The !2b2ih
format is the correct one for the data that you claim to have for each line, the error is not so much in the code, it is in the conception of the file itself.
There are two possible (and concurrent) causes for the problem (discarding of course errors when creating the file and that what there is for each "line" is not what it is supposed to be):
-
The readline
method returns a string of bytes from the current cursor position until it encounters an EOL. At no time removes the character \r
or \n
, this is also returned in the chain. Therefore, your variable line
will not have 12 bytes, it will have 13 (or 14 if the end of line is CRLF).
-
It must be borne in mind that from the point of view of the data, when you read bytes of a file, it is just that, bytes. That is, they do not mean anything, there is nothing that differentiates that is an integer and that is character or a float. That meaning is given later. This is obvious but very important, the integer 10 in binary would be 0000 1010
( 0A
in hexadecimal), exactly the same value that has the character line feed ( \n
) in the ASCII table . What happens if a file of 10 is saved as int
of 16 bits? Well, when you try to use readline
the last byte of the whole is taken as a line break and we have the problem mounted ...
We can see it more graphically if we use an example, if we concatenate the byte 0100 0001
the byte 0100 0010
, the integer 4 bytes 14753
, the integer 4 bytes 10
and the integer 2 bytes 23
and the line break we get the following string of bytes (hex):
b '\ x41 \ x42 \ x00 \ x00 \ x39 \ xa1 \ x00 \ x00 \ x00 \ x0A \ x00 \ x17 \ x0A'
When readline
goes through the file it does it from the current position of the cursor until an LF character is found, if we go through the previous chain with readline
or readlines
we obtain the following:
>>> from io import BytesIO
>>> file = BytesIO(b'\x41\x42\x00\x00\x39\xa1\x00\x00\x00\x0A\x00\x17\x0A')
>>> file.readline()
b'AB\x00\x009\xa1\x00\x00\x00\n'
>>> file.readline()
b'\x00\x17\n'
The integer 10 causes the chain to break, at the end we get a "line" with 10 bytes and another with 3 that contains the \n
final ...
To store your structure you should not use line breaks. You simply see concatenated each string of 12 bytes one after the other. How do we unpack it later? Well, if we know that they go in packets of 12 bytes, we should simply iterate over the file, obtaining chains of 12 bytes in 12 bytes.
A very simple example using the module struct
to create and then to read the file:
import struct
with open("Fichero", "wb") as f:
f.write(struct.pack('!2b2ih', ord("A"), ord("B"), 107788, 17455, 23))
f.write(struct.pack('!2b2ih', ord("C"), ord("D"), 19488, 431542, 5588))
f.write(struct.pack('!2b2ih', ord("\n"), ord("F"), 47588, 42442, 77))
with open('Fichero', 'rb') as f:
for st in struct.iter_unpack('!2b2ih', f.read()):
print(st)
Which we parsea the file without problems, the output will be:
(65, 66, 107788, 17455, 23)
(67, 68, 19488, 431542, 5588)
(10, 70, 47588, 42442, 77)
The struct.iter_unpack
method appeared in Python 3.4, if an older version is used we can always create our own method:
def iter_unpack(fmt, buffer):
size = struct.calcsize(fmt)
gen = (buffer[i: i+size] for i in range(0, len(buffer), size))
for chunck in gen:
yield struct.unpack(fmt, chunck)
with open('Fichero', 'rb') as f:
for st in iter_unpack('!2b2ih', f.read()):
print(st)
If your file is not created by you, and therefore you must use it as it is, yes or yes, you can do something to solve your problem whenever it is due to the comments.
The idea is to read the file as before, but taking packages of 13 bytes and eliminating or ignoring the final character that will correspond to \n
in principle.
We will create a file in which each data pack will be separated in a new line:
import struct
with open("Fichero", "wb") as f:
f.write(struct.pack('!2b2ih', ord("A"), ord("B"), 107788, 17455, 23))
f.write(b"\n")
f.write(struct.pack('!2b2ih', ord("C"), ord("D"), 19488, 431542, 5588))
f.write(b"\n")
f.write(struct.pack('!2b2ih', ord("\n"), ord("F"), 47588, 42442, 77))
f.write(b"\n")
If we try to read it:
with open('Fichero', 'rb') as f:
for st in struct.iter_unpack('!2b2ih', f.read()):
print(st)
We get:
Traceback (most recent call last):
File "D:\test.py", line 14, in <module>
for st in struct.iter_unpack('!2b2ih', f.read()):
struct.error: iterative unpacking requires a bytes length multiple of 12
Does the error sound? To try to solve it we can do the following:
with open('Fichero', 'rb') as f:
for st in struct.iter_unpack('!2b2ihb', f.read()):
st = st[:-1]
print(st)
In the previous case we simply make%% co parse the NF character as a byte, which we then ignore. Another more manual option would be:
with open('Fichero', 'rb') as f:
fmt = '!2b2ih'
size = struct.calcsize(fmt) + 1
dat = f.read()
gen = (dat[i: i+size] for i in range(0, len(dat), size))
for chunck in gen:
print(struct.unpack(fmt, chunck[:-1]))
In both cases we are assuming that struck.unpack
was added at the end of each data "package". Logically if \n
is used or the last line does not have EOL we would have to adapt it.