Error unpacking data from file with struct.unpack, incorrect length of string bytes

1

I'm doing in Python a program that unpacks a series of data from a file stored line by line. As you can see, I'm using the struct module and the struct.unpack() function for that purpose.

The data for each line is: 2 bytes, 2 32-bit integers and 16-bit integer in that order (a total of 12 bytes).

The code that I have implemented is the following:

#!/usr/bin/python
import struct

f = open('Fichero', 'rb')

while(1):
    line = f.readline()
    if not line:
        break
    else:
        print(struct.unpack('!2b2ih', line))

f.close()

The problem is that when I run the code I find myself facing the following error:

  

struct.error: unpack requires a byte object of length 12

Error that I do not understand because in principle line has a length of 12 bytes and the format of the first argument seems to be correct.

    
asked by Kolorao 02.03.2018 в 17:39
source

1 answer

1

The !2b2ih format is the correct one for the data that you claim to have for each line, the error is not so much in the code, it is in the conception of the file itself.

There are two possible (and concurrent) causes for the problem (discarding of course errors when creating the file and that what there is for each "line" is not what it is supposed to be):

  • The readline method returns a string of bytes from the current cursor position until it encounters an EOL. At no time removes the character \r or \n , this is also returned in the chain. Therefore, your variable line will not have 12 bytes, it will have 13 (or 14 if the end of line is CRLF).

  • It must be borne in mind that from the point of view of the data, when you read bytes of a file, it is just that, bytes. That is, they do not mean anything, there is nothing that differentiates that is an integer and that is character or a float. That meaning is given later. This is obvious but very important, the integer 10 in binary would be 0000 1010 ( 0A in hexadecimal), exactly the same value that has the character line feed ( \n ) in the ASCII table . What happens if a file of 10 is saved as int of 16 bits? Well, when you try to use readline the last byte of the whole is taken as a line break and we have the problem mounted ...

We can see it more graphically if we use an example, if we concatenate the byte 0100 0001 the byte 0100 0010 , the integer 4 bytes 14753 , the integer 4 bytes 10 and the integer 2 bytes 23 and the line break we get the following string of bytes (hex):

  

b '\ x41 \ x42 \ x00 \ x00 \ x39 \ xa1 \ x00 \ x00 \ x00 \ x0A \ x00 \ x17 \ x0A'

When readline goes through the file it does it from the current position of the cursor until an LF character is found, if we go through the previous chain with readline or readlines we obtain the following:

>>> from io import BytesIO
>>> file = BytesIO(b'\x41\x42\x00\x00\x39\xa1\x00\x00\x00\x0A\x00\x17\x0A')
>>> file.readline()
b'AB\x00\x009\xa1\x00\x00\x00\n'
>>> file.readline()
b'\x00\x17\n'

The integer 10 causes the chain to break, at the end we get a "line" with 10 bytes and another with 3 that contains the \n final ...

To store your structure you should not use line breaks. You simply see concatenated each string of 12 bytes one after the other. How do we unpack it later? Well, if we know that they go in packets of 12 bytes, we should simply iterate over the file, obtaining chains of 12 bytes in 12 bytes.

A very simple example using the module struct to create and then to read the file:

import struct



with open("Fichero", "wb") as f:
    f.write(struct.pack('!2b2ih', ord("A"), ord("B"), 107788, 17455, 23))
    f.write(struct.pack('!2b2ih', ord("C"), ord("D"), 19488, 431542, 5588))
    f.write(struct.pack('!2b2ih', ord("\n"), ord("F"), 47588, 42442, 77))

with open('Fichero', 'rb') as f:
    for st in struct.iter_unpack('!2b2ih', f.read()):
        print(st)

Which we parsea the file without problems, the output will be:

  

(65, 66, 107788, 17455, 23)
  (67, 68, 19488, 431542, 5588)
  (10, 70, 47588, 42442, 77)

The struct.iter_unpack method appeared in Python 3.4, if an older version is used we can always create our own method:

def iter_unpack(fmt, buffer):
    size = struct.calcsize(fmt)
    gen = (buffer[i: i+size] for i in range(0, len(buffer), size))
    for chunck in gen:
        yield struct.unpack(fmt, chunck)

with open('Fichero', 'rb') as f:
    for st in iter_unpack('!2b2ih', f.read()):
        print(st)

If your file is not created by you, and therefore you must use it as it is, yes or yes, you can do something to solve your problem whenever it is due to the comments.

The idea is to read the file as before, but taking packages of 13 bytes and eliminating or ignoring the final character that will correspond to \n in principle.

We will create a file in which each data pack will be separated in a new line:

import struct



with open("Fichero", "wb") as f:
    f.write(struct.pack('!2b2ih', ord("A"), ord("B"), 107788, 17455, 23))
    f.write(b"\n")
    f.write(struct.pack('!2b2ih', ord("C"), ord("D"), 19488, 431542, 5588))
    f.write(b"\n")
    f.write(struct.pack('!2b2ih', ord("\n"), ord("F"), 47588, 42442, 77))
    f.write(b"\n")

If we try to read it:

with open('Fichero', 'rb') as f:
    for st in struct.iter_unpack('!2b2ih', f.read()):
        print(st)

We get:

Traceback (most recent call last):
  File "D:\test.py", line 14, in <module>
    for st in struct.iter_unpack('!2b2ih', f.read()):
struct.error: iterative unpacking requires a bytes length multiple of 12

Does the error sound? To try to solve it we can do the following:

with open('Fichero', 'rb') as f:
    for st in struct.iter_unpack('!2b2ihb', f.read()):
        st = st[:-1]
        print(st)

In the previous case we simply make%% co parse the NF character as a byte, which we then ignore. Another more manual option would be:

with open('Fichero', 'rb') as f:
    fmt = '!2b2ih'
    size = struct.calcsize(fmt) + 1
    dat = f.read()
    gen = (dat[i: i+size] for i in range(0, len(dat), size))
    for chunck in gen:
        print(struct.unpack(fmt, chunck[:-1]))

In both cases we are assuming that struck.unpack was added at the end of each data "package". Logically if \n is used or the last line does not have EOL we would have to adapt it.

    
answered by 02.03.2018 / 22:30
source