Read a .json from python

Question

Read a .json from python

Navigation

#1 by (2 votes)

0

I have a .json with the following structure:

[
  {
    "Country": "Spain",
    "Age": "14"
  },
  {
    "Country": "China",
    "Age": "16"
  },
]

I try to read it with the following method:

import json
from pprint import pprint

with open('json.json') as f:
    data = json.load(f)

pprint(data)

but it throws me the following error:

ValueError: No JSON object could be decoded

The JSON is returned to me by the Octoparse software, so I do not think it is badly formatted.

How to store the values in a local json to my script?

I would like it to have the following format:

{"14":"Spain","16":"China"}

Thank you.

python json

asked by XBoss 19.07.2018 в 08:14

source

1 answer

Capture more than one click from a button Disable dates greater than the current one in an input date

score 2 · Accepted Answer

Diagnosis

Although the JSON that is pasted in the question is correct (except for an error when copying it that has left a comma at the end that was left over), when the user tries the same operations on his own JSON, he gets the error ValueError , which is not very informative.

After some conversations with the user, I get the json file with which he is really working, and I try to replicate the execution of his code with Python2 (which is the version that the user uses), and indeed, although the JSON supplied looks like correct, I get the error:

ValueError: No JSON object could be decoded

If, on the other hand, I repeat the execution using Python3, the diagnosis is much more precise and confirms my suspicions that there are hidden characters at the beginning of the file that are causing the problems:

json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)

The problem

The file contains at the beginning a series of bytes called "BOM" (Byte Order Mark) that are invisible when displayed on screen or loaded in an editor, but not when read from a program.

The mission of those bytes, if the file were in UTF-16, is to allow the programs that read it to deduce the endianity from the architecture in which the file was generated (ie, if it's little endian or big endian ). However, in a UTF-8 file it does not make sense to enter these bytes because the UTF-8 format is immune to the endianity problem.

However, many editors and Windows programs enter these bytes equally when they save in UTF-8, and that is why it is not compatible with the JSON standard.

Solution

Using python3 it is possible to pass to open() a parameter that specifies the encoding of the file to be read (if it is not passed, it assumes utf-8 ). In this case we would have to pass utf-8-sig , as Python3 itself is telling us in its error message.

However, since the user uses Python2, he does not have the possibility to pass that parameter when opening the file, so we have no choice but to read the entire file to a string of bytes, and then encode that string to Unicode, using the format in question. Then we will use json.loads() instead of json.load() , since this way we can pass the correctly decoded unicode string instead of the file.

That is:

import json
with open("json.json") as f:
  raw_data = f.read()
data = json.loads(raw_data.decode("utf-8-sig"))

This solution occupies more memory than Python3, because we have to load the entire file before parsing the json, while in the python3 it would go parsing as it is read, but since the file is not very large (61K) no problem.