'ascii' codec can not decode byte 0xc3 in position 43: ordinal not in range (128) Python

0

What happens is that I made a program that counts words in txt files in python, I try it for some files in English but when the files have characters like - ----- or ñ it appears Error :

  

'ascii' codec can not decode byte 0xc3 in position 43: ordinal not in   range (128)

Does anyone know what the problem might be? Thanks in advance. :)

Code

archivo = input("Ingresa el nombre del archivo\n")
ahandle = open(archivo)
counts = dict()
for line in ahandle:
  words = line.split()
for word in words:
  counts[word] = counts.get(word, 0) + 1
bigcount = None
bigword = None
for word, count in counts.items():
  if bigcount is None or count > bigcount:
  bigword = word
bigcount = count
print(bigword, bigcount)
    
asked by Guillermo 23.02.2018 в 21:40
source

1 answer

1

First of all, it would be necessary to know which version of python you use (if 2 or 3). For the parentheses in the print() I will assume that you use the 3.

The second, it would be necessary to know what encoding the file you are reading has. If you are on Linux or a modern Mac, it is most likely utf-8, but it could also be another. If you are on Windows it could be iso-8859-15 or utf-8 (unless you have created it from a console editor, but I'll dismiss it as unlikely).

Since the error is given to you when you find a byte that is worth 0xc3 , it is most likely that it is utf-8 (because that is the prefix that you use before the accented characters or the eñe).

What you have to do is specify what encoding the file uses as part of the open() parameters, so for example:

ahandle = open(archivo, encoding="utf-8")

If it keeps failing you, it will be that the coding is different.

    
answered by 23.02.2018 в 22:39