FastText word vectors can not convert string to float?

0

I try to use vectors of words trained using fastText for "word embedding" in the construction of a neural network for the Kaggle competition on the classification of toxic comments. I was inspired by a network of another competitor and tried to use other vectors of words than he did. In effect, I used crawl-300d-2M.vec who is no longer available seems to you. In any case, neither in the Kaggle site nor in FastText site : I get the following message when I try to download it:

  

/tmp/mozilla_mike0/NrvYil8h.zip.part could not be saved, because the source file could not be read.

Then I tried to use another one

EMBEDDING_FILE = '../FastText/wiki.en.vec' # fue crawl-300d-2M.vec antes
embeddings_index = dict(get_coefs(*o.strip().split()) 
    for o in open(EMBEDDING_FILE, encoding="utf8"))

But it tells me that string could not be converted to float: '·'

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-15f34ea11d1d> in <module>()
     26 print('Preparing Dictionary...')
     27 # Read the FastText word vectors (space delimited strings) into a dictionary from word->vector
---> 28 embeddings_index = dict(get_coefs(*o.strip().split()) for o in open(EMBEDDING_FILE, encoding="utf8"))
     29 print("embeddings_index size: ", len(embeddings_index))
     30 dictionary = dict.fromkeys(embeddings_index, None)

<ipython-input-16-15f34ea11d1d> in <genexpr>(.0)
     26 print('Preparing Dictionary...')
     27 # Read the FastText word vectors (space delimited strings) into a dictionary from word->vector
---> 28 embeddings_index = dict(get_coefs(*o.strip().split()) for o in open(EMBEDDING_FILE, encoding="utf8"))
     29 print("embeddings_index size: ", len(embeddings_index))
     30 dictionary = dict.fromkeys(embeddings_index, None)

<ipython-input-7-4a3efa694941> in get_coefs(word, *arr)
      1 def get_coefs(word, *arr):
----> 2     return word, np.asarray(arr, dtype='float32')
      3 

~/.local/lib/python3.5/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
    529 
    530     """
--> 531     return array(a, dtype, copy=False, order=order)
    532 
    533 

ValueError: could not convert string to float: '·'

I try with wiki.simple.vec and I answered almost the same message but ValueError: could not convert string to float: 'united'

    
asked by ThePassenger 16.03.2018 в 06:24
source

0 answers