I try to use vectors of words trained using fastText for "word embedding" in the construction of a neural network for the Kaggle competition on the classification of toxic comments. I was inspired by a network of another competitor and tried to use other vectors of words than he did. In effect, I used crawl-300d-2M.vec
who is no longer available seems to you. In any case, neither in the Kaggle site nor in FastText site : I get the following message when I try to download it:
/tmp/mozilla_mike0/NrvYil8h.zip.part could not be saved, because the source file could not be read.
Then I tried to use another one
EMBEDDING_FILE = '../FastText/wiki.en.vec' # fue crawl-300d-2M.vec antes
embeddings_index = dict(get_coefs(*o.strip().split())
for o in open(EMBEDDING_FILE, encoding="utf8"))
But it tells me that string could not be converted to float: '·'
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-15f34ea11d1d> in <module>()
26 print('Preparing Dictionary...')
27 # Read the FastText word vectors (space delimited strings) into a dictionary from word->vector
---> 28 embeddings_index = dict(get_coefs(*o.strip().split()) for o in open(EMBEDDING_FILE, encoding="utf8"))
29 print("embeddings_index size: ", len(embeddings_index))
30 dictionary = dict.fromkeys(embeddings_index, None)
<ipython-input-16-15f34ea11d1d> in <genexpr>(.0)
26 print('Preparing Dictionary...')
27 # Read the FastText word vectors (space delimited strings) into a dictionary from word->vector
---> 28 embeddings_index = dict(get_coefs(*o.strip().split()) for o in open(EMBEDDING_FILE, encoding="utf8"))
29 print("embeddings_index size: ", len(embeddings_index))
30 dictionary = dict.fromkeys(embeddings_index, None)
<ipython-input-7-4a3efa694941> in get_coefs(word, *arr)
1 def get_coefs(word, *arr):
----> 2 return word, np.asarray(arr, dtype='float32')
3
~/.local/lib/python3.5/site-packages/numpy/core/numeric.py in asarray(a, dtype, order)
529
530 """
--> 531 return array(a, dtype, copy=False, order=order)
532
533
ValueError: could not convert string to float: '·'
I try with wiki.simple.vec
and I answered almost the same message but ValueError: could not convert string to float: 'united'