How to unify words that are the same but are written differently?

1

For a program I need to put values of a file in a list.

What happens is that there are words like Real-Madrid , real madrid, madrid, etc ... that since they are all the same should appear only once, but when comparing it in python, a == b , gives me that they are different because they are written differently.

I do not know if you know in any way so I can detect that they should be the same.

    
asked by arnold 10.02.2017 в 21:23
source

2 answers

1

You could search for the word within the string, for example:

words = ["Real Madrid", "Real-Madrid", "madrid", "Other Word"]
selectedWord = "madrid"

for word in words:
    result = word.lower().find(selectedWord)
    print result

you would get this result:

5
5
0
-1

Compare each element in the list with the chosen word and return the position, in case it is not in the String it returns -1, to avoid uppercase and lowercase the lower () is used.

    
answered by 10.02.2017 в 21:41
1

There is an internal Python function for that: difflib .

from difflib import SequenceMatcher

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

similar("Real Madrid", "Real-Madrid")
0.9090909090909091

similar("Real Madrid", "RealMadrid")
0.9523809523809523

similar("Real Madrid", "Madrid")
0.7058823529411765

similar("Real Madrid", "Salamanca")
0.3
    
answered by 16.02.2017 в 15:58