Count the words repeated in a dictionary in Python

Question

Count the words repeated in a dictionary in Python

Navigation

#1 by (1 votes)
#2 by (1 votes)

1

I would like to count all the repeated words in this dictionary and group them in another dictionary.

data = [{'text': 'Real Madrid', 'type': 'ORG'}, {'text': 'España', 'type': 'LOC'}, {'text': 'Real Madrid TV', 'type': 'MISC'}, {'text': 'España', 'type': 'PER'}, {'text': 'España', 'type': 'PER'}, {'text': 'Real Madrid', 'type': 'ORG'}, {'text': 'Atlético de Madrid', 'type': 'LOC'}, {'text': 'Real Madrid', 'type': 'ORG'}, {'text': 'Real Madrid', 'type': 'ORG'}, {'text': 'Real Mad', 'type': 'ORG'}, {'text': 'Ricardo Rodríguez', 'type': 'PER'}]

Capture of the dictionary structure

In the dictionary as you can see, the word Real Madrid is repeated 4 times, Spain is repeated 2 times, others once.

What I want is to create another dictionary by adding the Quantity column and put the repeated values there; I mean my new dictionary would be with the new fields: text, type and quantity; the new dimension of the dictionary would be 6.

This is my breakthrough:

valor = data[0]['text']
del data[0]
count = 0
value_count = []
for x in data:
    if x['text'] == valor:
        value_count.append({'text': x['text'], 'type':x['type']})
        valor = x['text']
print (value_count)

Thanks

python python-3.x array diccionarios

asked by Alex Ancco Cahuana 10.08.2018 в 22:27

source

2 answers

1

You need to count the repetitions to see if they are greater than 1 and then check if it is already in the new dictionary

data = [{'text': 'Real Madrid', 'type': 'ORG'}, 
        {'text': 'España', 'type': 'LOC'}, 
        {'text': 'Real Madrid TV', 'type': 'MISC'}, 
        {'text': 'España', 'type': 'PER'}, 
        {'text': 'España', 'type': 'PER'}, 
        {'text': 'Real Madrid', 'type': 'ORG'}, 
        {'text': 'Atlético de Madrid', 'type': 'LOC'}, 
        {'text': 'Real Madrid', 'type': 'ORG'}, 
        {'text': 'Real Madrid', 'type': 'ORG'}, 
        {'text': 'Real Mad', 'type': 'ORG'}, 
        {'text': 'Ricardo Rodríguez', 'type': 'PER'}
]

data2 = [

]

def contarRepeticiones(palabra):
    cont = 0
    for i in data:
        if i['text']==palabra:
            cont+=1
    return cont


def seEcuentraenData2(palabra):
    for i in data2:
        if i['text']==palabra:
            return True
    return False            


for i in data:
    if contarRepeticiones(i['text'])>1 and seEcuentraenData2(i['text'])==False:
        data2.append({'text': i['text'], 'type': i['type'],'cantidad': contarRepeticiones(i['text'])})

print (data2)

answered by 10.08.2018 в 22:47

The page is restarted after uploading the second file try to simulate the command tail and head with python in windows

score 1 · Accepted Answer

I hope this code serves you. You can remove the if element count > 1: in case you want to count everything, repetitions and not repetitions.

data = [
{'text': 'Real Madrid', 'type': 'ORG'}, 
{'text': 'España', 'type': 'LOC'}, 
{'text': 'Real Madrid TV', 'type': 'MISC'}, 
{'text': 'España', 'type': 'PER'}, 
{'text': 'España', 'type': 'PER'}, 
{'text': 'Real Madrid', 'type': 'ORG'}, 
{'text': 'Atlético de Madrid', 'type': 'LOC'}, 
{'text': 'Real Madrid', 'type': 'ORG'}, 
{'text': 'Real Madrid', 'type': 'ORG'}, 
{'text': 'Real Mad', 'type': 'ORG'}, 
{'text': 'Ricardo Rodríguez', 'type': 'PER'}]

new_datalist = []
items_found = []
for element in data:
    if (not element in items_found):
        # items_found acumula los dic que ya se analizaron para no repetirlos
        items_found.append(element)
        elem_count = data.count(element) # Se cuentan los elementos
        if elem_count > 1:
            # Si hay mas de 1 repeticion, crear el diccionario nuevo
            new_elem = {}
            new_elem['text'] = element['text']
            new_elem['type'] = element['type']
            new_elem['cantidad'] = elem_count 
            new_datalist.append(new_elem)
print(new_datalist)

Exit:

[{'text': 'Real Madrid', 'type': 'ORG', 'cantidad': 4}, 
{'text': 'España', 'type': 'PER', 'cantidad': 2}]