If I understood correctly what you asked for, the following code would be the answer. I have renamed some of your variables to follow the typical Python conventions, according to which the uppercase initial is reserved for class names (this agreement and others are specified in the PEP8 , of which there is an translation unofficial to Spanish) .
sentimiento = open("Sentimientos.txt")
valores = {}
for linea in sentimiento:
termino, valor = linea.split("\t")
valores[termino] = int(valor)
tweets = open("salida_tweets.txt",'r')
for i, linea in enumerate(tweets):
total = 0
for sentimiento, valor in valores.items():
if sentimiento in linea:
print("Se ha encontrado {} en el tweet de la linea {} (valor={})"
.format(sentimiento, i, valor))
total += valor
print("El tweet de la línea {} tiene un valor de {}".format(i, total))
This code calculates the sum of values of all the feelings found in each tweet, which I think is what you asked for.
Update
Once the OP has provided an example of the contents of the salida_tweets.txt
file, it is seen that the content consists of a tweet per line, but each tweet is a JSON structure, not a simple text string.
I copy here part of the content provided by the OP:
{"delete":{"status":{"id":294512601600258048,"id_str":"294512601600258048","user_id":90681582,"user_id_str":"90681582"},"timestamp_ms":"1410368494083"}}
{"created_at":"Wed Sep 10 17:01:33 +0000 2014","id":509748524897292288,"id_str":"509748524897292288","text":"@Brenamae_ I WHALE SLAP YOUR FIN AND TELL YOU ONE LAST TIME: GO AWHALE","source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":509748106015948800,"in_reply_to_status_id_str":"509748106015948800","in_reply_to_user_id":242563886,"in_reply_to_user_id_str":"242563886","in_reply_to_screen_name":"Brenamae_","user":{"id":175160659,"id_str":"175160659","name":"Butterfly","screen_name":"VanessaLilyWan","location":"Canada, Montreal","url":"http:\/\/instagram.com\/vanessalilywan","description":"British youtubers. 'Nuff said.","protected":false,"verified":false,"followers_count":118,"friends_count":180,"listed_count":2,"favourites_count":319,"statuses_count":10221,"created_at":"Thu Aug 05 20:03:16 +0000 2010","utc_offset":-36000,"time_zone":"Hawaii","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"B2DFDA","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme13\/bg.gif","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme13\/bg.gif","profile_background_tile":false,"profile_link_color":"93A644","profile_sidebar_border_color":"EEEEEE","profile_sidebar_fill_color":"FFFFFF","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/470701406245376000\/2aXDrauR_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/470701406245376000\/2aXDrauR_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/175160659\/1404361640","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[],"user_mentions":[{"screen_name":"Brenamae_","name":"I-G-G-Bye","id":242563886,"id_str":"242563886","indices":[0,10]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"medium","lang":"en","timestamp_ms":"1410368493668"}
{"delete":{"status":{"id":204951917716189185,"id_str":"204951917716189185","user_id":496152394,"user_id_str":"496152394"},"timestamp_ms":"1410368494071"}}
Many of the lines in this example do not look like "true" tweets, since they do not contain the "text"
field. In fact, the only line that looks like a true tweet is the one that starts with {"created_at"...
The others seem more like deletion actions.
With this new information, I do not think that the initial approach of looking for certain words (feelings) in each line is the most indicated. Let's think for example that one of the keywords to look for is "time"
. This word appears in all the lines because all the tweets contain in their JSON the time they were issued in a field called "time"
. But I understand that what is sought is only tweets that use the word "time"
as part of the tweet message, and not as part of the complete JSON.
On the other hand, just as the code was, it was not taking into account that a feeling must be found even if it has been written in capital letters in the Tweet. For example, the only Tweet that contains text (the second line of the example) has the following text:
@Brenamae_ I WHALE SLAP YOUR FIN AND TELL YOU ONE LAST TIME: GO AWHALE
that everything is in capital letters (and also, look what a coincidence, use the word TIME that I mentioned before).
Therefore, a correct way to approach the problem in my opinion would be:
Read each line of the tweeets file
Parse the json contained in that line to obtain a python dictionary
See if that dictionary contains the text
field. If not, ignore the line as it is not a "true" tweet.
Stay with the field 'text, pass it to lowercase and use it to look for feelings in it and compute the corresponding scores.
All this is done by the following code, in which I have supplied the contents of some sample files as strings, so that anyone can try it and see that it still works without having the files. It only remains to change the io.IOString()
for open()
of the corresponding files so that it works on files instead of strings.
contenido_tweets = r'''
{"delete":{"status":{"id":294512601600258048,"id_str":"294512601600258048","user_id":90681582,"user_id_str":"90681582"},"timestamp_ms":"1410368494083"}}
{"created_at":"Wed Sep 10 17:01:33 +0000 2014","id":509748524897292288,"id_str":"509748524897292288","text":"@Brenamae_ I WHALE SLAP YOUR FIN AND TELL YOU ONE LAST TIME: GO AWHALE","source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":509748106015948800,"in_reply_to_status_id_str":"509748106015948800","in_reply_to_user_id":242563886,"in_reply_to_user_id_str":"242563886","in_reply_to_screen_name":"Brenamae_","user":{"id":175160659,"id_str":"175160659","name":"Butterfly","screen_name":"VanessaLilyWan","location":"Canada, Montreal","url":"http:\/\/instagram.com\/vanessalilywan","description":"British youtubers. 'Nuff said.","protected":false,"verified":false,"followers_count":118,"friends_count":180,"listed_count":2,"favourites_count":319,"statuses_count":10221,"created_at":"Thu Aug 05 20:03:16 +0000 2010","utc_offset":-36000,"time_zone":"Hawaii","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"B2DFDA","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme13\/bg.gif","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme13\/bg.gif","profile_background_tile":false,"profile_link_color":"93A644","profile_sidebar_border_color":"EEEEEE","profile_sidebar_fill_color":"FFFFFF","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/470701406245376000\/2aXDrauR_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/470701406245376000\/2aXDrauR_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/175160659\/1404361640","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"trends":[],"urls":[],"user_mentions":[{"screen_name":"Brenamae_","name":"I-G-G-Bye","id":242563886,"id_str":"242563886","indices":[0,10]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"medium","lang":"en","timestamp_ms":"1410368493668"}
{"delete":{"status":{"id":204951917716189185,"id_str":"204951917716189185","user_id":496152394,"user_id_str":"496152394"},"timestamp_ms":"1410368494071"}}
{"delete":{"status":{"id":509733211497193473,"id_str":"509733211497193473","user_id":2328935617,"user_id_str":"2328935617"},"timestamp_ms":"1410368494165"}}
'''
contenido_sentimientos = '''
time\t5
slap\t2
whale\t3
'''
# ------------------------
import io
import json
sentimiento = io.StringIO(contenido_sentimientos)
valores = {}
for linea in sentimiento:
linea = linea.strip()
if not linea:
continue # Saltarse lineas en blanco
termino, valor = linea.split("\t")
valores[termino.lower()] = int(valor)
tweets = io.StringIO(contenido_tweets)
for i, linea in enumerate(tweets):
total = 0
linea = linea.strip()
if not linea:
continue # Saltarse lineas vacias
# Convertir el JSON de la línea a un diccionario python
data = json.loads(linea)
if "text" not in linea:
continue # Saltarse líneas que no tengan un tweet
for sentimiento, valor in valores.items():
if sentimiento in data["text"].lower():
print("Se ha encontrado {} en el tweet de la linea {} (valor={})"
.format(sentimiento, i, valor))
total += valor
print("El tweet de la línea {} tiene un valor de {}".format(i, total))
The result that appears on the screen is:
Time has been found in the tweet of line 2 (value = 5)
Slap found in the tweet of line 2 (value = 2)
Whale found in the tweet of line 2 (value = 3)
The tweet of line 2 has a value of 10