R - Twitter - json - lists

0

I've done a Twitter extraction of about 3000 messages, including retweets, and now I want to analyze them. For this I saved the file as UFT-8 format without BOM. The original extraction I did it in the following way:

library(streamR);
source('credentials.R')
filterStream("tweets.json", track = c("Obama", "Putin"), timeout = 60, oauth = cred);

Then I open it with the following instruction, taking it to a list:

lista_mensajes_twitter <- readTweets("mensajes_twitter.json")

So far so good. The fact now is that I would need to get the following, and I do not know how to do it:

  • Average length of text of captured tweets.
  • Correlation between the number of followers of each user and the number of RTs that the user has received (of those who have at least 1).
  • Correlation between the number of followers of each user and the number of replies received by that user (at least they have received a reply).
  • asked by Groguet 25.05.2017 в 15:43
    source

    1 answer

    0

    Starting, you first download the messages as "tweets.json" and then load them as "mensajes_twitter.json" . Be careful how you formulate the question. Also, if you want to count retweets, one minute is a very short time to capture that information.

    R reads the file .json as if it were a list. Each list is composed of a series of boxes in which logical values, vectors or even more lists can be housed. Use the str(lista_mensajes_twitter[[1]]) command and you will see that the first tweet in the list is composed.

    That said, to count the length of the first tweet you have to know in which box the text is located.

    str(lista_mensajes_twitter[[1]], max.level = 1)
    
     List of 30
      $ created_at               : chr "Thu May 25 14:44:02 +0000 2017"
      $ id                       : num 8.68e+17
      $ id_str                   : chr "867753126899666944"
      $ text                     : chr "RT @HoopsOverHoes_: Bro in the first pic I  thought....nvm https://t.co/BVsMV3HTWQ"
      $ source                   : chr "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>"
      $ truncated                : logi FALSE
      $ in_reply_to_status_id    : NULL
      $ in_reply_to_status_id_str: NULL
      $ in_reply_to_user_id      : NULL
      $ in_reply_to_user_id_str  : NULL
      $ in_reply_to_screen_name  : NULL
      $ user                     :List of 38
      $ geo                      : NULL
      $ coordinates              : NULL
      $ place                    : NULL
      $ contributors             : NULL
      $ retweeted_status         :List of 29
      $ quoted_status_id         : num 8.67e+17
      $ quoted_status_id_str     : chr "867076974337982465"
      $ quoted_status            :List of 27
      $ is_quote_status          : logi TRUE
      $ retweet_count            : num 0
      $ favorite_count           : num 0
      $ entities                 :List of 4
      $ favorited                : logi FALSE
      $ retweeted                : logi FALSE
      $ possibly_sensitive       : logi FALSE
      $ filter_level             : chr "low"
      $ lang                     : chr "en"
      $ timestamp_ms             : chr "1495723442183"
    

    Knowing that it is in the box $text or number 3, we proceed to calculate the length:

    nchar(lista_mensajes_twitter[[1]][['text']])
    81
    

    To apply it to each tweet you use a loop, a apply or the functions of the package purrr (very powerful handling lists):

    library(purrr)
    
    len_tweets <- lista_mensajes_twitter %>% map("text") %>% map_int(nchar)
    
    head(len_tweets)
    [1]  81 140 140 134 140 127
    

    The result is a vector with the number of characters per tweet.

    To extract the number of followers and number of retweets:

    fllw <- lista_mensajes_twitter %>% map("user") %>% map("followers_count")
    
    rt <- lista_mensajes_twitter %>% map_dbl("retweet_count")
    

    The number of responses is more difficult to determine, since you have to track all tweets with id over in_reply_to_status_id of others.

    The correlation is simply made with cor between the vectors.

        
    answered by 25.05.2017 / 18:57
    source