Use of dplyr and summarize with missing values

0

I have a set of countries in which I observe the suicide rate over time. I want to get the average of suicide rates by country. I have tried to use this code, and for that, as there are lost values, I have also included na.rm, but it gives me the results, as I could do

Suicide.avg.per.country <- suicidedata %>% group_by(country_name) %>% summarise(AVG_SUICIDE = mean(suicidedata$suicidetotal), na.rm=T)

A tibble: 22 × 3
     country_name AVG_SUICIDE na.rm
            <chr>       <lgl> <lgl>
1            MKD*          NA  TRUE
2         Armenia          NA  TRUE
3      Azerbaijan          NA  TRUE
4         Belarus          NA  TRUE
5        Bulgaria          NA  TRUE
6  Czech Republic          NA  TRUE
7         Estonia          NA  TRUE
8         Georgia          NA  TRUE
9         Hungary          NA  TRUE
10     Kazakhstan          NA  TRUE
# ... with 12 more rows

I have tried however with tapply and I get the results

 tapply(suicidedata$suicidetotal, suicidedata$country_name, mean, na.rm=TRUE)
               MKD*             Armenia          Azerbaijan 
           4.996500            2.861031            1.767627 
            Belarus            Bulgaria      Czech Republic 
          46.165000           11.471000           16.550500 
            Estonia             Georgia             Hungary 
          38.774500            3.543044           30.316000 
         Kazakhstan          Kyrgyzstan              Latvia 
          40.884500           17.784500           41.274500 
          Lithuania              Poland Republic of Moldova 
          59.663500           21.236111           25.288500 
            Romania  Russian Federation            Slovakia 
          15.783500           48.499000           17.968889 
           Slovenia          Tajikistan             Ukraine 
          26.684500            4.652000           34.449500 
         Uzbekistan 
           9.869333 

Thanks in advance Antonio

    
asked by Antonio 07.06.2017 в 16:57
source

2 answers

0

One possible problem is that the data is not being considered as numeric, but that is quickly resolved:

tapply(as.numeric(suicidedata$suicidetotal), suicidedata$country_name, mean, na.rm=TRUE)

I also remind you of a simpler way using aggregate

aggregate(as.numeric(suicidetotal) ~ country_name, suicidedata, mean)
    
answered by 07.06.2017 в 18:29
0

I think the problem is that na.rm=T must be passed as a parameter of the mean function.

That is, instead of:

Suicide.avg.per.country <- suicidedata %>% group_by(country_name) %>% 
summarise(AVG_SUICIDE = mean(suicidedata$suicidetotal), na.rm=T)

put:

Suicide.avg.per.country <- suicidedata %>% group_by(country_name) %>% 
summarise(AVG_SUICIDE = mean(suicidedata$suicidetotal, na.rm=T))
    
answered by 15.06.2017 в 00:11