use aggregate based on two columns of the data frame

1

I have a data matrix (dfA) with several columns: province (dfA is a subset for one of the provinces), municipality, crop, number of farms, area, production ... And I want to calculate the number of total farms , total area, total production ... for each crop in each municipality.

STRUCTURE of the data matrix:

    str(dfA)

'data.frame':   523 obs. of  8 variables:

$ CODPROV: Factor w/ 8 levels "4","11","14",..: 1 1 1 1 1 1 1 1 1 1 ...

$ MUN_INE: num  1 1 1 1 1 1 1 1 2 2 ...

$ OTE_NUM: Factor w/ 4 levels "15","20","36",..: 1 3 3 3 2 2 2 4 3 3 ...

$ NUM_EXP: num  1 1 7 14 2 2 3 20 3 8 ...

$ PRODEST: num  3510 830 1924 3388 696 ...

$ UTA_TOT: num  1 90 1705 2053 2613 ...

$ SUP_TOT: num  1395 15 820 1846 2519 ...

 $ SAU_TOT: num  1368 15 367 1779 2450 ...

FIRST LINES of the data matrix:

CODE that I have written so far:

aggregate (cbind(df$NUM_EXP,df$PRODEST,df$UTA_TOT,df$SAU_TOT) ~ df$OTE_NUM, data=dfA, FUN=sum)

However, I do NOT want to add only according to 'df$OTE_NUM' but according to 'df$MUN_INE' and 'df$OTE_NUM' . For this I tried to create a list:

aggregate (cbind(df$NUM_EXP,df$PRODEST,df$UTA_TOT,df$SAU_TOT) ~ list(df$MUN_INE, df$OTE_NUM), data=dfA, FUN=sum)

But entering the list gives me an error, how can I do it? I need something like this:

MANY THANKS for the help:)

    
asked by Laura Pradera 02.03.2017 в 18:40
source

2 answers

3

Good! I've already managed to solve it, I've done it this way:

dfA.sum<-aggregate (cbind(dfA$NUM_EXP,dfA$PRODEST,dfA$UTA_TOT,dfA$SAU_TOT) ~ dfA$MUN_INE + dfA$OTE_NUM, data=dfA, FUN=sum)

Thanks Toledo, jbkunst and Juan Bosco.

    
answered by 03.03.2017 в 08:35
1

As proposed @jbkunst , using the dplyr package is simpler and easier to interpret the code.

Here I try to reproduce some lines of your dataframe .

library(dplyr)

(dfA <- data_frame(CODPROV = c( 1,  1,  1,  1,  1,  1,  2,  2,  2,  3,  3,  3),
                   OTE_NUM = c(15, 36, 36, 20, 20, 20, 36, 36, 20, 36, 36, 36),
                   NUM_EXP = c( 1,  1,  7, 14,  2,  3,  3,  8, 45,  6,  1,  4)))

# A tibble: 12 × 3
   CODPROV OTE_NUM NUM_EXP
     <dbl>   <dbl>   <dbl>
1        1      15       1
2        1      36       1
3        1      36       7
4        1      20      14
5        1      20       2
6        1      20       3
7        2      36       3
8        2      36       8
9        2      20      45
10       3      36       6
11       3      36       1
12       3      36       4

Now I group according to CODPROV and OTE_NUM with group_by , and calculate the sum of NUM_EXP and call it SUM_NUM_EXP with function summarise .

dfA %>% 
  group_by(CODPROV, OTE_NUM) %>%
  summarise(SUM_NUM_EXP = sum(NUM_EXP))

  CODPROV OTE_NUM SUM_NUM_EXP
    <dbl>   <dbl>       <dbl>
1       1      15           1
2       1      20          19
3       1      36           8
4       2      20          45
5       2      36          11
6       3      36          11
    
answered by 06.03.2017 в 22:18