Sum of a category within a data.frame

1

From a data.frame with the following characteristics:

df <- data.frame(mes= rep(10:12, each = 4)
                 , hour = rep(1:2, 6)
                 , value = sample(1:12))

df

> df
   mes hour value
1   10    1     6
2   10    2     9
3   10    1    10
4   10    2    12
5   11    1     7
6   11    2     2
7   11    1    11
8   11    2     3
9   12    1     5
10  12    2     1
11  12    1     8
12  12    2     4 

I would like to generate another data.frame with the accumulated sums of "value" per hour in each of the months something like this:

>df2
   mes hour value
1   10    1    16
2   10    2    18
3   11    1    18
4   11    2     5
5   12    1    13
6   12    2     5 

Any ideas on how I can achieve it?

    
asked by Edgar Mora 30.11.2017 в 23:57
source

2 answers

1

for this kind of situations, the dplyr library makes the transformation you need very easy. Simply create the data groups first (in this case, each combination of month and hour) and then make a summary, in this case with the sum of value for each of the groups you defined before. I use pipes to link functions, but you could formulate it in another way.

library(tidyverse)      #Esta librería a su vez empaqueta a 'dplyr' y otras librerías muy útiles para manipulación de datos. 

tribble(
~mes, ~hour, ~value,
 10,    1,     6,
 10,    2,     9,
 10,    1,    10,
 10,    2,    12,
 11,    1,     7,
 11,    2,     2,
 11,    1,    11,
 11,    2,     3,
 12,    1,     5,
 12,    2,     1,
 12,    1,     8,
 12,    2,     4) ->datos     #Creo la estructura de datos y le asigno el nombre datos. 

With this I already have the data object in my environment.

datos %>%                      #La tubería pasa la función siguiente (lo llamamos lado derecho) el output de la anterior (lado izquierdo). En este caso pasa el objeto datos.  
group_by(mes, hour) %>%        #Agrupo por mes y hour, si miras el output de esta función verás que no cambia nada en los datos, pero agrega atributos de grupos que usará la siguiente función. 
  summarise(suma=sum(value))   #Crea un sumario con la sumatoria de datos, lo interesante es que la sumatoria es para cada uno de los grupos que definí antes. 

and I get:

# A tibble: 6 x 3
# Groups:   mes [?]
mes  hour  suma
<dbl> <dbl> <dbl>
  10     1    16
  10     2    21
  11     1    18
  11     2     5
  12     1    13
  12     2     5

After the last function you could use the allocator on the right -> to name your output, if that is what you need. Or use the traditional allocator <- at the beginning of the entire chain. The result is the same: a data.frame . This is important because if you want to continue doing operations you have a very manageable data structure.

Other uses of summarise()

With group_by() and summarise() it is very easy to make summaries of the data, provided that they have the correct structure. For example, see the average value in each month:

datos %>% 
  group_by(mes) %>% 
  summarise(promedio=mean(value))

Or identify the highest value for one hour in each month:

datos %>% 
  group_by(mes, hour) %>% 
  summarise(maximo=max(value))
    
answered by 01.12.2017 / 01:14
source
1

As an alternative to the good response of mpaladino, I can not help but point out the solution using R base :

aggregate(value~mes+hour, data=df, FUN=sum)

Explanation:

  • aggregate() is the basic function of R to arm groups of a data.frame and apply multiple routines to each group
  • It has many ways to be invoked, this, using the formula value~mes+hour is optimal because it returns a data.frame similar to the one you are looking for
  • On each group we apply the function sum()

The only issue I see, is that the output by default does not have the order you would expect, but eventually we could sort the grouping by an intermediate object, or even, easier, adjust the call a bit to aggregate() :

> df
   mes hour value
1   10    1     8
2   10    2     5
3   10    1     4
4   10    2    11
5   11    1     2
6   11    2     3
7   11    1    12
8   11    2    10
9   12    1     6
10  12    2     7
11  12    1     9
12  12    2     1

> aggregate(value~hour+mes, data=df, FUN=sum)[,c(2,1,3)]
  mes hour value
1  10    1    12
2  10    2    16
3  11    1    14
4  11    2    13
5  12    1    15
6  12    2     8
    
answered by 01.12.2017 в 02:25