for this kind of situations, the dplyr
library makes the transformation you need very easy. Simply create the data groups first (in this case, each combination of month and hour) and then make a summary, in this case with the sum of value
for each of the groups you defined before. I use pipes to link functions, but you could formulate it in another way.
library(tidyverse) #Esta librería a su vez empaqueta a 'dplyr' y otras librerías muy útiles para manipulación de datos.
tribble(
~mes, ~hour, ~value,
10, 1, 6,
10, 2, 9,
10, 1, 10,
10, 2, 12,
11, 1, 7,
11, 2, 2,
11, 1, 11,
11, 2, 3,
12, 1, 5,
12, 2, 1,
12, 1, 8,
12, 2, 4) ->datos #Creo la estructura de datos y le asigno el nombre datos.
With this I already have the data object in my environment.
datos %>% #La tubería pasa la función siguiente (lo llamamos lado derecho) el output de la anterior (lado izquierdo). En este caso pasa el objeto datos.
group_by(mes, hour) %>% #Agrupo por mes y hour, si miras el output de esta función verás que no cambia nada en los datos, pero agrega atributos de grupos que usará la siguiente función.
summarise(suma=sum(value)) #Crea un sumario con la sumatoria de datos, lo interesante es que la sumatoria es para cada uno de los grupos que definí antes.
and I get:
# A tibble: 6 x 3
# Groups: mes [?]
mes hour suma
<dbl> <dbl> <dbl>
10 1 16
10 2 21
11 1 18
11 2 5
12 1 13
12 2 5
After the last function you could use the allocator on the right ->
to name your output, if that is what you need. Or use the traditional allocator <-
at the beginning of the entire chain. The result is the same: a data.frame
. This is important because if you want to continue doing operations you have a very manageable data structure.
Other uses of summarise()
With group_by()
and summarise()
it is very easy to make summaries of the data, provided that they have the correct structure.
For example, see the average value in each month:
datos %>%
group_by(mes) %>%
summarise(promedio=mean(value))
Or identify the highest value for one hour in each month:
datos %>%
group_by(mes, hour) %>%
summarise(maximo=max(value))