How to collapse a data frame with R


Hi, I'm trying to collapse a data frame that I have with R, the idea is that I have 3 variables: site, species, biomass. For each site I have a list of species where the same species are repeated several times in the sites with biomasses, and I want to leave for each site only once each species, with the sum of the abundances if it has left more than once in that site. I was using the dplyr package, and trying to do it with summarize (group_by), but I only get the list of all the species that come out in total in all the sites together and their total biomass of all the sites together, not separated by each site observed. But it would be good to do it in any way in R. Thank you very much for your time.

asked by Nestor.S.M 26.03.2016 в 21:13

2 answers


Greetings, Including an example of the data accelerates the process. It can be code that generates the data or if you copy it from the clipboard. Here is a question in English on the subject. Let's suppose data like these:

datos <- data.frame(sitio=paste0("sitio",sample(3,10,replace=TRUE)),

    sitio      especie   biomasa
    1  sitio3      e3      18
    2  sitio3      e1      21
    3  sitio2      e4      20
    4  sitio3      e4      21
    5  sitio2      e2      21
    6  sitio1      e4      19
    7  sitio1      e1      20
    8  sitio1      e1      18
    9  sitio3      e1      20
    10 sitio1      e4      20

Rubén's response is excellent:

 datos %>% group_by(sitio, especie) %>% summarise(biomasa = sum(biomasa))

   sitio especie biomasa
  (fctr)  (fctr)   (dbl)
1 sitio1      e1      38
2 sitio1      e4      39
3 sitio2      e2      21
4 sitio2      e4      20
5 sitio3      e1      41
6 sitio3      e3      18
7 sitio3      e4      21

However, I propose the use of data.table as an alternative, which is slightly faster and to me it seems easier:

 datos <-
 datos[,sum(biomasa), by=c("especie","sitio")]
    especie  sitio V1
 1:      e3 sitio3 18
 2:      e1 sitio3 41
 3:      e4 sitio2 20
 4:      e4 sitio3 21
 5:      e2 sitio2 21
 6:      e4 sitio1 39
 7:      e1 sitio1 38
answered by 12.05.2016 в 20:05

You should give more details about the problem (eg include some code).

If you use the dplyr package, keep in mind that you can group by more than one variable. Example:

datos %>% group_by(sitio, especie) %>% summarise(biomasa = sum(biomasa))

I hope it serves you ...

answered by 01.04.2016 в 10:57