Using DPLYR
From what I understand what you are looking for: Get the first record of each id
you could just use slice()
:
# Agrandamos un poco tu ejemplo con una columna más
data<-data.frame(id=c(1,1,3,4),n=c("x","y","e","w"), o=c(4,3,2,1))
data
id n o
1 1 x 4
2 1 y 3
3 3 e 2
4 4 w 1
data %>%
group_by(id) %>%
slice(1)
# A tibble: 3 x 3
# Groups: id [3]
id n o
<dbl> <fct> <dbl>
1 1. x 4.
2 3. e 2.
3 4. w 1.
The topic is which of the records of each group you want to leave and which ones to ignore, slice()
in a simple and clear alternative, but if you need to control the order by which you are going to stay with the first top_n()
can certainly be much better since you can control by the optional parameter wt
the or the columns by which you will sort the groups.
data %>%
group_by(id) %>%
top_n(n = 1, wt=-o) # Ordenamos por la columna 'o' ascendente
Also, you can check the site in English, this question has many other interesting options to solve this issue.
Using R base
I can not help but see how to do it in R base. Nor is it too complicated:
aggregate(. ~id, data, function(x){head(x, 1)})
id n o
1 1 3 4
2 3 1 2
3 4 2 1
Obviously, in this example, one of the columns is a factor, so we must convert it before:
aggregate(.~id, data.frame(lapply(data, as.character), stringsAsFactors=FALSE), function(x){head(x, 1)})
id n o
1 1 x 4
2 3 e 2
3 4 w 1
This would work as the slice()
, the head(x, 1)
simply keeps the first record according to the natural order of the object, if we need to establish another order to determine which record to stay, we must order before the data.frame
of agreement to that criterion.
And finally I was forgetting duplicated()
, even simpler:
data[!duplicated(data$id),]
And I still forgot about unique()
, thanks @mpaladino
data[unique(data$id),]