Delete rows with duplicate id in R

1

I have data like the following:

data<-data.frame(id=c(1,1,3,4),n=c("x","y","e","w"))
data
  id n
1  1 x
2  1 y
3  3 e
4  4 w

I want to get an output like the following:

data
  id n
1  1 x
3  3 e
4  4 w

that is to say to have unique ids discriminated the contenndió of the other columns. So far I have achieved it with:

library(dplyr)
data<-group_by(data,id)%>%summarise(n=n[1])
data
# A tibble: 3 x 2
     id      n
  <dbl> <fctr>
1     1      x
2     3      e
3     4      w

However when there are more columns like "n" this is impractical. My question is if there is any way to do this depending only on the "id" column.

    
asked by Rolando Tamayo 09.05.2018 в 18:48
source

1 answer

1

Using DPLYR

From what I understand what you are looking for: Get the first record of each id you could just use slice() :

# Agrandamos un poco tu ejemplo con una columna más
data<-data.frame(id=c(1,1,3,4),n=c("x","y","e","w"), o=c(4,3,2,1))
data

  id n o
1  1 x 4
2  1 y 3
3  3 e 2
4  4 w 1

data %>%
    group_by(id) %>%
    slice(1)

# A tibble: 3 x 3
# Groups:   id [3]
     id n         o
  <dbl> <fct> <dbl>
1    1. x        4.
2    3. e        2.
3    4. w        1.

The topic is which of the records of each group you want to leave and which ones to ignore, slice() in a simple and clear alternative, but if you need to control the order by which you are going to stay with the first top_n() can certainly be much better since you can control by the optional parameter wt the or the columns by which you will sort the groups.

data %>%
    group_by(id) %>%
    top_n(n = 1, wt=-o) # Ordenamos por la columna 'o' ascendente

Also, you can check the site in English, this question has many other interesting options to solve this issue.

Using R base

I can not help but see how to do it in R base. Nor is it too complicated:

aggregate(. ~id, data, function(x){head(x, 1)})

  id n o
1  1 3 4
2  3 1 2
3  4 2 1

Obviously, in this example, one of the columns is a factor, so we must convert it before:

aggregate(.~id, data.frame(lapply(data, as.character), stringsAsFactors=FALSE), function(x){head(x, 1)})

  id n o
1  1 x 4
2  3 e 2
3  4 w 1

This would work as the slice() , the head(x, 1) simply keeps the first record according to the natural order of the object, if we need to establish another order to determine which record to stay, we must order before the data.frame of agreement to that criterion.

And finally I was forgetting duplicated() , even simpler:

data[!duplicated(data$id),]

And I still forgot about unique() , thanks @mpaladino

data[unique(data$id),]
    
answered by 09.05.2018 / 19:36
source