Delete rows that contain values greater than the previous rows

1

I have a massive csv file that responds to the following minimized structure:

Data

Here the data table inserted:

Ronda   Tratamiento
5        A
6        A
7        A
6        A
7        A
3        B
4        B
5        B
6        B
7        B
6        C
7        C
6        C
7        C
5        C
6        C
7        C
6        B
7        B

The round column contains values sorted from an arbitrary value to 7. I would like to eliminate from the round column all the rows that contain a numeric value for round greater than the numeric value of the previous row. Or in other words, keep only the rows containing the first value in which the growing series up to 7 starts.

For example, in the example provided, get something like this:

Ronda   Tratamiento
5        A
6        A
3        B
6        C
6        C
5        C
6        B

I was trying to create an additional column with the idea of later deleting rows. But I am not familiar with these operations in R. Surely there is a much more effective and direct method. Copy, in any case the efforts:

v <- c()
data$Round <- as.numeric(data$Round)
for (i in seq(1:(nrow(t)))) {
  v <-c(v, ifelse(t[i+1,5]> t[i,5], 1,0))
}
t$keep <- v
t<- t[!(t$keep==0),]

Thanks for the help,

    
asked by pyring 22.07.2018 в 17:05
source

1 answer

1

If we think about the problem in another way we could say that what you are looking for is to obtain the rows where each Tratamiento has the minimum value of Ronda for that Tratamiento .

We will start with these data:

df <- data.frame(Ronda=c(5,6,7,5,6,7,3,4,5,6,7,7,7,6,7,6,6),
                 Tratamiento=c('A','A','A','A','A','A','B','B','B','B','B',
                               'C','C','C','C','C','C'))

This, using R base can be solved in the following way:

  • We obtain the minimum values of Ronda for each Tratamiento using aggregate() and the function min() :

    minimos <- aggregate(Ronda ~ Tratamiento, df, min)
    
    minimos
      Tratamiento Ronda
    1           A     5
    2           B     3
    3           C     6
    
  • Having the minimum values of each group we will now filter the rows of data.frame main, there are several ways, the one that occurs to me very simple is to use merge() to do something that in SQL is known as INNER JOIN that is, the intersection of the two data sets:

    merge(df,minimos,by = c("Ronda","Tratamiento"))
      Ronda Tratamiento
    1     3           B
    2     5           A
    3     5           A
    4     6           C
    5     6           C
    6     6           C
    
  • You can, and it is highly recommended, use dplyr where the previous operations are much more readable:

    library(dplyr)
    
    df %>% 
        group_by(Tratamiento) %>%    # Agrupamos por Tratamiento
        filter(Ronda == min(Ronda))  # Solo filas dónde Ronda sea el mínimo del grupo
    

    Edit:

    According to the last thing you mention, you need the first row of the first value of the ascending series, in this case I think that with R base is simpler because it is a bit of arithmetic of vectors:

    df[c(max(df[,1]),df[1:nrow(df)-1,1]) - df[,1] > 0,]
    
       Ronda Tratamiento
    1      5           A
    4      6           A
    6      3           B
    11     6           C
    13     6           C
    15     5           C
    18     6           B
    

    Detail:

    • With c(max(df[,1]),df[1:nrow(df)-1,1]) we generate a vector with the Ronda but one element and with the maximum value as first:

      [1] 7 5 6 7 6 7 3 4 5 6 7 6 7 6 7 5 6 7 6
      
    • If we subtract it from the original vector df[,1] , that is:

      [1] 5 6 7 6 7 3 4 5 6 7 6 7 6 7 5 6 7 6 7
      

      We'll get something like this:

      [1]  2 -1 -1  1 -1  4 -1 -1 -1 -1  1 -1  1 -1  2
      [16] -1 -1  1 -1
      

      If we pay attention, the positive values are just the first of each ascending series, so we will use this to "filter" the final rows converting this vector into a logical one: c(max(df[,1]),df[1:nrow(df)-1,1]) - df[,1] > 0 that we will use to filter the rows that interest us .

    answered by 22.07.2018 / 17:58
    source