ifelse with two conditions

3

I want to create a new variable from two dataframes, as long as two conditions are met. The df 1 and the 2 have two common variables, but only the df 1 contains a variable that I would like to join the df 2. I tried to use the vectorized function ifelse, but I do not know what is syntax to write two simultaneous conditions.

I tried unsuccessfully:

df2$Nuevavariable <- ifelse(df1$ID == df2$ID && 
                            df1$edad == df2$edad ,df1$sexo ,"NO") 

I have also reviewed the post that appears in Nested ifelse statement and I have followed the examples that it puts nested ifelse but R tells me that I have problems with commas.

I know that another option that I would have would be to make a merge, but I have such a number of variables that it is not worth generating a new file with variables that do not interest me and then removing them.

    
asked by Caro 30.07.2018 в 13:00
source

1 answer

1

The ifelse() could only serve you, if both data.frame have the same number of records, the same ID and the same order:

df1 <- data.frame(ID=c(1,2,3), edad=c(20, 30, 45))
df2 <- cbind(df1, sexo = c("F", "M", "F"))
df1$NuevaVariable <- ifelse(df1$ID == df2$ID & df1$edad == df2$edad, as.character(df2$sexo) ,"NO") 
df1

  ID edad NuevaVariable
1  1   20             F
2  2   30             M
3  3   45             F

As you can see df2 is exactly equal to df1 with only one additional column, and very important, we are using the and logical vectorized and simple, that is & , do not use in these cases the && since this one in particular, only checks the first element of any vector. (more info ). Now, if you only had different amounts of elements, or different ID or different order, this sentence either would not work or would do it in an inappropriate way, for example, simply by modifying the order in the example:

df2 <- df2[c(2,3,1), ]

df1$NuevaVariable <- ifelse(df1$ID == df2$ID & df1$edad == df2$edad, as.character(df2$sexo) ,"NO") 
df1

  ID edad NuevaVariable
1  1   20            NO
2  2   30            NO
3  3   45            NO

We can verify that it no longer works as expected. For what you are looking for, without a doubt the merge() , is the appropriate way to solve it:

df1 <- data.frame(ID=c(1,2,3), edad=c(20, 30, 45))
df2 <- cbind(df1, sexo = c("F", "M", "F"))
df1 <- merge(df1, df2, by = c("ID","edad"), all.x=TRUE)
df1
  ID edad sexo
1  1   20    F
2  2   30    M
3  3   45    F

The merge makes a "matching" of the columns indicated by by , in this case as the names of the columns are the same, the function already understands this, otherwise the columns of the two tables would have to be defined separately with by.x and by.y the other important parameter is all.x=TRUE with which we indicate that we want all the rows of df1 that coincide or not with df2 .

If you only want the new column, you could do:

df1$NuevaVariable <- merge(df1, df2, by = c("ID","edad"), all.x=TRUE)[, "sexo"]
    
answered by 30.07.2018 в 17:34