To be able to test the solution we create random data with the structure of your data. A data.frame with keys to keep, which we call df1
and a data.frame with those same keys and other different keys, in addition to some random data that simulate the statistics.
set.seed(2018)
df1 <- data.frame(claves = letters[1:3], # a, b y c
stringsAsFactors = FALSE)
df2 <- data.frame(claves = sample(letters, 100, replace = TRUE), #100 letras al azar, algunas son a, b o c
estadisticos = rnorm(100), #Unos datos numéricos al azar que simulan los estadísticos
stringsAsFactors = FALSE)
To filter we are going to ask R to keep only the rows of df2
in which the value of the key column is also present in the keys of df1
. We use the operator %in
(membership). If we used ==
(equality) it would only verify equality with an element, and we would verify equality in several elements.
We use df2[ , ]
to specify that we want a subset of that object. The name claves
is irrelevant, the columns could be called in any way.
df2[df2$claves %in% df1$claves, ]
It's a basic operation and there are many ways to do it. This solution uses simpler operators possible. If your data is very large it could be somewhat slow. In that case, it could be faster to work with a join of dplyr
, which tend to be faster:
library(dplyr)
left_join(df1, df2)
left_join
indicates that only the data.frame values on the left side (in this case df1
) that match those on the right side ( df2
) will be preserved.
In this case it is necessary that the tie column names (key) are exactly the same in both data.frame, otherwise left_join
would not know which column to join. You can specify names with the by=
argument, but it's more complicated.