First let's prepare your data in a reproducible example:
dat <- read.table(text='N1, N2
"jara", "moreno"
"moreno", "lopez"
"diaz", "Swanson"
"powell", "jara"
"Mckinze", "jenner"
"jenner", "londra"
"londra", "kennedy"',
header=T, sep=',', stringsAsFactors = F, quote = '"', strip.white = T)
This leaves us a data.frame
but to be fairer, your data seems to be an unnamed matrix of columns, so we'll do this:
dat <- as.matrix(dat)
colnames(dat) <- NULL
dat
[,1] [,2]
[1,] "jara" "moreno"
[2,] "moreno" "lopez"
[3,] "diaz" "Swanson"
[4,] "powell" "jara"
[5,] "Mckinze" "jenner"
[6,] "jenner" "londra"
[7,] "londra" "kennedy"
Now, yes, we have the data as you have stated, let's go to the solution. One way to get the repeated values from one column in another, could be: dat[dat[,1] %in% dat[,2], 1]
, that gives us the values of column 1 that are identical to those in column 2. However it is complicated to do so, because you should also check backwards also, those of column 2 that are equal to those of 1. And so with the 10 variables / columns that you mention.
But luckily we have a very useful feature to count frequencies that is table()
, so we could do this:
tbl <- table(dat)
names(tbl[tbl > 1])
[1] "jara" "jenner" "londra" "moreno"
With table(dat)
we obtain a frequency table of all the variables and observations of your matrix, you should eventually "cut" it to those columns that interest you. The result is something like this:
diaz jara jenner kennedy londra lopez Mckinze moreno powell Swanson
1 2 2 1 2 1 1 2 1 1
Quite clear, now, it would only be necessary to obtain the names that have more than one occurrence and that we do with names(tbl[tbl > 1])
.
Important Clarification : this solution will count as repeated within the same column as well. If you do not want to get a name that has only been repeated in a single column, there is a little trick to make this solution:
tbl <- table(apply(dat, 2, function(x) {ifelse(duplicated(x), NA, x)}))
names(tbl[tbl > 1])
Basically what we are doing with apply(dat, 2, function(x) {ifelse(duplicated(x), NA, x)})
is to remove within each column, the values that are repeated replacing them by NA
and then count the occurrences effectively.