If we have a large set of columns / variables and we want to convert some of them into one factor and not others, the simplest way is to assemble a vector of columns to convert. Suppose we have the following data.frame
txt <- "c1,c2,c3,c4,c5,c6
1,0,100,1,alto, 0
1,1,101,0,medio, 0
0,0,200,1,alto, 0
1,1,101,1,bajo, 0
"
df <- as.data.frame(read.table(textConnection(txt), sep = ",", header=TRUE))
Inspecting:
> str(df)
'data.frame': 4 obs. of 6 variables:
$ c1: int 1 1 0 1
$ c2: int 0 1 0 1
$ c3: int 100 101 200 101
$ c4: int 1 0 1 1
$ c5: Factor w/ 3 levels "alto","bajo",..: 1 3 1 2
$ c6: int 0 0 0 0
Not all variables are factor
, by default only those that are chain types. Now, suppose we want to change columns% 1,2,3,4 to factor
and not 6, or 5 which is already factor
. This we can solve this way:
col.to.factor <- c(1,2,3,4)
df[col.to.factor] <- lapply(df[col.to.factor], as.factor)
The result:
> str(df)
'data.frame': 4 obs. of 6 variables:
$ c1: Factor w/ 2 levels "0","1": 2 2 1 2
$ c2: Factor w/ 2 levels "0","1": 1 2 1 2
$ c3: Factor w/ 3 levels "100","101","200": 1 2 3 2
$ c4: Factor w/ 2 levels "0","1": 2 1 2 2
$ c5: Factor w/ 3 levels "alto","bajo",..: 1 3 1 2
$ c6: int 0 0 0 0
Clearly we see that we have left only the column / variable c6
as integer and the rest we have converted them to factor
. Another more interesting way would be, for example: convert all the variables / columns that have only 1 and 0 in factor
automatically and the rest leave them as they are:
First we generate a logical vector apply.factor
that tells us that columns have only 1 and 0:
apply.factor <- sapply(df, function(x) isTRUE(all.equal(levels(as.factor(x)),as.vector(as.factor(c("0", "1"))))))
> apply.factor
c1 c2 c3 c4 c5 c6
TRUE TRUE FALSE TRUE FALSE FALSE
The important thing is: as.vector(as.factor(c("0", "1")))
that arms the sample of values that we want to verify in a column / variable, obviously it can be modified by what we need to make a comparison of each column with this same 'vector.
Then in col.to.factor
we generate the vector with the indexes of columns that we are going to convert (the columns that met our criterion)
col.to.factor <- seq(length(apply.factor))[apply.factor]
> col.to.factor
[1] 1 2 4
And finally we apply the conversion only on the chosen columns
df[col.to.factor] <- lapply(df[col.to.factor], as.factor)
Summing up everything:
> apply.factor <- sapply(df, function(x) isTRUE(all.equal(levels(as.factor(x)),as.vector(as.factor(c("0", "1"))))))
> col.to.factor <- seq(length(apply.factor))[apply.factor]
> df[col.to.factor] <- lapply(df[col.to.factor], as.factor)
> str(df)
'data.frame': 4 obs. of 6 variables:
$ c1: Factor w/ 2 levels "0","1": 2 2 1 2
$ c2: Factor w/ 2 levels "0","1": 1 2 1 2
$ c3: int 100 101 200 101
$ c4: Factor w/ 2 levels "0","1": 2 1 2 2
$ c5: Factor w/ 3 levels "alto","bajo",..: 1 3 1 2
$ c6: int 0 0 0 0
We see then that we have converted the columns we wanted into factor
.
I hope it's useful for you.