R: Generate values in new columns in Dataframe from column names

2

Greetings to all,

You see, I have a large dataframe, and the names of the columns have a format type "XXX_YYY_ZZZ", being able to take XXX, YYY or ZZZ different categories / factors. I would like to make a dataframe of only 4 columns, being the columns "XXX", "YYY", "ZZZ" and the values of the columns. previous.

I was wondering if anyone knows any simple way to do all this.

For example, the current data frame looks like this:

  

"CYT_CD40L_exp1" | "MEMB_QVD_exp1" | "ORG_FasL_exp3"

   0              |        2           |       4
   1              |        3           |       5

and I would like to convert the table to:

  

Value | "XXX" | "YYY" | "ZZZ"

     

0 | "CYT" | "CD40L" | "exp1"

     

1 | "CYT" | "CD40L" | "exp1"

     

2 | "MEMB" | "QVD" | "exp1"

     

3 | "MEMB" | "QVD" | "exp1"

     

4 | "ORG" | "FasL" | "exp3"

     

5 | "ORG" | "FasL" | "exp3"

I know I can separate the names of the columns with strsplit(colnames(df), sep"_")[y] , but it is very tedious to sort all the values.

I hope I have explained myself sufficiently ^^ U. Thank you very much.

    
asked by Ouuiea 28.09.2017 в 14:56
source

2 answers

1

Ouuiea, The most compact and simple way that occurs to me is using two library functions tidyverse:: . I work on your minimal example, if it is well done it should scale for your real data. I use the binary operator %>% to link functions, but you could do it differently.

#Cargo la librería
library(tidyverse)
# Creo el df foo con los datos el ejemplo.
tribble(~CYT_CD40L_exp1,~MEMB_QVD_exp1, ~ORG_FasL_exp3, 
                  0,             2,              4,
                  1,             3,              5) -> foo

#En dos pasos cambio la estructura.
gather(foo) %>% 
  separate(key, into=c("XXX", "YYY", "ZZZ")) 

How does it work? gather() passes the data.frame to long format: key pairs and values.

separate() takes a column with strings separated by a separator and creates several columns, one with each separate string. It is necessary to specify the names of the new columns with the argument into= , to which you pass a vector of quoted characters defined with c() . separate() "guess" which is the separator, in this case _ , but if there is ambiguity it is possible to pass it with the argument sep= "separador" .

Greetings!

    
answered by 28.09.2017 в 16:09
0

Using simply the base package you can solve it in the following way:

df <- read.table(text = "CYT_CD40L_exp1 MEMB_QVD_exp1 ORG_FasL_exp3
0 2 4
1 3 5", header = TRUE)

newdf <- data.frame(Valor=numeric(), XXX=character(), YYY=character(), ZZZ=character())
for (col in colnames(df)) {
    newdf <- rbind(newdf, merge(df[, col], t(c(unlist(strsplit(col, "_"))))))
}
newdf

The exit:

  x   V1    V2   V3
1 0  CYT CD40L exp1
2 1  CYT CD40L exp1
3 2 MEMB   QVD exp1
4 3 MEMB   QVD exp1
5 4  ORG  FasL exp3
6 5  ORG  FasL exp3

Basically we create a new data.frame with the expected structure and for each column name ( colnames(df) ) "spliteamos" the values in a list that we transpose so that the three values remain as a row, then we do a merge() with the data of the column itself and the final row we add to the new dataframe with rbind()

    
answered by 28.09.2017 в 20:31