Problem to change values in a column in dataFrame in R

1

I will start from the beginning I have a function that performs the following get a string and return a part of the string

x <- "LABEL=UCI-1, CellIndex=50, CGI=368010016000"

and my function:

value <-substr(x,7,stop=gregexpr(',',x)[[1]][1]-1)

returns "UCI-1" , in the same way with a string "LABEL=MARRIBA-1, CellIndex=50, CGI=368010016000" the function returns "MARRIBA-1"

In case what it does is to take what is after the sign "=" and the previous to the first ",". So far so good, but when I have a dataFrame as I put it in case you want to try it

emp.data <- data.frame(
  emp_id = c (1:3), 
  emp_name = c("LABEL=UCI-1, CellIndex=50, CGI=368010016000","LABEL=UCI-3, CellIndex=34, CGI=3680100150014","LABEL=MARRIB2, CellIndex=50, CGI=368010016000"),
  salary = c(623.3,515.2,611.0), 
  stringsAsFactors = FALSE
) 

and I execute the function

emp.data$emp_name<-substr(emp.data$emp_name,7,stop=gregexpr(',',emp.data$emp_name)[[1]][1]-1)

the values that you return to me with the function that you create in the emp_name column are:

UCI-1 ---
UCI-2 ---
MARRI // el cual es incorrecto ya que deberia ser MARRIBA2

I understand that my problem is that the function adopts the indexes of the "," from its first execution and I do not understand why it does so if it runs through each row of dataFrame separately. What I want is help to make the result of my column emp_name in my dataFrame be:

UCI-1 ---
UCI-2 --- 
MARRIBA2 ---

PS: I have many more data do not limit the answers to only these three columns EJ: HLSBO2 or others more

    
asked by Leonar Bode 21.06.2018 в 18:30
source

1 answer

1

The problem you have is this:

> gregexpr(',',emp.data$emp_name)[[1]][1]-1
[1] 11

You are using this to determine how far to "trim" each value of emp.data$emp_name , gregexpr() returns you a list with the positions of each comma, (none, one or more) of each of the values of emp_name , however, you can see that the final return is a single value, the problem is access by indexes that you try, in no way works as you have proposed. I think the simplest way is to do this:

> unlist(lapply(gregexpr(',',emp.data$emp_name), '[[', 1))-1
[1] 11 11 13

Conceptually what we are doing with lapply is the following:

c(gregexpr(',',emp.data$emp_name)[[1]][1] - 1,
  gregexpr(',',emp.data$emp_name)[[2]][1] - 1,
  gregexpr(',',emp.data$emp_name)[[3]][1] - 1)

That is, we get the first element of each vector in the list and with unlist we leave the result more "flat".

Finally, everything would be like this:

> substr(emp.data$emp_name,7,stop=unlist(lapply(gregexpr(',',emp.data$emp_name), '[[', 1))-1)
[1] "UCI-1"   "UCI-3"   "MARRIB2"
    
answered by 21.06.2018 в 19:39