Extract text from a string in R

1

I have a base from which I must extract part of the text, the name of the company.

Here's an example:

995 945 Disprofa 741 593 63 56 302 292 351 Servicios  
996 0 Terminal Puerto garin 740 Terminal  
997 783 Las Las 740 871 -73 -51 142 203 885 969 450 Activ. de edición

I was testing with regular expressions in R:

str2 <- 995 945*Disprofa 741 593 63 56 302 292 351 Servicios
str3 <- strsplit(str2, "[0-9][\s][0-9][\s]", perl = TRUE) # Divide cadenas   
str3 <- as.character(str3) 
str4 <- strsplit(str3, "[\s][0-9]", perl = TRUE)

With this I manage to cut it, but I need to specifically extract the text and apply it to a whole data frame.

Can someone come up with a solution?

    
asked by Juan Teje 06.09.2018 в 16:13
source

1 answer

0

Try the following to keep the first word or words between numbers:

texto <- '995 945 Paraguas David e hijos 741 593 63 56 302 292 351 Servicios 123 123'

texto <- sub("^[\d\s]+(\w+(?:\s+(?!\d+\s)\w+)+)[\s\S]*", "\1", texto, perl = TRUE)

print(texto)

Result:

"David and sons umbrella"

You have a demo here

    
answered by 29.12.2018 в 12:53