Extract a text string from an HTML in R

2

I am working in R Studio and I have an html, I have managed to create a table with the content I need, but it turns out that the program that creates the .htm file is Metatrader 4 and the variables are added as the title of the row as you can see below. So I'm trying to extract the text and then convert it into fields within the column because they are variables.

the text is:

<td title="SL=1900; DistMaxima=700; Horas=4; 
Margen=180;Operaciones=1;
Volumen=0.01; ">21</td>

the "21" is the reference field and yes, it is adding it correctly since it is out of

<td> </td>

and I would like to convert it into columns

SL | DistMaxima | Hours | Margin | Operations | Volume |

This same string in the file is repeated more than 1000 times so I think it should be through a function, but I can not find the library or the correct form.

One way to find it I think it would be that through some function that I would look for

<td title=" ****************** "> 

since it always fulfills this norm and it was adding it to a data frame to work it later, but I do not know how to do it either. Thank you very much in advance.

the code that I used is:

#Cargamos las librerias necesarias 
library(XML)

#Añadimos la url de la Optimización
Optimizador_url <- "../Data/AUDJPY-PV-104-1H.htm" 

#Guardamos las tablas en data.frame
Tablas <- readHTMLTable(Optimizador_url)

#seleccionamos y guardamos la tabla que nos interesa
operaciones <- Tablas[[2]]

str(operaciones) 
head(operaciones)
    
asked by DaVid C 08.04.2018 в 17:46
source

1 answer

0

This could work.

web<-'<td title="SL=1900; DistMaxima=700; Horas=4; 
Margen=180;Operaciones=1;
Volumen=0.01; ">21</td>
<td title="SL=1900; DistMaxima=800; Horas=4; 
Margen=180;Operaciones=1;
Volumen=0.01; ">23</td>'

read the data and load libraries

web<-read_html(web)    
library(stringr)
library(rvest)

you get a vector of all the attributes title of the tags td

data<-web%>%html_nodes("td")%>%html_attrs()
data<-unlist(data)
data<-data[names(data)=="title"]

the numerical data is extracted

data<-data%>%str_extract_all("[0-9]+\.*\d*")

are saved in a data.frame

data<-do.call(rbind.data.frame, data)
    
answered by 10.04.2018 / 00:01
source