Ana, I'm more familiar with xml2
, so I'll propose a solution using this package. To install it: install.packages("xml2")
in the console.
library(xml2)
xmlText <- '<?xml version="1.0"?>
<sct type="detail" version="0.1">
<serviceLevelSegments timestamp="2017-10-25T11:45:44Z">
<Segment travelTimeMinutes="0.387" c-value="100" score="30" reference="85"
average="77" speed="95" code="447332188"/>
<Segment travelTimeMinutes="0.597" c-value="100" score="30" reference="85"
average="77" speed="85" code="447332203">
<SubSegment speed="95" PkEnd="782.112" PkIni="781.9" offset="0,212"/>
<SubSegment speed="77" PkEnd="782.746" offset="635,846" pkIni="782.535"/>
</Segment>
</serviceLevelSegments>
</sct>'
xml <- read_xml(xmlText)
niveles <- data.frame(id=c(1,2,3),
node=c('serviceLevelSegments', 'Segment', 'SubSegment'))
df <- data.frame()
colvar <- rep(NA,nrow(niveles))
for (n in xml_find_all(xml, ".//*")) {
for (a in names(xml_attrs(n))) {
df <- rbind(df, c(cbind(xml_name(n), a), colvar), stringsAsFactors=FALSE)
df[nrow(df),!is.na(c(NA,NA,niveles[match(niveles$node,xml_name(n)),1]))] <- xml_attr(n,a)
}
}
colnames(df) <- c("nodo", "attr",paste0("Valor",1:nrow(niveles)))
df
The Exit:
nodo attr Valor1 Valor2 Valor3
1 serviceLevelSegments timestamp 2017-10-25T11:45:44Z <NA> <NA>
2 Segment travelTimeMinutes <NA> 0.387 <NA>
3 Segment c-value <NA> 100 <NA>
4 Segment score <NA> 30 <NA>
5 Segment reference <NA> 85 <NA>
6 Segment average <NA> 77 <NA>
7 Segment speed <NA> 95 <NA>
8 Segment code <NA> 447332188 <NA>
9 Segment travelTimeMinutes <NA> 0.597 <NA>
10 Segment c-value <NA> 100 <NA>
11 Segment score <NA> 30 <NA>
12 Segment reference <NA> 85 <NA>
13 Segment average <NA> 77 <NA>
14 Segment speed <NA> 85 <NA>
15 Segment code <NA> 447332203 <NA>
16 SubSegment speed <NA> <NA> 95
17 SubSegment PkEnd <NA> <NA> 782.112
18 SubSegment PkIni <NA> <NA> 781.9
19 SubSegment offset <NA> <NA> 0,212
20 SubSegment speed <NA> <NA> 77
21 SubSegment PkEnd <NA> <NA> 782.746
22 SubSegment offset <NA> <NA> 635,846
23 SubSegment pkIni <NA> <NA> 782.535
I added the node to make it clearer to understand, but if you want to stay with the attributes only, you simply filter the column.
A little explanation:
-
First, we define which are the nodes that determine each level, in the order they correspond. Eventually you could see to determine the "levels" dynamically, but not to complicate the example because in your case the nodes are known in advance we do:
niveles <- data.frame(id=c(1,2,3),
node=c('serviceLevelSegments', 'Segment', 'SubSegment'))
-
Then iterate through all the nodes xml_find_all(xml, ".//*")
and all the attributes of each of these: names(xml_attrs(n))
- We add a new row to
data.frame
with the columns of nodo
, atributo
and one additional column for the value for each level: df <- rbind(df, c(cbind(xml_name(n), a), colvar), stringsAsFactors=FALSE)
- The value of the attribute is entered in the corresponding column according to the level at which we are:
df[nrow(df),!is.na(c(NA,NA,niveles[match(niveles$node,xml_name(n)),1]))] <- xml_attr(n,a)
- And finally, as an aesthetic detail, we rename the columns:
colnames(df) <- c("nodo", "attr",paste0("Valor",1:nrow(niveles)))