xml to R database, different levels

1

I have this xml code:

<?xml version="1.0"?>
-<sct type="detail" version="0.1">
-<serviceLevelSegments timestamp="2017-10-25T11:45:44Z">
<Segment travelTimeMinutes="0.387" c-value="100" score="30" reference="85" 
average="77" speed="95" code="447332188"/>
-<Segment travelTimeMinutes="0.597" c-value="100" score="30" reference="85" 
average="77" speed="85" code="447332203">
<SubSegment speed="95" PkEnd="782.112" PkIni="781.9" offset="0,212"/>
<SubSegment speed="77" PkEnd="782.746" offset="635,846" pkIni="782.535"/>
</Segment>
</serviceLevelSegments>
</sct>

And I want to get a table with all the attributes (of all the levels). It would be a table like this:

                          Value1      Value2     Value3     ...
timestamp
code
travelTimeMinutes
c-value
score
reference
average
speed (from Segment)
speed (from SubSegment)
PkEnd
PkIni
offset

In the empty cells I would like you to put: "NA".

I have only managed to obtain a separate table for each level (serviceLevelSegments, Segment or SubSegment separately). For example, to get the table for the SubSegment level, I found this code:

library(XML)
arch = "file.xml" #code in xml
input <- xmlParse(arch)
nodes <- getNodeSet(input,"//SubSegment")
all_parameters <- sapply(nodes, xmlAttrs)
all_parameters

But I only get a table with these attributes (in rows field): speed, offset, PkIni, PkEnd. And I need all the attributes that come out in the code.

Thank you very much in advance.

    
asked by Ana 03.11.2017 в 08:41
source

1 answer

0

Ana, I'm more familiar with xml2 , so I'll propose a solution using this package. To install it: install.packages("xml2") in the console.

library(xml2)

xmlText <- '<?xml version="1.0"?>
    <sct type="detail" version="0.1">
        <serviceLevelSegments timestamp="2017-10-25T11:45:44Z">
            <Segment travelTimeMinutes="0.387" c-value="100" score="30" reference="85" 
            average="77" speed="95" code="447332188"/>
            <Segment travelTimeMinutes="0.597" c-value="100" score="30" reference="85" 
            average="77" speed="85" code="447332203">
                <SubSegment speed="95" PkEnd="782.112" PkIni="781.9" offset="0,212"/>
                <SubSegment speed="77" PkEnd="782.746" offset="635,846" pkIni="782.535"/>
            </Segment>
        </serviceLevelSegments>
    </sct>'

xml <- read_xml(xmlText)

niveles <- data.frame(id=c(1,2,3), 
                      node=c('serviceLevelSegments', 'Segment', 'SubSegment'))

df <- data.frame()
colvar <- rep(NA,nrow(niveles))
for (n in xml_find_all(xml, ".//*")) {
    for (a in names(xml_attrs(n))) {
        df <- rbind(df, c(cbind(xml_name(n), a), colvar), stringsAsFactors=FALSE)
        df[nrow(df),!is.na(c(NA,NA,niveles[match(niveles$node,xml_name(n)),1]))] <- xml_attr(n,a)
    }
}
colnames(df) <- c("nodo", "attr",paste0("Valor",1:nrow(niveles)))
df

The Exit:

                   nodo              attr               Valor1    Valor2  Valor3
1  serviceLevelSegments         timestamp 2017-10-25T11:45:44Z      <NA>    <NA>
2               Segment travelTimeMinutes                 <NA>     0.387    <NA>
3               Segment           c-value                 <NA>       100    <NA>
4               Segment             score                 <NA>        30    <NA>
5               Segment         reference                 <NA>        85    <NA>
6               Segment           average                 <NA>        77    <NA>
7               Segment             speed                 <NA>        95    <NA>
8               Segment              code                 <NA> 447332188    <NA>
9               Segment travelTimeMinutes                 <NA>     0.597    <NA>
10              Segment           c-value                 <NA>       100    <NA>
11              Segment             score                 <NA>        30    <NA>
12              Segment         reference                 <NA>        85    <NA>
13              Segment           average                 <NA>        77    <NA>
14              Segment             speed                 <NA>        85    <NA>
15              Segment              code                 <NA> 447332203    <NA>
16           SubSegment             speed                 <NA>      <NA>      95
17           SubSegment             PkEnd                 <NA>      <NA> 782.112
18           SubSegment             PkIni                 <NA>      <NA>   781.9
19           SubSegment            offset                 <NA>      <NA>   0,212
20           SubSegment             speed                 <NA>      <NA>      77
21           SubSegment             PkEnd                 <NA>      <NA> 782.746
22           SubSegment            offset                 <NA>      <NA> 635,846
23           SubSegment             pkIni                 <NA>      <NA> 782.535

I added the node to make it clearer to understand, but if you want to stay with the attributes only, you simply filter the column.

A little explanation:

  • First, we define which are the nodes that determine each level, in the order they correspond. Eventually you could see to determine the "levels" dynamically, but not to complicate the example because in your case the nodes are known in advance we do:

    niveles <- data.frame(id=c(1,2,3), 
                      node=c('serviceLevelSegments', 'Segment', 'SubSegment'))
    
  • Then iterate through all the nodes xml_find_all(xml, ".//*") and all the attributes of each of these: names(xml_attrs(n))

  • We add a new row to data.frame with the columns of nodo , atributo and one additional column for the value for each level: df <- rbind(df, c(cbind(xml_name(n), a), colvar), stringsAsFactors=FALSE)
  • The value of the attribute is entered in the corresponding column according to the level at which we are: df[nrow(df),!is.na(c(NA,NA,niveles[match(niveles$node,xml_name(n)),1]))] <- xml_attr(n,a)
  • And finally, as an aesthetic detail, we rename the columns: colnames(df) <- c("nodo", "attr",paste0("Valor",1:nrow(niveles)))
answered by 05.11.2017 в 15:33