Extract XML data using Python

4

I have this code and I need to extract the comment counts from the XML data, calculate ( link ) the sum of the numbers in the file and write the sum.

I would like to know which part of the code I need to enter because I had already used different codes to extract the sum of the numbers

import urllib
import xml.etree.ElementTree as ET

serviceurl = 'http://maps.googleapis.com/maps/api/geocode/xml?'

while True:
    address = raw_input('Enter location: ')
    if len(address) < 1 : break

    url = serviceurl + urllib.urlencode({'sensor':'false', 'address': address})
    print 'Retrieving', url
    uh = urllib.urlopen(url)
    data = uh.read()
    print 'Retrieved',len(data),'characters'
    print data
    tree = ET.fromstring(data)

    results = tree.findall('result')
    lat = results[0].find('geometry').find('location').find('lat').text
    lng = results[0].find('geometry').find('location').find('lng').text
    location = results[0].find('formatted_address').text

    print 'lat',lat,'lng',lng
    print location
    
asked by JchG 25.01.2016 в 22:37
source

2 answers

2

With this code you can extract the numbers within the count tag, which I assume are the number of comments per user, and then add them:

import urllib
import xml.etree.ElementTree as ET

url = 'http://python-data.dr-chuck.net/comments_228073.xml'
uh = urllib.urlopen(url)
data = uh.read()
commentinfo = ET.fromstring(data)
count_sum = sum([int(comment.find('count').text) for comment in commentinfo[1]])
print(count_sum)
    
answered by 26.01.2016 в 03:58
1

You already have Javier's solution using ElementTree , that module may seem a little complicated to use at first. There are other simpler alternatives that you could try.

untangle

The module untangle converts XML to Python objects, the advantage is that you can directly pass the URL to it. To install:

$ pip install untangle

Solution:

import untangle

url = 'http://python-data.dr-chuck.net/comments_228073.xml'
parsed_data = untangle.parse(url)
comments = parsed_data.commentinfo.comments.comment
total = sum([int(comment.count.cdata) for comment in comments])

xmltodict

The xmltodict module allows you to work with XML as if you were working with JSON. To install:

$ pip install xmltodict

Solution:

import urllib2
import xmltodict

url = 'http://python-data.dr-chuck.net/comments_228073.xml'
data = urllib2.urlopen(url)
parsed_data = xmltodict.parse(data.read())
comments = parsed_data['commentinfo']['comments']['comment']
total = sum([int(comment['count']) for comment in comments])
    
answered by 26.01.2016 в 14:09