Event vs. DOM Driven Parsing of XML

Published on April 29, 2008

I recently have been playing with parsing GPX files and spitting out the results into a special KML file. I initially wrote a parser using minidom, yet after running this the first time -- and my Core2Duo laptop reaching 100% utilization for 10 seconds -- I realized I needed to re-write it using something else.

I spent a little time reading the different parsers for XML and eventually read more about cElementTree. And it is included with Python2.5, sweet.

I quickly rewrote the code and did some tests. First, the two bits of code for parsing my GPX file:

minidom-speed.py

#!/usr/bin/python

from xml.dom import minidom
from genshi.template import TemplateLoader

def collect_info():
    dom = minidom.parse('airport.gpx')
    for node in dom.getElementsByTagName('trkpt'):
        lat = node.getAttribute('lat')
        lon = node.getAttribute('lon')
        speed = node.getElementsByTagName('speed')[0].firstChild.data
        speed = float(speed) * 10
        coords = '%s,%s' % (lon, lat)
        coords_speed = '%s,%s' % (coords, speed)
        yield {
            'coordinates': coords_speed
            }

loader = TemplateLoader(['.'])
template = loader.load('template-speed.kml')
stream = template.generate(collection=collect_info())

f = open('minidom.kml', 'w')
f.write(stream.render())

cet-speed.py

#!/usr/bin/python

import sys,os
import xml.etree.cElementTree as ET
import string
from genshi.template import TemplateLoader

def collect_info():
    mainNS=string.Template("{http://www.topografix.com/GPX/1/0}$tag")

    wptTag=mainNS.substitute(tag="trkpt")
    nameTag=mainNS.substitute(tag="speed")

    et=ET.parse(open("airport.gpx"))
    for wpt in et.findall("//"+wptTag):
        wptinfo=[]
        wptinfo.append(wpt.get("lon"))
        wptinfo.append(wpt.get("lat"))
        wptinfo.append(str(float(wpt.findtext(nameTag)) * 10))
        coords_speed = ",".join(wptinfo)
        yield {
            'coordinates': coords_speed,
            }
        
loader = TemplateLoader(['.'])
template = loader.load('template-speed.kml')
stream = template.generate(collection=collect_info())

f = open('cet.kml', 'w')
f.write(stream.render())

The speed difference is not just noticeable, but very noticeable.

minidom-speed.py

$ python -m cProfile minidom-speed.py
4405376 function calls (3787047 primitive calls) in 32.142 CPU seconds

cet-speed.py

$ python -m cProfile cet-speed.py
1082061 function calls (904167 primitive calls) in 6.736 CPU seconds

A quarter as many calls and almost 5x faster -- at least that's how I interpret the results. Much better!



Tagged as: gpx | graphing | xml
blog comments powered by Disqus

About this page

This entry is from my tech blog and was written on April 29, 2008. It's been tagged with gpx and graphing and xml.

Via Twitter

Just saw a large white parrot try to steal the clothing from someone's drying rack. (about 2 weeks, 5 days ago)

Currently Reading

Book Book
View Previous Reads