Python XML: Expat, StartElementHandler

Investigate ways to parse XML. Use Expat, in xml.parsers.expat.

XML. Many programs require XML support. XML is a markup language that is commonly used for configuration and data files. In Python, several modules offer XML support. They have special features and advantages. We test Expat, one XML library.

To begin, we add the xml.parsers.expat module with an import statement. In this example, we add start tags, and element data, to a list. We assign the StartElementHandler and CharacterDataHandler to lambda expressions.

Tip: A def-method name could instead be used. Lambda expressions may be sufficient if you need just one statement.


Info: For StartElementHandler, we append the tag name to the list. For CharacterDataHandler, we append the data.

And: This yields a list containing start element names, and the contents of those elements.

Note: Many other handlers, including EndElementHandler and CommentHandler are available. Please see the Python documentation.

Python program that uses xml.parsers.expat import xml.parsers.expat # Will store tag names and char data. list = [] # Create the parser. parser = xml.parsers.expat.ParserCreate() # Specify handlers. parser.StartElementHandler = lambda name, attrs: list.append(name) parser.CharacterDataHandler = lambda data: list.append(data) # Parse a string. parser.Parse("""<?xml version="1.0"?> <item><name>Sam</name> <name>Mark</name> </item>""", True) # Print the items in our list. print(list) Output ['item', 'name', 'Sam', '\n', 'name', 'Mark', '\n']

Newlines. Please notice how newlines are treated as character data. This is not the ideal effect for most programs. Newlines could be ignored, or filtered out of the list with helper methods. This would yield a better data model.

Discussion. Expat is not a Python technology. It is an older XML library created by James Clark in 1998. Written in C, it has excellent performance: it is noted as a "fast" parser. It does no validation. Generally, raw C code outperforms Python code.

So: For performance, Expat is a good choice. It may be harder to use than other solutions. This is a tradeoff you must evaluate.

The Expat XML Parser:

Summary. Nearly every developer will encounter XML files and need to parse them. No one way is ideal. A custom string-based parser, written in Python, is sometimes a good choice. A regular expression, with re, may be, search

But: A C-based, optimized XML parser like Expat is likely one of the fastest options. It requires less testing: it is already developed.

Dot Net Perls
© 2007-2020 Sam Allen. Every person is special and unique. Send bug reports to