Chapter 1
Python and XML
Python and XML
Python
and XML are two very different animals, each with a rich history. Python is a
full-scale programming language that has grown from scripting world roots in a
very organic way, through the vision and guidance of Python's inventor, Guido
van Rossum. Guido continues to take into account the needs of Python developers
as Python matures. XML, on the other hand, though strongly impacted by the
ideas of a small cadre of visionaries, has grown from standards-committee
roots. It has seen both quiet adoption and wrenching battles over its future.
Why bother putting the two technologies together?
Before
the Python/XML combination, there seemed no easy or effective way to work with
XML in a distributed environment. Developers were forced to rely on a variety
of tools used in awkward combination with one other. We used shell scripting
and Perl to process text and interact with the operating system, and then used
Java XML API's for processing XML and network programming. The shell provided
an excellent means of file manipulation and interaction with the Unix system,
and Perl was a good choice for simple text manipulation, providing access to
the Unix APIs. Unfortunately, neither sported a sophisticated object model.
Java, on the other hand, featured an object-oriented environment, a robust
platform API for network programming, threads, and graphical user interface
(GUI) application development. But with Java, we found an immediate lack of
text manipulation power; scripting languages typically provided strong text processing.
Python presented a perfect solution, as it combines the strengths of all of
these various options.
The Power of Python and
XML
Now that
we've introduced you to the world of XML, we'll look at what Python brings to
the table. We'll review the Python features that apply to XML, and then we'll
give some specific examples of Python with XML. As a very high-level language,
Python includes many powerful data structures as part of the core language and
libraries. The more recent versions of Python, from 2.0 onward, include
excellent support for Unicode and an impressive range of encodings, as well as
an excellent (and fast!) XML parser that provides character data from XML as
Unicode strings. Python's standard library also contains implementations of the
industry-standard DOM and SAX interfaces for working with XML data, and
additional support for alternate parsers and interfaces is available.
Of
course, this much could be said of other modern high-level languages as well.
Java certainly includes an impressive library of highly usable data structures,
and Perl offers equivalent data structures also. What makes Python preferable
to those languages and their libraries? There are several features, of which we
briefly discuss the most important:
- Python source code is
easy to read and maintain.
- The interactive
interpreter makes it simple to try out code fragments.
- Python is incredibly
portable, but does not restrict access to platform-specific capabilities.
- The object-oriented
features are powerful without being obscure.
The Power of Python and XML
Now
that we've introduced you to the world of XML, we'll look at what Python brings
to the table. We'll review the Python features that apply to XML, and then
we'll give some specific examples of Python with XML. As a very high-level
language, Python includes many powerful data structures as part of the core
language and libraries. The more recent versions of Python, from 2.0 onward,
include excellent support for Unicode and an impressive range of encodings, as
well as an excellent (and fast!) XML parser that provides character data from
XML as Unicode strings. Python's standard library also contains implementations
of the industry-standard DOM and SAX interfaces for working with XML data, and
additional support for alternate parsers and interfaces is available.
Of
course, this much could be said of other modern high-level languages as well.
Java certainly includes an impressive library of highly usable data structures,
and Perl offers equivalent data structures also. What makes Python preferable
to those languages and their libraries? There are several features, of which we
briefly discuss the most important:
- Python source code is easy to
read and maintain.
- The interactive interpreter
makes it simple to try out code fragments.
- Python is incredibly portable,
but does not restrict access to platform-specific capabilities.
- The object-oriented features
are powerful without being obscure.
There
are many languages capable of doing what can be done with Python, but it is rare
to find all of the "peripheral" qualities of Python in any single
language. These qualities do not so much make Python more capable, but they
make it much easier to apply, reducing programming hours. This allows more time
to be spent finding better ways to solve real problems or just allows the
programmer to move on to the next problem. Here we discuss these features in
more detail.
Easy to read and
maintain
As a programming language, Python exhibits a
remarkable clarity of expression. Though some programmers accustomed to other
languages view Python's use of significant whitespace with surprise, everyone
seems to think it makes Python source code significantly more readable than
languages that require more special characters to be introduced to mark structure
in the source. Python's structures are not simpler than those of other
languages, but the different syntax makes source code "feel" much
cleaner in Python.
The use of whitespace also helps avoid having
minor stylistic differences, such as the placement of structural braces, so
there's a greater degree of visual consistency across code by different
programmers. While this may seem like a minor thing to many programmers, the
effect is that maintaining code written by another programmer becomes much easier
simply because its easier to concentrate on the actual structure and algorithms
of the code. For the individual programmer, this is a nice side benefit, but
for a business, this results in lower expenses for code maintenance.
Exploratory programming
in an interactive interpreter
Many modern high-level programming languages
offer interpreters, but few have proved as successful at doing so as Python.
Others, such as Java, do not generally offer interpreters at all. If we
consider Perl, a language that is arguably very capable when used from a
command line, we see that it is not equipped with a rich interpreter. If we
start the Perl interpreter without naming a script, it simply waits for us to
type a complete script at the console, and then interprets the script when
we're done. It does allow us to enter a few commands on the command line
directly, but there's no ability to run one statement at a time and inspect the
results as we go in order to determine if each bit of code is doing exactly
what we expect. With Python, the interactive interpreter provides a rich
environment for executing individual statements and testing the results.
Portability without
restrictions
The Python interpreter is one of the most
portable language interpreters available. It is known to run on platforms
ranging from PDAs and other embedded systems to some of the most powerful
multiprocessor platforms ever built. It can run on more operating systems than
perhaps any other interpreter. Moreover, carefully written application code can
share much of this portability. Python provides a great array of abstractions
that do just enough to hide platform differences while allowing the programmer
to use the services of specific platforms when necessary.
When an application requires access to facilities
or libraries that Python does not provide, Python also makes it easy to add
extensions that take advantage of these additional facilities. Additional
modules can be created (usually in C or C++, but other languages can be used as
well) that allow Python code to call on external facilities efficiently.
Powerful but accessible
object-orientation
At one time, it was common to hear about how
object-oriented programming (OOP) would solve most of the technical problems
programmers had to deal with in their code. Of course, programmers knew better,
pushed back, and turned the concepts into useful tools that could be applied
when appropriate (though how and when it
should be applied may always be the subject of debate). Unfortunately, many
languages that have strong support for OOP are either very tedious to work with
(such as C++ or, to a lesser extent, Java), or they have not been as widely
accepted for general use (such as Eiffel).
Python is different. The language supports
object orientation without much of the syntactic overhead found in many widely
used object-oriented languages, making it very easy to define new object types.
Unlike many other languages, Python is highly polymorphic; interfaces are
defined in much less stringent ways than in languages such as C++ and Java.
This makes it easy to create useful objects without having to write code that
exists only to conform to an interface, but that will not actually be used in a
particular application. When combined with the excellent advantage taken by Python's
standard library of a variety of common interfaces, the value of creating
reusable objects is easily recognized, all while the ease of implementing
useful interfaces is maintained.
Python Tools for XML
Three
major packages provide Python tools for working with XML. These are, from the
most commonly used to the largest:
1.
The
Python standard library
2.
PyXML,
produced by the Python XML Special Interest Group
3.
4Suite,
provided by Fourthought, Inc.
The
Python standard library provides a minimal but useful set of interfaces to work
with XML, including an interface to the popular Expat XML parser, an
implementation of the lightweight Simple API for XML (SAX), and a basic
implementation of the core Document Object Model (DOM). The DOM implementation
supports Level 1 and much of Level 2 of the DOM specification from the W3C, but
does not implement most of the optional features. The material in the standard
library was drawn from material originally in the PyXML package, and additional
material was contributed by leading Python XML developers.
PyXML is
a more feature-laden package; it extends the standard library with additional
XML parsers, has a much more substantial DOM implementation (including more
optional features), has adapters to allow more parsers to support the SAX
interface, XPath expression parsing and evaluation, XSLT transformations, and a
variety of other helper modules. The package is maintained as a community
effort by many of the most active Python/XML programmers.
4Suite
is not a superset of the other packages, but is intended to be used in addition
to PyXML. It offers additional DOM implementations tailored for different
applications, support for the XLink and XPointer specifications, and tools for
working with Resource Description Framework (RDF) data.
What Can We Do with It?
Now that
we've looked at how we can use XML with Python, we need to look at how we can
apply our knowledge of XML and Python to real applications. In the Internet
age, this means widely distributed systems operating across the Internet.
There's
a lot to working with the Internet beyond XML and the CGI programming done in
many of the examples in the book. In case you're not already familiar with this
topic, we include an introduction to the facilities in the Python standard library
that help create clients and servers for the Internet in Chapter 8. We review
how to retrieve data from remote servers, and how to submit form-based requests
programmatically and read the result. We then learn to build custom web servers
that respond to HTTP requests, allowing us to build servers that do exactly
what we need them to.
With
these skills under our hat, we proceed to look at the emerging world of
"web services." Chapter 9 describes what we mean by web services and
introduces the specifications coming out in that area. We look at two packages
that allow us to use SOAP to call on web services and demonstrate how to create
one in Python.
In
Chapter 10, we pull together much of what we've learned with an extended
example that demonstrates how it all works together. Using XML as a
communications medium, we are able to build an application that uses a variety
of technologies and operates in diverse environments.
0 comments:
Post a Comment