The Document Object Model (DOM)
The Document Object Model, unlike SAX, has its origins in the World Wide Web Consortium (W3C). Whereas SAX is public-domain software, developed through long discussions on the XMLdev mailing list, DOM is a standard just as the actual XML specification itself is. The DOM is also not designed specifically for Java, but to represent the content and model of documents across all programming languages and tools. Bindings exist for JavaScript, Java, CORBA, and other languages, allowing the DOM to be a cross-platform and cross-language specification. In addition to being different from SAX in regard to standardization and language bindings.
DOM is organized into " levels" instead of versions. DOM Level One is an accepted Recommendation, a Level One details the functionality and navigation of content within a document. A document in the DOM is not just limited to XML, but can be HTML or other content models as well! Level Two, which should finalize in mid-2000, adds upon Level One by supplying modules and options aimed at specific content models, such as XML, HTML, and Cascading Style Sheets (CSS). These less-generic modules begin to "fill in the blanks" left by the more general tools provided in DOM Level One.
The DOM and Java
- Using the DOM for a specific programming language requires a set of interfaces and classes that define and implement the DOM itself. Because the methods involved are not outlined specifically in the DOM specification, and instead the model of a document is focused upon, language bindings must be developed to represent the conceptual structure of the DOM for its use in Java or any other language. These language bindings then serve as APIs for us to manipulate documents in the fashion outlined in the DOM specification.
- We are obviously concerned with the Java language binding. The classes you should be able to add to your IDE or class path are all in the org.w3c.dom package (and its subpackages). However, before downloading these yourself, you should check the XML parser and XSLT processor you purchased or downloaded; like the SAX package, the DOM package is often included with these products. This also ensures a correct match between your parser, processor, and the version of DOM that is supported.
- Most processors do not handle the task of generating a DOM input themselves, but instead rely on an XML parser that is capable of generating a DOM tree. For this reason, it is often the XML parser that will have the needed DOM binding classes and not the XSLT processor. In addition, this maintains the loose coupling between parser and processor, letting one or the other be substituted with comparable products. As Apache Xalan, by default, uses Apache Xerces for XML parsing and DOM generation, it is the level of support for DOM that Xerces provides that is of interest to us.
Getting a DOM Parser
One thing that the DOM does not specify is how a DOM tree is created. The specification instead focuses on the structure and APIs for manipulating this tree, which leaves a lot of latitude in how DOM parsers are implemented. Unlike the SAX XMLReader class, which dynamically loads a SAX XMLReader implementation, you will need to import and instantiate your vendor's DOM parser class explicitly. To begin, create a new Java file and call it DOMParserDemo.java. We will look at how to build a simple DOM parsing program to read in an XML document and print out its contents. Create the structure and skeleton of your example class first, as shown in Example B.
Example B. DOMParserDemo Class
// Import your vendor's DOM parserimport org.apache.xerces.parsers.DOMParser;
/**
* DOMParserDemo will take an XML file and display
* the document using DOM
*/
public class DOMParserDemo {
/**
* This parses the file, and then prints the document out
* using DOM.
* @param uri String URI of file to parse.
*/
public void performDemo(String uri) {
System.out.println("Parsing XML File: " + uri + "\n\n");
// Instantiate your vendor's DOM parser implementation
DOMParser parser = new DOMParser( );
try {
// parser.parse(uri);
} catch (Exception e) {
System.out.println("Error in parsing: " + e.getMessage( ));
}
}
/**
* This provides a command-line entry point for this demo.
*/
public static void main(String[] args) {
if (args.length != 1) {
System.out.println("Usage: java DOMParserDemo [XML URI]");
System.exit(0);
}
String uri = args[0];
DOMParserDemo parserDemo = new DOMParserDemo( );
parserDemo.performDemo(uri);
}
}
- This is set up in a fashion similar to our earlier SAXParserDemo class, but imports the Apache Xerces DOMParser class directly and instantiates it. We have commented out our actual invocation of the parse( ) method for the moment; before looking at what is involved in parsing a document into a DOM structure, we need to address issues of vendor neutrality in our choice of parsers. Keep in mind that this is simple and works great for many applications, but is not portable across parser implementations as our SAX example was.
- The initial impulse would be to use Java constructs like Class.forName(parserClass).newInstance( ) to get an instance of the correct vendor parser class. However, different DOM implementations behave in a variety of fashions: sometimes the parse( ) method returns an org.w3c.dom.Document object (which we look at next); sometimes the parser class provides a getDocument( ) method; and sometimes different parameter types are required for the parse( ) method (InputSource, InputStream, String, URI, etc.) to be supplied with the URI. In other words, while the DOM tree created is portable, the method of obtaining that tree is not without fairly complex reflection and dynamic class and method loading.