Sie sind auf Seite 1von 8

The benets of JAXP

One of the most important technologies available in java is the APIs used to work with XML. There are basically two ways to work with XML documents. SAX involves an event driven means of processing XML using callbacks to handle the relevant events. DOM involves using an in-house tree structure of the XML document. Sun Microsystems created a Java API for XML Processing (JAXP) toolkit which makes XML manageable for all developers to use. It is a key component for exploiting all the possibilities with using XML technology such as building web services. In this article Im assuming that you have some basic knowledge of XML although you may not know very much about XML parsing. If not, there are a large number of books available to help you with understanding the basics of XML. Now lets get started! There are two key things for any developer using the JAXP to remember when deciding which of the two APIs to use in their project for parsing an XML document. If you are focused on making one pass through the document and want to use the events initiated by this to capture key information, than the Simple API for XML (SAX) is the API that you want to use. If you are looking to manipulate, transform or query a document than you are better to use the Document Object Model (DOM). In fact, one cannot really call them APIs but rather abstraction layers since you are able to plug in the parsers that you prefer to perform these operations. The Basics In order to use a parser irrespective of what you are trying to do, in general the process is exactly the same. The steps are the following: Create a parser object Pass your XML document to the parser Process the results With this process in mind, one can start to build applications that take advantage of XML. Of course the process of building applications or web services are more involved than this. But this shows the typical ow for an application using XML. Types of parsers There are different ways to categorize parsers. There are parsers that support the Document Object Model (DOM) as well as those that support the Simple API for XML. The parsers using these abstraction models are written in a number of languages including Java, Perl and C++. One can also differentiate between validating and non-validating parsers. XML documents that use a schema or older documents using a DTD and follow the rules dened in that schema or DTD are called valid documents. XML documents that follow the basic tagging rules are called well-formed documents. The XML specication requires all parsers to report errors when they nd that a document is not well- formed. Validation, is however a completely different issue. Validating parsers validate XML documents as they parse them. Non-validating parsers ignore any validation errors. In other words, if an XML document is well-formed, a non-validating parser doesnt care if the document follows the rules specied in its schema (if any). The benets of non-validating parser The benet of using non-validating parser is the gain in speed and efciency due to the time saved avoiding the validation of the document. It takes a signicant amount of effort

for an XML parser to process a schema and make sure that every element in an XML document follows the rules of the schema. One would only attempt this if one is condent that the XML document is already valid (either something that has been used within your organization or from a trusted source), so theres no point in validating it again. Another scenario is when you want to nd all of the XML tags in a document. Once you have acquired them, you can use them to extract the data from them and process them. The Simple API for XML (SAX) The SAX API is an event driven means of working with the contents of XML documents. It was developed by David Megginson and other members of the XML-Dev mailing list. When you parse an XML document with a SAX parser, the parser generates events at various points in your document. You then use callback functions to decide what to do with each of those events. A SAX parser generates events at the start and end of a document, at the start and end of an element, when it nds characters inside an element, and at several other points. You write the Java code (callback) that handles each event, and you decide what to do with the information you get from the parser. Working with SAX In the SAX model, we send our XML document to the parser, and the parser noties us when certain events happen. Its up to us to decide what we want to do with those events; if we ignore them, the information in the event is discarded. The SAX API denes a number of events. You can write Java code that handles all of the events you care about. If you dont care about a certain type of event, you dont have to write any code at all. Just ignore the event, and the parser will discard it. Here is a list of most of the commonly used SAX events. There are other SAX events but are not relevant for this article. Theyre part of the DefaultHandler class in the org.xml.sax.helpers package. startDocument - Signals the start of the document. endDocument - Signals the end of the document. startElement - Signals the start of an element. The parser res this event when all of the contents of the opening tag have been processed. This includes the name of the tag and any attributes it might have. endElement - Signals the end of an element. characters - Contains character data, similar to a DOM Text node. A Simple SAX Parser using JAXP So a simple SAX Parser uses the following typical routine: 1. Create a SAXParser instance using the SAXParserFactory for instantiating a specic vendors parser implementation. 2. Register callback implementations (by extending DefaultHandler or another callback class) 3. Start parsing and sit back as your callback implementations are red off. JAXP's SAX component provides a simple means for doing all of this. JAXP lets you provide a parser as a Java system property. The parser that is used is Sun's version of Xerces. You can change the parser to another implementation by just changing the classpath setting without any need to recompile any code. That is the beauty of JAXP. Once you have set up the factory, invoking newSAXParser(), it returns a ready-to-use instance of the JAXP SAXParser class. This class wraps an underlying SAX parser (an instance of the SAX class org.xml.sax.XMLReader). It also protects you from using any vendor-specic additions to the parser class. (Remember the discussion about the

XmlDocument class earlier in this article?) This class allows actual parsing behavior to be kicked off. The First gure shows the handler with all the callbacks
class SimpleHandler extends DefaultHandler { // SAX callback implementations from DocumentHandler, ErrorHandler, etc. private Writer out; public SimpleHandler() throws SAXException { try { out = new OutputStreamWriter(System.out, "UTF8"); } catch (IOException e) { throw new SAXException("Error getting output handle.", e); } } public void startDocument() throws SAXException { print("<?xml version=\"1.0\"?>\n"); } public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException { print("<" + qName); if (atts != null) { for (int i=0, len = atts.getLength(); i<len; i++) { print(" " + atts.getQName(i) + "=\"" + atts.getValue(i) + "\""); } } print(">"); } public void endElement(String uri, String localName, String qName) throws SAXException { print("</" + qName + ">\n"); } public void characters(char[] ch, int start, int len) throws SAXException { print(new String(ch, start, len)); } private void print(String s) throws SAXException { try { out.write(s); out.ush(); } catch (IOException e) { throw new SAXException("IO Error Occurred.", e); } } } Figure 1

The next gure shows the steps for how to create, congure, and use a SAX factory.
public class SimpleSAXParsing { public static void main(String[] args) { try { if (args.length != 1) { System.err.println ("Usage: java SimpleSAXParsing [lename]"); System.exit (1); } // Get SAX Parser Factory SAXParserFactory factory = SAXParserFactory.newInstance(); // Turn on validation, and turn off namespaces factory.setValidating(true); factory.setNamespaceAware(false); SAXParser parser = factory.newSAXParser(); parser.parse(new File(args[0]), new SimpleHandler()); } catch (ParserCongurationException e) { System.out.println("The underlying parser does not support " + " the requested features."); } catch (FactoryCongurationError e) { System.out.println("Error occurred obtaining SAX Parser Factory."); } catch (Exception e) { e.printStackTrace(); } } }

Figure 2

Working with Document Object Model (DOM) The Document Object Model denes an interface that enables programs to access and update the style, structure, and contents of XML documents. XML parsers that support the DOM implement that interface. When you use a DOM parser to parse an XML document, you get back a tree structure that contains all of the elements of the document. The DOM provides a variety of functions you can use to examine the contents and structure of the document. Here are the methods which you will commonly used: Document.getDocumentElement() Returns the root element of the document. Node.getFirstChild() Returns the rst child of a given Node. Node.getLastChild() Returns the last child of a given Node. Node.getNextSibling() This method returns the next sibling of a given Node. Node.getPreviousSibling() This method returns the previous sibling of a given Node. Node.getAttribute(attrName) For a given Node, returns the attribute with the requested name. A Simple DOM Parser using JAXP DOM with JAXP is almost the same as using SAX. The differences are primarily in the names of the classes and the return types. JAXP is responsible for return a org.w3c.dom.Document object from parsing. The XML document and is made up of DOM nodes that represent the elements, attributes, and other XML constructs. Unlike with SAX we dont have any callback handler so it is just a matter of parsing the XML document and then using the DOM object for addressing our needs. In this example, we show how to write out the DOM tree both forwards and in reverse.

public void write(Node node, String indent) { switch(node.getNodeType()) { case Node.DOCUMENT_NODE: { Document doc = (Document)node; out.println(indent + "<?xml version='1.0'?>"); Node child = doc.getFirstChild(); while(child != null) { write(child, indent); child = child.getNextSibling(); } break; } case Node.DOCUMENT_TYPE_NODE: { DocumentType doctype = (DocumentType) node; out.println("<!DOCTYPE " + doctype.getName() + ">"); break; } case Node.ELEMENT_NODE: { Element elt = (Element) node; out.print(indent + "<" + elt.getTagName()); NamedNodeMap attrs = elt.getAttributes(); for(int i = 0; i < attrs.getLength(); i++) { Node a = attrs.item(i); out.print(" " + a.getNodeName() + "='" + xup(a.getNodeValue()) + "'"); } out.println(">"); String newindent = indent + " "; Node child = elt.getFirstChild(); while(child != null) { write(child, newindent); child = child.getNextSibling();} out.println(indent + "</" + elt.getTagName() + ">"); break; } case Node.TEXT_NODE: { Text textNode = (Text)node; String text = textNode.getData().trim(); if ((text != null) && text.length() > 0) out.println(indent + xup(text)); break; } case Node.PROCESSING_INSTRUCTION_NODE: { ProcessingInstruction pi = (ProcessingInstruction)node; out.println(indent + "<?" + pi.getTarget() + " " + pi.getData() + "?>"); break; } case Node.ENTITY_REFERENCE_NODE: { out.println(indent + "&" + node.getNodeName() + ";"); break; } case Node.CDATA_SECTION_NODE: { CDATASection cdata = (CDATASection)node; // Careful! Don't put a CDATA section in the program itself! out.println(indent + "<" + "![CDATA[" + cdata.getData() +"]]" + ">"); break; } case Node.COMMENT_NODE: { Comment c = (Comment)node; out.println(indent + "<!--" + c.getData() + "-->"); break; } default: System.err.println("Ignoring node: " + node.getClass().getName()); break; } } Figure 1

public void reverse(Node node, String indent) { switch(node.getNodeType()) { case Node.DOCUMENT_NODE: { Document doc = (Document)node; out.println(indent + "<?xml version='1.0'?>"); Node child = doc.getLastChild(); while(child != null) { reverse(child, indent); child = child.getPreviousSibling(); } break; } case Node.DOCUMENT_TYPE_NODE: { DocumentType doctype = (DocumentType) node; out.println("<!DOCTYPE " + doctype.getName() + ">"); break; } case Node.ELEMENT_NODE: { Element elt = (Element) node; out.print(indent + "<" + elt.getTagName()); NamedNodeMap attrs = elt.getAttributes(); for(int i = attrs.getLength()-1; i > 0; i--) { Node a = attrs.item(i); out.print(" " + a.getNodeName() + "='" + xup(a.getNodeValue()) + "'"); } out.println(">"); String newindent = indent + " "; Node child = elt.getLastChild(); while(child != null) { reverse(child, newindent); child = child.getPreviousSibling(); } out.println(indent + "</" + elt.getTagName() + ">"); break; } case Node.TEXT_NODE: { Text textNode = (Text)node; String text = textNode.getData().trim(); if ((text != null) && text.length() > 0) out.println(indent + xup(text)); break; } case Node.PROCESSING_INSTRUCTION_NODE: { ProcessingInstruction pi = (ProcessingInstruction)node; out.println(indent + "<?" + pi.getTarget() + " " + pi.getData() + "?>"); break; } case Node.ENTITY_REFERENCE_NODE: { out.println(indent + "&" + node.getNodeName() + ";"); break; } case Node.CDATA_SECTION_NODE: { CDATASection cdata = (CDATASection)node; // Careful! Don't put a CDATA section in the program itself! out.println(indent + "<" + "![CDATA[" + cdata.getData() + "]]" + ">"); break; } case Node.COMMENT_NODE: { Comment c = (Comment)node; out.println(indent + "<!--" + c.getData() + "-->"); break; } default: System.err.println("Ignoring node: " + node.getClass().getName()); break; } } Figure 2

public static void main(String[] args) throws Exception { String lename = null; boolean dtdValidate = false; boolean xsdValidate = false; String schemaSource = null; boolean ignoreWhitespace = false; boolean ignoreComments = false; boolean putCDATAIntoText = false; boolean createEntityRefs = false; for (int i = 0; i < args.length; i++) { if (args[i].equals("-dtd")) { dtdValidate = true; } else if (args[i].equals("-xsd")) { xsdValidate = true; } else if (args[i].equals("-xsdss")) { if (i == args.length - 1) { usage(); } xsdValidate = true; schemaSource = args[++i]; } else if (args[i].equals("-ws")) { ignoreWhitespace = true; } else if (args[i].startsWith("-co")) { ignoreComments = true; } else if (args[i].startsWith("-cd")) { putCDATAIntoText = true; } else if (args[i].startsWith("-e")) { createEntityRefs = true; } else if (args[i].equals("-usage")) { usage(); } else if (args[i].equals("-help")) { usage(); } else { lename = args[i]; if (i != args.length - 1) { usage(); } } } if (lename == null) { usage(); } DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); dbf.setNamespaceAware(true); dbf.setValidating(dtdValidate || xsdValidate); if (xsdValidate) { try { dbf.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA); } catch (IllegalArgumentException x) { System.err.println( "Error: JAXP DocumentBuilderFactory attribute not recognized: " + JAXP_SCHEMA_LANGUAGE); System.err.println( "Check to see if parser conforms to JAXP 1.2 spec."); System.exit(1); } } if (schemaSource != null) { dbf.setAttribute(JAXP_SCHEMA_SOURCE, new File(schemaSource)); } dbf.setIgnoringComments(ignoreComments); dbf.setIgnoringElementContentWhitespace(ignoreWhitespace); dbf.setCoalescing(putCDATAIntoText); dbf.setExpandEntityReferences(!createEntityRefs); DocumentBuilder db = dbf.newDocumentBuilder(); OutputStreamWriter errorWriter = new OutputStreamWriter(System.err, outputEncoding); db.setErrorHandler(new MyErrorHandler(new PrintWriter(errorWriter, true))); Document doc = db.parse(new File(lename)); // Print out the DOM tree OutputStreamWriter outWriter = new OutputStreamWriter(System.out, outputEncoding); XMLDocumentWriter xmlDocWriter = new XMLDocumentWriter(new PrintWriter(outWriter, true)); xmlDocWriter.write(doc); xmlDocWriter.reverse(doc); }

Figure 3

Key Point The key point Ill make is that in working with the Nodes in the DOM tree, you have to check the type of each Node before you work with it. Certain methods, such as getAttributes, return null for some node types. If you dont check the node type, youll get unexpected results (at best) and exceptions (at worst). Which parser should you use? Use a DOM parser when: You need to know a lot about the structure of a document You need to move parts of the document around (you might want to sort certain elements, for example) You need to use the information in the document more than once Use a SAX parser when: You only need to extract a few elements from an XML document. You dont have much memory to work with Youre only going to use the information in the document once (as opposed to parsing the information once, then using it many times later). In this article, we have covered some of the basics related to using JAXP and the benets it provides in relation to XML processing. In a future article we will look at some of the more advanced functions used with JAXP for both SAX and DOM parsers.

Das könnte Ihnen auch gefallen