Sie sind auf Seite 1von 19

Document Object Model (DOM).

DOM is an in-memory tree representation of the


structure of an XML document.
Simple API for XML (SAX). SAX is a standard for event-based XML parsing.
Java API for XML Processing (JAXP). JAXP is a standard interface for processing XML
with Java applications. It supports the DOM and SAX standards.
Document Type Definition (DTD). An XML DTD defines the legal structure of an XML
document.
XML Schema. Like a DTD, an XML schema defines the legal structure of an XML
document.
XML Namespaces. Namespaces are a mechanism for differentiating element and attribute
names.
Binary XML. Both scalable and nonscalable DOMs can save XML documents in this
format.

XML Parsing in Java


XMLParser is the abstract base class for the XML parser for Java. An instantiated parser invokes the
parse() method to read an XML document.
XMLDOMImplementation factory methods provide another method to parse Binary XML to create
scalable DOM.
Figure 4-1 illustrates the basic parsing process, using XMLParser. The diagram does not apply to
XMLDOMImplementation().
Figure 4-1 The XML Parser Process

Description of "Figure 4-1 The XML Parser Process"

The following APIs provide a Java application with access to a parsed XML document:
DOM API, which parses XML documents and builds a tree representation of the documents
in memory. Use either a DOMParser object to parse with DOM or the
XMLDOMImplementation interface factory methods to create a pluggable, scalable DOM.
SAX API, which processes an XML document as a stream of events, which means that a
program cannot access random locations in a document. Use a SAXParser object to parse
with SAX.
JAXP, which is a Java-specific API that supports DOM, SAX, and XSL. Use a
DocumentBuilder or SAXParser object to parse with JAXP.

The sample XML document in Example 4-1 helps illustrate the differences among DOM, SAX, and JAXP.
Example 4-1 Sample XML Document
<?xml version="1.0"?>
<EMPLIST>
<EMP>
<ENAME>MARY</ENAME>
</EMP>
<EMP>
<ENAME>SCOTT</ENAME>
</EMP>
</EMPLIST>

DOM in XML Parsing


DOM builds an in-memory tree representation of the XML document. For example, the DOM API receives
the document described in Example 4-1 and creates an in-memory tree as shown in Figure 4-2. DOM
provides classes and methods to navigate and process the tree.
In general, the DOM API provides the following advantages:
DOM API is easier to use than SAX because it provides a familiar tree structure of objects.
Structural manipulations of the XML tree, such as re-ordering elements, adding to and
deleting elements and attributes, and renaming elements, can be performed.
Interactive applications can store the object model in memory, enabling users to access
and manipulate it.
DOM as a standard does not support XPath. However, most XPath implementations use
DOM. The Oracle XDK includes DOM API extensions to support XPath.
A pluggable, scalable DOM can be created that considerably improves scalability and
efficiency.

DOM Creation
In Java XDK, there are three ways to create a DOM:
Parse a document using DOMParser. This has been the traditional XDK approach.
Create a scalable DOM using XMLDOMImplementation factory methods.
Use an XMLDocument constructor. This is not a common solution in XDK.

Scalable DOM
With Oracle 11g Release 1 (11.1), XDK provides scalable, pluggable support for DOM. This relieves
problems of memory inefficiency, limited scalability, and lack of control over the DOM configuration.
For the scalable DOM, the configuration and creation are mainly supported using the
XMLDOMImplementation class.
These are important aspects of scalable DOM:
Plug-in Data allows external XML representation to be directly used by Scalable DOM
without replicating XML in internal representation.
Scalable DOM is created on top of plug-in XML data through the R eader and
InfosetWriter abstract interfaces. XML data can be in different forms, such as Binary
XML, XMLType, and third-party DOM, and so on.
Transient nodes. DOM nodes are created lazily and may be freed if not in use.
Binary XML
The scalable DOM can use binary XML as both input and output format. Scalable DOM can
interact with the data in two ways:
Through the abstract InfosetReader and InfosetWriter interfaces.
Users can (1) use the BinXML implementation of InfosetReader and
InfosetWriter to read and write BinXML data, and (2) use other
implementations supplied by the user to read and write in other forms of XML
infoset.
Through an implementation of the InfosetReader and InfosetWriter
adaptor for BinXMLStream.

SAX in the XML Parser


Unlike DOM, SAX is event-based, so it does not build in-memory tree representations of input documents.
SAX processes the input document element by element and can report events and significant data to
callback methods in the application. The XML document in Example 4-1 is parsed as a series of linear
events as shown in Figure 4-2.
In general, the SAX API provides the following advantages:
It is useful for search operations and other programs that do not need to manipulate an
XML tree.
It does not consume significant memory resources.
It is faster than DOM when retrieving XML documents from a database.

Figure 4-2 Comparing DOM (Tree-Based) and SAX (Event-Based) APIs

Description of "Figure 4-2 Comparing DOM (Tree-Based) and SAX (Event-Based) APIs"

JAXP in the XML Parser


The JAXP API enables you to plug in an implementation of the SAX or DOM parser. The SAX and DOM
APIs provided in the Oracle XDK are examples of vendor-specific implementations supported by JAXP.
In general, the advantage of JAXP is that you can use it to write interoperable applications. If an
application uses features available through JAXP, then it can very easily switch the implementation.
The main disadvantage of JAXP is that it runs more slowly than vendor-specific APIs. In addition, several
features are available through Oracle-specific APIs that are not available through JAXP APIs. Only some
of the Oracle-specific features are available through the extension mechanism provided in JAXP. If an
application uses these extensions, however, then the flexibility of switching implementation is lost.

The sample XML considered in the examples is:


01
<employees>

02
<employee id="111">

03
<firstName>Rakesh</firstName>

04
<lastName>Mishra</lastName>

05
<location>Bangalore</location>

06
</employee>

07
<employee id="112">

08
<firstName>John</firstName>

09
<lastName>Davis</lastName>

10
<location>Chennai</location>

11
</employee>

12
<employee id="113">

13
<firstName>Rajesh</firstName>

14
<lastName>Sharma</lastName>

15
<location>Pune</location>

16
</employee>
17
</employees>
And the obejct into which the XML content is to be extracted is defined as below:
01
class Employee{

02
String id;

03
String firstName;

04
String lastName;

05
String location;

06

07
@Override

08
public String toString() {

09
return firstName+" "+lastName+"("+id+")"+location;

10
}

11
}
There are 3 main parsers for which I have given sample code:
DOM Parser
SAX Parser
StAX Parser

Using DOM Parser

I am making use of the DOM parser implementation that comes with the JDK and in my example I am
using JDK 7. The DOM Parser loads the complete XML content into a Tree structure. And we iterate
through the Node and NodeList to get the content of the XML. The code for XML parsing using DOM
parser is given below.
01
public class DOMParserDemo {

02

03
public static void main(String[] args) throws Exception {
04
//Get the DOM Builder Factory

05
DocumentBuilderFactory factory =

06
DocumentBuilderFactory.newInstance();

07

08
//Get the DOM Builder

09
DocumentBuilder builder = factory.newDocumentBuilder();

10

11
//Load and Parse the XML document

12
//document contains the complete XML as a Tree.

13
Document document =

14
builder.parse(

15
ClassLoader.getSystemResourceAsStream("xml/employee.xml"));

16

17
List<Employee> empList = new ArrayList<>();

18

19
//Iterating through the nodes and extracting the data.

20
NodeList nodeList = document.getDocumentElement().getChildNodes();

21
22
for (int i = 0; i < nodeList.getLength(); i++) {

23

24
//We have encountered an <employee> tag.

25
Node node = nodeList.item(i);

26
if (node instanceof Element) {

27
Employee emp = new Employee();

28
emp.id = node.getAttributes().

29
getNamedItem("id").getNodeValue();

30

31
NodeList childNodes = node.getChildNodes();

32
for (int j = 0; j < childNodes.getLength(); j++) {

33
Node cNode = childNodes.item(j);

34

35
//Identifying the child tag of employee encountered.

36
if (cNode instanceof Element) {

37
String content = cNode.getLastChild().

38
getTextContent().trim();

39
switch (cNode.getNodeName()) {

40
case "firstName":
41
emp.firstName = content;

42
break;

43
case "lastName":

44
emp.lastName = content;

45
break;

46
case "location":

47
emp.location = content;

48
break;

49
}

50
}

51
}

52
empList.add(emp);

53
}

54

55
}

56

57
//Printing the Employee list populated.

58
for (Employee emp : empList) {
59
System.out.println(emp);

60
}

61

62
}

63
}

64

65
class Employee{

66
String id;

67
String firstName;

68
String lastName;

69
String location;

70

71
@Override

72
public String toString() {

73
return firstName+" "+lastName+"("+id+")"+location;

74
}

75
}
The output for the above will be:
1
Rakesh Mishra(111)Bangalore

2
John Davis(112)Chennai
3
Rajesh Sharma(113)Pune

Using SAX Parser

SAX Parser is different from the DOM Parser where SAX parser doesnt load the complete XML into the
memory, instead it parses the XML line by line triggering different events as and when it encounters
different elements like: opening tag, closing tag, character data, comments and so on. This is the reason
why SAX Parser is called an event based parser.
Along with the XML source file, we also register a handler which extends the DefaultHandler class. The
DefaultHandler class provides different callbacks out of which we would be interested in:
startElement() triggers this event when the start of the tag is encountered.
endElement() triggers this event when the end of the tag is encountered.
characters() triggers this event when it encounters some text data.

The code for parsing the XML using SAX Parser is given below:
01
import java.util.ArrayList;

02
import java.util.List;

03
import javax.xml.parsers.SAXParser;

04
import javax.xml.parsers.SAXParserFactory;

05
import org.xml.sax.Attributes;

06
import org.xml.sax.SAXException;

07
import org.xml.sax.helpers.DefaultHandler;

08

09
public class SAXParserDemo {

10

11
public static void main(String[] args) throws Exception {

12
SAXParserFactory parserFactor = SAXParserFactory.newInstance();

13
SAXParser parser = parserFactor.newSAXParser();
14
SAXHandler handler = new SAXHandler();

15
parser.parse(ClassLoader.getSystemResourceAsStream("xml/employee.xml"),

16
handler);

17

18
//Printing the list of employees obtained from XML

19
for ( Employee emp : handler.empList){

20
System.out.println(emp);

21
}

22
}

23
}

24
/**

25
* The Handler for SAX Events.

26
*/

27
class SAXHandler extends DefaultHandler {

28

29
List<Employee> empList = new ArrayList<>();

30
Employee emp = null;

31
String content = null;
32
@Override

33
//Triggered when the start of tag is found.

34
public void startElement(String uri, String localName,

35
String qName, Attributes attributes)

36
throws SAXException {

37

38
switch(qName){

39
//Create a new Employee object when the start tag is found

40
case "employee":

41
emp = new Employee();

42
emp.id = attributes.getValue("id");

43
break;

44
}

45
}

46

47
@Override

48
public void endElement(String uri, String localName,

49
String qName) throws SAXException {

50
switch(qName){
51
//Add the employee to list once end tag is found

52
case "employee":

53
empList.add(emp);

54
break;

55
//For all other end tags the employee has to be updated.

56
case "firstName":

57
emp.firstName = content;

58
break;

59
case "lastName":

60
emp.lastName = content;

61
break;

62
case "location":

63
emp.location = content;

64
break;

65
}

66
}

67

68
@Override
69
public void characters(char[] ch, int start, int length)

70
throws SAXException {

71
content = String.copyValueOf(ch, start, length).trim();

72
}

73

74
}

75

76
class Employee {

77

78
String id;

79
String firstName;

80
String lastName;

81
String location;

82

83
@Override

84
public String toString() {

85
return firstName + " " + lastName + "(" + id + ")" + location;

86
}

87
}
The output for the above would be:
1
Rakesh Mishra(111)Bangalore

2
John Davis(112)Chennai

3
Rajesh Sharma(113)Pune

Using StAX Parser

StAX stands for Streaming API for XML and StAX Parser is different from DOM in the same way SAX
Parser is. StAX parser is also in a subtle way different from SAX parser.
The SAX Parser pushes the data but StAX parser pulls the required data from the XML.
The StAX parser maintains a cursor at the current position in the document allows to extract
the content available at the cursor whereas SAX parser issues events as and when certain
data is encountered.

XMLInputFactory and XMLStreamReader are the two class which can be used to load an XML file. And
as we read through the XML file using XMLStreamReader, events are generated in the form of integer
values and these are then compared with the constants in XMLStreamConstants. The below code shows
how to parse XML using StAX parser:
01
import java.util.ArrayList;

02
import java.util.List;

03
import javax.xml.stream.XMLInputFactory;

04
import javax.xml.stream.XMLStreamConstants;

05
import javax.xml.stream.XMLStreamException;

06
import javax.xml.stream.XMLStreamReader;

07

08
public class StaxParserDemo {

09
public static void main(String[] args) throws XMLStreamException {

10
List<Employee> empList = null;

11
Employee currEmp = null;
12
String tagContent = null;

13
XMLInputFactory factory = XMLInputFactory.newInstance();

14
XMLStreamReader reader =

15
factory.createXMLStreamReader(

16
ClassLoader.getSystemResourceAsStream("xml/employee.xml"));

17

18
while(reader.hasNext()){

19
int event = reader.next();

20

21
switch(event){

22
case XMLStreamConstants.START_ELEMENT:

23
if ("employee".equals(reader.getLocalName())){

24
currEmp = new Employee();

25
currEmp.id = reader.getAttributeValue(0);

26
}

27
if("employees".equals(reader.getLocalName())){

28
empList = new ArrayList<>();

29
}
30
break;

31

32
case XMLStreamConstants.CHARACTERS:

33
tagContent = reader.getText().trim();

34
break;

35

36
case XMLStreamConstants.END_ELEMENT:

37
switch(reader.getLocalName()){

38
case "employee":

39
empList.add(currEmp);

40
break;

41
case "firstName":

42
currEmp.firstName = tagContent;

43
break;

44
case "lastName":

45
currEmp.lastName = tagContent;

46
break;

47
case "location":

48
currEmp.location = tagContent;
49
break;

50
}

51
break;

52

53
case XMLStreamConstants.START_DOCUMENT:

54
empList = new ArrayList<>();

55
break;

56
}

57

58
}

59

60
//Print the employee list populated from XML

61
for ( Employee emp : empList){

62
System.out.println(emp);

63
}

64

65
}

66
}
67

68
class Employee{

69
String id;

70
String firstName;

71
String lastName;

72
String location;

73

74
@Override

75
public String toString(){

76
return firstName+" "+lastName+"("+id+") "+location;

77
}

78
}
The output for the above is:

view sourceprint?
1
Rakesh Mishra(111) Bangalore

2
John Davis(112) Chennai

3
Rajesh Sharma(113) Pune

Das könnte Ihnen auch gefallen