Sie sind auf Seite 1von 7

Defining a Serializer

This chapter includes the following topics:


Serializer Overview
Creating the Project
Configuring the Serializer
Calling the Serializer Recursively
Points to Remember

Serializer Overview
In the preceding exercises, you have defined parsers, which convert documents in various formats to
XML. In this exercise, you will create a serializer that works in the opposite direction, converting XML
to another format.

Usually, it is easier to define a serializer than a parser because the input is a fully structured,
unambiguous XML document.

The serializer that you will define is very simple. It contains only four serialization anchors, which are
the opposite of the anchors that you use in parsing. Nonetheless, the serializer has some interesting
points:
The serializer is recursive. That is, it calls itself repetitively to serialize the nested sections of an
XML document.
The output of the serializer is a worksheet that you can open in Excel.

You will define the serializer by editing the IntelliScript. It is also possible to generate a serializer
automatically by inverting the operation of a parser. For more information about defining serializers,
see the Data Transformation Studio User Guide.

Prerequisite
The output of this exercise is a *.csv (comma separated values) file. You can view the output in
Notepad, but for the most meaningful display, you need Microsoft Excel.

You do not need Excel to run the serializer. It is recommended only to view the output.

Requirements Analysis
The input XML document is in Exercise folder.

The document is an XML representation of a family tree. Notice the recursive structure. Each Person
element can contain a Children element, which contains additional Person elements.
<Person>
<Name>Jake Dubrey</Name>
<Age>84</Age>
<Children>
<Person>
<Name>Mitchell Dubrey</Name>
<Age>52</Age>
</Person>
<Person>
<Name>Pamela Dubrey McAllister</Name>
<Age>50</Age>
<Children>
<Person>
<Name>Arnold McAllister</Name>
<Age>26</Age>
</Person>
</Children>
</Person>
</Children>
</Person>
Output the names and ages of the Person elements as a *.csv file, which has the following structure:
Jake Dubrey,84
Mitchell Dubrey,52
Pamela Dubrey McAllister,50
Arnold McAllister,26

Creating the Project


1. Click File > New > Project.
2. Under the Data Transformation node, select Serializer Project.
A wizard appears.
3. In the wizard, perform the following actions:
Name the project Tutorial_5.
Name the serializer FamilyTreeSerializer.
Name the script file Serializer_Script.
4. On the Schema page, browse to the schema FamilyTree.xsd, which is in the Exercise folder:
Data Transformation Explorer displays the new project.
5. Double-click the Serializer_Script.tgp file to edit it.
6. Use the FamilyTree.xml input document to design the serializer.

Determining the Project Folder Location


Your project folder might be in a non-default location for either of the following reasons:
In the New Project wizard, you selected a non-default location.
Your copy of Eclipse is configured to use a non-default workspace location.
To determine the location of the project folder, select the project in the Data Transformation Explorer,
and click the File > Properties command. Alternatively, click anywhere in the IntelliScript editor, and
then click the Project > Properties command. The Info tab of the properties window displays the
location.

Project Properties
The properties window displays many useful options, such as the input and output encoding that are
used in your documents and the XML validation options. For more information the project properties,
see the Data Transformation Studio User Guide.

Configuring the Serializer


Configure the serializer properties and add the serialization anchors.
1. Display the advanced properties of the serializer, and set output_file_extension =.csv, with a
leading period. When you run the serializer in the Studio, the serializer names the output file
output.csv.
2. Under the contains line of the serializer, insert a Content Serializer serialization anchor and
configure the following properties:
data_holder = /Person/*s/Name
closing_str = ","
3. Define a second ContentSerializer and configure the following properties:
opening_str = ""
closing_str = "~013~010"
data_holder = /Person/*s/Age
Perform the following actions to enter the ASCII codes:
a. Select the closing_str property, and then press ENTER.
b. Press CTRL+A.
A small dot appears in the text box.
c. Enter 013.
d. Press CTRL+A again.
e. Enter 010.
f. Press ENTER to complete the property assignment.
This ContentSerializer writes the /Person/*s/Age data holder to the output. It appends a carriage
return (ASCII code 013) and a linefeed (ASCII 010) to the output.
4. Perform the following actions to run the serializer:
Set the serializer as the startup component.
Click Run > Run.
In the I/O Ports table, edit the first row and open the test input file, FamilyTree.xml.
Click Run.
5. Examine the Events view for errors.
6. In the Data Transformation Explorer view, under Results, double-click output.csv to view the
output.

Calling the Serializer Recursively


Configure the serializer to move deeper in the XML tree and process the child Person elements.
1. Insert a RepeatingGroupSerializer.
You can use this serialization anchor to iterate over a repetitive structure in the input, and to
generate a repetitive structure in the output. In this case, the RepeatingGroupSerializer will iterate
over all the Person elements at a given level of nesting.
2. Within the RepeatingGroupSerializer, nest an EmbeddedSerializer. The purpose of this
serialization anchor is to call a secondary serializer.
3. Make the EmbeddedSerializer optional, and then assign the following properties:
serializer = FamilyTreeSerializer
schema_connections =
Connect
data_holder = /Person/*s/Children/*s/Person
embedded_data_holder = /Person
The properties have the following meanings:
The assignment serializer = FamilyTreeSerializer means that the secondary serializer is the
same as the main serializer. In other words, the serializer calls itself recursively.
The schema_connections property means that the secondary serializer processes
/Person/*s/Children/*s/Person as though it were a top-level /Person element. This is what lets
the serializer move down through the generations of the family tree.
The optional property means that the secondary serializer does not cause the main serializer
to fail when it runs out of data.
4. Run the serializer.

Defining Multiple Components in a Project


None of the exercises in this book contain multiple components. In the exercises throughout this
book, each project contains a single parser, serializer, or mapper. We did this for simplicity.
It is quite possible for a single project to contain multiple components, for example:
Multiple parsers, serializers, or mappers
Multiple script (TGP) files
Multiple XML schemas
The following paragraphs are a brief summary of some possible project configurations involving
multiple components. For more information, see the Data Transformation Studio User Guide.

Multiple Transformation Components


To define multiple transformation components such as parsers, serializers, or mappers, insert them at
the global level of the script.
To run one of the components, set it as the startup component and use the commands on the Run
menu.
The startup component can call secondary parsers, serializers, or mappers to process portions of a
document.

Multiple Script Files


You can use multiple script files to organize your work.
To create a script file:
1. Right-click the Scripts node in the Data Transformation Explorer, and click New > Script. To add a
script that you created in another project, click Add File.
2. To open a script file for editing, double-click the file in the Data Transformation Explorer.

Multiple XML Schemas


To add multiple schemas to a project, right-click the XSD node in the Data Transformation Explorer
and click Add File. To create an empty schema that you can edit in any editor, click New > XSD.

Points to Remember
A serializer is the opposite of a parser: it converts XML to other formats. You can design a serializer
that outputs to any data format, for example, Microsoft Excel.
You can create a serializer either by generating it from an existing parser or by editing the IntelliScript.

A serializer contains serialization anchors that are analogous to the anchors that you use in a parser,
but work in the opposite direction.

In an IntelliScript editor, you can hide or display the panes by choosing the commands on the
IntelliScript menu or on the toolbar.

We recommend that you store all files associated with a project in the project folder, located in your
Eclipse workspace. You can determine the folder location by clicking File > Properties or Project >
Properties.

A single project can contain multiple transformation components such as parsers, serializers, and
mappers. The startup component can call secondary serializers, parsers, or mappers, which are
defined in the same project. Secondary components can process portions of a document. Recursion,
a component calling itself, is supported.

Das könnte Ihnen auch gefallen