Sie sind auf Seite 1von 11

How to use PDL (Parameter Definition

Language) in Abinitio
26/04/2011

Author
Amitava Dey
amitava.dey@bt.com
About the Author
Having around 6 years of experience in Design & Development.
Currently working as a Billing Designer for GenIUS (Billing and Payment Platform)
project for British Telecom Plc., UK.

TCS Internal
Introduction

This document is an overview of how to improve AbInitio graph performance is being


used in BT in order to handle millions of customer data within sort period of time.

What is AbInitio
AbInitio is an ETL tool capable manipulating large volume of data.

AbInitio architecture:
Co>Operating System
Co>Operating System is a program provided by AbInitio which operates on
the top of the operating system and is a base for all AbInitio processes. It
provides additional features known as air commands which can be installed
on a variety of system environments such as UNIX, HP-UX, Linux, IBM
AIX, Windows systems. The AbInitio Co>Operating System provides the
following features:
- Manage and run AbInitio graphs and control the ETL processes
- Provides AbInitio extensions to the operating system
- ETL processes monitoring and debugging
- Metadata management and interaction with the EME

AbInitio GDE (Graphical Development Environment)


GDE is a graphical application for a developer which is used for designing
and running AbInitio graphs. It also provides:
- The ETL process in AbInitio is represented by AbInitio graphs. Graphs are
formed by components (from the standard components library or custom),
flows (data streams) and parameters.
- A user-friendly front-end for designing AbInitio ETL graphs
- Ability to run, debug Ab Initio jobs and trace execution logs
- GDE AbInitio graph compilation process results in generation of a UNIX
shell script which may be executed on a machine without the GDE installed

AbInitio EME
Enterprise Meta>Environment (EME) is an AbInitio repository and
environment for storing and managing metadata. It provides capability to
store both business and technical metadata. EME metadata can be accessed
from the Ab Initio GDE, web browser or AbInitio Co>Operating system
command line (air commands)

TCS Internal
Parameter Definition Language

PDL (the Parameter Definition Language) is a simple but comprehensive


set of notations for expressing the values of parameters in components and
graphs.
PDL offers you flexibility in parameter interpretation, with a minimum of
bother in specifying which kind of interpretation you want. All you have to
do is specify PDL interpretation, and you get everything — when you need
it. Both $ and ${ } interpretation are part of PDL — you simply use either
one when you need to information.)

PDL-embedded DML

In addition, PDL allows you to use embedded DML within parameter


definitions to perform almost any operation or calculation you want .You
can code any DML statement within PDL's $[ ] construct. (The main
restrictions are that you can't use global variables, and you can't use lookup
files in PDL-embedded DML.) If you need more complicated DML in a
parameter definition, you can define helper functions and include them into
the parameter environment using the AB_DML_DEFS parameter
definition, and then invoke the functions in other parameter definitions .The
following topics are discussed in this section:

¾ Setting up to use PDL


¾ PDL parameter references and substitution
¾ Basic PDL rules
¾ DML inline computation in PDL
¾ PDL processing modes
¾ PDL examples

¾ Setting up to use PDL

To use PDL, you must enable dynamic code generation:


1. In the GDE, select Run > Settings.
This brings up the Run Settings dialog.
2. Click the Script tab.
3. In the Script Generation drop-down list, select Dynamic (rather than
GDE 1.13 Compatible).
Note that once you have set Dynamic script generation, the following things
are true:

• If you change to a version 2.13 Co>Operating System run host, you must turn
off dynamic script generation before running the graph.

TCS Internal
• If you change script generation back to GDE 1.13 Compatible, the GDE will
check to make sure you do not have any PDL-interpreted parameters (except
for layout parameters or eme-dependent parameters). If the GDE finds
otherwise, it will report an error and not change the setting.

Making PDL the default graph parameter interpretation

To set PDL as the default interpretation of graph parameters for a particular


graph:

1. In the GDE, select Edit > Properties.

This brings up the Graph Properties dialog.

2. Click the Description tab.


3. In the Interpreter drop-down list, select PDL.
4. Note that once you have chosen PDL as the default graph parameter
interpretation, you can no longer down-save the graph as a version 1.13 graph.

¾ PDL parameter references and substitution

In general, PDL uses $name and ${name} substitution, where name is


assumed to be the name of a parameter that is being referred to; in resolving
the parameter, PDL substitutes the value of the referred-to parameter for the
$name or ${name} within the expression.

However, within quotes, substitution in PDL is turned off by default; the


quoting turns everything inside it to literals. For example, if the value of
$FOO is "xxx", the following business rule

"The value of \$FOO is $FOO"

in a transform with shell interpretation set will cause the following string to
be output:

The value of $FOO is xxx

But if you set pdl interpretation, the same rule will provoke an error, since
there is nothing for the \ to escape (the $ is already considered a literal
within the quotes).

TCS Internal
You can change such behavior by specifying $-text at the start of the value.
This makes PDL use "text" mode (with DML quoting conventions used
everywhere) for everything that follows the $-text:

$-text "The value of \$FOO is $FOO" $-dml;

The $-dml directive at the end of the value restores the PDL interpretation's
default DML mode.

¾ Basic PDL rules

The following rules describe what happens to the value of a parameter


whose Interpretation is set to PDL under various possible circumstances.

Literal interpretation in PDL


Unless you specify otherwise, all characters are interpreted literally.

Substitution in PDL
• If you don't want an identifier to be interpreted literally, you prefix $
to the identifier ($name) to specify parameter substitution.
• ${identifier} also specifies parameter substitution; the { }s are used
to delimit the identifier.

Escaping in PDL
• A single \, when it immediately precedes a $, escapes it.
• However, \ is interpreted as an escape only when it occurs
immediately before a $. In any other position, \ is regarded as a
normal character. The \ is not a global escape character.

If more than one contiguous \ occurs before a $, the following rules


apply:
• If there is an even number of contiguous backslashes before a $,
then the $ is interpreted to signify parameter substitution. One
backslash is output for every two contiguous backslashes found in
the input.
• If there is an odd number of contiguous backslashes, then the $ is
interpreted as a literal. One backslash is output for every two
contiguous backslashes found in the input (rounding down).

TCS Internal
In record format strings interpreted in PDL, you don't have to specially escape \s (just
as you don't have to escape them in DML).

For example, the following definition for the test_value field in an Output File's
record format:

TCS Internal
will resolve like this:

Note also how the other fields' string('\n') definitions resolve as they were defined,
without any escaping.

String literals in PDL


• In PDL, substitution is turned off by default inside string literals.
• By default, DML lexical conventions are assumed
• Strings in include statements in PDL behave just as they do in
regular DML.

Substitution and quoting in PDL


• $"identifier" and $'identifier' are interpreted as parameter
substitution, plus quoting.

This allows you to turn parameter values into DML literals.

• Single and double quotes behave the same way.

Quoting for DML in PDL


• Quoting for DML automatically includes proper treatment of
embedded special characters.

TCS Internal
¾ PDL processing modes

There are three PDL parsing modes, which you can turn on and off with the following
inline directives:

• $-dml

DML mode (the default). Specifies that DML quoting conventions are to be used for
$"name" processing, and when scanning for embedded strings. This has the effect of
turning off $name substitution within strings.

• $-text

"Text" mode. Specifies that $name substitution is not to be suppressed inside quoted
strings. DML quoting conventions are still used for $"name" constructs.

• $-literal

"Literal" mode. Everything following $-literal (except the immediately following


whitespace) is output verbatim.

¾ PDL examples

Following are some examples of application of the basic PDL rules.

Example of literal interpretation

Here is an example of literal interpretation of a string in PDL:

TCS Internal
Substitution examples

Prefixing $ to the identifier:

Surrounding the identifier with { } and prefixing it with $:

Examples of escaping in PDL

A single \ preceding a $, acting as an escape:

TCS Internal
Here the even number of \s results in the $ substitution occurring, and half of the \s
being output:

TCS Internal