Sie sind auf Seite 1von 8

Regular Expression Validation EP

Salim Achouche
Principal Software Engineer

Ashlee Bailey
Senior Technical Writer

Overview
If you are running PowerCenter 7.0, you can include Regular Expression Validation External
Procedure (EP) transformations in a mapping to validate patterns of data in String format. This
lets you validate data patterns, such as IDs, telephone numbers, postal codes, and state names.
You can also include the name of a Regular Expression Validation EP as part of an expression in
an Expression transformation in a mapping. This is useful when you want to validate more than
one data pattern in a single mapping.
You validate data patterns in the Regular Expression Validation EP transformation using Perl
Compatible Regular Expressions (PCRE) in the EP transformation. PCRE is a powerful tool for
matching data in String format that follows a pattern.

Installing the Regular Expression Validation EP Transformation


Before you can use the Regular Expression Validation EP transformation, you must download the
RegExValidation70.zip file. The ZIP file includes the following components:
sample1.xml file and sample2.xml files. These files contain a Regular Expression
Validation EP transformation and two mappings that use the transformation.
documentation
PowerCenter Server files for the Regular Expression Validation EP transformation
Once you download the ZIP file, you configure the Regular Expression Validation EP files and
import the transformation into your PowerCenter repository.
Note: There is no need to have a Perl installation when using the Regular Expression Validation
EP transformation, since it has been compiled with the open source PCRE RegEx library.
To install and configure the Regular Expression Validation EP transformation:
1. Download the RegExValidation70.zip file to your local area network.
2. Unzip the RegExValidation70.zip file to a temporary directory.
The unzip process extracts a folder called RegExValidation70.
3. Open the RegExValidation70 folder.
4. Copy the PowerCenter Server library file to the PowerCenter Server\bin directory based on
the following table:
Operating System Filename
Windows pmdpregexpr.dll
pmpcre.dll
pcre.dll
pmdpmetadata.dll
Solaris libpmdpregexpr.so.1
libpmpcre.so
libpcre.so.0
libpmdpmetadata.so
Linux libpmdpregexpr.so.1
libpmpcre.so
libpcre.so.0
libpmdpmetadata.so
HP-UX libpmdpregexpr.sl
libpmpcre.sl
libpcre.sl
libpmdpmetadata.so
AIX libpmdpregexpr.a
libpmpcre.a
libpcre.a
libpmdpmetadata.a
Note: You must have execute permission to run the PowerCenter Server libraries.
5. From the PowerCenter Designer, import the sample1.xml and sample2.xml files.

Working with the Regular Expression Validation EP


Transformation
Include one Regular Expression Validation EP transformation in a mapping for each expression
you want to use to validate source data. For example, if you want to use a regular expression to
validate telephone numbers and employee IDs, you must add a separate Regular Expression
Validation EP transformation for each regular expression to your mapping.
The Regular Expression Validation EP transformation has one input and one output port.
COLUMN_VALUE is the input port. IS_VALID is the output port. The ports are predefined and
cannot be modified.
Table 1 describes the ports in the Regular Expression Validation EP transformation:
Table 1: Ports in a Regular Expression Validation EP Transformation

Port Name Description

COLUMN_VALUE Represents the input string that should be validated.

IS_VALID Set to 1 if input string is valid or null. Otherwise, set to 0.


Figure 1 shows the Ports tab of the Regular Expression Validation EP transformation with the
COLUMN_VALUE and IS_VALID ports:
Figure 1: Ports Tab of the Regular Expression Validation EP Transformation

You cannot use pass-through ports with the Regular Expression Validation EP transformation.
Tip: Use port concatenation as a workaround.
Figure 2 shows the properties tab of the Regular Expression Validation EP transformation. You
do not have to configure any properties when you include the transformation in your mapping.
Figure 2: Properties Tab of the Regular Expression Validation EP Transformation

Once you include the Regular Expression Validation EP transformation in a mapping, you can
define the regular expression on the Initialization Properties tab of the transformation at the
mapping level.
The data you want to validate must be in String format.
Tip: To validate data that is not in String format, you can use an Expression transformation in the
mapping to convert the data to String format.
Figure 3 shows the Initialization Properties tab of the Regular Expression Validation EP
transformation with a regular expression:
Figure 3: Initialization Properties Tab of the Regular Expression Validation EP
Transformation

The Value attribute represents the regular expression that the PowerCenter Server uses to
validate the input port COLUMN_VALUE.
Working with Perl Compatible Regular Expressions in an Regular
Expression Validation EP
When you use PCRE in a Regular Expression Validation EP, the following information applies:
Regular expressions are case insensitive. For example, [a-z] and [A-Z] are equivalent.
Spaces at the beginning and end of an input string are ignored.
Regular expressions are automatically anchored. This means that matching starts at the
beginning of the input string and ends at the end of the input string. For example, \d+ is
equivalent to ^\d+$. ^ is the PCRE syntax for marking the beginning of a string. $ is the PCRE
syntax for marking the end of a string.
To turn off the default anchored behavior, you can add .* at the beginning and end of the
input string.
Table 2 provides guidelines for entering a regular expression in the Regular Expression
Validation EP transformation:
Table 2: PCRE Syntax

Syntax Description

. (a period) Matches any one character.

[a-z] Matches one instance of a letter. For example, [a-z][a-z] can match ab or CA.

\d Matches one instance of any digit from 0-9.


() Groups an expression. For example, the parentheses in (\d-\d-\d\d) groups
the expression \d\d-\d\d, which finds any two numbers followed by a hyphen
and any two numbers, as in 12-34.

{} Matches the number of characters exactly. For example, \d{3} matches any
five numbers, such as 650 or 510. Or, [a-z]{2} matches any two letters, such
as CA or NY.

? Matches the preceding character or group of characters zero or one time. For
example,
\d{3}(-{d{4})? matches any three numbers, which can be followed by a
hyphen and any four numbers.

* Matches zero or more instances of the values that follow the asterisk. For
example, *0 is any value that precedes a 0.

For example, to create a regular expression for U.S. zip codes, you can enter the following:
\d{5}(-\d{4})?
This expression lets you validate a column that contains 5-digit U.S. zip codes, such as 93930, as
well as 9-digit zip codes, such as 93930-5407.
In this example, \d{5} refers to any five numbers, such as 93930. The parentheses surrounding -
\d{4} group this segment of the expression. The hyphen represents the hyphen of a 9-digit zip
code, as in 93930-5407. \d{4} refers to any four numbers, such as 5407. The question mark
states that the hyphen and last four digits are optional or can appear one time.
Tips for Converting COBOL Syntax to PCRE format
If you are familiar with COBOL syntax, you can use the following information to help you write
regular expressions.
Table 3 shows examples of COBOL syntax and their PCRE equivalents:
Table 3: COBOL Syntax and PCRE Syntax Compared

COBOL Syntax PCRE Syntax Description

9 \d Matches one instance of any digit from 0-9.

9999 \d\d\d\d Matches any four digits from 0-9, as in 1234 or 5936.
or
\d{4}

x [a-z] Matches one instance of a letter.

9xx9 \d[a-z][a-z]\d Matches any number followed by two letters and another
number, as in 1ab2.

Tips for Converting SQL Syntax to PCRE format


If you are familiar with SQL syntax, you can use the following information to help you write regular
expressions.
Table 4 shows examples of SQL syntax and their PCRE equivalents:
Table 4: SQL Syntax and PCRE Syntax Compared

SQL Syntax PCRE Syntax Description

% .* Matches any string.


A% A.* Matches the letter “A” followed by any string, as in Area.

_ . (a period) Matches any one character.

A_ A. Matches “A” followed by any one character, such as AZ.

Validating a Regular Expression


To validate a data pattern with a regular expression, include a Regular Expression Validation EP
transformation in the mapping along with an Expression transformation. The Expression
transformation creates the name for the column, which contains the data you want to validate.
The Expression transformation also passes the data to the Regular Expression Validation EP
transformation. The Regular Expression Validation EP transformation validates the data pattern.
For example, you want to verify that the telephone numbers in the TEL column in your source are
valid North American telephone numbers. You want to write invalid telephone numbers to a target
table that holds invalid employee attributes.
Use an Expression transformation to pass the data in the TEL column to a Regular Expression
Validation EP transformation. Enter the following regular expression on the Initialization
Properties tab in the EP transformation to determine if the data in the TEL column are valid North
American telephone numbers:
\d\d\d-\d\d\d-\d\d\d\d
In this expression, \d\d\d finds any three numbers and \d\d\d\d finds any four numbers. Therefore,
numbers such as 650-385-5000 are valid, while numbers such as 385-5000 are not.
Figure 4 shows an example of Regular Expression Validation EP transformation with a regular
expression to check valid North American telephone numbers:
Figure 4: Regular Expression for Validating North American Telephone Numbers

In the mapping, use a Filter transformation to filter out the valid telephone numbers and pass the
invalid values to the target.
Figure 5 shows the mapping for writing invalid telephone numbers to a target:
Figure 5: Mapping with a Single Regular Expression Validation EP Transformation

Validating Multiple Regular Expressions


You can use regular expressions to validate multiple data patterns in a single mapping. In this
case, add a Regular Expression Validation EP transformation for each data pattern you want to
validate. Include the transformation name as part of an expression in an Expression
transformation. When you include Regular Expression Validation EP transformations in an
expression, you must leave the transformations unconnected in the mapping.
For example, you want to validate North American telephone numbers and five-digit customer ID
numbers. You want to write invalid values to target tables. Create two Regular Expression
Validation EP transformations: one to validate North American telephone numbers and one to
validate customer IDs. Include the name of the Regular Expression Validation EP transformations
as part of expressions in an Expression transformation.
Use the following syntax in the expression:
:EXT.<transformation_name>(<column_name>)
For example:
:EXT.t_RegExValidate_CID(CID)
Figure 6 shows the Ports tab of an Expression transformation that includes a Regular Expression
Validation EP transformation name in an expression:
Figure 6: Ports Tab of the Expression Transformation

In the mapping, use Filter transformations to filter the invalid data and pass it to target tables.
Figure 7 shows an example of a mapping with two Regular Expression Validation EP
transformations and a single Expression transformation:
Figure 7: Mapping for Validating Multiple Regular Expressions

Das könnte Ihnen auch gefallen