Beruflich Dokumente
Kultur Dokumente
Salim Achouche
Principal Software Engineer
Ashlee Bailey
Senior Technical Writer
Overview
If you are running PowerCenter 7.0, you can include Regular Expression Validation External
Procedure (EP) transformations in a mapping to validate patterns of data in String format. This
lets you validate data patterns, such as IDs, telephone numbers, postal codes, and state names.
You can also include the name of a Regular Expression Validation EP as part of an expression in
an Expression transformation in a mapping. This is useful when you want to validate more than
one data pattern in a single mapping.
You validate data patterns in the Regular Expression Validation EP transformation using Perl
Compatible Regular Expressions (PCRE) in the EP transformation. PCRE is a powerful tool for
matching data in String format that follows a pattern.
You cannot use pass-through ports with the Regular Expression Validation EP transformation.
Tip: Use port concatenation as a workaround.
Figure 2 shows the properties tab of the Regular Expression Validation EP transformation. You
do not have to configure any properties when you include the transformation in your mapping.
Figure 2: Properties Tab of the Regular Expression Validation EP Transformation
Once you include the Regular Expression Validation EP transformation in a mapping, you can
define the regular expression on the Initialization Properties tab of the transformation at the
mapping level.
The data you want to validate must be in String format.
Tip: To validate data that is not in String format, you can use an Expression transformation in the
mapping to convert the data to String format.
Figure 3 shows the Initialization Properties tab of the Regular Expression Validation EP
transformation with a regular expression:
Figure 3: Initialization Properties Tab of the Regular Expression Validation EP
Transformation
The Value attribute represents the regular expression that the PowerCenter Server uses to
validate the input port COLUMN_VALUE.
Working with Perl Compatible Regular Expressions in an Regular
Expression Validation EP
When you use PCRE in a Regular Expression Validation EP, the following information applies:
Regular expressions are case insensitive. For example, [a-z] and [A-Z] are equivalent.
Spaces at the beginning and end of an input string are ignored.
Regular expressions are automatically anchored. This means that matching starts at the
beginning of the input string and ends at the end of the input string. For example, \d+ is
equivalent to ^\d+$. ^ is the PCRE syntax for marking the beginning of a string. $ is the PCRE
syntax for marking the end of a string.
To turn off the default anchored behavior, you can add .* at the beginning and end of the
input string.
Table 2 provides guidelines for entering a regular expression in the Regular Expression
Validation EP transformation:
Table 2: PCRE Syntax
Syntax Description
[a-z] Matches one instance of a letter. For example, [a-z][a-z] can match ab or CA.
{} Matches the number of characters exactly. For example, \d{3} matches any
five numbers, such as 650 or 510. Or, [a-z]{2} matches any two letters, such
as CA or NY.
? Matches the preceding character or group of characters zero or one time. For
example,
\d{3}(-{d{4})? matches any three numbers, which can be followed by a
hyphen and any four numbers.
* Matches zero or more instances of the values that follow the asterisk. For
example, *0 is any value that precedes a 0.
For example, to create a regular expression for U.S. zip codes, you can enter the following:
\d{5}(-\d{4})?
This expression lets you validate a column that contains 5-digit U.S. zip codes, such as 93930, as
well as 9-digit zip codes, such as 93930-5407.
In this example, \d{5} refers to any five numbers, such as 93930. The parentheses surrounding -
\d{4} group this segment of the expression. The hyphen represents the hyphen of a 9-digit zip
code, as in 93930-5407. \d{4} refers to any four numbers, such as 5407. The question mark
states that the hyphen and last four digits are optional or can appear one time.
Tips for Converting COBOL Syntax to PCRE format
If you are familiar with COBOL syntax, you can use the following information to help you write
regular expressions.
Table 3 shows examples of COBOL syntax and their PCRE equivalents:
Table 3: COBOL Syntax and PCRE Syntax Compared
9999 \d\d\d\d Matches any four digits from 0-9, as in 1234 or 5936.
or
\d{4}
9xx9 \d[a-z][a-z]\d Matches any number followed by two letters and another
number, as in 1ab2.
In the mapping, use a Filter transformation to filter out the valid telephone numbers and pass the
invalid values to the target.
Figure 5 shows the mapping for writing invalid telephone numbers to a target:
Figure 5: Mapping with a Single Regular Expression Validation EP Transformation
In the mapping, use Filter transformations to filter the invalid data and pass it to target tables.
Figure 7 shows an example of a mapping with two Regular Expression Validation EP
transformations and a single Expression transformation:
Figure 7: Mapping for Validating Multiple Regular Expressions