You are on page 1of 5

Sorter Transformation Overview

By PenchalaRaju.Yanamala

Transformation type:
Active
Connected

You can sort data with the Sorter transformation. You can sort data in ascending
or descending order according to a specified sort key. You can also configure the
Sorter transformation for case-sensitive sorting, and specify whether the output
rows should be distinct. The Sorter transformation is an active transformation. It
must be connected to the data flow.

You can sort data from relational or flat file sources. You can also use the Sorter
transformation to sort data passing through an Aggregator transformation
configured to use sorted input.

When you create a Sorter transformation in a mapping, you specify one or more
ports as a sort key and configure each sort key port to sort in ascending or
descending order. You also configure sort criteria the Integration Service applies
to all sort key ports and the system resources it allocates to perform the sort
operation.
At session run time, the Integration Service passes the following rows into the
Sorter transformation:

ORDER_ID ITEM_ID QUANTITY DISCOUNT


45 123456 3 3.04
45 456789 2 12.02
43 000246 6 34.55
41 000468 5 .56

After sorting the data, the Integration Service passes the following rows out of the
Sorter transformation:

ORDER_ID ITEM_ID QUANTITY DISCOUNT


41 000468 5 .56
43 000246 6 34.55
45 123456 3 3.04
45 456789 2 12.02

Sorter Transformation Properties

The Sorter transformation has several properties that specify additional sort
criteria. The Integration Service applies these criteria to all sort key ports. The
Sorter transformation properties also determine the system resources the
Integration Service allocates when it sorts data.

Sorter Cache Size


The Integration Service uses the Sorter Cache Size property to determine the
maximum amount of memory it can allocate to perform the sort operation. The
Integration Service passes all incoming data into the Sorter transformation before
it performs the sort operation. You can configure a numeric value for the Sorter
cache, or you can configure the Integration Service to determine the cache size
at run time. If you configure the Integration Service to determine the cache size,
you can also configure a maximum amount of memory for the Integration Service
to allocate to the cache. If the total configured session cache size is 2 GB
(2,147,483,648 bytes) or greater, you must run the session on a 64-bit
Integration Service.

Before starting the sort operation, the Integration Service allocates the amount of
memory configured for the Sorter cache size. If the Integration Service runs a
partitioned session, it allocates the specified amount of Sorter cache memory for
each partition.

If it cannot allocate enough memory, the Integration Service fails the session. For
best performance, configure Sorter cache size with a value less than or equal to
the amount of available physical RAM on the Integration Service machine.
Allocate at least 16 MB (16,777,216 bytes) of physical memory to sort data using
the Sorter transformation. Sorter cache size is set to 16,777,216 bytes by default.

If the amount of incoming data is greater than the amount of Sorter cache size,
the Integration Service temporarily stores data in the Sorter transformation work
directory. The Integration Service requires disk space of at least twice the
amount of incoming data when storing data in the work directory. If the amount of
incoming data is significantly greater than the Sorter cache size, the Integration
Service may require much more than twice the amount of disk space available to
the work directory.

Use the following formula to determine the size of incoming data:

number_of_input_rows [( Σ column_size) + 16]

Table 20-1 gives the column size values by datatype for Sorter data calculations:

Table 20-1. Column Sizes for Sorter Data Calculations


Datatype Column Size
Binary precision + 8
Round to nearest multiple of 8
Date/Time 29
Decimal, high precision off (all precision) 16
Decimal, high precision on (precision <=18) 24
Decimal, high precision on (precision >18, <=28) 32
Decimal, high precision on (precision >28) 16
Decimal, high precision on (negative scale) 16
Double 16
Real 16
Integer 16
Small integer 16
Bigint 64
NString, NText, String, Text Unicode mode: 2*(precision + 5)
ASCII mode: precision + 9

The column sizes include the bytes required for a null indicator.

To increase performance for the sort operation, the Integration Service aligns all
data for the Sorter transformation memory on an 8-byte boundary. Each Sorter
column includes rounding to the nearest multiple of eight.

The Integration Service also writes the row size and amount of memory the
Sorter transformation uses to the session log when you configure the Sorter
transformation tracing level to Normal.

Case Sensitive

The Case Sensitive property determines whether the Integration Service


considers case when sorting data. When you enable the Case Sensitive
property, the Integration Service sorts uppercase characters higher than
lowercase characters.

Work Directory

You must specify a work directory the Integration Service uses to create
temporary files while it sorts data. After the Integration Service sorts the data, it
deletes the temporary files. You can specify any directory on the Integration
Service machine to use as a work directory. By default, the Integration Service
uses the value specified for the $PMTempDir process variable.

When you partition a session with a Sorter transformation, you can specify a
different work directory for each partition in the pipeline. To increase session
performance, specify work directories on physically separate disks on the
Integration Service system.

Distinct Output Rows

You can configure the Sorter transformation to treat output rows as distinct. If you
configure the Sorter transformation for distinct output rows, the Mapping
Designer configures all ports as part of the sort key. The Integration Service
discards duplicate rows compared during the sort operation.

Tracing Level

Configure the Sorter transformation tracing level to control the number and type
of Sorter error and status messages the Integration Service writes to the session
log. At Normal tracing level, the Integration Service writes the size of the row
passed to the Sorter transformation and the amount of memory the Sorter
transformation allocates for the sort operation. The Integration Service also
writes the time and date when it passes the first and last input rows to the Sorter
transformation.
If you configure the Sorter transformation tracing level to Verbose Data, the
Integration Service writes the time the Sorter transformation finishes passing all
data to the next transformation in the pipeline. The Integration Service also writes
the time to the session log when the Sorter transformation releases memory
resources and removes temporary files from the work directory.

Null Treated Low

You can configure the way the Sorter transformation treats null values. Enable
this property if you want the Integration Service to treat null values as lower than
any other value when it performs the sort operation. Disable this option if you
want the Integration Service to treat null values as higher than any other value.

Transformation Scope

The transformation scope specifies how the Integration Service applies the
transformation logic to incoming data:

Transaction. Applies the transformation logic to all rows in a transaction.


Choose Transaction when a row of data depends on all rows in the same
transaction, but does not depend on rows in other transactions.
All Input. Applies the transformation logic on all incoming data. When you
choose All Input, the PowerCenter drops incoming transaction boundaries.
Choose All Input when a row of data depends on all rows in the source.

Creating a Sorter Transformation

To add a Sorter transformation to a mapping, complete the following steps.

To create a Sorter transformation:

In the Mapping Designer, click Transformation > Create. Select the Sorter
1.transformation.
The naming convention for Sorter transformations is SRT_TransformationName.
Enter a description for the transformation. This description appears in the
Repository Manager, making it easier to understand what the transformation
does.
2.Enter a name for the Sorter and click Create.
The Designer creates the Sorter transformation.
3.Click Done.
4.Drag the ports you want to sort into the Sorter transformation.
The Designer creates the input/output ports for each port you include.
Double-click the title bar of the transformation to open the Edit Transformations
5.dialog box.
6.Select the Ports tab.
7.Select the ports you want to use as the sort key.
For each port selected as part of the sort key, specify whether you want the
8.Integration Service to sort data in ascending or descending order.
9. Select the Properties tab. Modify the Sorter transformation properties.
Select the Metadata Extensions tab. Create or edit metadata extensions for
10.the Sorter transformation.
11.Click OK.