Sie sind auf Seite 1von 22

Custom Stages

Agenda
Introduction Types of Stages How to build Custom Stages

2002. Infosys Technologies Ltd.

Introduction
Data stage provides large no of inbuilt Stages to extract and transform data. In addition to existing Stages, also provides capability to build custom Stages.

2002. Infosys Technologies Ltd.

Types of Stages
There are three different types of Stages that can be built. Custom use an existing Orchestrate operator as a Stage and use in parallel jobs.

Build
Creator own operators and use them in Stage. Wrapper Specify a UNIX command as a Stage and use it.

2002. Infosys Technologies Ltd.

Custom Stage
Custom Stages use already existing Orchestrate operators. Steps in defining Custom Stages.
Select the category from repository. Select File -> New Parallel Stage -> Custom On General page specify the name of the operator to be used. On Links page specify the maximum and minimum no of input and output links. On Properties page specify the properties.

2002. Infosys Technologies Ltd.

Wrapped Stages
Wrapper Stages use UNIX commands. When defining a Build stage you provide the following information: Details of the UNIX command that the stage will execute. Description of the data that will be input to the stage. Description of the data that will be output from the stage. Definition of the environment in which the command will execute.
Unix command can be any command like sort, grep, a script, etc.

2002. Infosys Technologies Ltd.

Build Stages
Enables you to create own operators. Written in C++ Gives advantage of programming language control.

2002. Infosys Technologies Ltd.

Build Stages
Buildop provides a simple means of creating own operator. It does not use an existing operators or executable Reasons to use Buildop include:
Functionality of Multiple Stages can be combined into
Complex business logic that cannot be easily using existing stages
Lookups across a range of values Surrogate key generation

Better Performance as there is no unwanted functionality Buildop is reusable. It can used within a project as well as exported and used in other projects also

2002. Infosys Technologies Ltd.

Build Stage
Interface is similar to wrapper Stage. When defining a Build Stage one needs to provide
Input interface/schema

Output interface /schema


Transfer type, if Auto Transfer is selected all the input columns are output. Header files and definitions Code to be executed before the stage Code to be executed for each record input Code to be executed after the stage

2002. Infosys Technologies Ltd.

Build Stage

2002. Infosys Technologies Ltd.

10

Steps
Steps for defining a Build Stage 1. Select the Stage Types category in which the Stage is to be created 2. Choose File ->New Parallel Stage -> Build from the main menu or New Parallel Stage -> Build from the shortcut menu.
General tab has Stage Type, Category, Operator by default the Build Stage name and Class name by default the Build Stage name. Creator tab has generic information about the version of build Stage, Author name, copy right information. Properties page all the options to be passed to Build Stage as run time options are defined. Build page contains three tabs
Interfaces This page contains input and output interfaces/schemas defined.

Logic This tab contains three sections Pre Loop, Per Record and Post Loop
Advanced

2002. Infosys Technologies Ltd.

11

Build Stage Macros


There are a number of macros you can use when specifying Pre-Loop, Per-Record, and Post-Loop code.

Informational

Flow-control
Input and output Transfer

2002. Infosys Technologies Ltd.

12

This slide shows Interfaces tab in Build page. This tab contains the input and output interfaces defined.

2002. Infosys Technologies Ltd.

13

Build Stage Macros


Informational Macros These macros are used to determine the number of inputs, outputs,and transfers inputs() - returns the number of inputs to the stage. outputs() - returns the number of outputs from the stage. transfers() - returns the number of transfers in the stage. Flow-Control Macros

These macros used to override the default behavior of the Per-Record loop in stage definition
endLoop() - stops looping after completion of the current loop after writing any auto outputs for this loop. nextLoop() - immediately move control to the start of next loop failStep() - return a failed status and terminate the job

2002. Infosys Technologies Ltd.

14

Build Stage Macros


Input and Output Macros The following macros are available: readRecord(input) - reads the next record from input, if there is one. If there is no record, the next call to inputDone() will return false. writeRecord(output) - writes a record to output. inputDone(input) - returns true if the last call to readRecord() for the specified input failed to read a new record, because the input has no more records. holdRecord(input) - auto input is suspended for the current record discardRecord(output) - auto output is suspended for the current record, so that the operator does not output the record at the end of the current loop.

discardTransfer(index) - auto transfer is suspended

2002. Infosys Technologies Ltd.

15

Build Stage Macros


Transfer Macros The following macros are available: doTransfer(index) transfers data specified by index. doTransfersFrom(input) - transfers input from the index specified. doTransfersTo(output) - transfers output to the index specified. transferAndWriteRecord(output) - transfers and writes a record for the specified output. Calling this macro is equivalent to calling the macros doTransfersTo() and writeRecord().

2002. Infosys Technologies Ltd.

16

Build Stage
This page contains all header file information and definitions

2002. Infosys Technologies Ltd.

17

Example
Definitions tab contains Header files and definitions
#include "apt_util/string.h" #include "apt_util/ints.h" int iHold = 0; int iVar = 0; int iCounter=0; struct extract_type { long long gst_i; long long mail_addr_i; char surname[32]; long long acct_cd_seq_i; long long dummy_grp_seq_i; char grp_end_d[10]; }; struct extract_type extract_rec[100];

2002. Infosys Technologies Ltd.

18

Example
Pre Loop section contains Code to be executed before processing of input. Per Record section.
This section contains logic to be implemented for each record.
if (input.MAIL_ADDR_I!=tempMail ) { // reading first record extract_rec[i].gst_i=input.GST_I; extract_rec[i].mail_addr_i=input.MAIL_ADDR_I; extract_rec[i].acct_cd_seq_i=input.ACCT_CD_SEQ_I; // Begin of Grouping logic

Each of the input column is accessed as input.Column where input is the name of input interface

2002. Infosys Technologies Ltd.

19

Per Record section contains the code to be executed for each of the input record.This page shows code to be executed for each record.

2002. Infosys Technologies Ltd.

20

Example
Code is written in C++ same as any C++ program without main
//write output to output interface for ( m=0;m<i;m++) { output.GST_I=extract_rec[m].gst_i; output.MAIL_ADDR_I=extract_rec[m].mail_addr_i; output.ACCT_CD_SEQ_I=extract_rec[m].acct_cd_seq_i; output.PRIM_LAST_NAME=extract_rec[m].surname; // Writing the record to Output writeRecord(output.portid_);

Data is transferred to output interface by assigning the computed values to output interface using output.Column where output is the interface name.

Output is written by calling writeRecord(output) macro. It transfers the data to output interface.

2002. Infosys Technologies Ltd.

21

Example
Post Loop section contains code to be executed after the processing. This is same as Pre Loop and Per Record sections but is executed after completion of Per Record section.

2002. Infosys Technologies Ltd.

22

Das könnte Ihnen auch gefallen