Beruflich Dokumente
Kultur Dokumente
Agenda
Introduction Types of Stages How to build Custom Stages
Introduction
Data stage provides large no of inbuilt Stages to extract and transform data. In addition to existing Stages, also provides capability to build custom Stages.
Types of Stages
There are three different types of Stages that can be built. Custom use an existing Orchestrate operator as a Stage and use in parallel jobs.
Build
Creator own operators and use them in Stage. Wrapper Specify a UNIX command as a Stage and use it.
Custom Stage
Custom Stages use already existing Orchestrate operators. Steps in defining Custom Stages.
Select the category from repository. Select File -> New Parallel Stage -> Custom On General page specify the name of the operator to be used. On Links page specify the maximum and minimum no of input and output links. On Properties page specify the properties.
Wrapped Stages
Wrapper Stages use UNIX commands. When defining a Build stage you provide the following information: Details of the UNIX command that the stage will execute. Description of the data that will be input to the stage. Description of the data that will be output from the stage. Definition of the environment in which the command will execute.
Unix command can be any command like sort, grep, a script, etc.
Build Stages
Enables you to create own operators. Written in C++ Gives advantage of programming language control.
Build Stages
Buildop provides a simple means of creating own operator. It does not use an existing operators or executable Reasons to use Buildop include:
Functionality of Multiple Stages can be combined into
Complex business logic that cannot be easily using existing stages
Lookups across a range of values Surrogate key generation
Better Performance as there is no unwanted functionality Buildop is reusable. It can used within a project as well as exported and used in other projects also
Build Stage
Interface is similar to wrapper Stage. When defining a Build Stage one needs to provide
Input interface/schema
Build Stage
10
Steps
Steps for defining a Build Stage 1. Select the Stage Types category in which the Stage is to be created 2. Choose File ->New Parallel Stage -> Build from the main menu or New Parallel Stage -> Build from the shortcut menu.
General tab has Stage Type, Category, Operator by default the Build Stage name and Class name by default the Build Stage name. Creator tab has generic information about the version of build Stage, Author name, copy right information. Properties page all the options to be passed to Build Stage as run time options are defined. Build page contains three tabs
Interfaces This page contains input and output interfaces/schemas defined.
Logic This tab contains three sections Pre Loop, Per Record and Post Loop
Advanced
11
Informational
Flow-control
Input and output Transfer
12
This slide shows Interfaces tab in Build page. This tab contains the input and output interfaces defined.
13
These macros used to override the default behavior of the Per-Record loop in stage definition
endLoop() - stops looping after completion of the current loop after writing any auto outputs for this loop. nextLoop() - immediately move control to the start of next loop failStep() - return a failed status and terminate the job
14
15
16
Build Stage
This page contains all header file information and definitions
17
Example
Definitions tab contains Header files and definitions
#include "apt_util/string.h" #include "apt_util/ints.h" int iHold = 0; int iVar = 0; int iCounter=0; struct extract_type { long long gst_i; long long mail_addr_i; char surname[32]; long long acct_cd_seq_i; long long dummy_grp_seq_i; char grp_end_d[10]; }; struct extract_type extract_rec[100];
18
Example
Pre Loop section contains Code to be executed before processing of input. Per Record section.
This section contains logic to be implemented for each record.
if (input.MAIL_ADDR_I!=tempMail ) { // reading first record extract_rec[i].gst_i=input.GST_I; extract_rec[i].mail_addr_i=input.MAIL_ADDR_I; extract_rec[i].acct_cd_seq_i=input.ACCT_CD_SEQ_I; // Begin of Grouping logic
Each of the input column is accessed as input.Column where input is the name of input interface
19
Per Record section contains the code to be executed for each of the input record.This page shows code to be executed for each record.
20
Example
Code is written in C++ same as any C++ program without main
//write output to output interface for ( m=0;m<i;m++) { output.GST_I=extract_rec[m].gst_i; output.MAIL_ADDR_I=extract_rec[m].mail_addr_i; output.ACCT_CD_SEQ_I=extract_rec[m].acct_cd_seq_i; output.PRIM_LAST_NAME=extract_rec[m].surname; // Writing the record to Output writeRecord(output.portid_);
Data is transferred to output interface by assigning the computed values to output interface using output.Column where output is the interface name.
Output is written by calling writeRecord(output) macro. It transfers the data to output interface.
21
Example
Post Loop section contains code to be executed after the processing. This is same as Pre Loop and Per Record sections but is executed after completion of Per Record section.
22