Sie sind auf Seite 1von 131

Mail: Kitsonlinetrainings@gmail.

com
Phone: +91 9959766329

Ab initio Interview Questions

Q.What is surrogate key?

Answer: surrogate key is a system


generated sequential number
which acts as a primary key.

Q.Differences Between Ab-Initio and


Informatica?

Answer: Informatica and Ab-Initio both


support parallelism. But Informatica
supports only one type of parallelism
but the Ab-Initio supports three types of
parallelisms.

Component
Data Parallelism

Pipe Line parallelism.

We don’t have scheduler in Ab-Initio like


Informatica , you need to schedule
through script or you need to run
manually.

Ab-Initio supports different types of text


files means you can read same file with
different structures that is not possible
in Informatica, and also Ab-Initio is
more user friendly than Informatica .

Informatica is an engine based ETL tool,


the power this tool is in it’s
transformation engine and the code that
it generates after development cannot
be seen or modified.

Ab-Initio is a code based ETL tool, it


generates ksh or bat etc. code, which
can be modified to achieve the goals, if
any that can not be taken care through
the ETL tool itself.

Initial ramp up time with Ab-Initio is


quick compare to Informatica, when it
comes to standardization and tuning
probably both fall into same bucket.

Ab-Initio doesn’t need a dedicated


administrator, UNIX or NT admin will
suffice, where as Informatica need a
dedicated administrator.

With Ab-Initio you can read data with


multiple delimiter in a given record,
where as Informatica force you to have
all the fields be delimited by one
standard delimiter

Error Handling – In Ab-Initio you can


attach error and reject files to each
transformation and capture and analyze
the message and data separately.
Informatica has one huge log! Very
inefficient when working on a large
process, with numerous points of
failure.

Q.What is the difference between rollup


and scan?

Answer : By using rollup we cant


generate cumulative summary records
for that we will be using scan
Q.Why we go for Ab-Initio?

Answer : Ab-Initio designed to support


largest and most complex business
applications.

We can develop applications easily


using GDE for Business requirements.

Data Processing is very fast and


efficient when compared to other ETL
tools.

Available in both Windows NT and UNIX

Q.What is the difference between


partitioning with key and round robin?

Answer:

PARTITION BY KEY:
In this, we have to specify the key based
on which the partition will occur. Since it
is key based it results in very well
balanced data. It is useful for key
dependent parallelism.
PARTITION BY ROUND ROBIN:
In this, the records are partitioned in
sequential way, distributing data evenly
in blocksize chunks across the output
partition. It is not key based and results
in well balanced data especially with
blocksize of 1. It is useful for record
independent parallelism.

Q.How to Create Surrogate Key using


Ab Initio?

Answer. A key is a field or set of fields


that uniquely identifies a record in a file
or table.

A natural key is a key that is meaningful


in some business or real-world sense.
For example, a social security number
for a person, or a serial number for a
piece of equipment, is a natural key.

A surrogate key is a field that is added


to a record, either to replace the natural
key or in addition to it, and has no
business meaning. Surrogate keys are
frequently added to records when
populating a data warehouse, to help
isolate the records in the warehouse
from changes to the natural keys by
outside processes.

Q.What are the most commonly used


components in a Ab-Initio graphs?

Answer:

input file / output file

input table / output table

lookup / lookup_local

reformat

gather / concatenate

join

run sql

join with db

compression components

filter by expression

sort (single or multiple keys)

rollup

partition by expression / partition by key

Q.How do we handle if DML changing


dynamically?

Answer: There are lot many ways to


handle the DMLs which changes
dynamically with in a single file.

Some of the suitable methods are to


use a conditional DML or to call the
vector functionality while calling the
DMLs.

Q.What is meant by limit and ramp in


Ab-Initio? Which situation it’s using?

Answer: The limit and ramp are the


variables that are used to set the reject
tolerance for a particular graph. This is
one of the option for reject-threshold
properties. The limit and ramp values
should pass if enables this option.

Graph stops the execution when the


number of rejected records exceeds the
following formula.

limit + (ramp *
no_of_records_processed).

The default value will be set to 0.0.

The limit parameter contains an integer


that represents a number of reject
events The ramp parameter contains a
real number that represents a rate of
reject events in the number of records
processed.

Typical Limit and Ramp settings

Limit = 0  Ramp = 0.0 Abort on any error

Limit = 50 Ramp = 0.0 Abort after 50


errors

Limit = 1   Ramp = 0.01 Abort if more


than 2 in 100 records causes error

Limit = 1  Ramp = 1     Never Abort

Q.What are data mapping and data


modeling?

Answer: Data mapping deals with the


transformation of the extracted data at
FIELD level i.e. the transformation of
the source field to target field is
specified by the mapping defined on the
target field. The data mapping is
specified during the cleansing of the
data to be loaded.

For Example:

source;
string(35) name = “Siva Krishna      “;

target;

string(“01”) nm=NULL(“”);/*(maximum
length is string(35))*/

Then we can have a mapping like:

Straight move.Trim the leading or


trailing spaces.

The above mapping specifies the


transformation of the field nm.

Q.What is the difference between a DB


config and a CFG file?

Answer : .dbc file has the information


required for Ab Initio to connect to the
database to extract or load tables or
views. While .CFG file is the table
configuration file created by db_config
while using components like Load DB
Table

Q.What is mean by Layout?

Answer: A layout is a list of host and


directory locations, usually given by the
URL of a file or multi file. If a layout has
multiple locations but is not a multi file,
the layout is a list of URLs called a
custom layout.

A program component’s layout is the list


of hosts and directories in which the
component runs.

A dataset component’s layout is the list


of hosts and directories in which the
data resides. Layouts are set on the
Properties Layout tab.

The layout defines the level of


Parallelism . Parallelism is achieved by
partitioning data and

computation across processors.

Q.What are Cartesian joins?

Answer: A Cartesian join will get you a


Cartesian product. A Cartesian join is
when you join every row of one table to
every row of another table. You can also
get one by joining every row of a table to
every row of itself.

Q.What is the function you would use to


transfer a string into a decimal?
Answer: For converting a string to a
decimal we need to typecast it using the
following syntax,
out.decimal_field :: ( decimal(
size_of_decimal ) ) string_field;
The above statement converts the
string to decimal and populates it to the
decimal field in output.

Q.How do we handle if DML changing


dynamically?

Answer: There are lot many ways to


handle the DMLs which changes
dynamically with in a single file. Some
of the suitable methods are to use a
conditional DML or to call the vector
functionality while calling the DMLs.we
can use MULTIREFORMAT component
to handle dynamically changing DML’s.

Q.Explain the differences between api


and utility mode?

Answer: API and UTILITY are the two


possible interfaces to connect to the
databases to perform certain user
specific tasks. These interfaces allow
the user to access or use certain
functions (provided by the database
vendor) to perform operation on the
databases. The functionality of each of
these interfaces depends on the
databases.

API has more flexibility but often


considered as a slower process as
compared to UTILITY mode. Well the
trade off is their performance and
usage.

 Contact for   Ab initio training 

Q.What are the uses of is_valid,


is_define functions?

Answers:

is_valid and is_defined are Pre defined


functions

is valid(): Tests whether a value is valid.

The is_valid function returns:

The value 1 if expr is a valid data


item.

The value 0 if the expression does


not evaluate to NULL.

If expr is a record type that has field


validity checking functions, the is_valid
function calls each field validity
checking function. The is_valid function
returns 0 if any field validity checking
function returns 0 or NULL.

Example:

is_valid(1) 1

is_valid(“oao”) 1

is_valid((decimal(8))”1,000″) 0

is_valid((date(“YYYYMMDD”))”19960504″)
1

is_valid((date(“YYYYMMDD”))”abcdefgh”)
0

is_valid((date(“YYYY MMM DD”))”1996


May 04″) 1

is_valid((date(“YYYY MMM
DD”))”1996*May&04″) 0

is defined():

Tests whether an expression is not


NULL.

The is_defined function returns:

The value 1 if expr evaluates to a non


NULL value.

The value 0 otherwise.

The inverse of is_defined is is_null.

Q.What is meant by merge join and


hash join? Where those are used in Ab
Initio?

Answer: The command line syntax for


Join Component consists of two
commands. The first one calls the

component, and is one of two


commands:

mp merge join to process sorted


input

mp hash join to process unsorted


input

Q.What is data mapping and data


modelling?
Answer: Data mapping deals with the
transformation of the extracted data at
FIELD level i.e. the transformation of the
source field to target field is specified
by the mapping defined on the target
field. The data mapping is specified
during the cleansing of the data to be
loaded

What is the difference between sandbox


and EME, can we perform checkin and
checkout through sandbox/ Can
anybody explain checkin and checkout?

Sandboxes are work areas used to


develop, test or run code associated
with a given project. Only one version of
the code can be held within the sandbox
at any time.
The EME Datastore contains all versions
of the code that have been checked into
it.A particular sandbox is associated
with only one Project where as a Project
can be checked out to a number of
sandboxes

Q.What are the Graph parameter?


Answer: The graph paramaters are one
which are added to the respective
graph. You can added the graph
parameters by selecting the
edit>parameters from the menu tab.
Here’s the example for the graph
parameters.

If you want to run a same graph for n


number of files in a directory, You can
assign a graph parameter to the input
file name and you can supply the
paramter value from the script before
invoking the graph.

How to Schedule Graphs in Ab Initio, like


workflow Schedule in Informatica? And
where we must is Unix shell scripting in
Ab Initio?

Q.How to Improve Performance of


graphs in Ab initio? Give some
examples or tips.

There are so many ways to improve the


performance of the graphs in Ab initio.

Here  are few points


Use MFS system using Partion by
Round by robin.
.If needed use lookup local than
lookup when there is a large data.
Takeout unnecessary components
like filter by exp instead provide them
in reformat/Join/Rollup.
Use gather instead of concatenate.
Tune Max_core for Optional
performance.
Try to avoid more phases.
Go Parallel as soon as possible using
Ab Initio Partitioning technique.
Once Data Is partitioned do not bring
to serial , then back to parallel.
Repartition instead.
For Small processing jobs serial may
be better than parallel.
Do not access large files across NFS,
Use FTP component
Use Ad Hoc MFS to read many serial
files in parallel and use concat
coponenet.

Using Phase breaks let you allocate


more memory to individual
component and make your graph run
faster
Use Checkpoint after the sort than
land data on to disk
Use Join and rollup in memory
feature
Best performance will be gained
when components can work with in
memory by MAX CORE.
MAR CORE for SORT is
calculated by finding size of
input data file.

For In memory join memory needed


is equal to non driving data size +
overhead.
If in memory join cannot fir its
non driving inputs in the
provided MAX CORE then it will
drop all the inputs to disk and in
memory does not make sence.

Use rollup and Filter by EX as soon


as possible to reduce number of
records.
When joining very small dataset
to a very large dataset, it is
more efficient to broadcast the
small dataset to MFS using
broadcast component or use
the small file as lookup.

Use MFS, use Round robin partition


or load balance if you are not joining
or rollup
Filter the data in the beginning
of the graph.

Take out unnecessary components


like filter by expression instead use
select expression in join, rollup,
reformat etc
Use lookups instead of joins if
you are joining small tale to
large table.

Take out old components use new


components like join instead of math
merge .
Use gather instead of concat
Use Phasing if you have too many
components
Tune the max core for optimal
performance
Avoid sorting data by using in
memory for smaller datasets join
Use Ab Initio layout instead of
database default to achieve parallel
loads
Change AB_REPORT parameter to
increased monitoring duration ( )
Use catalogs for reusability
Use sort after partition component
instead of before.
Partition the data as early as
possible and departition the data as
late as possible.
Filter unwanted fields/records as
early as possible.
Try to avoid the usage of join with db
component.

Q.How does force_error function work ?


If we set never abort in reformat , will
force_error stop the graph or will it
continue to process the next set of
records ?

Answer: force_error as the name


suggests it works on as to force an error
in case of not meeting of any conditions
mentioned.The function can be used as
per the requirement.

If you want to stop execution of graph in


case of not meeting a specific condition
say you have to compare the input and
out put records reconciliation and the
graph should fail if the input record
count is not same as output record
count

“THEN set the reject-threshold to Abort


on first reject” so that the graph stops.

Note:- force_error directs all the records


meeting the condition to reject port with
the error message to error port.

In certain special circumstances you


can also use to treat the reject port as
an additional data flow path leaving the
component.When using force_error to
direct valid records to the reject port for
separate processing you must
remember that invalid records will also
be sent there.
Q.What are the most commonly used
components in a Ab inition graph?can
anybody give me a practical example of
a trasformation of data, say customer
data in a credit card company into
meaningful output based on business
rules?

Answer: The most commonly used


components in to any Ab Initio project
are

input file/output file

input table/output table

lookup file

reformat,gather,join,runsql,join with
db,compress
components,sort,trash,partition by
expression,partition by key ,concatinate

Q.How to work with parameterized


graphs?

Answer: One of the main purpose of the


parameterized graphs is that if we need
to run the same graph for n number of
times for different files, we set up the
graph parameters like $INPUT_FILE,
$OUTPUT_FILE etc and we supply the
values for these in the
Edit>parameters.These parameters are
substituted during the run time. we can
set different types of parameters like
positional, keyword, local etc.

The idea here is, instead of maintaining


different versions of the same graph, we
can maintain one version for different
files.

Q.What is the use of unused port in join


component?

Answer: While joining two input flows,


records which match the join condition
goes to output port and we can get the
records which do not meet the join
condition at unused ports.

Q.What is meant by dedup Sort with


null key?

Answer: If we don’t use any key in the


sort component while using the dedup
sort, then the output depends on the
keep parameter. It considers whole
records as one group

first – only the first record

last – only last record

unique_only – there will be no records in


the output file.

Q.Hi can anyone tell me what happens


when the graph run? i.e The Co-
operating System will be at the host,
We are running the graph at some other
place. How the Co-operating System
interprets with Native OS?

Answer: CO-operating system is layered


on the top of the native OS

When a graph is executed it has to be


deployed in host settings and
connection method like rexec, telnet,
rsh, rlogin This is what the graph
interacts with the co>op.

when ever you press Run button on your


GDE,the GDE genarates a script

and the genarated script will be


transfered to your host which is
specified in to your GDE run settings.
then the Co>operating system interprets
this script and executes the script on
different mechins(if required) as a sub
process(threads),after compleation of
each sub process,these sub_processes
will  return status code to main process
this main process in tern returns error or
sucess code of the job to GDE

Q. Difference between conventional


loading and direct loading? When it is
used in real time.

Answer:

Conventional Load:
Before loading the data, all the Table
constraints will be checked against the
data.

Direct load:(Faster Loading)


All the Constraints will be disabled. Data
will be loaded directly. Later the data
will be checked against the table
constraints and the bad data won’t be
indexed.

Api conventional loading


utility direct loading.

Q.explain the environment varaibles


with example.?

Answer: Environemental variables


server as global variables in unix
envrionment. They are used for passing
on values from a shell/ process to
another. They are inherited by Abinitio
as sandbox variables/ graph parameters
like
AI_SORT_MAX_CORE
AI_HOME
AI_SERIAL
AI_MFS etc.
To know what all variables exist, in your
unix shell, find out the naming
convention and type a command like  |
grep . This will provide you a list of all
the variables set in the shell. You can
refer to the graph parameters/
components to see how these variables
are used inside Abinitio.

Q.How to find the number of arguments


defined in graph ?
Answer: List of shell arguments $*.

then what is $# and $? …

$#  – No of positional parameters

$? – the exit status of the last executed


command

Q.How many numbers of inputs join


component support ?

Answer: Join will support maximum of


60 inputs and minimum is 2 inputs.

Q.What is max-core? What are the


Components that use MAX_CORE?

Answer: The value of the MAX_CORE


parameter is that it determines the
maximum amount of memory, in bytes,
that a specified component will use. If
the component is running in parallel, the
value of MAX_CORE represents the
maximum memory usage per partition.
If MAX_CORE is set too low the
component will run slower than
expected. Too high and the component
will use too many machine resources
and slow up Dramatically.
The Max core parameter can be defined
in the following components:

SCAN
in-memory SCAN
ROLLUP
in-memory ROLLUP
in-memory JOIN
SORT

Whenever these components are used


and have the component set to
parameter set to “In-memory; Inputs
need not be sorted”, a max-core variable
must be specified.

Q.What does dependency analysis


mean in Ab Initio?

 Answer :

Dependency Analysis

It analyses the Project for the


dependencies within and between the
graphs. The EME examines the Project
and develops a  survey tracing how data
is transformed and transferred field by
field from component to component.
Dependency analysis has two basic
steps:

Translation
Analysis

Analysis Level:

In the check in wizard’s advanced


options, the analysis level can be
specified as one of the following:

None:
No dependency analysis is
performed during the check in.

Translation only:

Graph being checked in is translated to


data store format but no error checking
is done. This is the minimum
requirement during check in.

Translation with checking: (Default)

Along with the translation, errors, which


will interfere with dependency analysis,
are checked for. These include:
Absolute paths
Undefined parameters
dml syntax errors
Parameter reference to objects that
can’t be resolved
Wrong substitution syntax in
parameter definition
Full Dependency Analysis:

Full dependency analysis is done during


check in. It is not recommended as
takes a long time and in turn can delay
the check in process.

What to analyse:

All files:

Analyse all files in the Project

All unanalysed files:

Analyse all files that have been changed


or which are dependent on or required
by files that have changed since the last
time they were analysed.

Only my checked in files:


All files checked in by you would be
analysed if they have not been before.

Only the file specified:

Apply analysis to the file specified only.

 Abinitio Online Training

Q.what is the difference between .dbc


and .cfg file?

Answer: .cfg file is for the remote


connection and .dbc is for connecting
the database.

.cfg contains :

The name of the remote machine


The username/pwd to be used while
connecting to the db.
The location of the operating system
on the remote machine.
The connection method.

 .dbc file contains :

The database name


Database version
Userid/pwd
Database character set and some
more.

Q.What are the Graph parameter?

Answer: There are 2 types of graph


parameters in AbInitio
1. local parameter
2. Formal parameters.(those
parameters working at runtime)

Q.How many types of joins are in Ab-


Initio?

Answer: Join is based on a match key


for inputs, Join components describes
out port, unused ports, reject ports and
log port.

Inner Joins:

The most common case is when join-


type is Inner Join. In this case, if each
input port contains a record with the
same value for the key fields, the
transform function is called and an
output record is produced.
If some of the input flows have more
than one record with that key value, the
transform function is called multiple
times, once for each possible
combination of records, taken one from
each input port.Whenever a particular
key value does not have a matching
record on every input port and Inner
Join is specified, the transform function
is not called and all incoming records
with that key value are sent to the
unused ports.

Full Outer Joins:

Another common case is when join-type


is Full Outer Join: if each input port has
a record with a matching key value, Join
does the same thing as it does for an
Inner Join. If some input ports do not
have records with matching key values,
Join applies the transform function
anyway, with NULL substituted for the
missing records. The missing records
are in effect ignored. With an Outer Join,
the transform function typically requires
additional rules (as compared to an
Inner Join) to handle the possibility of
NULL inputs.

Explicit Joins:

The final case is when join-type is


Explicit. This setting allows you to
specify True or False for the record-
required n parameter for each in n port.
The settings you choose determine
when Join calls the transform function.

The join-type and record-required n


Parameters

The two intersecting ovals in the


diagrams below represent the key
values in the records on the two ports —

in0 and in1 — that are the inputs to join:

For each possible setting of join-type or


(if join-type is Explicit) combination of
settings for

record-required n, the shaded region of


each of the following diagrams
represents the inputs for which Join
calls the transform. Join ignores the
records that have key values
represented by the white regions, and
consequently those records go to the
unused port.

Q.what is semi-join ?

Answer:  A left semi-join on two input


files, connected to ports in0 and in1 is
the  Inner Join .The dedup0 parameter
is set to Do not dedup this input, but
dedup1 is set to Dedup this input before
joining.

Duplicates were removed from only the


in1 port, that is, from Input File 2.

semijoins can be achieved by using the


join component with parameter

Join Type set to explicit join and the


parameters
recordrequired0,recordrequired1 set one
to true and the other false depending on
whether you require left outer or right
outer join.

in abinitio,there are 3 types of join…

1.inner join.        2.outer join         and


3.semi join.
for inner join ‘record_requiredn’
parameter is true for all in ports.

for outer join it is false for all the in


ports.

if u want the semi join u put


‘record_required n’ as true for the
required component and false for other
components..

Q.How to do we run sequences of jobs?


like output of A JOB is Input to B
how do we co-ordinate the jobs ?

Answer: By writing the wrapper scripts


we can control the sequence of
execution of more than one job.

Q.How would you do performance


tuning for already built graph ? Can you
let me know some examples?

Answer:

example :- 1.)suppose sort is used in


fornt of merge component its no use of
using sort ! because we have sort
component built in merge.
2) we use lookup instead of JOIN,Merge
Component.

3.) suppose we want to join the data


coming from 2 files and we don’t want
duplicates we will use union function
instead of adding additional component
for duplicate remover.

Q.What is the relation between EME ,


GDE and Co-operating system ?

Answer: EME is said as enterprise


metadata env,
GDE as graphical development env and
Co-operating system can be said as
abinitio server relation b/w this CO-OP,
EME AND GDE is as follows Co
operating system is the Abinitio
Server.This co-op is installed on
particular O.S platform that is called
NATIVE O.S .coming to the EME, its i
just as repository in informatica , its
hold the metadata,transformations,db
config files source and targets
information. coming to GDE its is end
user environment where we can develop
the graphs(mapping just like in
informatica) designer uses the GDE and
designs the graphs and save to the EME
or Sand box it is at user side where EME
is at server side.

Q.When we use Dynamical DML?

Answer: Dynamic DML is used if the


input meta data can change. Example:
at different time different input files are
received for processing which have
different dml. in that case we can use
flag in the dml and the flag is first read
in the input file received and according
to the flag its corresponding dml is
used.

Q.Explain the differences between


Replicate and BROADCAST?

Answer: Replicate takes records from


input flow arbitrarily combines and
gives to components which connected
to its output port.Broadcast is partition
component copies the input record to
components which connected to its
output port.Consider one example,input
file contains 4 records and level of
parallelism is 3 then Replicate gives 4
records to each component connected
to it’s out port whereas Broadcast gives
12 records to each component
connected to it’s out port.

Q.How do you truncate a table?

Answer: From Abinitio run sql


component using the DDL truncate table
By using the Truncate table component
in Ab Initio.

Q.How to get DML using Utilities in


UNIX?

Answer: By using the command

m_db gendml -table

Q.Explain the difference between


REFORMAT and Redefine FORMAT?

Answer: Reformat changes the record


format by adding or deleting fields in the
DML record. Length of the record can be
changed.

Redefine copies it’s input flow to it’s out


port without any transform.
Redefine is used to rename the fields in
the DML. But Length of record should
not change.

Q.How to work with parameterized


graphs?

Answer: Parameterized graphs


specifies everything through
parameters. i.e,data locations in
input/output

files,DMLs etc…

Q.What is driving port? When do you


use it?

Answer: When you set the sorted-input


parameter of “JOIN” component to “In
memory: Input need not be sorted”,  you
can find the driving port.

Generally driving port use to improve


performance in a graph.

The driving input is the largest input. All


other inputs are read into memory.

For example, suppose the largest input


to be joined is on the in1 port. Specify a
port number of 1 as the value of the
driving parameter. The component
reads all other inputs to the join — for
example, in0, and in2 — into memory.

Default is 0, which specifies that the


driving input is on port in0.

Join also improves performance by


loading all records from all inputs
except the driving input into main
memory.

driving port in join supplies the data that


drives join . That means, for every
record from the driving port, it will be
compared against the data from non
driving port.

We have to set the driving port to the


larger dataset sothat non driving data
which is smaller can be kept in main
memory for speedingup the operation. 
Contact for Abinitio Online Training

Q.How can we test the ab-Initio


manually and automation?

Answer: By running a graph through


GDE is manual test.

By running a graph using deployed


script is automated test.

Q.What is the difference between


partitioning with key and round robin?

Answer: Partition by Key or hash


partition -> This is a partitioning
technique which is used to partition
data when the keys are diverse. If the
key is present in large volume then there
can large data skew. But this method is
used more often for parallel data
processing.
Round robin partition is another
partitioning technique to uniformly
distribute the data on each of the
destination data partitions. The skew is
zero in this case when no of records is
divisible by number of partitions. A real
life example is how a pack of 52 cards
is distributed among 4 players in a
round-robin manner

Q.what is skew and skew


measurement?
Answer: skew is the measure of data
flow to each partition .

suppose i/p is coming from 4 files and


size is 1 gb

1 gb= ( 100mb+200mb+300mb+5oomb)

1000mb/4= 250 mb

(100- 250 )/500= –> -150/500 == cal ur


self it wil come in -ve value.

Cal clu for 200,500,300.

+ve value of skew is all ways desirable.

skew is a indericet measure of graph.

Q.What is error called ‘depth not equal’?

Answer: When two components are


linked together if their layout does not
match then this problem can occur
during the compilation of the graph. A
solution to this problem would be to use
a partitioning component in between if
there was change in layout.

Latest Ab initio Interview Questions       


      Ab initio Interview Questions Pdf
Q.What is the function you would use to
transfer a string into a decimal?

Answer : For converting a string to a


decimal we need to typecast it using the
following syntax, out.decimal_field :: (
decimal( size_of_decimal ) ) string_field;

The above statement converts the


string to decimal and populates it to the
decimal field in output.

Q.Which one is faster for processing


fixed length dmls or delimited dmls and
why?

Answer: Fixed length,because for


delimited dml it has to check for
delimiter every time but for fixed length
dml directly length will b taken.

Q.What are kinds of layouts does ab-


Initio supports?

Answer: Ab-Initio supports two kinds of


Layouts:

Serial Layout

Multi layout.
In Ab-Initio Layout tells which
component should run where and it also
gives level of parallelism.

For serial Layout,level of parallelism is


1.

For Multi layout,Level of parallelism


depends on data partition.

Q.How can you run a graph infinitely?

Answer:

To run a graph infinitely,

The end script of the graph should call


the .ksh file of the graph. Thus if the
name of the graph is abc.mp then in the
end script of the graph there should be a
call to abc.ksh. Then this graph will run
infinitely.

Run the deployed script in a loop


infinitely.

 Q.what is local and formal parameter ?

Answer: Two are graph level parameters


but in local you need to initialize the
value at the time of declaration where
as globle no need to initialize the data it
will promt at the time of running the
graph for that parameter.

local parameter is like local variable in c


language where as formal parameter is
like command line argument we need to
pass at run time.

Q.what is BRODCASTING and


REPLICATE ?

Answer:Broadcast can do everything


that replicate does broadcast can also
send singlt file to mfs with out splitiong
and brodcast makes multiple copies of
single file mfs. Replicate combines data
rendomly, receives in single flow and
write a copy of that flow in each of
output flow.

replicate generates multiple straight


flows as the output where as broadcast
results single fanout flow.

replicate improves component


parallelism where as broadcast
improves data parallelism.
Broadcast – Takes data from multiple
inputs, combines it and sends it to all
the output ports.

Eg – You have 2 incoming flows (This


can be data parallelism or component
parallelism) on Broadcast component,
one with 10 records & other with 20
records. Then on all the outgoing flows
(it can be any number of flows) will have
10 + 20 = 30 records

Replicate – It replicates the data for a


particular partition and send it out to
multiple out ports of the component, but
maintains the partition integrity.

Eg – Your incoming flow to replicate has


a data parallelism level of 2. with one
partition having 10 recs & other one
having 20 recs. Now suppose you have
3 output flos from replicate. Then each
flow will have 2 data partitions with 10 &
20 records respectively.

Q.what is the importance of EME in


abinitio?

Answer: EME is a repository in Ab


Inition and it used for checkin and
checkout for graphs also maintains
graph version.

Q.what is m_dump

Answer: It is a co-opating system’s


command that we use to view data from
the command prompt.

m_dump command prints the data in a


formatted way.

m_dump

Q.what is the syntax of m_dump


command?

Answer: m_dump

Q.what are differences between


different GDE
versions(1.10,1.11,1.12,1.13and 1.15)?
Answer: what are differences between
different versions of Co-op?

1.10  is a non key version and rest are


key versions.

There are lot of components added and


revised at following versions.
Q.How to run the graph without GDE?

Answer: In the run directory a graph can


be deployed as a .ksh file. Now, this .ksh
file can be run at the command prompt
as:

ksh

Q.What is the Difference between DML


Expression and XFR Expression ?

Answer: dml expression means abinitio


dml are stored or saved in a file and dml
describs the data interms of
expressions that performs simple
computations such as files, dml also
contains transform functions that
control data transforms,and also
describs data interms of keys that
specify grouping or non grouping ,that
means dml expression are non
embedded record format files

.xfr means simply say it is non


embedded transform files ,Transform
function is express business rules ,local
variables, statements and as well as
conn between this elements and the
input and the ouput fields.

Q.How Does MAXCORE works?

Answer: Maxcore is a temporary


memory used to sort the records

Maxcore is a value (it will be in Kb).


Whenever a component is executed it
will take that much memory we
specified for execution

Maxcore is the maximum memory that


could be used by a component in its
execution.

Q.What is $mpjret? Where it is used in


ab-initio?

Answer: $mpjret is return value of shell


command “mp run” execution of Ab-
Initio graph.

this is generally treated as graph


execution status return value

Q.What is the latest version that is


available in Ab-initio?

Answer: The latest version of GDE


ism1.15 AND Co>operating system is
2.14

Q.What is mean by Co>Operating


system and why it is special for Ab-
initio ?

Answer: Co-Operating systems, that


itself means a lot, it’s not merely an
engine or interpretor. As it says, it’s an
operating system which co-exists with
another operating system. What does
that mean…. in layman’s term abinitio,
unlike other applications, does not sit as
a layer on top of any OS? It itself has
quite a lot of operating system level
capabilities such as multi files, memory
management and so on and this way it
completely integrate with any other OS
and work jointly on the available
hardware resources. This sort of
Synergy with OS optimize the utilization
of available hardware resources. Unlike
other applications (including most
other ETL tools) it does not work like a
layer and interprete the commands.
 That is the major difference with other
ETL tools , this is the reason why
abinitio is much much faster than any
other ETL tool and obviously much
much costlier as well.

Q.How to take the input data from an


excel sheet?

Answer: There is a Read Excell


component that reads the excel either
from host or from local drive. The dml
will be a default one.

Through Read Excel component in


$AB_HOME we can read excell directly.

Q.How will you test a dbc file from


command prompt ??

Answer: You can test a dbc file from


command prompt(Unix) using m_db
test command which gives the checking
of data base connection, version of data
base, user

Q.Which one is faster for processing


fixed length dmls or delimited dmls and
why?

Answer: Fixed length DML’s are faster


because it will directly read the data of
that length without any comparisons but
in delimited one,s every character is to
be compared and hence delays

Q.what are the contineous components


in Abinitio?

Answer: Contineous components used


to create graphs,that produce useful
output file while running continously

Ex:- Contineous rollup,Contineous


update,batch subscribe

Q.How can I calculate the total memory


requirement of a graph?

Answer:

You can roughly calculate memory


requirement as:

Each partition of a component uses:~ 8


MB + max-core (if any)

Add size of lookup files used in phase (if


multiple components use same lookup
only count it once) Multiply by degree of
parallelism. Add up all components in a
phase; that is how much memory is used
in that phase.
Add size of input and output
datasets(Total memory requirement of a
graph) > (the largest-memory phase in
the graph).

Q.What is multistage component?

Answer: Multistage component are


nothing but the transform components
where the records are transformed into
five stages like input selection,
temporary records initialization,
processing , finalization and output
selection.

examples of multistage components


are like

Rollup

Scan

Normalize

Denormalize sorted.

Q.what is the use of aggregation when


we have rollup as we know rollup
component in ab-Initio is used to
summarize group of data record. then
where we will use aggregation ?
Answer:Rollup has a good control over
record selection grouping and
aggregation as compared to that of
aggregate. Rollup is an updated version
of aggregate.

When Rollup is in template mode ,it has


aggregation functions to use. So it is
better to go for Rollup.

Q.Phase verses Checkpoint ?

Answer:

Difference between a phase and


checkpoint .

phases are used to break up a graph so


that it does not use up all the memory ,
it limits the number of active
components thus reduce the number of
components running in parallel hence
improves the performance .Phases
make possible the effective utilization
of the resources such as memory disk
space and CPU So when we have
memory consuming components in the
straight flow and the data in flow is in
millions we can separate the
process out in one phase so as the CPU
allocation is more for the process to
consume less time for the whole
process to get over.

Temporary files created during a phase


will be deleted after completion of that
phase.

Don’t put phase after


Replicate,sort,across all to all flows and
temporary files.

Check points are used for the purpose


of recovery.

In contrary Checkpoints are like save


points .These are required if we need to
run the graph from the saved last phase
recovery file(phase break checkpoint) if
it fails unexpectedly.

At job start,output datasets are copied


into temporary files and after the
completion of check pointing all
datasets and job state are copied into
temporary files. so if any failure occurs
job can be run from last committed
check point.
Use of phase breaks which includes the
checkpoints would degrade the
performance but ensures save point
run.

The major difference between these two


is that phasing deletes the intermediate
files made at the end of each phase as
soon as it enters the next phase.

On the other hand what check pointing


does is…it stores these intermediate
files till the end of the graph. Thus we
can easily use the intermediate file to
restart the process from where it failed.
But this cannot be done in case of
phasing.

We can have phases without check


points.

We can not assign checkpoints without


phases.

Q.In Ab-Initio, How can you display


records between 50-75.. ?

Answer: Input dataset having 100


records. I want records between 50-75
then use m_dump -start 50 -end 75

For serial and mfs there are many ways


the components can be used.

1.Filter by Expression : use


next_in_sequence() >50 &&
next_in_sequence() <75 2.We can also
use multiple LEADING RECORDS
components for meeting the
requirement. If you have the access to
Co>Op then you can try an alternate.

Say suppose the input file is : file 1

Use the Run program component in GDE


and write

the below command: `sed -n50 75p file


1 > file 2`

Q.What is the order of evaluation of


parameters?

Answer: When you run a graph,


parameters are evaluated in the
following order

The host setup script is run.Common (i.e,


included) sandbox parameters are
evaluated.

Sandbox parameters are evaluated.

The project-start.ksh script is run.

Graph parameters are evaluated.

The graph Start Script is run.

The execution of process is run


simultaneously based component’s
layouts.

The Lookup files is run

The graph Meta data is checking


process.

The in/out file paths with files are


checking.

The graph runs as order of phase0,


phase1, phase2,..

Q.How do you convert 4-way MFS to


8-way mfs?

Answer: By partitioning. we can use any


partition method to partition.

Partitioning methods are:

Partition by Round-robin
Broadcast
Partition by Key
Partition by Expression
Partition by Range
Partition by Percentage
Partition by Load Balance

Q.For data parallelism,we can use


partition components. For component
parallelism,we can use replicate
component.Like this which
component(s) can we use for pipeline
parallelism?

Answer:When connected sequence of


components of the same branch of
graph execute concurrently is called
pipeline parallelism.

Components like reformat where we


distribute input flow to multiple o/p flow
using output index depending on some
selection criteria and process those o/p
flows simultaneously creates pipeline
parallelism.

But components like sort where entire


i/p must be read before a single record
is written to o/p can not achieve
pipeline parallelism.
Q.what is meant by fancing in abinitio ?

Answer:The word Abinitio means from


the beginning.

did you mean “fanning” ? “fan-in” ? “fan-


out” ?

Q.how to retrive data from database to


source in that case whice componenet
is used for this?

Answer:To unload (retrive)  Data from


the database DB2, Informix, or Oracle
we have components like Input Table
and Unload DB Table by using these two
components we can unload data from
the database

Q.what is the relation between EME ,


GDE and Co-operating system ?

Answer: EME is said as enterprise


metdata env, GDE as graphical
devlopment env and Co-operating
sytem  can be said as asbinitio server

relation b/w this CO-OP, EME AND GDE


is as fallows
Co operating system is the Abinitio
Server.   this co-op is installed on 
perticular O.S platform that is called
NATIVE O.S .comming to the EME, its i
just as repository in informatica , its
hold the metadata,trnsformations,db
config files source and targets
informations. comming to GDE its is
end user envirinment where we can
devlop the graphs(mapping just like in
informatica)

desinger uses the GDE and designs the


graphs and save to the EME or Sand box
it is at user side.where EME is ast server
side.

Q.what is the use of aggregation when


we have rollup
as we know rollup component in
abinitio is used to summirize group of
data record. then where we will use
aggregation ?

Answer: Aggregation and Rollup both


can summerise the data but rollup is
much more convenient to use. In order
to understand how a particular
summerisation being rollup is much
more explanatory compared to
aggregate. Rollup can do some other
functionalities like input and output
filtering of records.

Q.what are kinds of layouts does ab


initio supports

Answer: Basically there are serial and


parallel layouts supported by AbInitio. A
graph can have both at the same time.
The parallel one depends on the degree
of data parallelism. If the multi-file
system is 4-way parallel then a
component in a graph can run 4 way
parallel if the layout is defined such as
it’s same as the degree of parallelism.

Q.How can you run a graph infinitely?

Answer:To run a graph infinitely, the end


script in the graph should call the .ksh
file of the graph. Thus if the name of the
graph is abc.mp then in the end script of
the graph there should be a call to
abc.ksh.
Like this the graph will run infinitely.
Q.How do you add default rules in
transformer?

Answer: Double click on the transform


parameter of parameter tab page of
component properties, it will open
transform editor. In the transform editor
click on the Edit menu and then select
Add Default Rules from the drop down.
It will show two options – 1) Match
Names 2) Wildcard.

Q.Do you know what a local lookup is?

Answer: If your lookup file is a multifile


and partioned/sorted on a particular key
then local lookup function can be used
ahead of lookup function call. This is
local to a particular partition depending
on the key.

Lookup File consists of data records


which can be held in main memory. This
makes the transform function to retrieve
the records much faster than retrieving
from disk. It allows the transform
component to process the data records
of multiple files fastly.
Q.What is the difference between look-
up file and look-up, with a relevant
example?

Answer: Generally Lookup file


represents one or more serial files(Flat
files). The amount of data is small
enough to be held in the memory. This
allows transform functions to retrive
records much more quickly than it could
retrive from Disk.
A lookup is a component of abinitio
graph where we can store data and
retrieve it by using a key parameter.

A lookup file is the physical file where


the data for the lookup is stored.

Q.how to handle if DML changes


dynamically in abinitio

Answer: If the DML changes


dynamically then both dml and xfr has
to be passed as graph level parameter
during the runtime.

By parametrization or by conditional
record format or by metadata
Q.Explain what is lookup?

Answer: Lookup is basically a specific


dataset which is keyed. This can be
used to mapping values as per the data
present in a particular file (serial/multi
file). The dataset can be static as well
dynamic ( in case the lookup file is
being generated in previous phase and
used as lookup file in current phase).
Sometimes, hash-joins can be replaced
by using reformat and lookup if one of
the input to the join contains less
number of records with slim record
length.

AbInitio has built-in functions to retrieve


values using the key for the lookup.

Q.What is a ramp limit?

Answer: The limit parameter contains


an integer that represents a number of
reject events .
The ramp parameter contains a real
number that represents a rate of reject
events in the number of records
processed.
no of bad records allowed = limit + no of
records*ramp.
ramp is basically the percentage value
(from 0 to 1)
This two together provides the threshold
value of bad records.

Q.Have you worked with packages?

Answer: Multistage transform


components by default uses packages.
However user can create his own set of
functions in a transfer function and can
include this in other transfer functions.

Q.Have you used rollup component?


Describe how.

Answer: If the user wants to group the


records on particular field values then
rollup is best way to do that. Rollup is a
multi-stage transform function and it
contains the following mandatory
functions.
1. initialise
2. rollup
3. finalise
Also need to declare one temporary
variable if you want to get counts of a
particular group.
For each of the group, first it does call
the initialise function once, followed by
rollup function calls for each of the
records in the group and finally calls the
finalise function once at the end of last
rollup call.

Q.How do you add default rules in


transformer?

Answer: In case of reformat if the


destination field names are same or
subset of the source fields then no need
to write anything in the reformat xfr
unless you dont want to use any real
transform other than reducing the set of
fields or split the flow into a number of
flows to achive the functionality.

1)If it is not already displayed, display


the Transform Editor Grid.
2)Click the Business Rules tab if it is not
already displayed.
3)Select Edit > Add Default Rules.

Add Default Rules — Opens the Add


Default Rules dialog. Select one of the
following: Match Names — Match
names: generates a set of rules that
copies input fields to output fields with
the same name. Use Wildcard (.*) Rule
— Generates one rule that copies input
fields to output fields with the same
name.

Q.What is the difference between


partitioning with key and round robin?

Answer: Partition by Key or hash


partition -> This is a partitioning
technique which is used to partition
data when the keys are diverse. If the
key is present in large volume then there
can large data skew. But this method is
used more often for parallel data
processing.
Round robin partition is another
partitioning technique to uniformly
distribute the data on each of the
destination data partitions. The skew is
zero in this case when no of records is
divisible by number of partitions. A real
life example is how a pack of 52 cards
is distributed among 4 players in a
round-robin manner.

If you have some 30 cards taken at


random from 52 card pack——-If take
the card color as key(red or white) and
distribute then the no of cards in each
partion may vary much.But in Round
robin , we distribute with block size , so
the variation is limited to the block size

Partition by Key – Distribute according


to the key value

Partition by Round Robin – Distribute a


predefined number of records to one
flow and then the same numbers of
records to the next flow and so on. After
the last flow resumes the pattern and
almost evenly distributes the records…
This patter is called round robin fashion.

Q.How do you truncate a table? (Each


candidate would say only 1 of the
several ways to do this.)

Answer: From Abinitio run sql


component using the DDL “trucate
table”
By using the Truncate table component
in Ab Initio

There are many ways to do it.

1. Probably the easiest way is to use


Truncate Table

2. Run Sql or update table can be used to


do the same thing

3. Run Program

Q.Have you eveer encountered an error


called “depth not equal”? (This occurs
when you extensively create graphs it is
a trick question)

Answer: When two components are


linked together if their layout doesnot
match then this problem can occur
during the compilation of the graph. A
solution to this problem would be to use
a partitioning component in between if
there was change in layout.

have talked about a situation where you


have linked

2 components – each of them having


different layouts.
Think about a situation where the
components on the left hand side is
linked to a serial dataset and on the
right hand side the downstream
component is linked to a multifile.
Layout is going to be propagaed from
naghbours.

So without any partitioning component


the jump in the depth cannot be
achieved and I suppose you must need
one partitioning component which can
help alleviate this depth discrepancy.

Q.What is the function you would use to


transfer a string into a decimal?

In this case no specific function is


required if the size of the string and
decimal is same. Just use decimal cast
with the size in the transform function
and will suffice. For example, if the
source field is defined as string(8) and
the destination as decimal(8) then (say
the field name is field1).
out.field :: (decimal(8)) in.field
If the destination field size is lesser than
the input then use of string_substring
function can be used likie the following.
say destination field is decimal(5).
out.field ::
(decimal(5))string_lrtrim(string_substring(in.field,1,5))
/* string_lrtrim used to trim leading and
trailing spaces */
Hope this solution works.

Q.How many parallelisms are in


Abinitio? Please give a definition of
each.

Answer: There are 3 kinds of


Parallelism:
1) Data Parallesim
2)Componnent Paralelism
3) Pipeline.

When the data is divided into smalll


chunks and processed on different
components simultaneously we call it
DataParallelism

When different components work on


different data sets it is called
Component parallelism

When a graph uses multiple


components to run on the same data
simultaneously we call it Pipeline
parallelism

Q.What is multi directory?

Answer:A multi directory is a parallel


directory that is composed of individual
directories, typically on different disks
or computers. The individual directories
are partitions of the multi directory.
Each multi directory contains one
control directory and one or more data
directories. Multi files are stored in multi
directories.

Q.What is multi file?

Answer: A multi file is a parallel file that


is composed of individual files, typically
on different disks or computers. The
individual files are partitions of the multi
file. Each multi file contains one control
partitions and one or more data
partitions. Multi files are stored in
distributed directories called multi
directories. This diagram shows a multi
directory and a multi file in a multi file
system:
The data in a multi file is usually divided
across partitions by one of these
methods:

Random or round robin partitioning

Partitioning based on ranges or


functions

Replication or broadcast, in which each


partition is an identical copy of the
serial data.

Q.What is mean by GDE, SDE? What is


purpose of GDE, SDE?

Answer:

GDE – Graphical Development


Environment –it is used for developing
the graphs

SDE – Shell Development Environment,


which is used for developing the korn
shell script on co>operating system.

Q.What is difference between Rollup


and Scan ?

Answer:
Roll up comp:

Rollup evaluates a group of input


records that have the same key and
then generates data records that either
summarize each group or select certain
information from each group.

Using Rollup component can evaluates


to two ways as follows: 1. Template
mode 2. Expanded Mode

1. Template Mode:

This mode options evaluates using built


aggregation functions alike sum, min,
max, count, avg,

product, first, last.

2. Expanded Mode:

This mode option can evaluates using


(without aggregation functions) user
defined functions alike temporary
function, initialize, finalize and rollup
functions in transform function
propriety.

Scan generates a series of cumulative


summary records — such as successive
year-to-date totals for groups of data
records. Scan produces intermediate
summary records.

Rollup is for group by and Scan is for


successive total. Basically, when we
need to produce summary then we use
scan. Rollup is used to aggregate data.

Q.What is Runtime Behavior of Rollup?

Answer: Roll up can supports two types


of modes.

1.Template Mode:

This mode options evaluates using built


aggregation functions alike sum, min,
max, count, avg,

product, first, last.

2. Expanded Mode:

This mode option can evaluates using


(without aggregation functions) user
defined functions alike temporary
function, initialize, finalize and rollup
functions in transform function
propriety.

Rollup component’s performance differs


from using Rollup Input is Sorted and
Rollup Input is Unsorted

When Rollup Input is sorted

When you set the sorted-input


parameter to Input must be sorted or
grouped (the default), Rollup requires
data records grouped according to the
key parameter. If you need to group the
records, use Sort with the same key
specifier that you use for Rollup. It will
produces sorted outputs in output port.

When Rollup Input is Unsorted

When you set the sorted-input


parameter to In memory: Input need not
be sorted, Rollup accepts un grouped
input, and groups all records according
to the key parameter. It does not
produce sorted output.

Q.How do you do rollback in Ab-Initio?

Answer:Ab-Initio has supports very


good recovery options for any failures at
runtime and interrupted powers at
development time.

Development time:

You can get a recovery graph file while


occurred any interrupted failures at
development time.

At Runtime:

You can get a recovery file while


occurred any failures at execution of
graph and you can restart the execution.
The recovery file has last checkpoint
information and restarts from last
checkpoint onwards.

you can use two ways to rollback the


Ab-Initio graphs

m_rollback –d -deletes all intermediate


files and checkpoints

Q.What is internal execution (process)


of the Ab-Initio graphs in Ab-Initio
co>operating system on while running
the graphs?

Answer:Normally the Ab-Initio Co>


operating system checks relevant code
compatible of GDE and

Co>operating system. if you are used


any lookup files in graphs. This is called
lookup layout checking.

The graphs are having input and output


files and it checks whether the path are
correct or not, given below the
sequence of process has done while
running the graphs.

Checks lookup files layouts.

Checks meta data part (this is part


check whether data types are used or
not and related everything) – dml
checking for each component basis.

Checks input files

Checks output files

Checks each component’s layouts

Finally, it checks flow of process


assigns to straight.

Q.What does dependency analysis


mean in Ab-Initio?
Answer: dependency analysis will
answer the questions regarding data
linage that is where does the data
comes from and what applications
produced depend on this data etc..

Q.What is meant by Fencing in Ab-


Initio?

Answer: In Software World fencing


means job controlling on priority basis.

In Ab-Initio it actually refers to


customized phase breaking.

A well fenced graph means no matter


what is source data volume process will
not cough in dead locks.

It actually limits the number of


simultaneous processes.

In Ab-Initio you need to Fence the job in


some times to stop the schedule.

Fencing is nothing but changing the


priority of the particular job.

Q.What is the function of fuse


component?
Answer: Fuse combines multiple input
flows into a single output flow by
applying a transform function to
corresponding records of each flow

Runtime Behavior of Fuse

Fuse applies a transform function to


corresponding records of each input
flow. The first time the transform
function executes, it uses the first
record of each flow. The second time
the transform

function executes, it uses the second


record of each flow, and so on. Fuse
sends the result of the transform
function to the out port.

The component works as follows. The


component tries to read from each of its
input flows.

* If all of its input flows are finished,


Fuse exits.

* Otherwise, Fuse reads one record from


each still-unfinished input port and a
NULL from each finished input port.
Q.what is data skew? how can you
eliminate data skew while i am using
partiiion by key?

Answer: The skew of a data or flow


partition is the amount by which its size
deviates from the average partition size
expressed as a percentage of the
largest partition

Skew of data (partition size –


avg.partition size)*100/(size of largest
partition)

Q.What is $mpjret? Where it is used in


ab-Initio?

Answer:

$mpjret gives the status of a graph.

U can use $mpjret in end script like

if 0 -eq($mpjret)

then

echo success

else

mailx -s [graph_name] failed mail_id


Q.What are primary keys and foreign
keys?

Answer: In RDBMS the relationship


between the two tables is represented
as Primary key and foreign key
relationship.Wheras the primary key
table is the parent table and foreignkey
table is the child table.The criteria for
both the tables is there should be a
matching column.

Q.What is an outer join?

Answer: An outer join is used when one


wants to select all the records from a
port – whether it has satisfied the join
criteria or not.

If you want to see all the records of one


input file independent of whether there
is a matching record in the other file or
not. then its an outer join.

Q.What are Cartesian joins?

Answer: joins two tables without a join


key. Key should be {}.

A Cartesian join will get you a Cartesian


product. A Cartesian join is when you
join every row of one table to every row
of another table. You can also get one
by joining every row of a table to every
row of itself.

Q.What is the difference between a DB


config and a CFG file?

Answer: A .dbc file has the information


required for Ab Initio to connect to the
database to extract or load tables or
views. While .CFG file is the table
configuration file created by db_config
while using components like Load DB
Table.

Both DBC and CFG files are used for


database connectivity, basically both
are of similar use. The only difference is,
cfg file is used for Informix Database,
whereas dbc are used for other
database such as Oracle or Sqlserver

Q.What is the difference between a


Scan component and a RollUp
component?

Answer: Rollup is for group by and Scan


is for successive total. Basically, when
we need to produce summary then we
use scan. Rollup is used to aggregate
data.

1. what is local and formal parameter?

Answer: Two are graph level parameters


but in local you need to initialize the
value at the time of declaration where
as globle no need to initialize the data it
will promt at the time of running the
graph for that parameter.

Q.How will you test a dbc file from


command prompt ??

Answer: try “m_db test myfile.dbc”

Q.Explain the difference between the


“truncate” and “delete” commands ?

Answer. Truncate :- It is a DDL


command, used to delete tables or
clusters. Since it is a DDL command
hence it is auto commit and Rollback
can’t be performed. It is faster than
delete.
Delete:- It is DML command, generally
used to delete a record, clusters or
tables. Rollback command can be
performed , in order to retrieve the
earlier deleted things. To make deleted
things permanently, “commit” command
should be used.

Q.How to retrive data from database to


source in that case whice componenet
is used for this?

Answer. To unload (retrive) Data from


the database DB2, Informix, or Oracle
we have components like Input Table
and Unload DB Table by using these two
components we can unload data from
the database.

Q.How many components are there in


your most complicated graph?

Answer: This is a tricky question,


number of component in a graph has
nothing to do with the level of
knowledge a person has. On the
contrary, a proper standardized and
modular parametric approach will
reduce the number of components to a
very few. In a well thought modular and
parametric design, mostly the graphs
will have 3/4 components, which will be
doing a particular task and will then call
another sets of graphs to do the
next and so on. This way total numbers
of distinct graphs will
drastically come down, support and
maintenance will be much more
simplified. The bottom line is, there are
lot more other things to plan rather than
to add components.

Q.Do you know what a local lookup is?

Answer: This function is similar to a


lookup…the difference being that this
function returns NULL when there is no
record having the value that has been
mentioned in the arguments of the
function.
If it finfs the matching record it returns
the complete record..that is all the fields
along with their values corresponding to
the expression mentioned in the lookup
local function.
eg: lookup_local( “LOOKUP_FILE”,81) ->
null
if the key on which the lookup file is
partitioned does not hold any value as
mentioned.
Local Lookup files are small files that
can be accommodated into
physical memory for use in transforms.
Details like country
code/country, Currency code/currency,
forexrate/value can be used in a lookup
file and mapped during transformations.
Lookup files are not connected to any
component of the graph but available to
reformat for
mapping.

Q.How to Create Surrogate Key using Ab


Initio?

Ans. A key is a field or set of fields that


uniquely identifies a record in a file or
table.

A natural key is a key that is meaningful


in some business or real-world sense.
For example, a social security number
for a person, or a serial number for a
piece of equipment, is a natural key.
A surrogate key is a field that is added
to a record, either to replace the natural
key or in addition to it, and has no
business meaning. Surrogate keys are
frequently added to records when
populating a data warehouse, to help
isolate the records in the warehouse
from changes to the natural keys by
outside processes.

Q.How to Improve Performance of


graphs in Ab initio?
Give some examples or tips.

Ans. There are somany ways to improve


the performance of the graphs in
Abinitio.
I have few points from my side.
1.Use MFS system using Partion by
Round by robin.
2.If needed use lookup local than lookup
when there is a large data.
3.Takeout unnecessary components like
filter by exp instead provide them in
reformat/Join/Rollup.
4.Use gather instead of concatenate.
5.Tune Max_core for Optional
performance.
6.Try to avoid more phases.

There are many ways the performance


of the graph can be improved.
1) Use a limited number of components
in a particular phase
2) Use optimum value of max core
values for sort and join components
3) Minimise the number of sort
components
4) Minimise sorted join component and
if possible replace them by in-memory
join/hash join
5) Use only required fields in the sort,
reformat, join components
6) Use phasing/flow buffers in case of
merge, sorted joins
7) If the two inputs are huge then use
sorted join, otherwise use hash join with
proper driving port
8) For large dataset don’t use broadcast
as partitioner
9) Minimise the use of regular
expression functions like re_index in the
trasfer functions
10) Avoid repartitioning of data
unnecessarily

Q.Describe the process steps you


would perform when defragmenting a
data table. This table contains mission
critical data ?

Answer: There are several ways to do


this:
1) We can move the table in the same or
other tablespace and rebuild all the
indexes on the table.
alter table move this activity reclaims
the defragmented space in the table
analyze table table_name compute
statistics to capture the updated
statistics.

2)Reorg could be done by taking a dump


of the table, truncate the table and
import the dump back into the table.

Q.How do we handle if DML changing


dynamically ?

Answer: There are lot many ways to


handle the DMLs which changes
dynamically with in a single file. Some
of the suitable methods are to use a
conditional DML or to call the vector
functionality while calling the DMLs.

Q.What r the Graph parameter?

Answer: There are 2 types of graph


parameters in AbInitio
1. local parameter
2. Formal parameters.(those
parameters working at runtime)

Q.What is meant by fancing in abinitio ?

Answer: The word Abinitio means from


the beginning.

Q.What is a ramp limit?

Answer: Limit and Ramp.


For most of the graph components, we
can manually set the error
threshold limit, after which the graph
exits. Normally there are three
levels of thresholds like “Never Exit” and
“Exit on First Occurance”,
very clear from the text. They represent
both the extremes. The third
one is Limit along with Ramp. Limit
talks about max limit where as RAMP
talks in terms of percentage of
processed records. For example a ramp
value of 5 means, if less than 5% of the
total records are rejected,
continue running. If it crosses the ramp
then it will come out of the
graph. Typically development starts with
never exit, followed by ramp
and finally in production “Exit on First
Occurance”. Case to case basis
RAMP can be used in production but
definitely not a desired approach.

Q.Difference between conventional


loading and direct loading ? when it is
used in real time ?

Answer:

Conventional Load:

Before loading the data all the Table


constraints will be checked against the
data.

Direct load:(Faster Loading)

All the Constraints will be disabled. Data


will be loaded directly. Later the data
will be checked against the table
constraints and the bad data won’t be
indexed.

api conventional loading

utility direct loading.

Q.How do you done the unit testing in


Ab-Initio? How will you perform the Ab-
Initio Graphs executions? How will you
increase the performance in Ab-Inito
graphs?

Answer:

The Ab-Initio Co>operating system is


handling the graph with multiple
processes running simultaneously. This
is primary performance. Follows the
given below actions:

1. The data separators mostly use “\307”


and “\007” instead of “~”, “,” and
special characters and avoids these
delimiters. Because of the Ab-Initio has
predefined these data separators.

2. Avoids repeated aggregation in graphs.


You calculate for required aggregation
at once and stores in file calls value
using parameters and then you can use
this parameter where it required.

3. Avoids the maximum number of


components in graph and max core
components in graphs.

4. Don’t write any kinds looping


statements in start script

5. Mostly use the sources are flat files

Q.How do you improve the performance


of a graph?

Answer:There are many ways the


performance of the graph can be
improved.

Use a limited number of components in a


particular phase

Use optimum value of max core values


for sort and join components

Minimize the number of sort


components

Minimize sorted join component and if


possible replace them by in-memory
join/hash join

Use only required fields in the sort,


reformat, join components

Use phasing/flow buffers in case of


merge, sorted joins

If the two inputs are huge then use


sorted join, otherwise use hash join with
proper driving port

For large dataset don’t use broadcast as


partitioner

Minimize the use of regular expression


functions like re_index in the transfer
functions

Avoid repartitioning of data


unnecessarily

Q.How would you do performance


tuning for already built graph?

Answer:Steps to performance Tuning


for already built graph.

Understand the functionality of the


Graph.

Modularize(i.e,check for dependencies


among components).

Give Phasing.

Check for correct Parallelism.


Check for DB component(i.e,take
required data from DB. Instead of taking
whole data from DB which consumes
more time and memory.

Q.What is .abinitiorc ? What it contain?

Answer:.abinitiorc is a file which


contains the credentials to connect to
host.

Credentials like

1)Host IP

2)User-name

3)Password etc…

This is a config file for ab-Initio – in


user’s home directory and in
$AB_HOME/Config. It sets Ab-Initio
home path, configuration variables
(AB_WORK_DIR, AB_DATA_DIR, etc.),
login info (id, encrypted password),
login methods for hosts for execution
(like EME host, etc.), etc.

Q.Why might you create a stored


procedure with the ‘with recompile’
option?

Answer: Recompile is useful when the


tables referenced by the stored
procedure undergoes a lot of

modification/deletion/addition of data.
Due to the heavy modification activity
the execute plan

becomes outdated and hence the stored


procedure performance goes down. If
we create the stored procedure with
recompile option, the sql server wont
cache a plan for this stored procedure
and it will be recompiled every time it is
run.

Q.What is the purpose of having stored


procedures in a database?

Answer:Main Purpose of Stored


Procedure for reduce the network traffic
and all sql statement executing in
cursor so speed too high.

We use Run SQL and Join with DB


components to run Stored Procedures.

Q.What is mean by Co>Operating


system and why it is special for Ab-
Initio?

Answer:

Co > Operating System:Layered top to


the Native operating system.

It converts the Ab-Initio specific code


into the format, which the
UNIX/Windows can understand and
feeds it to the native operating system,
which carries out the task.

Q.How to retrieve data from database


to source in that case which component
is used for this?

Answer: To unload (retrieve) Data from


the database DB2, Informix, or Oracle
we have components like Input Table
and Unload DB Table by using these two
components we can unload data from
the database.

Input Table Component use the


following parameters:

1)db_config file(which contains


credentials to interface with Database)
2)Database Types

3)SQL file (which contains sql queries to


unload data from table(s)).

Q.How to execute the graph from start


to end stages?Tell me and how to run
graph in non Ab-Initio system?

Answer:

There are so many ways to do this,

1.you can run components according to


phases how you defined.

2.by creating ksh, sh scripts also you


can run.

Q.What is Join With DB?

Answer: Join with DB Component joins


records from the flow or flows
connected to its in port with records
read directly from a database, and
outputs new records containing data
based on, transform function.

Q.How do you truncate a table?

Answer: Use Truncate Table component


to truncate a table from DB in Ab-Initio.

Truncate Table Component has the


following parameters:

1)db_config file(which contains


credentials to interface with Database)

2)Database Types

3)SQL file (which contains sql queries to


truncate table(s)).

Q.Can we load multiple files?

Answer: Yes,we can load multiple file in


Ab-Initio.

Q.What is the syntax of m_dump


command?

Answer: m_dump command prints the


data in a formatted way.

The general syntax is

m_dump

“m_dump meta data data [action] ”

e.g

m_dump emp.dml emp.dat -start 10


-end 20

– it will give record from 10 to 20 from


emp.dat file.

Q.How to Create Surrogate Key using


Ab-Initio?

Answer: A surrogate key is a


substitution for the natural primary key.

–It is just a unique identifier or number


for each record like ROWID of an Oracle
table.

Surrogate keys can be created using

1)next_in_sequence

2)this_partition

3)no_of_partitions

Q.Can any one give me an example of


real-time start script in the graph?

Answer: Start script is a script which


gets executed before the graph
execution starts. If we want to export
values of parameters to the graph then
we can write in start script then run the
graph then those values will be exported
to graph.

Q.What is the difference between


sandbox and EME, can we perform
checkin and checkout through
sandbox/ Can anybody explain checkin
and checkout?

Answer. Sandboxes are work areas


used to develop, test or run code
associated with a given project. Only
one version of the code can be held
within the sandbox at any time.
The EME Datastore contains all versions
of the code that have been checked into
it. A particular sandbox is associated
with only one Project where as a Project
can be checked out to a number of
sandboxes.

Q.What is skew and skew


measurement?

Answer: skew is the mesaureof data


flow to each partation .
suppose i/p is comming from 4 files
and size is 1 gb
1 gb= ( 100mb+200mb+300mb+5oomb)
1000mb/4= 250 mb
(100- 250 )/500= –> -150/500 == cal ur
self it wil come in -ve value.
calclu for 200,500,300.
+ve value of skew is allways desriable.
skew is a indericet measure of graph.

Q.What is the syntax of m_dump


command?

Answer: The genaral syntax is “m_dump


metadata data [action] ”

Q.What is the latest version that is


available in Ab-initio?

Answer: The latest version of GDE


ism1.15 AND Co>operating system is
2.14

Q.What is the Difference between DML


Expression and XFR Expression ?

Answer: The main difference b/w dml &


xfr is that
DML represent format of the metadata.
XFR represent the tranform
functions.which will contain business
rules

Q.What are the most commonly used


components in a Abinition graph? can
anybody give me a practical example of
a trasformation of data, say customer
data in a credit card company into
meaningful output based on business
rules?

Answer: The most commonly used


components in to any Ab Initio project
are
input file/output file
input table/output table
lookup file
reformat,gather,join,runsql,join with
db,compress
components,sort,trash,partition by
expression,partition by key ,concatinate

Q.Have you used rollup component?


Describe how ?

Answer: Rollup component can be used


in different number of ways. It basically
acts on a group of records based on a
certain key.
The simplest application would be to
count the number of records in a certain
file or table.
In this case there would not be any “key”
associated with it. A temp variable
would be created for eg. ‘temp.count’
which would be increamented with
every record ( since there is no key here
all the fields are trated as one group)
that flows through the transform, like
temp.count=temp.count+1.
Again the rollup component can be used
to discard duplicates from a
group.Rollup basically acting as the
dedup component in this case.

1. What is the difference between


partitioning with key and round robin?

Answer: PARTITION BY KEY:


In this, we have to specify the key based
on which the partition will occur. Since it
is key based it results in very well
balanced data. It is useful for key
dependent parallelism.
PARTITION BY ROUND ROBIN:
In this, the records are partitioned in
sequential way, distributing data evenly
in blocksize chunks across the output
partition. It is not key based and results
in well balanced data especially with
blocksize of 1. It is useful for record
independent parallelism.

Q.How to work with parameterized


graphs?

Answer: One of the main purpose of the


parameterized graphs is that if we need
to run the same graph for n number of
times for different files, we set up the
graph parameters like $INPUT_FILE,
$OUTPUT_FILE etc and we supply the
values for these in the
Edit>parameters.These parameters are
substituted during the run time. we can
set different types of parameters like
positional, keyword, local etc.
The idea here is, instead of maintaining
different versions of the same graph, we
can maintain one version for different
files.

Q.How Does MAXCORE works?


Answer: Maxcore is a value (it will be in
Kb).Whne ever a component is executed
it will take that much memeory we
specified for execution.

Q.What does layout means in terms of


Ab Initio?

Answer: Before you can run an Ab Initio


graph, you must specify layouts to
describe the following to the
Co>Operating System:

The location of files

The number and locations of the


partitions of multifiles

The number of, and the locations in


which, the partitions of program
components execute

A layout is one of the following:

A URL that specifies the location of a


serial file

A URL that specifies the location of the


control partition of a multifile

A list of URLs that specifies the locations


of:
The partitions of an ad hoc multifile

The working directories of a


program component

Every component in a graph — both


dataset and program components —
has a layout. Some graphs use one
layout throughout; others use several
layouts and repartition data as needed
for processing by a greater or lesser
number of processors.

During execution, a graph writes various


files in the layouts of some or all of the
components in it. For example:

An Intermediate File component writes to


disk all the data that passes through it.

A phase break, checkpoint, or watcher


writes to disk, in the layout of the
component downstream from it, all the
data passing through it.

A buffered flow writes data to disk, in the


layout of the component downstream
from it, when its buffers overflow.

Many program components — Sort is one


example — write, then read and remove,
temporary files in their layouts.

A checkpoint in a continuous graph


writes files in the layout of every
component as it moves through the
graph.

Q.Can we load multiple files?

Answer: Load multiple files from my


perspective means writing into more
than one file at a time. If this is the
same case with you, Ab initio provides a
component called Write Multiplefiles (in
dataset Component group) which can
write multiple files at a time. But the
files which are to be written must be
local files i.e., they should reside in your
local PC. For more information on this
component read in help file.

Q.How would you do performance


tuning for already built graph ? Can you
let me know some examples?

Answer: example :- suppose sort is


used in fornt of merge component its no
use of using sort !
1)we have sort component built in
merge.
2) we use lookup instead of JOIN,Merge
Componenet.
3) suppose we wnt to join the data
comming from 2 files and we dnt wnt
dupliates we will use union funtion
instead of adding addtional component
for duplicate remover.

Q.Which one is faster for processing


fixed length dmls or delimited dmls and
why ?

Answer: Fixed length DML’s are faster


because it will directly read the data of
that length without any comparisons but
in delimited one,s every character is to
be compared and hence delays.

Q.What is the function you would use to


transfer a string into a decimal?

Answer: For converting a string to a


decimal we need to typecast it using the
following syntax,
out.decimal_field :: ( decimal(
size_of_decimal ) ) string_field;
The above statement converts the
string to decimal and populates it to the
decimal field in output.

Q.What is the importance of EME in ab


initio?

Answer: EME is a repository in Ab


Inition and it used for checkin and
checkout for graphs also maintains
graph version.

Q.How do you add default rules in


transformer?

Answer: Double click on the transform


parameter of parameter tab page of
component properties, it will open
transform editor. In the transform editor
click on the Edit menu and then select
Add Default Rules from the dropdown. It
will show two options – 1) Match
Names 2) Wildcard.

Q.What is data mapping and data


modeling?

Answer: data mapping deals with the


transformation of the extracted data at
FIELD level i.e. the transformation of the
source field to target field is specified
by the mapping defined on the target
field. The data mapping is specified
during the cleansing of the data to be
loaded.
For Example:
source;
string(35) name = “Siva Krishna “;
target;
string(“01”) nm=NULL(“”);/*(maximum
length is string(35))*/
Then we can have a mapping like:
Straight move.Trim the leading or
trailing spaces.
The above mapping specifies the
transformation of the field nm.

Q.Difference between conventional


loading and direct loading ? when it is
used in real time .

Answer: Conventional Load:


Before loading the data, all the Table
constraints will be checked against the
data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data
will be loaded directly.Later the data will
be checked against the table
constraints and the bad data won’t be
indexed.
Api conventional loading
utility direct loading.

Q.What are the contineous components


in Abinitio?

Answer: Contineous components used


to create graphs,that produce useful
output file while running continuously
Ex:- Contineous rollup,Contineous
update,batch subscribe

Q.What is mean by Co > Operating


system and
why it is special for Ab-initio ?

Answer: Co > Operating System:


It converts the AbInitio specific code
into the format, which the
UNIX/Windows can understand and
feeds it to the native operating system,
which carries out the task.

Q.How do you add default rules in


transformer?

Answer: Click to transformer then go to


edit …then click to add default rule……
In Abinitio there is a concept called Rule
Priority, in which you can assign priority
to rules in Transformer.
Let’s have a example:
Ouput.var1 :1: input.var1 + 10
Ouput.var1 :2: 100
This example shows that output
variable is assigned an input variable +
100 or if input variable do not have a
value then default value 100 is set to the
output variable.
The numbers 1 and 2 represents the
priority.

Q.How to do we run sequences of jobs ,


like output of A JOB is Input to B,How
do we co-ordinate the jobs?

Answer: By writing the wrapper scripts


we can control the sequence of
execution of more than one job.

Q.what is BRODCASTING and


REPLICATE ?
Answer: Broadcast – Takes data from
multiple inputs, combines it and sends it
to all the output ports.
Eg – You have 2 incoming flows (This
can be data parallelism or component
parallelism) on Broadcast component,
one with 10 records & other with 20
records. Then on all the outgoing flows
(it can be any number of flows) will have
10 + 20 = 30 records
Replicate – It replicates the data for a
particular partition and send it out to
multiple out ports of the component, but
maintains the partition integrity.
Eg – Your incoming flow to replicate has
a data parallelism level of 2. with one
partition having 10 records & other one
having 20 recs. Now suppose you have
3 output flos from replicate. Then each
flow will have 2 data partitions with 10 &
20 records respectively.

Ab initio Interview Questions                    


      Ab initio Interview Questions and
Answers

Q.When using multiple DML statements


to perform a single unit of work, is it
preferable to use implicit or explicit
transactions, and why.

Answer: Because implicit is using for


internal processing and explicit is using
for user open data required.

Q.What are kinds of layouts does ab


initio supports

Answer: Basically there are serial and


parallel layouts supported by AbInitio. A
graph can have both at the same time.
The parallel one depends on the degree
of data parallelism. If the multi-file
system is 4-way parallel then a
component in a graph can run 4 way
parallel if the layout is defined such as
it’s same as the degree of parallelism.

Q.What is the difference between look-


up file and look-up, with a relevant
example?

Answer: A lookup is a component of


abinitio graph where we can store data
and retrieve it by using a key parameter.
A lookup file is the physical file where
the data for the lookup is stored.

Q.How will you test a dbc file from


command prompt?

Answer: A .dbc file can be tested using


m_db command

eg: m_db test .dbc_filename

Q.Can we merge two graphs?

Answer: You can not merge two ab-


Initio graphs. You can use the output of
one graph as input for another. You can
also copy/paste the contents between
graphs.

Q.Explain the differences between api


and utility mode?

Answer: api and Utility are Database


Interfaces.

api use SQL where table constrains are


checked against the data before loading
data into Database.

Utility uses Bulk Loading where table


constraints are disabled first and data
loaded into Database and then table
constraints are checked against data.

Data loading using Utility is faster when


compared to Api. if a crash occurs while
loading data into database we can have
commit and rollback in Api but we need
to load whole in Utility mode.

Q.How to Schedule Graphs in Ab-


Initio,like work flow Schedule in
Informatica? And where we must use
Unix shell scripting in Ab-Initio?

Answer: We can use Autosys, Control-


M, or any other external scheduler to
schedule graphs in Ab-Initio.

We can take care of dependencies in


many ways. For example, if scripts
should run sequentially, we can arrange
for this in Autosys, or we can create a
wrapper script and put there several
sequential commands (nohup
command1.ksh & ; nohup
command2.ksh &; etc). We can even
create a special graph in Ab-Initio to
execute individual scripts as needed.

Q.What is Environment project in Ab-


Initio?

Answer: Environment project is a


special public project that exists in
every Ab-Initio environment. It contains
all the environment parameters required
by the private or public projects which
constitute AI Standard Environment.

Q.What is Component Folding?What is


the use of it?

Answer: Component Folding is a new


feature by which Co>operating System
combines a group of components and
runs them as a single process.

Component Folding improves the


performance of graph.

Pre-Requirements for Component


Folding

The components must be foldable.


They must be in same phase and
layout.
Components must be connected via
straight flow
Q.How do you Debug a graph ,If an error
occurs while running?

Answer: There are many ways to debug


a graph. we can use

Debugger
File Watcher
Intermediate File for debugging
purpose.

Q.What do u mean by $RUN?

Answer: This is parameter variable and


it contains only path of project sandbox
run directory. Instead of using hard-code
value to use this parameter and this is
default sandbox run directory
parameter.

fin ——-> top-level directory (


$AI_PROJECT )

|—- mp ——-> second level directory


($MP )
|—- xfr ——-> second level directory
($XFR )
|—- run ——–> second level directory
($RUN )
|—- dml ——-> second level directory
($DML )

Q.What is the importance of EME in ab-


Initio?

Answer: EME is a repository in Ab-Initio


and it used for check-in and checkout
for graphs also maintains graph version.

EME is source code control system in


Ab-Initio world. It is repository where all
the sandboxes

related(project related codes(graphs


version are maintained) code version
are maintained , we just check-in and
checkout graphs and modified it
according. There will be lock put once it
is access by any users.

Q.What is the difference between


sandbox and EME, can we perform
check-in and checkout through
sandbox/ Can anybody explain check-in
and checkout?

Answer: Sandboxes are work areas


used to develop test or run code
associated with a given project.

Only one version of the code can be


held within the sandbox at any time. The
EME Data-store contains all versions of
the code that have been checked into it.

A particular sandbox is associated with


only one Project where as a Project can
be checked out to a number of
sandboxes.

Q.What is difference between sandbox


parameters and graph parameters?

Answer: Sandbox Parameters are


common parameters for the project. it
can be used to accessible with in a
project. The graph parameters are uses
with in graph but you can’t access
outside of other graphs. It’s called local
parameters.

Q.How do you connect EME to Ab-Initio


Server?

Answer:There are several ways of


connecting to EME
Set AB_AIR_ROOT
GDE you can connect to EME data-
store
login to eme web interface
using the air command, i don’t know
much about this.

Q.What is use of co>operating system


between GDE and Host?

Answer: The co>operating system is


heart of GDE, It always referring the host
setting, environmental variable and
functions while running the graphs
through GDE. It’s interfacing the
connection setting information between
HOST and GDE.

Q.What is the use of Sandbox ? What is


it.?

Answer: Sandbox is a directory


structure of which each directory level is
assigned a variable name, is used to
manage check-in and checkout of
repository based objects such as mp,
run, dml, db, xfr and sql (graphs, graph
ksh files, wrapper scripts, dml files, xfr
files, dbc files, sql files.)

Fin ——-> top-level directory (


$AI_PROJECT )

|—- mp ——-> second level directory


($AI_MP )

|—- xfr ——-> second level directory


($AI_XFR )

|—- run ——–> second level directory


($AI_RUN )

|—- dml ——-> second level directory


($AI_DML )

Sandbox contains various directories,


which is used for specific purpose only.
The mp directory is used for storing
data mapping details about between
sources and targets or components and
the file extension must be *.mp. The xfr
directory denotes purpose of stores the
transform files and the file extension
must be *.xfr. The dml directory is used
for storing all meta-data information of
data with Ab-Initio supported data types
and the file extensions must be *.dml.
The run directory contains only the
graph’s shell script (korn shell script)
files that are created after deploying the
graph.

The sandbox contains might be stores


all kinds of information for data.

Q.What is mean by EME Data Store and


what is use of EME Data Store in
Enterprise world?

Answer: EME Data Store is a Enterprise


Meta Environment Data store
(Enterprise Repository) and its contains
’n’ number of projects (sandbox) which
are interfacing the meta data between
them. These sandbox project objects
(mp, run, db, xfr, dml) are can be easily
to manage the check-in, checked out of
the repository objects.

Mode:

In the EME Data-store Mode box of the


EME Data-store Settings dialog, choose
one of the following:

Source Code Control — This is the


recommended setting. When you set a
data-store to this mode, you must check
out a project in order to work on it. This
prevents multiple users from making
conflicting changes to a project.

Full Access — This setting is strongly


not recommended. It is for advanced
users only. It allows you to edit a project
in the data-store without checking it out.

Save Script When Graph Saved to


Sandbox

In the EME Data-store Settings dialog,


select this option to have the GDE save
the script it generates for a graph when
you save the graph. The script lets you
run the graph without the GDE if, for
example, you relocate the project.

 Contact for Ab initio Training

Overall rating: ★★★★☆ based on 433


reviews

Name

Course
Country

Email Address

Phone Number

Timings For Demo

Message

Send

 Abinitio Online Training


WRITE A REVIEW 
Name *

Email

Review Title *

Rating * 

Review Content *

Submit

Ab initio interview questions Tags

ab initio interview questions and


answers,abinitio online training, ab initio
interview questions, ab initio online
training, abinitio training, ab initio
training institute, latest ab initio
interview questions, best ab initio
interview questions 2019, top 100 ab
initio interview questions,sample ab
initio interview questions,ab intio
interview questions technical, best ab
intio interview tips, best ab initio
interview basics

For  online training videos

      

Copy Rights Reserved by KITS Online Training's Pvt Ltd - Powered By Best Design Infotech

Das könnte Ihnen auch gefallen