Ab Initio Interview Questions

Mail: Kitsonlinetrainings@gmail.
com
Phone: +91 9959766329
Ab initio Interview Questions
Q.What is surrogate key?
Answer: surrogate key is a system

generated sequential number
which acts as a primary key.
Q.Differences Between Ab-Initio and

Informatica?
Answer: Informatica and Ab-Initio both

support parallelism. But Informatica
supports only one type of parallelism
but the Ab-Initio supports three types of
parallelisms.
Component
Data Parallelism
Pipe Line parallelism.
We don’t have scheduler in Ab-Initio like

Informatica , you need to schedule
through script or you need to run
manually.
Ab-Initio supports different types of text

files means you can read same file with
different structures that is not possible
in Informatica, and also Ab-Initio is
more user friendly than Informatica .
Informatica is an engine based ETL tool,

the power this tool is in it’s
transformation engine and the code that
it generates after development cannot
be seen or modified.
Ab-Initio is a code based ETL tool, it

generates ksh or bat etc. code, which
can be modified to achieve the goals, if
any that can not be taken care through
the ETL tool itself.
Initial ramp up time with Ab-Initio is

quick compare to Informatica, when it
comes to standardization and tuning
probably both fall into same bucket.
Ab-Initio doesn’t need a dedicated

administrator, UNIX or NT admin will
suffice, where as Informatica need a
dedicated administrator.
With Ab-Initio you can read data with

multiple delimiter in a given record,
where as Informatica force you to have
all the fields be delimited by one
standard delimiter
Error Handling – In Ab-Initio you can

attach error and reject files to each
transformation and capture and analyze
the message and data separately.
Informatica has one huge log! Very
inefficient when working on a large
process, with numerous points of
failure.
Q.What is the difference between rollup

and scan?
Answer : By using rollup we cant

generate cumulative summary records
for that we will be using scan
Q.Why we go for Ab-Initio?
Answer : Ab-Initio designed to support

largest and most complex business
applications.
We can develop applications easily

using GDE for Business requirements.
Data Processing is very fast and

efficient when compared to other ETL
tools.
Available in both Windows NT and UNIX
Q.What is the difference between

partitioning with key and round robin?
Answer:
PARTITION BY KEY:
In this, we have to specify the key based
on which the partition will occur. Since it
is key based it results in very well
balanced data. It is useful for key
dependent parallelism.
PARTITION BY ROUND ROBIN:
In this, the records are partitioned in
sequential way, distributing data evenly
in blocksize chunks across the output
partition. It is not key based and results
in well balanced data especially with
blocksize of 1. It is useful for record
independent parallelism.
Q.How to Create Surrogate Key using

Ab Initio?
Answer. A key is a field or set of fields

that uniquely identifies a record in a file
or table.
A natural key is a key that is meaningful

in some business or real-world sense.
For example, a social security number
for a person, or a serial number for a
piece of equipment, is a natural key.
A surrogate key is a field that is added

to a record, either to replace the natural
key or in addition to it, and has no
business meaning. Surrogate keys are
frequently added to records when
populating a data warehouse, to help
isolate the records in the warehouse
from changes to the natural keys by
outside processes.
Q.What are the most commonly used

components in a Ab-Initio graphs?
Answer:
input file / output file
input table / output table
lookup / lookup_local
reformat
gather / concatenate
join
run sql
join with db
compression components
filter by expression
sort (single or multiple keys)
rollup
partition by expression / partition by key
Q.How do we handle if DML changing

dynamically?
Answer: There are lot many ways to

handle the DMLs which changes
dynamically with in a single file.
Some of the suitable methods are to

use a conditional DML or to call the
vector functionality while calling the
DMLs.
Q.What is meant by limit and ramp in

Ab-Initio? Which situation it’s using?
Answer: The limit and ramp are the

variables that are used to set the reject
tolerance for a particular graph. This is
one of the option for reject-threshold
properties. The limit and ramp values
should pass if enables this option.
Graph stops the execution when the

number of rejected records exceeds the
following formula.
limit + (ramp *
no_of_records_processed).
The default value will be set to 0.0.
The limit parameter contains an integer

that represents a number of reject
events The ramp parameter contains a
real number that represents a rate of
reject events in the number of records
processed.
Typical Limit and Ramp settings
Limit = 0 Ramp = 0.0 Abort on any error
Limit = 50 Ramp = 0.0 Abort after 50

errors
Limit = 1 Ramp = 0.01 Abort if more

than 2 in 100 records causes error
Limit = 1 Ramp = 1 Never Abort
Q.What are data mapping and data

modeling?
Answer: Data mapping deals with the

transformation of the extracted data at
FIELD level i.e. the transformation of
the source field to target field is
specified by the mapping defined on the
target field. The data mapping is
specified during the cleansing of the
data to be loaded.
For Example:
source;
string(35) name = “Siva Krishna “;
target;
string(“01”) nm=NULL(“”);/*(maximum
length is string(35))*/
Then we can have a mapping like:
Straight move.Trim the leading or

trailing spaces.
The above mapping specifies the

transformation of the field nm.
Q.What is the difference between a DB

config and a CFG file?
Answer : .dbc file has the information

required for Ab Initio to connect to the
database to extract or load tables or
views. While .CFG file is the table
configuration file created by db_config
while using components like Load DB
Table
Q.What is mean by Layout?
Answer: A layout is a list of host and

directory locations, usually given by the
URL of a file or multi file. If a layout has
multiple locations but is not a multi file,
the layout is a list of URLs called a
custom layout.
A program component’s layout is the list

of hosts and directories in which the
component runs.
A dataset component’s layout is the list

of hosts and directories in which the
data resides. Layouts are set on the
Properties Layout tab.
The layout defines the level of

Parallelism . Parallelism is achieved by
partitioning data and
computation across processors.
Q.What are Cartesian joins?
Answer: A Cartesian join will get you a

Cartesian product. A Cartesian join is
when you join every row of one table to
every row of another table. You can also
get one by joining every row of a table to
every row of itself.
Q.What is the function you would use to

transfer a string into a decimal?
Answer: For converting a string to a
decimal we need to typecast it using the
following syntax,
out.decimal_field :: ( decimal(
size_of_decimal ) ) string_field;
The above statement converts the
string to decimal and populates it to the
decimal field in output.

dynamically?

dynamically with in a single file. Some
of the suitable methods are to use a
conditional DML or to call the vector
functionality while calling the DMLs.we
can use MULTIREFORMAT component
to handle dynamically changing DML’s.
Q.Explain the differences between api

and utility mode?
Answer: API and UTILITY are the two

possible interfaces to connect to the
databases to perform certain user
specific tasks. These interfaces allow
the user to access or use certain
functions (provided by the database
vendor) to perform operation on the
databases. The functionality of each of
these interfaces depends on the
databases.
API has more flexibility but often

considered as a slower process as
compared to UTILITY mode. Well the
trade off is their performance and
usage.
Contact for Ab initio training
Q.What are the uses of is_valid,

is_define functions?
Answers:
is_valid and is_defined are Pre defined

functions
is valid(): Tests whether a value is valid.
The is_valid function returns:
The value 1 if expr is a valid data

item.
The value 0 if the expression does

not evaluate to NULL.
If expr is a record type that has field

validity checking functions, the is_valid
function calls each field validity
checking function. The is_valid function
returns 0 if any field validity checking
function returns 0 or NULL.
Example:
is_valid(1) 1
is_valid(“oao”) 1
is_valid((decimal(8))”1,000″) 0
is_valid((date(“YYYYMMDD”))”19960504″)
1
is_valid((date(“YYYYMMDD”))”abcdefgh”)
0
is_valid((date(“YYYY MMM DD”))”1996

May 04″) 1
is_valid((date(“YYYY MMM
DD”))”1996*May&04″) 0
is defined():
Tests whether an expression is not

NULL.
The is_defined function returns:
The value 1 if expr evaluates to a non

NULL value.
The value 0 otherwise.
The inverse of is_defined is is_null.
Q.What is meant by merge join and

hash join? Where those are used in Ab
Initio?
Answer: The command line syntax for

Join Component consists of two
commands. The first one calls the
component, and is one of two

commands:
mp merge join to process sorted

input
mp hash join to process unsorted

input
Q.What is data mapping and data

modelling?
Answer: Data mapping deals with the
FIELD level i.e. the transformation of the
source field to target field is specified
by the mapping defined on the target
field. The data mapping is specified
during the cleansing of the data to be
loaded
What is the difference between sandbox

and EME, can we perform checkin and
checkout through sandbox/ Can
anybody explain checkin and checkout?
Sandboxes are work areas used to

develop, test or run code associated
with a given project. Only one version of
the code can be held within the sandbox
at any time.
The EME Datastore contains all versions
of the code that have been checked into
it.A particular sandbox is associated
with only one Project where as a Project
can be checked out to a number of
sandboxes
Q.What are the Graph parameter?

Answer: The graph paramaters are one
which are added to the respective
graph. You can added the graph
parameters by selecting the
edit>parameters from the menu tab.
Here’s the example for the graph
parameters.
If you want to run a same graph for n

number of files in a directory, You can
assign a graph parameter to the input
file name and you can supply the
paramter value from the script before
invoking the graph.
How to Schedule Graphs in Ab Initio, like

workflow Schedule in Informatica? And
where we must is Unix shell scripting in
Ab Initio?
Q.How to Improve Performance of

graphs in Ab initio? Give some
examples or tips.
There are so many ways to improve the

performance of the graphs in Ab initio.
Here are few points

Use MFS system using Partion by
Round by robin.
.If needed use lookup local than
lookup when there is a large data.
Takeout unnecessary components
like filter by exp instead provide them
in reformat/Join/Rollup.
Use gather instead of concatenate.
Tune Max_core for Optional
performance.
Try to avoid more phases.
Go Parallel as soon as possible using
Ab Initio Partitioning technique.
Once Data Is partitioned do not bring
to serial , then back to parallel.
Repartition instead.
For Small processing jobs serial may
be better than parallel.
Do not access large files across NFS,
Use FTP component
Use Ad Hoc MFS to read many serial
files in parallel and use concat
coponenet.
Using Phase breaks let you allocate

more memory to individual
component and make your graph run
faster
Use Checkpoint after the sort than
land data on to disk
Use Join and rollup in memory
feature
Best performance will be gained
when components can work with in
memory by MAX CORE.
MAR CORE for SORT is
calculated by finding size of
input data file.
For In memory join memory needed

is equal to non driving data size +
overhead.
If in memory join cannot fir its
non driving inputs in the
provided MAX CORE then it will
drop all the inputs to disk and in
memory does not make sence.
Use rollup and Filter by EX as soon

as possible to reduce number of
records.
When joining very small dataset
to a very large dataset, it is
more efficient to broadcast the
small dataset to MFS using
broadcast component or use
the small file as lookup.
Use MFS, use Round robin partition

or load balance if you are not joining
or rollup
Filter the data in the beginning
of the graph.
Take out unnecessary components

like filter by expression instead use
select expression in join, rollup,
reformat etc
Use lookups instead of joins if
you are joining small tale to
large table.
Take out old components use new

components like join instead of math
merge .
Use gather instead of concat
Use Phasing if you have too many
components
Tune the max core for optimal
performance
Avoid sorting data by using in
memory for smaller datasets join
Use Ab Initio layout instead of
database default to achieve parallel
loads
Change AB_REPORT parameter to
increased monitoring duration ( )
Use catalogs for reusability
Use sort after partition component
instead of before.
Partition the data as early as
possible and departition the data as
late as possible.
Filter unwanted fields/records as
early as possible.
Try to avoid the usage of join with db
component.
Q.How does force_error function work ?

If we set never abort in reformat , will
force_error stop the graph or will it
continue to process the next set of
records ?
Answer: force_error as the name

suggests it works on as to force an error
in case of not meeting of any conditions
mentioned.The function can be used as
per the requirement.
If you want to stop execution of graph in

case of not meeting a specific condition
say you have to compare the input and
out put records reconciliation and the
graph should fail if the input record
count is not same as output record
count
“THEN set the reject-threshold to Abort

on first reject” so that the graph stops.
Note:- force_error directs all the records

meeting the condition to reject port with
the error message to error port.
In certain special circumstances you

can also use to treat the reject port as
an additional data flow path leaving the
component.When using force_error to
direct valid records to the reject port for
separate processing you must
remember that invalid records will also
be sent there.
components in a Ab inition graph?can
anybody give me a practical example of
a trasformation of data, say customer
data in a credit card company into
meaningful output based on business
rules?
Answer: The most commonly used

components in to any Ab Initio project
are
input file/output file
input table/output table
lookup file
reformat,gather,join,runsql,join with
db,compress
components,sort,trash,partition by
expression,partition by key ,concatinate
Q.How to work with parameterized

graphs?
Answer: One of the main purpose of the

parameterized graphs is that if we need
to run the same graph for n number of
times for different files, we set up the
graph parameters like $INPUT_FILE,
$OUTPUT_FILE etc and we supply the
values for these in the
Edit>parameters.These parameters are
substituted during the run time. we can
set different types of parameters like
positional, keyword, local etc.
The idea here is, instead of maintaining

different versions of the same graph, we
can maintain one version for different
files.
Q.What is the use of unused port in join

component?
Answer: While joining two input flows,

records which match the join condition
goes to output port and we can get the
records which do not meet the join
condition at unused ports.
Q.What is meant by dedup Sort with

null key?
Answer: If we don’t use any key in the

sort component while using the dedup
sort, then the output depends on the
keep parameter. It considers whole
records as one group
first – only the first record
last – only last record
unique_only – there will be no records in

the output file.
Q.Hi can anyone tell me what happens

when the graph run? i.e The Co-
operating System will be at the host,
We are running the graph at some other
place. How the Co-operating System
interprets with Native OS?
Answer: CO-operating system is layered

on the top of the native OS
When a graph is executed it has to be

deployed in host settings and
connection method like rexec, telnet,
rsh, rlogin This is what the graph
interacts with the co>op.
when ever you press Run button on your

GDE,the GDE genarates a script
and the genarated script will be

transfered to your host which is
specified in to your GDE run settings.
then the Co>operating system interprets
this script and executes the script on
different mechins(if required) as a sub
process(threads),after compleation of
each sub process,these sub_processes
will return status code to main process
this main process in tern returns error or
sucess code of the job to GDE
Q. Difference between conventional

loading and direct loading? When it is
used in real time.
Answer:
Conventional Load:
Before loading the data, all the Table
constraints will be checked against the
data.
Direct load:(Faster Loading)

All the Constraints will be disabled. Data
will be loaded directly. Later the data
will be checked against the table
constraints and the bad data won’t be
indexed.
Api conventional loading

utility direct loading.
Q.explain the environment varaibles

with example.?
Answer: Environemental variables

server as global variables in unix
envrionment. They are used for passing
on values from a shell/ process to
another. They are inherited by Abinitio
as sandbox variables/ graph parameters
like
AI_SORT_MAX_CORE
AI_HOME
AI_SERIAL
AI_MFS etc.
To know what all variables exist, in your
unix shell, find out the naming
convention and type a command like |
grep . This will provide you a list of all
the variables set in the shell. You can
refer to the graph parameters/
components to see how these variables
are used inside Abinitio.
Q.How to find the number of arguments

defined in graph ?
Answer: List of shell arguments $*.
then what is $# and $? …
$# – No of positional parameters
$? – the exit status of the last executed

command
Q.How many numbers of inputs join

component support ?
Answer: Join will support maximum of

60 inputs and minimum is 2 inputs.
Q.What is max-core? What are the

Components that use MAX_CORE?
Answer: The value of the MAX_CORE

parameter is that it determines the
maximum amount of memory, in bytes,
that a specified component will use. If
the component is running in parallel, the
value of MAX_CORE represents the
maximum memory usage per partition.
If MAX_CORE is set too low the
component will run slower than
expected. Too high and the component
will use too many machine resources
and slow up Dramatically.
The Max core parameter can be defined
in the following components:
SCAN
in-memory SCAN
ROLLUP
in-memory ROLLUP
in-memory JOIN
SORT
Whenever these components are used

and have the component set to
parameter set to “In-memory; Inputs
need not be sorted”, a max-core variable
must be specified.
Q.What does dependency analysis

mean in Ab Initio?
Answer :
Dependency Analysis
It analyses the Project for the

dependencies within and between the
graphs. The EME examines the Project
and develops a survey tracing how data
is transformed and transferred field by
field from component to component.
Dependency analysis has two basic
steps:
Translation
Analysis
Analysis Level:
In the check in wizard’s advanced

options, the analysis level can be
specified as one of the following:
None:
No dependency analysis is
performed during the check in.
Translation only:
Graph being checked in is translated to

data store format but no error checking
is done. This is the minimum
requirement during check in.
Translation with checking: (Default)
Along with the translation, errors, which

will interfere with dependency analysis,
are checked for. These include:
Absolute paths
Undefined parameters
dml syntax errors
Parameter reference to objects that
can’t be resolved
Wrong substitution syntax in
parameter definition
Full Dependency Analysis:
Full dependency analysis is done during

check in. It is not recommended as
takes a long time and in turn can delay
the check in process.
What to analyse:
All files:
Analyse all files in the Project
All unanalysed files:
Analyse all files that have been changed

or which are dependent on or required
by files that have changed since the last
time they were analysed.
Only my checked in files:

All files checked in by you would be
analysed if they have not been before.
Only the file specified:
Apply analysis to the file specified only.
 Abinitio Online Training
Q.what is the difference between .dbc

and .cfg file?
Answer: .cfg file is forÂ the remote

connection and .dbc is for connecting
the database.
.cfg contains :
The name of the remote machine

The username/pwd to be used while
connecting to the db.
The location of the operating system
on the remote machine.
The connection method.
.dbc file contains :
The database name

Database version
Userid/pwd
Database character set and some
more.
Q.What are the Graph parameter?
Answer: There are 2 types of graph

parameters in AbInitio
1. local parameter
2. Formal parameters.(those
parameters working at runtime)
Q.How many types of joins are in Ab-

Initio?
Answer: Join is based on a match key

for inputs, Join components describes
out port, unused ports, reject ports and
log port.
Inner Joins:
The most common case is when join-

type is Inner Join. In this case, if each
input port contains a record with the
same value for the key fields, the
transform function is called and an
output record is produced.
If some of the input flows have more
than one record with that key value, the
transform function is called multiple
times, once for each possible
combination of records, taken one from
each input port.Whenever a particular
key value does not have a matching
record on every input port and Inner
Join is specified, the transform function
is not called and all incoming records
with that key value are sent to the
unused ports.
Full Outer Joins:
Another common case is when join-type

is Full Outer Join: if each input port has
a record with a matching key value, Join
does the same thing as it does for an
Inner Join. If some input ports do not
have records with matching key values,
Join applies the transform function
anyway, with NULL substituted for the
missing records. The missing records
are in effect ignored. With an Outer Join,
the transform function typically requires
additional rules (as compared to an
Inner Join) to handle the possibility of
NULL inputs.
Explicit Joins:
The final case is when join-type is

Explicit. This setting allows you to
specify True or False for the record-
required n parameter for each in n port.
The settings you choose determine
when Join calls the transform function.
The join-type and record-required n

Parameters
The two intersecting ovals in the

diagrams below represent the key
values in the records on the two ports —
in0 and in1 — that are the inputs to join:
For each possible setting of join-type or

(if join-type is Explicit) combination of
settings for
record-required n, the shaded region of

each of the following diagrams
represents the inputs for which Join
calls the transform. Join ignores the
records that have key values
represented by the white regions, and
consequently those records go to the
unused port.
Q.what is semi-join ?
Answer: A left semi-join on two input

files, connected to ports in0 and in1 is
the Inner Join .The dedup0 parameter
is set to Do not dedup this input, but
dedup1 is set to Dedup this input before
joining.
Duplicates were removed from only the

in1 port, that is, from Input File 2.
semijoins can be achieved by using the

join component with parameter
Join Type set to explicit join and the

parameters
recordrequired0,recordrequired1 set one
to true and the other false depending on
whether you require left outer or right
outer join.
in abinitio,there are 3 types of join…
1.inner join. 2.outer join and

3.semi join.
for inner join ‘record_requiredn’
parameter is true for all in ports.
for outer join it is false for all the in

ports.
if u want the semi join u put

‘record_required n’ as true for the
required component and false for other
components..
Q.How to do we run sequences of jobs?

like output of A JOB is Input to B
how do we co-ordinate the jobs ?
Answer: By writing the wrapper scripts

we can control the sequence of
execution of more than one job.
Q.How would you do performance

tuning for already built graph ? Can you
let me know some examples?
Answer:
example :- 1.)suppose sort is used in

fornt of merge component its no use of
using sort ! because we have sort
component built in merge.
2) we use lookup instead of JOIN,Merge
Component.
3.) suppose we want to join the data

coming from 2 files and we don’t want
duplicates we will use union function
instead of adding additional component
for duplicate remover.
Q.What is the relation between EME ,

GDE and Co-operating system ?
Answer: EME is said as enterprise

metadata env,
GDE as graphical development env and
Co-operating system can be said as
abinitio server relation b/w this CO-OP,
EME AND GDE is as follows Co
operating system is the Abinitio
Server.This co-op is installed on
particular O.S platform that is called
NATIVE O.S .coming to the EME, its i
just as repository in informatica , its
hold the metadata,transformations,db
config files source and targets
information. coming to GDE its is end
user environment where we can develop
the graphs(mapping just like in
informatica) designer uses the GDE and
designs the graphs and save to the EME
or Sand box it is at user side where EME
is at server side.
Q.When we use Dynamical DML?
Answer: Dynamic DML is used if the

input meta data can change. Example:
at different time different input files are
received for processing which have
different dml. in that case we can use
flag in the dml and the flag is first read
in the input file received and according
to the flag its corresponding dml is
used.
Q.Explain the differences between

Replicate and BROADCAST?
Answer: Replicate takes records from

input flow arbitrarily combines and
gives to components which connected
to its output port.Broadcast is partition
component copies the input record to
components which connected to its
output port.Consider one example,input
file contains 4 records and level of
parallelism is 3 then Replicate gives 4
records to each component connected
to it’s out port whereas Broadcast gives
12 records to each component
connected to it’s out port.
Q.How do you truncate a table?
Answer: From Abinitio run sql

component using the DDL truncate table
By using the Truncate table component
in Ab Initio.
Q.How to get DML using Utilities in

UNIX?
Answer: By using the command
m_db gendml -table
Q.Explain the difference between

REFORMAT and Redefine FORMAT?
Answer: Reformat changes the record

format by adding or deleting fields in the
DML record. Length of the record can be
changed.
Redefine copies it’s input flow to it’s out

port without any transform.
Redefine is used to rename the fields in
the DML. But Length of record should
not change.

graphs?
Answer: Parameterized graphs

specifies everything through
parameters. i.e,data locations in
input/output
files,DMLs etc…
Q.What is driving port? When do you

use it?
Answer: When you set the sorted-input

parameter of “JOIN” component to “In
memory: Input need not be sorted”, you
can find the driving port.
Generally driving port use to improve

performance in a graph.
The driving input is the largest input. All

other inputs are read into memory.
For example, suppose the largest input

to be joined is on the in1 port. Specify a
port number of 1 as the value of the
driving parameter. The component
reads all other inputs to the join — for
example, in0, and in2 — into memory.
Default is 0, which specifies that the

driving input is on port in0.
Join also improves performance by

loading all records from all inputs
except the driving input into main
memory.
driving port in join supplies the data that

drives join . That means, for every
record from the driving port, it will be
compared against the data from non
driving port.
We have to set the driving port to the

larger dataset sothat non driving data
which is smaller can be kept in main
memory for speedingup the operation.
Contact for Abinitio Online Training
Q.How can we test the ab-Initio

manually and automation?
Answer: By running a graph through

GDE is manual test.
By running a graph using deployed

script is automated test.

Answer: Partition by Key or hash

partition -> This is a partitioning
technique which is used to partition
data when the keys are diverse. If the
key is present in large volume then there
can large data skew. But this method is
used more often for parallel data
processing.
Round robin partition is another
partitioning technique to uniformly
distribute the data on each of the
destination data partitions. The skew is
zero in this case when no of records is
divisible by number of partitions. A real
life example is how a pack of 52 cards
is distributed among 4 players in a
round-robin manner
Q.what is skew and skew

measurement?
Answer: skew is the measure of data
flow to each partition .
suppose i/p is coming from 4 files and

size is 1 gb
1 gb= ( 100mb+200mb+300mb+5oomb)
1000mb/4= 250 mb
(100- 250 )/500= –> -150/500 == cal ur

self it wil come in -ve value.
Cal clu for 200,500,300.
+ve value of skew is all ways desirable.
skew is a indericet measure of graph.
Q.What is error called ‘depth not equal’?
Answer: When two components are

linked together if their layout does not
match then this problem can occur
during the compilation of the graph. A
solution to this problem would be to use
a partitioning component in between if
there was change in layout.
Latest Ab initio Interview Questions

Ab initio Interview Questions Pdf
Answer : For converting a string to a

following syntax, out.decimal_field :: (
decimal( size_of_decimal ) ) string_field;

Q.Which one is faster for processing

fixed length dmls or delimited dmls and
why?
Answer: Fixed length,because for

delimited dml it has to check for
delimiter every time but for fixed length
dml directly length will b taken.
Q.What are kinds of layouts does ab-

Initio supports?
Answer: Ab-Initio supports two kinds of

Layouts:
Serial Layout
Multi layout.
In Ab-Initio Layout tells which
component should run where and it also
gives level of parallelism.
For serial Layout,level of parallelism is

1.
For Multi layout,Level of parallelism

depends on data partition.
Q.How can you run a graph infinitely?
Answer:
To run a graph infinitely,
The end script of the graph should call

the .ksh file of the graph. Thus if the
name of the graph is abc.mp then in the
end script of the graph there should be a
call to abc.ksh. Then this graph will run
infinitely.
Run the deployed script in a loop

infinitely.
Q.what is local and formal parameter ?
Answer: Two are graph level parameters

but in local you need to initialize the
value at the time of declaration where
as globle no need to initialize the data it
will promt at the time of running the
graph for that parameter.
local parameter is like local variable in c

language where as formal parameter is
like command line argument we need to
pass at run time.
Q.what is BRODCASTING and

REPLICATE ?
Answer:Broadcast can do everything

that replicate does broadcast can also
send singlt file to mfs with out splitiong
and brodcast makes multiple copies of
single file mfs. Replicate combines data
rendomly, receives in single flow and
write a copy of that flow in each of
output flow.
replicate generates multiple straight

flows as the output where as broadcast
results single fanout flow.
replicate improves component

parallelism where as broadcast
improves data parallelism.
Broadcast – Takes data from multiple
inputs, combines it and sends it to all
the output ports.
Eg – You have 2 incoming flows (This

can be data parallelism or component
parallelism) on Broadcast component,
one with 10 records & other with 20
records. Then on all the outgoing flows
(it can be any number of flows) will have
10 + 20 = 30 records
Replicate – It replicates the data for a

particular partition and send it out to
multiple out ports of the component, but
maintains the partition integrity.
Eg – Your incoming flow to replicate has

a data parallelism level of 2. with one
partition having 10 recs & other one
having 20 recs. Now suppose you have
3 output flos from replicate. Then each
flow will have 2 data partitions with 10 &
20 records respectively.
Q.what is the importance of EME in

abinitio?
Answer: EME is a repository in Ab

Inition and it used for checkin and
checkout for graphs also maintains
graph version.
Q.what is m_dump
Answer: It is a co-opating system’s

command that we use to view data from
the command prompt.
m_dump command prints the data in a

formatted way.
m_dump
Q.what is the syntax of m_dump

command?
Answer: m_dump
Q.what are differences between

different GDE
versions(1.10,1.11,1.12,1.13and 1.15)?
Answer: what are differences between
different versions of Co-op?
1.10 is a non key version and rest are

key versions.
There are lot of components added and

revised at following versions.
Q.How to run the graph without GDE?
Answer: In the run directory a graph can

be deployed as a .ksh file. Now, this .ksh
file can be run at the command prompt
as:
ksh
Q.What is the Difference between DML

Expression and XFR Expression ?
Answer: dml expression means abinitio

dml are stored or saved in a file and dml
describs the data interms of
expressions that performs simple
computations such as files, dml also
contains transform functions that
control data transforms,and also
describs data interms of keys that
specify grouping or non grouping ,that
means dml expression are non
embedded record format files
.xfr means simply say it is non

embedded transform files ,Transform
function is express business rules ,local
variables, statements and as well as
conn between this elements and the
input and the ouput fields.
Q.How Does MAXCORE works?
Answer: Maxcore is a temporary

memory used to sort the records
Maxcore is a value (it will be in Kb).

Whenever a component is executed it
will take that much memory we
specified for execution
Maxcore is the maximum memory that

could be used by a component in its
execution.
Q.What is $mpjret? Where it is used in

ab-initio?
Answer: $mpjret is return value of shell

command “mp run” execution of Ab-
Initio graph.
this is generally treated as graph

execution status return value
Q.What is the latest version that is

available in Ab-initio?
Answer: The latest version of GDE

ism1.15 AND Co>operating system is
2.14
Q.What is mean by Co>Operating

system and why it is special for Ab-
initio ?
Answer: Co-Operating systems, that

itself means a lot, it’s not merely an
engine or interpretor. As it says, it’s an
operating system which co-exists with
another operating system. What does
that mean…. in layman’s term abinitio,
unlike other applications, does not sit as
a layer on top of any OS? It itself has
quite a lot of operating system level
capabilities such as multi files, memory
management and so on and this way it
completely integrate with any other OS
and work jointly on the available
hardware resources. This sort of
Synergy with OS optimize the utilization
of available hardware resources. Unlike
other applications (including most
other ETL tools) it does not work like a
layer and interprete the commands.
That is the major difference with other
ETL tools , this is the reason why
abinitio is much much faster than any
other ETL tool and obviously much
much costlier as well.
Q.How to take the input data from an

excel sheet?
Answer: There is a Read Excell

component that reads the excel either
from host or from local drive. The dml
will be a default one.
Through Read Excel component in

$AB_HOME we can read excell directly.
Q.How will you test a dbc file from

command prompt ??
Answer: You can test a dbc file from

command prompt(Unix) using m_db
test command which gives the checking
of data base connection, version of data
base, user

why?
Answer: Fixed length DML’s are faster

because it will directly read the data of
that length without any comparisons but
in delimited one,s every character is to
be compared and hence delays
Q.what are the contineous components

in Abinitio?
Answer: Contineous components used

to create graphs,that produce useful
output file while running continously
Ex:- Contineous rollup,Contineous

update,batch subscribe
Q.How can I calculate the total memory

requirement of a graph?
Answer:
You can roughly calculate memory

requirement as:
Each partition of a component uses:~ 8

MB + max-core (if any)
Add size of lookup files used in phase (if

multiple components use same lookup
only count it once) Multiply by degree of
parallelism. Add up all components in a
phase; that is how much memory is used
in that phase.
Add size of input and output
datasets(Total memory requirement of a
graph) > (the largest-memory phase in
the graph).
Q.What is multistage component?
Answer: Multistage component are

nothing but the transform components
where the records are transformed into
five stages like input selection,
temporary records initialization,
processing , finalization and output
selection.
examples of multistage components

are like
Rollup
Scan
Normalize
Denormalize sorted.
Q.what is the use of aggregation when

we have rollup as we know rollup
component in ab-Initio is used to
summarize group of data record. then
where we will use aggregation ?
Answer:Rollup has a good control over
record selection grouping and
aggregation as compared to that of
aggregate. Rollup is an updated version
of aggregate.
When Rollup is in template mode ,it has

aggregation functions to use. So it is
better to go for Rollup.
Q.Phase verses Checkpoint ?
Answer:
Difference between a phase and

checkpoint .
phases are used to break up a graph so

that it does not use up all the memory ,
it limits the number of active
components thus reduce the number of
components running in parallel hence
improves the performance .Phases
make possible the effective utilization
of the resources such as memory disk
space and CPU So when we have
memory consuming components in the
straight flow and the data in flow is in
millions we can separate the
process out in one phase so as the CPU
allocation is more for the process to
consume less time for the whole
process to get over.
Temporary files created during a phase

will be deleted after completion of that
phase.
Don’t put phase after

Replicate,sort,across all to all flows and
temporary files.
Check points are used for the purpose

of recovery.
In contrary Checkpoints are like save

points .These are required if we need to
run the graph from the saved last phase
recovery file(phase break checkpoint) if
it fails unexpectedly.
At job start,output datasets are copied

into temporary files and after the
completion of check pointing all
datasets and job state are copied into
temporary files. so if any failure occurs
job can be run from last committed
check point.
Use of phase breaks which includes the
checkpoints would degrade the
performance but ensures save point
run.
The major difference between these two

is that phasing deletes the intermediate
files made at the end of each phase as
soon as it enters the next phase.
On the other hand what check pointing

does is…it stores these intermediate
files till the end of the graph. Thus we
can easily use the intermediate file to
restart the process from where it failed.
But this cannot be done in case of
phasing.
We can have phases without check

points.
We can not assign checkpoints without

phases.
Q.In Ab-Initio, How can you display

records between 50-75.. ?
Answer: Input dataset having 100

records. I want records between 50-75
then use m_dump -start 50 -end 75
For serial and mfs there are many ways

the components can be used.
1.Filter by Expression : use

next_in_sequence() >50 &&
next_in_sequence() <75 2.We can also
use multiple LEADING RECORDS
components for meeting the
requirement. If you have the access to
Co>Op then you can try an alternate.
Say suppose the input file is : file 1
Use the Run program component in GDE

and write
the below command: `sed -n50 75p file

1 > file 2`
Q.What is the order of evaluation of

parameters?
Answer: When you run a graph,

parameters are evaluated in the
following order
The host setup script is run.Common (i.e,

included) sandbox parameters are
evaluated.
Sandbox parameters are evaluated.
The project-start.ksh script is run.
Graph parameters are evaluated.
The graph Start Script is run.
The execution of process is run

simultaneously based component’s
layouts.
The Lookup files is run
The graph Meta data is checking

process.
The in/out file paths with files are

checking.
The graph runs as order of phase0,

phase1, phase2,..
Q.How do you convert 4-way MFS to

8-way mfs?
Answer: By partitioning. we can use any

partition method to partition.
Partitioning methods are:
Partition by Round-robin
Broadcast
Partition by Key
Partition by Expression
Partition by Range
Partition by Percentage
Partition by Load Balance
Q.For data parallelism,we can use

partition components. For component
parallelism,we can use replicate
component.Like this which
component(s) can we use for pipeline
parallelism?
Answer:When connected sequence of

components of the same branch of
graph execute concurrently is called
pipeline parallelism.
Components like reformat where we

distribute input flow to multiple o/p flow
using output index depending on some
selection criteria and process those o/p
flows simultaneously creates pipeline
parallelism.
But components like sort where entire

i/p must be read before a single record
is written to o/p can not achieve
pipeline parallelism.
Q.what is meant by fancing in abinitio ?
Answer:The word Abinitio means from

the beginning.
did you mean “fanning” ? “fan-in” ? “fan-

out” ?
Q.how to retrive data from database to

source in that case whice componenet
is used for this?
Answer:To unload (retrive) Data from

the database DB2, Informix, or Oracle
we have components like Input Table
and Unload DB Table by using these two
components we can unload data from
the database
Q.what is the relation between EME ,

GDE and Co-operating system ?
Answer: EME is said as enterprise

metdata env, GDE as graphical
devlopment env and Co-operating
sytem can be said as asbinitio server
relation b/w this CO-OP, EME AND GDE

is as fallows
Co operating system is the Abinitio
Server. this co-op is installed on
perticular O.S platform that is called
NATIVE O.S .comming to the EME, its i
just as repository in informatica , its
hold the metadata,trnsformations,db
config files source and targets
informations. comming to GDE its is
end user envirinment where we can
devlop the graphs(mapping just like in
informatica)
desinger uses the GDE and designs the

graphs and save to the EME or Sand box
it is at user side.where EME is ast server
side.
Q.what is the use of aggregation when

we have rollup
as we know rollup component in
abinitio is used to summirize group of
data record. then where we will use
aggregation ?
Answer: Aggregation and Rollup both

can summerise the data but rollup is
much more convenient to use. In order
to understand how a particular
summerisation being rollup is much
more explanatory compared to
aggregate. Rollup can do some other
functionalities like input and output
filtering of records.
Q.what are kinds of layouts does ab

initio supports
Answer: Basically there are serial and

parallel layouts supported by AbInitio. A
graph can have both at the same time.
The parallel one depends on the degree
of data parallelism. If the multi-file
system is 4-way parallel then a
component in a graph can run 4 way
parallel if the layout is defined such as
it’s same as the degree of parallelism.
Q.How can you run a graph infinitely?
Answer:To run a graph infinitely, the end

script in the graph should call the .ksh
file of the graph. Thus if the name of the
graph is abc.mp then in the end script of
the graph there should be a call to
abc.ksh.
Like this the graph will run infinitely.
Q.How do you add default rules in
transformer?
Answer: Double click on the transform

parameter of parameter tab page of
component properties, it will open
transform editor. In the transform editor
click on the Edit menu and then select
Add Default Rules from the drop down.
It will show two options – 1) Match
Names 2) Wildcard.
Q.Do you know what a local lookup is?
Answer: If your lookup file is a multifile

and partioned/sorted on a particular key
then local lookup function can be used
ahead of lookup function call. This is
local to a particular partition depending
on the key.
Lookup File consists of data records

which can be held in main memory. This
makes the transform function to retrieve
the records much faster than retrieving
from disk. It allows the transform
component to process the data records
of multiple files fastly.
Q.What is the difference between look-
up file and look-up, with a relevant
example?
Answer: Generally Lookup file

represents one or more serial files(Flat
files). The amount of data is small
enough to be held in the memory. This
allows transform functions to retrive
records much more quickly than it could
retrive from Disk.
A lookup is a component of abinitio
graph where we can store data and
retrieve it by using a key parameter.
A lookup file is the physical file where

the data for the lookup is stored.
Q.how to handle if DML changes

dynamically in abinitio
Answer: If the DML changes

dynamically then both dml and xfr has
to be passed as graph level parameter
during the runtime.
By parametrization or by conditional
record format or by metadata
Q.Explain what is lookup?
Answer: Lookup is basically a specific

dataset which is keyed. This can be
used to mapping values as per the data
present in a particular file (serial/multi
file). The dataset can be static as well
dynamic ( in case the lookup file is
being generated in previous phase and
used as lookup file in current phase).
Sometimes, hash-joins can be replaced
by using reformat and lookup if one of
the input to the join contains less
number of records with slim record
length.
AbInitio has built-in functions to retrieve

values using the key for the lookup.
Q.What is a ramp limit?
Answer: The limit parameter contains

an integer that represents a number of
reject events .
The ramp parameter contains a real
number that represents a rate of reject
events in the number of records
processed.
no of bad records allowed = limit + no of
records*ramp.
ramp is basically the percentage value
(from 0 to 1)
This two together provides the threshold
value of bad records.
Q.Have you worked with packages?
Answer: Multistage transform

components by default uses packages.
However user can create his own set of
functions in a transfer function and can
include this in other transfer functions.
Q.Have you used rollup component?

Describe how.
Answer: If the user wants to group the

records on particular field values then
rollup is best way to do that. Rollup is a
multi-stage transform function and it
contains the following mandatory
functions.
1. initialise
2. rollup
3. finalise
Also need to declare one temporary
variable if you want to get counts of a
particular group.
For each of the group, first it does call
the initialise function once, followed by
rollup function calls for each of the
records in the group and finally calls the
finalise function once at the end of last
rollup call.

transformer?
Answer: In case of reformat if the

destination field names are same or
subset of the source fields then no need
to write anything in the reformat xfr
unless you dont want to use any real
transform other than reducing the set of
fields or split the flow into a number of
flows to achive the functionality.
1)If it is not already displayed, display

the Transform Editor Grid.
2)Click the Business Rules tab if it is not
already displayed.
3)Select Edit > Add Default Rules.
Add Default Rules — Opens the Add

Default Rules dialog. Select one of the
following: Match Names — Match
names: generates a set of rules that
copies input fields to output fields with
the same name. Use Wildcard (.*) Rule
— Generates one rule that copies input
fields to output fields with the same
name.

Answer: Partition by Key or hash

partition -> This is a partitioning
technique which is used to partition
data when the keys are diverse. If the
key is present in large volume then there
can large data skew. But this method is
used more often for parallel data
processing.
Round robin partition is another
partitioning technique to uniformly
distribute the data on each of the
destination data partitions. The skew is
zero in this case when no of records is
divisible by number of partitions. A real
life example is how a pack of 52 cards
is distributed among 4 players in a
round-robin manner.
If you have some 30 cards taken at

random from 52 card pack——-If take
the card color as key(red or white) and
distribute then the no of cards in each
partion may vary much.But in Round
robin , we distribute with block size , so
the variation is limited to the block size
Partition by Key – Distribute according

to the key value
Partition by Round Robin – Distribute a

predefined number of records to one
flow and then the same numbers of
records to the next flow and so on. After
the last flow resumes the pattern and
almost evenly distributes the records…
This patter is called round robin fashion.
Q.How do you truncate a table? (Each

candidate would say only 1 of the
several ways to do this.)
Answer: From Abinitio run sql

component using the DDL “trucate
table”
By using the Truncate table component
in Ab Initio
There are many ways to do it.
1. Probably the easiest way is to use

Truncate Table
2. Run Sql or update table can be used to

do the same thing
3. Run Program
Q.Have you eveer encountered an error

called “depth not equal”? (This occurs
when you extensively create graphs it is
a trick question)
Answer: When two components are

linked together if their layout doesnot
match then this problem can occur
during the compilation of the graph. A
solution to this problem would be to use
a partitioning component in between if
there was change in layout.
have talked about a situation where you

have linked
2 components – each of them having

different layouts.
Think about a situation where the
components on the left hand side is
linked to a serial dataset and on the
right hand side the downstream
component is linked to a multifile.
Layout is going to be propagaed from
naghbours.
So without any partitioning component

the jump in the depth cannot be
achieved and I suppose you must need
one partitioning component which can
help alleviate this depth discrepancy.

In this case no specific function is

required if the size of the string and
decimal is same. Just use decimal cast
with the size in the transform function
and will suffice. For example, if the
source field is defined as string(8) and
the destination as decimal(8) then (say
the field name is field1).
out.field :: (decimal(8)) in.field
If the destination field size is lesser than
the input then use of string_substring
function can be used likie the following.
say destination field is decimal(5).
out.field ::
(decimal(5))string_lrtrim(string_substring(in.field,1,5))
/* string_lrtrim used to trim leading and
trailing spaces */
Hope this solution works.
Q.How many parallelisms are in

Abinitio? Please give a definition of
each.
Answer: There are 3 kinds of

Parallelism:
1) Data Parallesim
2)Componnent Paralelism
3) Pipeline.
When the data is divided into smalll

chunks and processed on different
components simultaneously we call it
DataParallelism
When different components work on

different data sets it is called
Component parallelism
When a graph uses multiple

components to run on the same data
simultaneously we call it Pipeline
parallelism
Q.What is multi directory?
Answer:A multi directory is a parallel

directory that is composed of individual
directories, typically on different disks
or computers. The individual directories
are partitions of the multi directory.
Each multi directory contains one
control directory and one or more data
directories. Multi files are stored in multi
directories.
Q.What is multi file?
Answer: A multi file is a parallel file that

is composed of individual files, typically
on different disks or computers. The
individual files are partitions of the multi
file. Each multi file contains one control
partitions and one or more data
partitions. Multi files are stored in
distributed directories called multi
directories. This diagram shows a multi
directory and a multi file in a multi file
system:
The data in a multi file is usually divided
across partitions by one of these
methods:
Random or round robin partitioning
Partitioning based on ranges or

functions
Replication or broadcast, in which each

partition is an identical copy of the
serial data.
Q.What is mean by GDE, SDE? What is

purpose of GDE, SDE?
Answer:
GDE – Graphical Development

Environment –it is used for developing
the graphs
SDE – Shell Development Environment,

which is used for developing the korn
shell script on co>operating system.
Q.What is difference between Rollup

and Scan ?
Answer:
Roll up comp:
Rollup evaluates a group of input

records that have the same key and
then generates data records that either
summarize each group or select certain
information from each group.
Using Rollup component can evaluates

to two ways as follows: 1. Template
mode 2. Expanded Mode
1. Template Mode:
This mode options evaluates using built

aggregation functions alike sum, min,
max, count, avg,
product, first, last.
2. Expanded Mode:
This mode option can evaluates using

(without aggregation functions) user
defined functions alike temporary
function, initialize, finalize and rollup
functions in transform function
propriety.
Scan generates a series of cumulative

summary records — such as successive
year-to-date totals for groups of data
records. Scan produces intermediate
summary records.
Rollup is for group by and Scan is for

successive total. Basically, when we
need to produce summary then we use
scan. Rollup is used to aggregate data.
Q.What is Runtime Behavior of Rollup?
Answer: Roll up can supports two types

of modes.
1.Template Mode:
This mode options evaluates using built

aggregation functions alike sum, min,
max, count, avg,
product, first, last.
2. Expanded Mode:
This mode option can evaluates using

(without aggregation functions) user
defined functions alike temporary
function, initialize, finalize and rollup
functions in transform function
propriety.
Rollup component’s performance differs

from using Rollup Input is Sorted and
Rollup Input is Unsorted
When Rollup Input is sorted
When you set the sorted-input

parameter to Input must be sorted or
grouped (the default), Rollup requires
data records grouped according to the
key parameter. If you need to group the
records, use Sort with the same key
specifier that you use for Rollup. It will
produces sorted outputs in output port.
When Rollup Input is Unsorted
When you set the sorted-input

parameter to In memory: Input need not
be sorted, Rollup accepts un grouped
input, and groups all records according
to the key parameter. It does not
produce sorted output.
Q.How do you do rollback in Ab-Initio?
Answer:Ab-Initio has supports very

good recovery options for any failures at
runtime and interrupted powers at
development time.
Development time:
You can get a recovery graph file while

occurred any interrupted failures at
development time.
At Runtime:
You can get a recovery file while

occurred any failures at execution of
graph and you can restart the execution.
The recovery file has last checkpoint
information and restarts from last
checkpoint onwards.
you can use two ways to rollback the

Ab-Initio graphs
m_rollback –d -deletes all intermediate

files and checkpoints
Q.What is internal execution (process)

of the Ab-Initio graphs in Ab-Initio
co>operating system on while running
the graphs?
Answer:Normally the Ab-Initio Co>

operating system checks relevant code
compatible of GDE and
Co>operating system. if you are used

any lookup files in graphs. This is called
lookup layout checking.
The graphs are having input and output

files and it checks whether the path are
correct or not, given below the
sequence of process has done while
running the graphs.
Checks lookup files layouts.
Checks meta data part (this is part

check whether data types are used or
not and related everything) – dml
checking for each component basis.
Checks input files
Checks output files
Checks each component’s layouts
Finally, it checks flow of process

assigns to straight.
Q.What does dependency analysis

mean in Ab-Initio?
Answer: dependency analysis will
answer the questions regarding data
linage that is where does the data
comes from and what applications
produced depend on this data etc..
Q.What is meant by Fencing in Ab-

Initio?
Answer: In Software World fencing

means job controlling on priority basis.
In Ab-Initio it actually refers to

customized phase breaking.
A well fenced graph means no matter

what is source data volume process will
not cough in dead locks.
It actually limits the number of

simultaneous processes.
In Ab-Initio you need to Fence the job in

some times to stop the schedule.
Fencing is nothing but changing the

priority of the particular job.
Q.What is the function of fuse

component?
Answer: Fuse combines multiple input
flows into a single output flow by
applying a transform function to
corresponding records of each flow
Runtime Behavior of Fuse
Fuse applies a transform function to

corresponding records of each input
flow. The first time the transform
function executes, it uses the first
record of each flow. The second time
the transform
function executes, it uses the second

record of each flow, and so on. Fuse
sends the result of the transform
function to the out port.
The component works as follows. The

component tries to read from each of its
input flows.
* If all of its input flows are finished,

Fuse exits.
* Otherwise, Fuse reads one record from

each still-unfinished input port and a
NULL from each finished input port.
Q.what is data skew? how can you
eliminate data skew while i am using
partiiion by key?
Answer: The skew of a data or flow

partition is the amount by which its size
deviates from the average partition size
expressed as a percentage of the
largest partition
Skew of data (partition size –

avg.partition size)*100/(size of largest
partition)
Q.What is $mpjret? Where it is used in

ab-Initio?
Answer:
$mpjret gives the status of a graph.
U can use $mpjret in end script like
if 0 -eq($mpjret)
then
echo success
else
mailx -s [graph_name] failed mail_id

Q.What are primary keys and foreign
keys?
Answer: In RDBMS the relationship

between the two tables is represented
as Primary key and foreign key
relationship.Wheras the primary key
table is the parent table and foreignkey
table is the child table.The criteria for
both the tables is there should be a
matching column.
Q.What is an outer join?
Answer: An outer join is used when one

wants to select all the records from a
port – whether it has satisfied the join
criteria or not.
If you want to see all the records of one

input file independent of whether there
is a matching record in the other file or
not. then its an outer join.
Q.What are Cartesian joins?
Answer: joins two tables without a join

key. Key should be {}.
A Cartesian join will get you a Cartesian

product. A Cartesian join is when you
join every row of one table to every row
of another table. You can also get one
by joining every row of a table to every
row of itself.
Q.What is the difference between a DB

config and a CFG file?
Answer: A .dbc file has the information

required for Ab Initio to connect to the
database to extract or load tables or
views. While .CFG file is the table
configuration file created by db_config
while using components like Load DB
Table.
Both DBC and CFG files are used for

database connectivity, basically both
are of similar use. The only difference is,
cfg file is used for Informix Database,
whereas dbc are used for other
database such as Oracle or Sqlserver
Q.What is the difference between a

Scan component and a RollUp
component?
Answer: Rollup is for group by and Scan

is for successive total. Basically, when
we need to produce summary then we
use scan. Rollup is used to aggregate
data.
1. what is local and formal parameter?
Answer: Two are graph level parameters

but in local you need to initialize the
value at the time of declaration where
as globle no need to initialize the data it
will promt at the time of running the
graph for that parameter.

command prompt ??
Answer: try “m_db test myfile.dbc”
Q.Explain the difference between the

“truncate” and “delete” commands ?
Answer. Truncate :- It is a DDL

command, used to delete tables or
clusters. Since it is a DDL command
hence it is auto commit and Rollback
can’t be performed. It is faster than
delete.
Delete:- It is DML command, generally
used to delete a record, clusters or
tables. Rollback command can be
performed , in order to retrieve the
earlier deleted things. To make deleted
things permanently, “commit” command
should be used.
Q.How to retrive data from database to

source in that case whice componenet
is used for this?
Answer. To unload (retrive) Data from

the database.
Q.How many components are there in

your most complicated graph?
Answer: This is a tricky question,

number of component in a graph has
nothing to do with the level of
knowledge a person has. On the
contrary, a proper standardized and
modular parametric approach will
reduce the number of components to a
very few. In a well thought modular and
parametric design, mostly the graphs
will have 3/4 components, which will be
doing a particular task and will then call
another sets of graphs to do the
next and so on. This way total numbers
of distinct graphs will
drastically come down, support and
maintenance will be much more
simplified. The bottom line is, there are
lot more other things to plan rather than
to add components.
Q.Do you know what a local lookup is?
Answer: This function is similar to a

lookup…the difference being that this
function returns NULL when there is no
record having the value that has been
mentioned in the arguments of the
function.
If it finfs the matching record it returns
the complete record..that is all the fields
along with their values corresponding to
the expression mentioned in the lookup
local function.
eg: lookup_local( “LOOKUP_FILE”,81) ->
null
if the key on which the lookup file is
partitioned does not hold any value as
mentioned.
Local Lookup files are small files that
can be accommodated into
physical memory for use in transforms.
Details like country
code/country, Currency code/currency,
forexrate/value can be used in a lookup
file and mapped during transformations.
Lookup files are not connected to any
component of the graph but available to
reformat for
mapping.
Q.How to Create Surrogate Key using Ab

Initio?
Ans. A key is a field or set of fields that

uniquely identifies a record in a file or
table.
A natural key is a key that is meaningful

in some business or real-world sense.
For example, a social security number
for a person, or a serial number for a
piece of equipment, is a natural key.
A surrogate key is a field that is added
to a record, either to replace the natural
key or in addition to it, and has no
business meaning. Surrogate keys are
frequently added to records when
populating a data warehouse, to help
isolate the records in the warehouse
from changes to the natural keys by
outside processes.
Q.How to Improve Performance of

graphs in Ab initio?
Give some examples or tips.
Ans. There are somany ways to improve

the performance of the graphs in
Abinitio.
I have few points from my side.
1.Use MFS system using Partion by
Round by robin.
2.If needed use lookup local than lookup
when there is a large data.
3.Takeout unnecessary components like
filter by exp instead provide them in
reformat/Join/Rollup.
4.Use gather instead of concatenate.
5.Tune Max_core for Optional
performance.
6.Try to avoid more phases.
There are many ways the performance

of the graph can be improved.
1) Use a limited number of components
in a particular phase
2) Use optimum value of max core
values for sort and join components
3) Minimise the number of sort
components
4) Minimise sorted join component and
if possible replace them by in-memory
join/hash join
5) Use only required fields in the sort,
reformat, join components
6) Use phasing/flow buffers in case of
merge, sorted joins
7) If the two inputs are huge then use
sorted join, otherwise use hash join with
proper driving port
8) For large dataset don’t use broadcast
as partitioner
9) Minimise the use of regular
expression functions like re_index in the
trasfer functions
10) Avoid repartitioning of data
unnecessarily
Q.Describe the process steps you

would perform when defragmenting a
data table. This table contains mission
critical data ?
Answer: There are several ways to do

this:
1) We can move the table in the same or
other tablespace and rebuild all the
indexes on the table.
alter table move this activity reclaims
the defragmented space in the table
analyze table table_name compute
statistics to capture the updated
statistics.
2)Reorg could be done by taking a dump

of the table, truncate the table and
import the dump back into the table.

dynamically ?

dynamically with in a single file. Some
of the suitable methods are to use a
conditional DML or to call the vector
functionality while calling the DMLs.
Q.What r the Graph parameter?
Answer: There are 2 types of graph

parameters in AbInitio
1. local parameter
2. Formal parameters.(those
parameters working at runtime)
Q.What is meant by fancing in abinitio ?
Answer: The word Abinitio means from

the beginning.
Q.What is a ramp limit?
Answer: Limit and Ramp.

For most of the graph components, we
can manually set the error
threshold limit, after which the graph
exits. Normally there are three
levels of thresholds like “Never Exit” and
“Exit on First Occurance”,
very clear from the text. They represent
both the extremes. The third
one is Limit along with Ramp. Limit
talks about max limit where as RAMP
talks in terms of percentage of
processed records. For example a ramp
value of 5 means, if less than 5% of the
total records are rejected,
continue running. If it crosses the ramp
then it will come out of the
graph. Typically development starts with
never exit, followed by ramp
and finally in production “Exit on First
Occurance”. Case to case basis
RAMP can be used in production but
definitely not a desired approach.
Q.Difference between conventional

loading and direct loading ? when it is
used in real time ?
Answer:
Conventional Load:
Before loading the data all the Table

data.

will be loaded directly. Later the data
will be checked against the table
indexed.
api conventional loading
Q.How do you done the unit testing in

Ab-Initio? How will you perform the Ab-
Initio Graphs executions? How will you
increase the performance in Ab-Inito
graphs?
Answer:
The Ab-Initio Co>operating system is

handling the graph with multiple
processes running simultaneously. This
is primary performance. Follows the
given below actions:
1. The data separators mostly use “\307”

and “\007” instead of “~”, “,” and
special characters and avoids these
delimiters. Because of the Ab-Initio has
predefined these data separators.
2. Avoids repeated aggregation in graphs.

You calculate for required aggregation
at once and stores in file calls value
using parameters and then you can use
this parameter where it required.
3. Avoids the maximum number of

components in graph and max core
components in graphs.
4. Don’t write any kinds looping

statements in start script
5. Mostly use the sources are flat files
Q.How do you improve the performance

of a graph?
Answer:There are many ways the

performance of the graph can be
improved.
Use a limited number of components in a

particular phase
Use optimum value of max core values

for sort and join components
Minimize the number of sort

components
Minimize sorted join component and if

possible replace them by in-memory
join/hash join
Use only required fields in the sort,

reformat, join components
Use phasing/flow buffers in case of

merge, sorted joins
If the two inputs are huge then use

sorted join, otherwise use hash join with
proper driving port
For large dataset don’t use broadcast as

partitioner
Minimize the use of regular expression

functions like re_index in the transfer
functions
Avoid repartitioning of data

unnecessarily

tuning for already built graph?
Answer:Steps to performance Tuning

for already built graph.
Understand the functionality of the

Graph.
Modularize(i.e,check for dependencies

among components).
Give Phasing.
Check for correct Parallelism.

Check for DB component(i.e,take
required data from DB. Instead of taking
whole data from DB which consumes
more time and memory.
Q.What is .abinitiorc ? What it contain?
Answer:.abinitiorc is a file which

contains the credentials to connect to
host.
Credentials like
1)Host IP
2)User-name
3)Password etc…
This is a config file for ab-Initio – in

user’s home directory and in
$AB_HOME/Config. It sets Ab-Initio
home path, configuration variables
(AB_WORK_DIR, AB_DATA_DIR, etc.),
login info (id, encrypted password),
login methods for hosts for execution
(like EME host, etc.), etc.
Q.Why might you create a stored

procedure with the ‘with recompile’
option?
Answer: Recompile is useful when the

tables referenced by the stored
procedure undergoes a lot of
modification/deletion/addition of data.
Due to the heavy modification activity
the execute plan
becomes outdated and hence the stored

procedure performance goes down. If
we create the stored procedure with
recompile option, the sql server wont
cache a plan for this stored procedure
and it will be recompiled every time it is
run.
Q.What is the purpose of having stored

procedures in a database?
Answer:Main Purpose of Stored

Procedure for reduce the network traffic
and all sql statement executing in
cursor so speed too high.
We use Run SQL and Join with DB

components to run Stored Procedures.
Q.What is mean by Co>Operating

system and why it is special for Ab-
Initio?
Answer:
Co > Operating System:Layered top to

the Native operating system.
It converts the Ab-Initio specific code

into the format, which the
UNIX/Windows can understand and
feeds it to the native operating system,
which carries out the task.
Q.How to retrieve data from database

to source in that case which component
is used for this?
Answer: To unload (retrieve) Data from

the database.
Input Table Component use the

following parameters:
1)db_config file(which contains

credentials to interface with Database)
2)Database Types
3)SQL file (which contains sql queries to

unload data from table(s)).
Q.How to execute the graph from start

to end stages?Tell me and how to run
graph in non Ab-Initio system?
Answer:
There are so many ways to do this,
1.you can run components according to

phases how you defined.
2.by creating ksh, sh scripts also you

can run.
Q.What is Join With DB?
Answer: Join with DB Component joins

records from the flow or flows
connected to its in port with records
read directly from a database, and
outputs new records containing data
based on, transform function.
Q.How do you truncate a table?
Answer: Use Truncate Table component

to truncate a table from DB in Ab-Initio.
Truncate Table Component has the

following parameters:
1)db_config file(which contains

credentials to interface with Database)
2)Database Types
3)SQL file (which contains sql queries to

truncate table(s)).
Q.Can we load multiple files?
Answer: Yes,we can load multiple file in

Ab-Initio.
Q.What is the syntax of m_dump

command?
Answer: m_dump command prints the

data in a formatted way.
The general syntax is
m_dump
“m_dump meta data data [action] ”
e.g
m_dump emp.dml emp.dat -start 10

-end 20
– it will give record from 10 to 20 from

emp.dat file.
Q.How to Create Surrogate Key using

Ab-Initio?
Answer: A surrogate key is a

substitution for the natural primary key.
–It is just a unique identifier or number

for each record like ROWID of an Oracle
table.
Surrogate keys can be created using
1)next_in_sequence
2)this_partition
3)no_of_partitions
Q.Can any one give me an example of

real-time start script in the graph?
Answer: Start script is a script which

gets executed before the graph
execution starts. If we want to export
values of parameters to the graph then
we can write in start script then run the
graph then those values will be exported
to graph.

sandbox and EME, can we perform
checkin and checkout through
sandbox/ Can anybody explain checkin
and checkout?
Answer. Sandboxes are work areas

used to develop, test or run code
associated with a given project. Only
one version of the code can be held
within the sandbox at any time.
The EME Datastore contains all versions
of the code that have been checked into
it. A particular sandbox is associated
with only one Project where as a Project
can be checked out to a number of
sandboxes.
Q.What is skew and skew

measurement?
Answer: skew is the mesaureof data

flow to each partation .
suppose i/p is comming from 4 files
and size is 1 gb
1 gb= ( 100mb+200mb+300mb+5oomb)
1000mb/4= 250 mb
(100- 250 )/500= –> -150/500 == cal ur
self it wil come in -ve value.
calclu for 200,500,300.
+ve value of skew is allways desriable.
skew is a indericet measure of graph.
Q.What is the syntax of m_dump

command?
Answer: The genaral syntax is “m_dump

metadata data [action] ”
Q.What is the latest version that is

available in Ab-initio?
Answer: The latest version of GDE

ism1.15 AND Co>operating system is
2.14
Q.What is the Difference between DML

Expression and XFR Expression ?
Answer: The main difference b/w dml &

xfr is that
DML represent format of the metadata.
XFR represent the tranform
functions.which will contain business
rules

components in a Abinition graph? can
anybody give me a practical example of
a trasformation of data, say customer
data in a credit card company into
meaningful output based on business
rules?
Answer: The most commonly used

components in to any Ab Initio project
are
input file/output file
input table/output table
lookup file
reformat,gather,join,runsql,join with
db,compress
components,sort,trash,partition by
expression,partition by key ,concatinate
Q.Have you used rollup component?

Describe how ?
Answer: Rollup component can be used

in different number of ways. It basically
acts on a group of records based on a
certain key.
The simplest application would be to
count the number of records in a certain
file or table.
In this case there would not be any “key”
associated with it. A temp variable
would be created for eg. ‘temp.count’
which would be increamented with
every record ( since there is no key here
all the fields are trated as one group)
that flows through the transform, like
temp.count=temp.count+1.
Again the rollup component can be used
to discard duplicates from a
group.Rollup basically acting as the
dedup component in this case.
1. What is the difference between

Answer: PARTITION BY KEY:

In this, we have to specify the key based
on which the partition will occur. Since it
is key based it results in very well
balanced data. It is useful for key
dependent parallelism.
PARTITION BY ROUND ROBIN:
In this, the records are partitioned in
sequential way, distributing data evenly
in blocksize chunks across the output
partition. It is not key based and results
in well balanced data especially with
blocksize of 1. It is useful for record
independent parallelism.

graphs?
Answer: One of the main purpose of the

parameterized graphs is that if we need
to run the same graph for n number of
times for different files, we set up the
graph parameters like $INPUT_FILE,
$OUTPUT_FILE etc and we supply the
values for these in the
Edit>parameters.These parameters are
substituted during the run time. we can
set different types of parameters like
positional, keyword, local etc.
The idea here is, instead of maintaining
different versions of the same graph, we
can maintain one version for different
files.
Q.How Does MAXCORE works?

Answer: Maxcore is a value (it will be in
Kb).Whne ever a component is executed
it will take that much memeory we
specified for execution.
Q.What does layout means in terms of

Ab Initio?
Answer: Before you can run an Ab Initio

graph, you must specify layouts to
describe the following to the
Co>Operating System:
The location of files
The number and locations of the

partitions of multifiles
The number of, and the locations in

which, the partitions of program
components execute
A layout is one of the following:
A URL that specifies the location of a

serial file
A URL that specifies the location of the

control partition of a multifile
A list of URLs that specifies the locations

of:
The partitions of an ad hoc multifile
The working directories of a

program component
Every component in a graph — both

dataset and program components —
has a layout. Some graphs use one
layout throughout; others use several
layouts and repartition data as needed
for processing by a greater or lesser
number of processors.
During execution, a graph writes various

files in the layouts of some or all of the
components in it. For example:
An Intermediate File component writes to

disk all the data that passes through it.
A phase break, checkpoint, or watcher

writes to disk, in the layout of the
component downstream from it, all the
data passing through it.
A buffered flow writes data to disk, in the

layout of the component downstream
from it, when its buffers overflow.
Many program components — Sort is one

example — write, then read and remove,
temporary files in their layouts.
A checkpoint in a continuous graph

writes files in the layout of every
component as it moves through the
graph.
Q.Can we load multiple files?
Answer: Load multiple files from my

perspective means writing into more
than one file at a time. If this is the
same case with you, Ab initio provides a
component called Write Multiplefiles (in
dataset Component group) which can
write multiple files at a time. But the
files which are to be written must be
local files i.e., they should reside in your
local PC. For more information on this
component read in help file.

tuning for already built graph ? Can you
let me know some examples?
Answer: example :- suppose sort is

used in fornt of merge component its no
use of using sort !
1)we have sort component built in
merge.
2) we use lookup instead of JOIN,Merge
Componenet.
3) suppose we wnt to join the data
comming from 2 files and we dnt wnt
dupliates we will use union funtion
instead of adding addtional component
for duplicate remover.

why ?
Answer: Fixed length DML’s are faster

because it will directly read the data of
that length without any comparisons but
in delimited one,s every character is to
be compared and hence delays.

Answer: For converting a string to a

following syntax,
out.decimal_field :: ( decimal(
size_of_decimal ) ) string_field;
Q.What is the importance of EME in ab

initio?
Answer: EME is a repository in Ab

Inition and it used for checkin and
checkout for graphs also maintains
graph version.

transformer?
Answer: Double click on the transform

parameter of parameter tab page of
component properties, it will open
transform editor. In the transform editor
click on the Edit menu and then select
Add Default Rules from the dropdown. It
will show two options – 1) Match
Names 2) Wildcard.
Q.What is data mapping and data

modeling?
Answer: data mapping deals with the

FIELD level i.e. the transformation of the
source field to target field is specified
by the mapping defined on the target
field. The data mapping is specified
during the cleansing of the data to be
loaded.
For Example:
source;
string(35) name = “Siva Krishna “;
target;
string(“01”) nm=NULL(“”);/*(maximum
length is string(35))*/
Then we can have a mapping like:
Straight move.Trim the leading or
trailing spaces.
The above mapping specifies the
transformation of the field nm.
Q.Difference between conventional

loading and direct loading ? when it is
used in real time .
Answer: Conventional Load:

Before loading the data, all the Table
data.
will be loaded directly.Later the data will
be checked against the table
indexed.
Api conventional loading
Q.What are the contineous components

in Abinitio?
Answer: Contineous components used

to create graphs,that produce useful
output file while running continuously
Ex:- Contineous rollup,Contineous
update,batch subscribe
Q.What is mean by Co > Operating

system and
why it is special for Ab-initio ?
Answer: Co > Operating System:

It converts the AbInitio specific code
into the format, which the
UNIX/Windows can understand and
feeds it to the native operating system,
which carries out the task.

transformer?
Answer: Click to transformer then go to

edit …then click to add default rule……
In Abinitio there is a concept called Rule
Priority, in which you can assign priority
to rules in Transformer.
Let’s have a example:
Ouput.var1 :1: input.var1 + 10
Ouput.var1 :2: 100
This example shows that output
variable is assigned an input variable +
100 or if input variable do not have a
value then default value 100 is set to the
output variable.
The numbers 1 and 2 represents the
priority.
Q.How to do we run sequences of jobs ,

like output of A JOB is Input to B,How
do we co-ordinate the jobs?
Answer: By writing the wrapper scripts

we can control the sequence of
execution of more than one job.
Q.what is BRODCASTING and

REPLICATE ?
Answer: Broadcast – Takes data from
multiple inputs, combines it and sends it
to all the output ports.
Eg – You have 2 incoming flows (This
can be data parallelism or component
parallelism) on Broadcast component,
one with 10 records & other with 20
records. Then on all the outgoing flows
(it can be any number of flows) will have
10 + 20 = 30 records
Replicate – It replicates the data for a
particular partition and send it out to
multiple out ports of the component, but
maintains the partition integrity.
Eg – Your incoming flow to replicate has
a data parallelism level of 2. with one
partition having 10 records & other one
having 20 recs. Now suppose you have
3 output flos from replicate. Then each
flow will have 2 data partitions with 10 &
20 records respectively.
Ab initio Interview Questions

Ab initio Interview Questions and
Answers
Q.When using multiple DML statements

to perform a single unit of work, is it
preferable to use implicit or explicit
transactions, and why.
Answer: Because implicit is using for

internal processing and explicit is using
for user open data required.
Q.What are kinds of layouts does ab

initio supports
Answer: Basically there are serial and

parallel layouts supported by AbInitio. A
graph can have both at the same time.
The parallel one depends on the degree
of data parallelism. If the multi-file
system is 4-way parallel then a
component in a graph can run 4 way
parallel if the layout is defined such as
it’s same as the degree of parallelism.
Q.What is the difference between look-

up file and look-up, with a relevant
example?
Answer: A lookup is a component of

abinitio graph where we can store data
and retrieve it by using a key parameter.
A lookup file is the physical file where
the data for the lookup is stored.

command prompt?
Answer: A .dbc file can be tested using

m_db command
eg: m_db test .dbc_filename
Q.Can we merge two graphs?
Answer: You can not merge two ab-

Initio graphs. You can use the output of
one graph as input for another. You can
also copy/paste the contents between
graphs.
Q.Explain the differences between api

and utility mode?
Answer: api and Utility are Database

Interfaces.
api use SQL where table constrains are

checked against the data before loading
data into Database.
Utility uses Bulk Loading where table

constraints are disabled first and data
loaded into Database and then table
constraints are checked against data.
Data loading using Utility is faster when

compared to Api. if a crash occurs while
loading data into database we can have
commit and rollback in Api but we need
to load whole in Utility mode.
Q.How to Schedule Graphs in Ab-

Initio,like work flow Schedule in
Informatica? And where we must use
Unix shell scripting in Ab-Initio?
Answer: We can use Autosys, Control-

M, or any other external scheduler to
schedule graphs in Ab-Initio.
We can take care of dependencies in

many ways. For example, if scripts
should run sequentially, we can arrange
for this in Autosys, or we can create a
wrapper script and put there several
sequential commands (nohup
command1.ksh & ; nohup
command2.ksh &; etc). We can even
create a special graph in Ab-Initio to
execute individual scripts as needed.
Q.What is Environment project in Ab-

Initio?
Answer: Environment project is a

special public project that exists in
every Ab-Initio environment. It contains
all the environment parameters required
by the private or public projects which
constitute AI Standard Environment.
Q.What is Component Folding?What is

the use of it?
Answer: Component Folding is a new

feature by which Co>operating System
combines a group of components and
runs them as a single process.
Component Folding improves the

performance of graph.
Pre-Requirements for Component

Folding
The components must be foldable.

They must be in same phase and
layout.
Components must be connected via
straight flow
Q.How do you Debug a graph ,If an error
occurs while running?
Answer: There are many ways to debug

a graph. we can use
Debugger
File Watcher
Intermediate File for debugging
purpose.
Q.What do u mean by $RUN?
Answer: This is parameter variable and

it contains only path of project sandbox
run directory. Instead of using hard-code
value to use this parameter and this is
default sandbox run directory
parameter.
fin ——-> top-level directory (

$AI_PROJECT )
|—- mp ——-> second level directory

($MP )
|—- xfr ——-> second level directory
($XFR )
|—- run ——–> second level directory
($RUN )
|—- dml ——-> second level directory
($DML )
Q.What is the importance of EME in ab-

Initio?
Answer: EME is a repository in Ab-Initio

and it used for check-in and checkout
for graphs also maintains graph version.
EME is source code control system in

Ab-Initio world. It is repository where all
the sandboxes
related(project related codes(graphs

version are maintained) code version
are maintained , we just check-in and
checkout graphs and modified it
according. There will be lock put once it
is access by any users.

sandbox and EME, can we perform
check-in and checkout through
sandbox/ Can anybody explain check-in
and checkout?
Answer: Sandboxes are work areas

used to develop test or run code
associated with a given project.
Only one version of the code can be

held within the sandbox at any time. The
EME Data-store contains all versions of
the code that have been checked into it.
A particular sandbox is associated with

only one Project where as a Project can
be checked out to a number of
sandboxes.
Q.What is difference between sandbox

parameters and graph parameters?
Answer: Sandbox Parameters are

common parameters for the project. it
can be used to accessible with in a
project. The graph parameters are uses
with in graph but you can’t access
outside of other graphs. It’s called local
parameters.
Q.How do you connect EME to Ab-Initio

Server?
Answer:There are several ways of

connecting to EME
Set AB_AIR_ROOT
GDE you can connect to EME data-
store
login to eme web interface
using the air command, i don’t know
much about this.
Q.What is use of co>operating system

between GDE and Host?
Answer: The co>operating system is

heart of GDE, It always referring the host
setting, environmental variable and
functions while running the graphs
through GDE. It’s interfacing the
connection setting information between
HOST and GDE.
Q.What is the use of Sandbox ? What is

it.?
Answer: Sandbox is a directory

structure of which each directory level is
assigned a variable name, is used to
manage check-in and checkout of
repository based objects such as mp,
run, dml, db, xfr and sql (graphs, graph
ksh files, wrapper scripts, dml files, xfr
files, dbc files, sql files.)
Fin ——-> top-level directory (

$AI_PROJECT )
|—- mp ——-> second level directory

($AI_MP )
|—- xfr ——-> second level directory

($AI_XFR )
|—- run ——–> second level directory

($AI_RUN )
|—- dml ——-> second level directory

($AI_DML )
Sandbox contains various directories,

which is used for specific purpose only.
The mp directory is used for storing
data mapping details about between
sources and targets or components and
the file extension must be *.mp. The xfr
directory denotes purpose of stores the
transform files and the file extension
must be *.xfr. The dml directory is used
for storing all meta-data information of
data with Ab-Initio supported data types
and the file extensions must be *.dml.
The run directory contains only the
graph’s shell script (korn shell script)
files that are created after deploying the
graph.
The sandbox contains might be stores

all kinds of information for data.
Q.What is mean by EME Data Store and

what is use of EME Data Store in
Enterprise world?
Answer: EME Data Store is a Enterprise

Meta Environment Data store
(Enterprise Repository) and its contains
’n’ number of projects (sandbox) which
are interfacing the meta data between
them. These sandbox project objects
(mp, run, db, xfr, dml) are can be easily
to manage the check-in, checked out of
the repository objects.
Mode:
In the EME Data-store Mode box of the

EME Data-store Settings dialog, choose
one of the following:
Source Code Control — This is the

recommended setting. When you set a
data-store to this mode, you must check
out a project in order to work on it. This
prevents multiple users from making
conflicting changes to a project.
Full Access — This setting is strongly

not recommended. It is for advanced
users only. It allows you to edit a project
in the data-store without checking it out.
Save Script When Graph Saved to

Sandbox
In the EME Data-store Settings dialog,

select this option to have the GDE save
the script it generates for a graph when
you save the graph. The script lets you
run the graph without the GDE if, for
example, you relocate the project.
 Contact for Ab initio Training
Overall rating: ★★★★☆ based on 433

reviews
Name
Course
Country
Email Address
Phone Number
Timings For Demo
Message
Send
 Abinitio Online Training

WRITE A REVIEW
Name *
Email
Review Title *
Rating * 
Review Content *
Submit
Ab initio interview questions Tags
ab initio interview questions and

answers,abinitio online training, ab initio
interview questions, ab initio online
training, abinitio training, ab initio
training institute, latest ab initio
interview questions, best ab initio
interview questions 2019, top 100 ab
initio interview questions,sample ab
initio interview questions,ab intio
interview questions technical, best ab
intio interview tips, best ab initio
interview basics
For online training videos

Copy Rights Reserved by KITS Online Training's Pvt Ltd - Powered By Best Design Infotech

Ab Initio Interview Questions - HTML PDF

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Ab Initio Interview Questions - HTML PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Mail: Kitsonlinetrainings@gmail.

Q.What is surrogate key?

Answer: surrogate key is a system

Q.Differences Between Ab-Initio and

Answer: Informatica and Ab-Initio both

Pipe Line parallelism.

We don’t have scheduler in Ab-Initio like

Ab-Initio supports different types of text

Informatica is an engine based ETL tool,

Ab-Initio is a code based ETL tool, it

Initial ramp up time with Ab-Initio is

Ab-Initio doesn’t need a dedicated

With Ab-Initio you can read data with

Error Handling – In Ab-Initio you can

Q.What is the difference between rollup

Answer : By using rollup we cant

Answer : Ab-Initio designed to support

We can develop applications easily

Data Processing is very fast and

Available in both Windows NT and UNIX

Q.What is the difference between

Q.How to Create Surrogate Key using

Answer. A key is a field or set of fields

A natural key is a key that is meaningful

A surrogate key is a field that is added

Q.What are the most commonly used

input file / output file

input table / output table

sort (single or multiple keys)

partition by expression / partition by key

Q.How do we handle if DML changing

Answer: There are lot many ways to

Some of the suitable methods are to

Q.What is meant by limit and ramp in

Answer: The limit and ramp are the

Graph stops the execution when the

The default value will be set to 0.0.

The limit parameter contains an integer

Typical Limit and Ramp settings

Limit = 0 Ramp = 0.0 Abort on any error

Limit = 50 Ramp = 0.0 Abort after 50

Limit = 1 Ramp = 0.01 Abort if more

Limit = 1 Ramp = 1 Never Abort

Q.What are data mapping and data

Answer: Data mapping deals with the

Then we can have a mapping like:

Straight move.Trim the leading or

The above mapping specifies the

Q.What is the difference between a DB

Answer : .dbc file has the information

Q.What is mean by Layout?

Answer: A layout is a list of host and

A program component’s layout is the list

A dataset component’s layout is the list

The layout defines the level of

computation across processors.

Q.What are Cartesian joins?

Answer: A Cartesian join will get you a

Q.What is the function you would use to

Q.How do we handle if DML changing

Answer: There are lot many ways to