Sie sind auf Seite 1von 50

datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.

html

Share Report Abuse Next Blog» Create Blog Sign In

The Complete datastage solutions by Madhava

Home About blog Datastage Overview DS Course Syllabus Contact Us

1 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

Health Tip of The Day

Download videos from Youtube


DS interview questions
1. What is difference between server jobs & parallel jobs
The Parallel jobs are also available if you have Datastage 6.0 PX, or Datastage 7.0
versions installed. The Parallel jobs are especially useful if you have large amounts of
data to process. The key to making your muscles
2. What is merging? And how to use merge? stronger is working them against
merge is nothing but a filter conditions that have been used for filter condition resistance, whether that be from
3. How we use NLS function in Datastage? W hat are advantages of NLS weights or gravity. If you want to
function? Where we can use that one? Explain briefly? gain muscle strength, try
As per the manuals and documents, we have different level of interfaces. More exercises such as lifting weights
specific? Like Teradata interface operators, DB2 interface operators, Oracle Interface or rapidly taking the stairs.
operators and SAS-Interface operators. Orchestrate National Language Support
(NLS) makes it possible for you to process data in international languages using
Unicode character sets. International Components for Unicode (ICU) libraries support
NLS functionality in Orchestrate. Operator NLS Functionality* Teradata Interface
Operators * switch Operator * filter Operator * The DB2 Interface Operators * The
Oracle Interface Operators* The SAS-Interface Operators * transform Operator *
modify Operator * import and export Operators * generator Total Pageviews
4. What is APT_CONFIG in datastage
The APT_CONFIG_FILE (not just APT_CONFIG) is the configuration file that defines
the nodes, (the scratch area, temp area) for the specific project.
4903
5. W hat is the OCI? And how to use the ETL Tools?
OCI means orabulk data which used client having bulk data its retrieve time is much
more i.e., your used to orabulk data the divided and retrieved Blog Archive
6. What is merge and how it can be done explain with simple example taking 2 ▼ 2010 (12)
tables?
► December (4)
Merge is used to join two tables. It takes the Key columns sort them in Ascending or
descending order. Let us consider two table i.e. Emp,Dept.If we want to join these ► November (2)
two tables we are having DeptNo as a common Key so we can give that column name ► October (2)
as key
▼ September (3)
7. What is version Control?
Version Control stores different versions of DS jobs runs different versions of same job DS interview questions
reverts to previous version of a job view version histories DataStage 8.1 Interview
8. What are the Repository Tables in DataStage and what are they? Questions
A datawarehouse is a repository (centralized as well as distributed) of Data, able to DataStage Vs
answer any adhoc, analytical, historical or complex queries. Metadata is data about Informatica
data. Examples of metadata include data element descriptions, data type descriptions,
attribute/property descriptions, range/domain descriptions, and process/method ► August (1)
descriptions. The repository environment encompasses all corporate metadata
resources: database catalogs, data dictionaries, and navigation services. Metadata
includes things like the name, length, valid values, and description of a data element.
Yahoo News: Top Stories
Metadata is stored in a data dictionary and repository. It insulates the data warehouse
from changes in the schema of operational systems. In data stage I/O and Transfer ,
under interface tab: input , out put & transfer pages will have 4 tabs and the last one is
build under that u can find the TABLE NAME .The DataStage client components
are:AdministratorAdministers DataStage projects and conducts housekeeping on the
serverDesignerCreates DataStage jobs that are compiled into executable programs
Director Used to run and monitor the DataStage jobsManagerAllows you to view and
edit the contents of the repository
9. How can we pass parameters to job by using file?
You can do this, by passing parameters from UNIX file, and then calling the execution
of a datastage job. the ds job has the parameters defined (which are passed by Unix)
10. W here does Unix script of datastage execute weather in clinet machine or in
server? Suppose if it executes on server then it will execute?
Datastage jobs are executed in the server machines only. There is nothing that is
stored in the client machine.

2 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

11. Defaults nodes for datastage parallel Edition


Actually the Number of Nodes depend on the number of processors in your system. If
Your IP and Google Map location
your system is supporting two processors we will get two nodes by default
12. W hat happens if RCP disable?
In such case Osh has to perform Import and export every time when the job runs and
the processing time job is also increased
13. I want to process 3 files in sequentially one by one, how I can do that. While
processing the files it should fetch files automatically. Yahoo! News: Top Stories
If the metadata for all the files r same then create a job having file name as John Edwards indicted in $925K
mistress cover-up (AP)
Parameter, then use same job in routine and call the job with different file
GOP tries to bridge social-fiscal
Name or u can create sequencer to use divide (AP)
14. Scenario based Question..... Suppose that 4 job control by the sequencer like Rocket wounds Yemen president,
(job 1, job 2, job 3, job 4 )if job 1 have 10,000 row ,after run the job only 5000 escalating fight (AP)
data has been loaded in target table remaining are not loaded and your job
going to be aborted then.. How can short out the problem.
Suppose job sequencer synchronies or control 4 job but job 1 have problem, in this
condition should go director and check it what type of problem showing either data
type problem, warning massage, job fail or job aborted, If job fail means data type
problem
15. W hat is the Batch Program and how can generate?
Batch program is the program it's generate run time to maintain by the datastage it
self but u can easy to change own the basis of your requirement (Extraction,
Share it
Transformation, Loading) .Batch program are generate depends your job nature either
simple
16. How many places u can call Routines?
Tweet this
Four Places u can call (i) Transform of routine (A) Date Transformation (B) Upstring
Transformation (ii) Transform of the Before & After Subroutines(iii) XML Get more gadgets for your site

transformation(iv)Web base transformation


17. How do you fix the error "OCI has fetched truncated data" in DataStage Madhava@
Can we use Change capture stage to get the truncated data’s? Members please Search This Blog
confirm
18. Importance of Surrogate Key in Data warehousing?
Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is it powered by
is independent of underlying database. i.e. Surrogate Key is not affected by the
changes going on with a database.
19. W hat’s the difference between Datastage Developers and Datastage
Designers? W hat are the skills required for this. About Me
Datastage developer is one how will code the jobs. datastage designer is how will
design the job, i mean he will deal with blue prints and he will design the jobs the Madhava
stages that are required in developing the code For any queries regarding this blog
20. How do you merge two files in DS? info reach me @ 9652796966
Either used Copy command as a Before-job subroutine if the metadata of the 2 files
View my complete profile
are same or created a job to concatenate the 2 files into one if the metadata is
different.
21. How do we do the automation of ds jobs?
We can call Datastage Batch Job from Command prompt using 'dsjob'. We can also
pass all the parameters from command prompt. Then call this shell script in any of the
market available schedulers. The 2nd option is schedule these jobs using Data
22. W hat is DS Director used for - did u use it?
Datastage director is used to run the jobs and validate the jobs. we can go to
datastage director from datastage designer it self.
23. W hat is DS Manager used for - did u use it?
Datastage manager is used to export and import purpose [/B] main use of export and
import is sharing the jobs and projects one project to other project.
24. W hat are types of Hashed File?
Hashed File is classified broadly into 2 types. a) Static - Sub divided into 17 types
based on Primary Key Pattern. b) Dynamic - sub divided into 2 types i) Generic ii)
Specific. Default Hashed file is "Dynamic – Type
25. How do you eliminate duplicate rows?
Removal of duplicates done in two ways: 1. Use "Duplicate Data Removal" stage or 2.
Use group by on all the columns used in select, duplicates will go away.

3 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

26. W hat about System variables?


DataStage provides a set of variables containing useful system information that you
can access from a transform or routine. System variables are read-only. @DATE the
internal date when the program started. See the Date function.

@DAY The day of the month extracted from the value in @DATE.

@FALSE The compiler replaces the value with 0.

@FM A field mark, Char (254).

@IM An item mark, Char (255).

@INROWNUM Input row counter. For use in constrains and derivations in Transformer
stages.

@OUTROWNUM Output row counter (per link). For use in derivations in Transformer
stages.

@LOGNAME The user login name.

@MONTH The current extracted from the value in @DATE.

@NULL The null value.

@NULL.STR The internal representation of the null value, Char (128).

@PATH The pathname of the current DataStage project.

@SCHEMA The schema name of the current DataStage project.

@SM A subvalue mark (a delimiter used in Universe files), Char(252).

@SYSTEM.RETURN.CODE
Status codes returned by system processes or commands.

@TIME The internal time when the program started. See the Time function.

@TM A text mark (a delimiter used in Universe files), Char (251).

@TRUE The compiler replaces the value with 1.

@USERNO The user number.

@VM A value mark (a delimiter used in Universe files), Char (253).

@WHO The name of the current DataStage project directory.

@YEAR The current year extracted from @DATE.

REJECTED Can be used in the constraint expression of a Transformer stage of an


output link. REJECTED is initially TRUE, but is set to FALSE whenever an output link is
successfully written.
27. W hat is DS Designer used for - did u use it?
You use the Designer to build jobs by creating a visual design that models the flow and
transformation of data from the data source through to the target warehouse. The
Designer graphical interface lets you select stage icons, drop them onto the Designer
28. W hat is DS Administrator used for - did u use it?
The Administrator enables you to set up DataStage users, control the purging of the
Repository, and, if National Language Support (NLS) is enabled, install and manage
maps and locales.

4 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

29. Dimensional modeling is again sub divided into 2 types.


A) Star Schema - Simple & Much Faster. Denormalized form. B) Snowflake Schema -
Complex with more Granularity. More normalized form.
30. How will you call external function or subroutine from datastage?
There is datastage option to call external programs. execSH
31. How do you pass filename as the parameter for a job?
While job development we can create a parameter 'FILE_NAME' and the value can be
passed while running the job.
32. How to handle Date conversions in Datastage? Convert an mm/dd/yyyy format
to yyyy-dd-mm?
We use a) "Iconv" function - Internal Conversion. b) "Oconv" function - External
Conversion. Function to convert mm/dd/yyyy format to yyyy-dd-mm is
Oconv(Iconv(Filedname,"D/MDY[2,2,4]"),"D-MDY[2,2,4]")
33. W hat’s difference between operational data stage (ODS) & data warehouse?
That which is volatile is ODS and the data which is nonvolatile and historical and time
variant data is DWh data. In simple terms ods is dynamic data.
34. W hen should we use ODS?
DWH's are typically read only, batch updated on a schedule ODS's are maintained in
more real time, trickle fed constantly
35. W hat are the Job parameters?
These Parameters are used to provide Administrative access and change run time
values of the job.
EDIT>JOBPARAMETERS
In that Parameters Tab we can define the name,prompt,type,value
36. How can we join one Oracle source and Sequential file?
Join and look up used to join oracle and sequential file
37. W hat is iconv and oconv functions?
Iconv and oconv are date conversion functions
38. Difference between Hash file and Sequential File?
Hash file stores the data based on hash algorithm and on a key value. A sequential file
is just a file with no key column. Hash file used as a reference for look up. Sequential
file cannot
39. How do you rename all of the jobs to support your new File-naming
conventions?
Create an Excel spreadsheet with new and old names. Export the whole project as a
dsx. Write a Perl program, which can do a simple rename of the strings looking up the
Excel file. Then import the new dsx file probably into a new project for testing
40. Does the selection of 'Clear the table and Insert rows' in the ODBC stage send
a Truncate statement to the DB or does it do some kind of Delete logic.
There is no TRUNCATE on ODBC stages. It is Clear table blah blah and that is a
delete from statement. On an OCI stage such as Oracle, you do have both Clear and
Truncate options. They are radically different in permissions (Truncate requires you to
have altered table permissions where Delete doesn't).
41. Tell me one situation from your last project, where you had faced problem and
How did u solve it?
A. The jobs in which data is read directly from OCI stages are running extremely slow.
I had to stage the data before sending to the transformer to make the jobs run faster.
B. The job aborts in the middle of loading some 500,000 rows. Have an option either
cleaning/deleting the loaded data and then run the fixed job or run the job again from
the row the job has aborted. To make sure the load is proper we opted the former.
42. The above might raise another question: Why do we have to load the
dimensional tables first, then fact tables
as we load the dimensional tables the keys (primary) are generated and these keys
(primary) are Foreign keys in Fact tables.
43. How will you determine the sequence of jobs to load into data warehouse?
First we execute the jobs that load the data into Dimension tables, then Fact tables,
then load the Aggregator tables (if any).
44. W hat are the command line functions that import and export the DS jobs?
dsimport.exe- imports the DataStage components. dsexport.exe- exports the
DataStage components
45. W hat is the utility you use to schedule the jobs on a UNIX server other than

5 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

using Ascential Director?


Use crontab utility along with dsexecute() function along with proper parameters
passed.
46. How would call an external Java function which are not supported by
DataStage?
Starting from DS 6.0 we have the ability to call external Java functions using a Java
package from Ascential. In this case we can even use the command line to invoke the
Java function and write the return values from the Java program (if any) and use that
files as a source in DataStage job.
47. W hat will you in a situation where somebody wants to send you a file and use
that file as an input or reference and then run job.
Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run the
job. May be you can schedule the sequencer around the time the file is expected to
arrive.B. Under UNIX: Poll for the file. Once the file has start the job
48. Read the String functions in DS
Functions like [] -> sub-string function and ':' -> concatenation operator Syntax: string [
[ start, ] length ]string [ delimiter, instance, repeats ]
49. How did u connect with DB2 in your last project?
Most of the times the data was sent to us in the form of flat files. The data is dumped
and sent to us. In some cases were we need to connect to DB2 for look-ups as an
instance then we used ODBC drivers to connect to DB2 (or) DB2-UDB depending the
situation
50. W hat are Sequencers?
Sequencers are job control programs that execute other jobs with preset Job
parameters
51. How did you handle an 'Aborted' sequencer?
In almost all cases we have to delete the data inserted by this from DB manually and
fix the job and then run the job again.
52. W hat are other Performance tunings you have done in your last project to
increase the performance of slowly running jobs?
Staged the data coming from ODBC/OCI/DB2UDB stages or any database on the
server using Hash/Sequential files for optimum performance also for data recovery in
case job aborts. Tuned the OCI stage for 'Array Size' and 'Rows per Transaction'
numerical values
53. How did you handle reject data?
Typically a Reject-link is defined and the rejected data is loaded back into data
warehouse. So Reject link has to be defined every Output link you wish to collect
rejected data. Rejected data is typically bad data like duplicates of Primary keys or
null-rows
54. If worked with DS6.0 and latest versions what are Link-Partitioner and
Link-Collector used for?
Link Partitioner - Used for partitioning the data. Link Collector - Used for collecting the
partitioned data.
55. W hat are Routines and where/how are they written and have you written any
routines before?
Routines: Routines are stored in the Routines branch of the DataStage Repository,
where you can create, view, or edit them using the Routine dialog box. The following
program components are classified as routines:• Transform functions. These are
functions that you can use when defining custom transforms. DataStage has a number
of built-in transform functions which are located in the Routines ➤ Examples➤
Functions branch of the Repository. You can also define your own transform functions
in the Routine dialog box. Before/After subroutines. When designing a job, you can
specify a subroutine to run before or after the job, or before or after an active stage.
DataStage has a number of built-in before/after subroutines, which are located in the
Routines ➤ ➤
Built-in Before/Afterbranch in the Repository. You can also define your
own before/after subroutines using the Routine dialog box. Custom Universe functions.
These are specialized BASIC functions that have been defined outside DataStage.
Using the Routine dialog box, you can get DataStage to create a wrapper that enables
you to call these functions from within DataStage. These functions are stored under
the Routines branch in the Repository. You specify the category when you create the
routine. If NLS is enabled,9-4 Ascential DataStage Designer Guide you should be

6 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

aware of any mapping requirements when using custom Universe functions. If a


function uses data in a particular character set, it is your responsibility to map the data
to and from Unicode. ActiveX (OLE) functions. You can use ActiveX (OLE) functions
as programming components within DataStage. Such functions are made accessible to
DataStage by importing them. This creates a wrapper that enables you to call the
functions. After import, you can view and edit the BASIC wrapper using the Routine

dialog box. By default, such functions are located in the Routines Class name branch
in the Repository, but you can specify your own category when importing the functions.
When using the Expression Editor, all of these components appear under the DS
Routines… command on the Suggest Operand menu. A special case of routine is the
job control routine. Such a routine is used to set up a DataStage job that controls other
DataStage jobs. Job control routines are specified in the Job control page on the Job
Properties dialog box. Job control routines are not stored under the Routines branch in
theRepository.TransformsTransforms are stored in the Transforms branch of the
DataStage Repository, where you can create, view or edit them using the Transform
dialog box. Transforms specify the type of data transformed the type it is transformed
into, and the expression that performs the transformation. DataStage is supplied with a
number of built-in transforms (which you cannot edit). You can also define your own
custom transforms, which are stored in the Repository and can be used by other
DataStage jobs. When using the Expression Editor, the transforms appear under the
DSTransform… command on the Suggest Operand menu.FunctionsFunctions take
arguments and return a value. The word “function” is applied to many components in
DataStage:• BASIC functions. These are one of the fundamental building blocks of the
BASIC language. When using the Expression Editor, Programming in DataStage
9-5you can access the BASIC functions via the Function… command on the Suggest
Operand menu. DataStage BASIC functions. These are special BASIC functions that
are specific to DataStage. These are mostly used in job control routines. DataStage
functions begin with DS to distinguish them from general BASIC functions. When using
the Expression Editor, you can access the DataStage BASIC functions via the DS
Functions…command on the Suggest Operand menu. The following items, although
called “functions,” are classified as routines and are described under “Routines” on
page 9-3. When using the Expression Editor, they all appear under the DS Routines…
command on the Suggest Operand menu.• Transform functions• Custom Universe
functions• ActiveX (OLE) functionsExpressionsAn expression is an element of code
that defines a value. The word” expression” is used both as a specific part of BASIC
syntax, and to describe portions of code that you can enter when defining a job. Areas
of DataStage where you can use such expressions are:• Defining breakpoints in the
debugger• Defining column derivations, key expressions and constraints in Transformer
stages• Defining a custom transform In each of these cases the DataStage Expression
Editor guides you as to what programming elements you can insert into the expression
56. W hat are OConv () and Iconv () functions and where are they used?
IConv() - Converts a string to an internal storage format OConv() - Converts an
expression to an output format.
57. How did u connect to DB2 in your last project?
Using DB2 ODBC drivers.
58. Do u know about METASTAGE?
in simple terms metadata is data about data and metastge can be anything like
DS(dataset,sq file,etc)
59. Do you know about INTEGRITY/QUALITY stage?
integrity/quality stage is a data integration tool from ascential which is used to
standardize/integrate the data from different sources
60. W hat versions of DS you worked with?
DS 7.0.2/6.0/5.2
61. W hat are Static Hash files and Dynamic Hash files?
As the names itself suggest what they mean. In general we use Type-30 dynamic
Hash files. The Data file has a default size of 2Gb and the overflow file is used if the
data exceeds the 2GB size.
62. W hat is Hash file stage and what is it used for?
Used for Look-ups. It is like a reference table. It is also used in-place of ODBC, OCI
tables for better performance.
63. Have you ever involved in updating the DS versions like DS 5.X, if so tell us

7 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

some the steps you have taken in doing so?


Yes. The following are some of the steps; I have taken in doing so:1) Definitely take a
back up of the whole project(s) by exporting the project as a .dsx file2) See that you
are using the same parent folder for the new version also for your old jobs
64. Did you Parameterize the job or hard-coded the values in the jobs?
Always parameterized the job. Either the values are coming from Job Properties or
from a ‘Parameter Manager’ – a third part tool. There is no way you will hard–code
some parameters in your jobs. The often Parameterized variables in a job are: DB
DSN name
65. Tell me the environment in your last projects
Give the OS of the Server and the OS of the Client of your recent most project
66. How many jobs have you created in your last project?
100+ jobs for every 6 months if you are in Development, if you are in testing 40 jobs
for every 6 months although it need not be the same number for everybody
67. W hat are the often used Stages or stages you worked with in your last
project?
A) Transformer, ORAOCI8/9, ODBC, Link-Partitioner, Link-Collector, Hash, ODBC,
Aggregator, Sort.
68. W hat r XML files and how do you read data from XML files and what stage to
be used?
In the pallet there is Real time stages like xml-input,xml-output,xml-transformer
69. Suppose if there are million records did you use OCI? if not then what stage do
you prefer?
using Orabulk
70. How do you pass the parameter to the job sequence if the job is running at
night?
Two ways1. Ste the default values of Parameters in the Job Sequencer and map
these parameters to job.2. Run the job in the sequencer using dsjobs utility where we
can specify the values to be taken for each parameter.
71. W hat happens if the job fails at night?
Job Sequence Abort
72. W hat is SQL tuning? How do you do it?
in database using Hints
73. How do you track performance statistics and enhance it?
Through Monitor we can view the performance statistics.
74. W hat is the order of execution done internally in the transformer with the stage
editor having input links on the left hand side and output links?
Stage variables, constraints and column derivation or expressions.
75. W hat are the difficulties faced in using DataStage? Or what are the constraints
in using DataStage?
1) If the number of lookups are more? 2) What will happen, while loading the data due
to some regions job aborts?
76. Differentiate Database data and Data warehouse data?
Data in a Database is a) Detailed or Transactional b) Both Readable and Writable. c)
Current.
77. Dimension Modeling types along with their significance
Data Modeling is broadly classified into 2 types. a) E-R Diagrams (Entity -
Relationships). b) Dimensional Modeling
78. W hat is the flow of loading data into fact & dimensional tables?
Fact table - Table with Collection of Foreign Keys corresponding to the Primary Keys
in Dimensional table. Consists of fields with numeric values. Dimension table - Table
with Unique Primary Key. Load - Data should be first loaded into dimensional table.
79. Orchestrate Vs Datastage Parallel Extender?
Orchestrate itself is an ETL tool with extensive parallel processing capabilities and
running on UNIX platform. Datastage used Orchestrate with Datastage XE (Beta
version of 6.0) to incorporate the parallel processing capabilities. Now Datastage has
purchased Orchestrate and integrated it with Datastage XE and released a new
version Datastage 6.0 i.e Parallel Extender.
80. Differentiate Primary Key and Partition Key?
Primary Key is a combination of unique and not null. It can be a collection of key values
called as composite primary key. Partition Key is a just a part of Primary Key. There

8 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

are several methods of partition like Hash, DB2, Random etc.While using Hash
partition we specify the Partition Key.
81. How do you execute datastage job from command line prompt?
Using "dsjob" command as follows. dsjob -run -job status projectname jobname
82. W hat are Stage Variables, Derivations and Constants?
Stage Variable - An intermediate processing variable that retains value during read and
doesnt pass the value into target column. Derivation - Expression that specifies value
to be passed on to the target column. Constant - Conditions that are either
83. W hat is the default cache size? How do you change the cache size if needed?
Default cache size is 256 MB. We can increase it by going into Datastage
Administrator and selecting the Tunable Tab and specify the cache size over there.
84. Containers: Usage and Types?
Container is a collection of stages used for the purpose of Reusability. There are 2
types of Containers. a) Local Container: Job Specific b) Shared Container: Used in
any job within a project.
85. Compare and Contrast ODBC and Plug-In stages?
ODBC: a) Poor Performance. b) Can be used for Variety of Databases. c) Can handle
Stored Procedures. Plug-In: a) Good Performance. b) Database specific.(Only one
database) c) Cannot handle Stored Procedures.
86. How to run a Shell Script within the scope of a Data stage job?
By using "ExcecSH" command at Before/After job properties.
87. Types of Parallel Processing?
Parallel Processing is broadly classified into 2 types. a) SMP - Symmetrical Multi
Processing. b) MPP - Massive Parallel Processing.
88. W hat does a Config File in parallel extender consist of?
Config file consists of the following. a) Number of Processes or Nodes. b) Actual Disk
Storage Location.
89. Functionality of Link Partitioner and Link Collector?
Link Partitioner: It actually splits data into various partitions or data flows using various
partition methods. Link
Collector: It collects the data coming from partitions, merges it into a single data flow
and loads to target.
90. W hat is Modulus and Splitting in Dynamic Hashed File?
In a Hashed File, the size of the file keeps changing randomly. If the size of the file
increases it is called as "Modulus". If the size of the file decreases it is called as
"Splitting".
91. Types of vies in Datastage Director?
There are 3 types of views in Datastage Director a) Job View - Dates of Jobs
Compiled. b) Log View - Status of Job last run c) Status View - Warning Messages,
Event Messages, and Program Generated Messages.

Posted by Madhava 0 comments

9 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

DataStage 8.1 Interview Questions


INTERVIEW QUESTIONS

What is E-R modeling and why is it used for OLTP design?


E-R model is Entity Relation model used in two dimensional Databases. For Example,
SQL Server, or Ozracle. A table is based on two dimensional Rows and Columns.
Generally, OLTP systems are based on two dimensions.
But, if you see in Dimensional modeling, we have more than two dimensions.
A cube represents a three dimensional model in a data warehouse, the data are stored in
the form of summary of information. Also, these data can be easily retrieved from a DB
compared to a normal OLTP Database.
Let us assume, PROD, GEOG, TIME and MEAS are the four dimensions we have. A DW
System have stored information with these four dimensions. If you want to know the sales
of Lux (Prod), in?North India (Geog), during (Oct 2006) for a measure value of Lux 75
grams (MEAS).
ie., FACT_TBL(PROD LUX, GEOG NORTH_INDIA, TIME OCT06, MEAS Units) would
give rise to some quantity say, 75809 Units. This means, in north india this many units have
been sold during the given period.
This you can very well access with a normal OLTP system. But the problem is when the
size of the data grows, your system will not tolerate the load. Your query performance will
die down. Not just this alone, for many other advantages, we need DWH instead of a
normal OLTP system.

What is the architecture of any Data warehousing project? What is the flow?
1) The basic step of data warehousing starts with datamodelling. i.e. creation of
dimensions and facts.
2) data warehouse starts with collection of data from source systems such as
OLTP,CRM,ERPs etc
3) Cleansing and transformation process is done with ETL(Extraction Transformation
Loading)?tool.
4) by the end of ETL process target databases(dimensions,facts) are ready with data
which accomplishes the business rules.
5) Now finally with the use of Reporting tools (OLAP) we can get the information which is
used for decision support.

Discuss the advantages & Disadvantages of star & snowflake schema?


In a star schema every dimension will have a primary key.
In a star schema, a dimension table will not have any parent table.
Whereas in a snow flake schema, a dimension table will have one or more parent tables.
Hierarchies for the dimensions are stored in the dimensional table itself in star schema.

10 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

Whereas hierarchies are broken into separate tables in snow flake schema. These
hierarchies help to drill down the data from topmost hierarchies to the lowermost
hierarchies.

Compare Data Warehousing Top-Down approach with Bottom-up approach?


In top down approach: first we have to build data warehouse then we will build data
marts. Which will need more cross functional skills and time taking process also costly?
In bottom up approach: first we will build data marts then data warehouse. The data
mart that is first build will remain as a proof of concept for the others. Less time as
compared to above and less cost.

Definition of data marts?


Data Mart is the subset of data warehouse. You can also consider data mart holds the
data of one subject area. For an example, you consider an organization that has HR,
Finance, Communications and Corporate Service divisions. For each division you can
create a data mart. The historical data will be stored into data marts first and then
exported to data warehouse finally.

What is the difference between E-R modeling and Dimensional modeling?


E-R modeling is the relation between entities in the form of normalization.
Dimensional modeling is the relation between dimensions in the form of de normalization.

Are OLAP databases also called decision support system??? True/false?


True
What is the difference between OLAP and datawarehouse?
Data warehouse is the place where the data is stored for analyzing
Where as OLAP is the process of analyzing the data, managing aggregations,
Partitioning information into cubes for in depth visualization.

What is the difference between Data warehousing and Business Intelligence?


Data warehousing deals with all aspects of managing the development, implementation
and operation of a data warehouse or data mart including meta data management, data
acquisition, data cleansing, data transformation, storage management, data distribution,
data archiving, operational reporting, analytical reporting, security management,
backup/recovery planning, etc.
Business intelligence, on the other hand, is a set of software tools that enable an
organization to analyze measurable aspects of their business such as sales performance,
profitability, operational efficiency, effectiveness of marketing campaigns, market
penetration among certain customer groups, cost trends, anomalies and exceptions, etc.
Typically, the term? Business intelligence? is used to encompass OLAP, data visualization,
data mining and query/reporting tools. Think of the data warehouse as the back office and
business intelligence as the entire business including the back office. The business needs
the back office on which to function, but the back office without a business to support,
makes no sense
.
Why Denormalization is promoted in Universe Designing?
In a relational data model, for normalization purposes, some lookup tables are not merged
as a single table. In a dimensional data modeling (star schema), these tables would be
merged as a single table called DIMENSION table for performance and slicing data. Due
to this merging of tables into one large Dimension table, it comes out of complex
intermediate joins. Dimension tables are directly joined to Fact tables. Though, redundancy
of data occurs in DIMENSION table, size of DIMENSION table is 15%onlywhen compared
to FACT table. So only Denormalization is promoted in Universe Designing.

What is fact less fact table? Where you have used it in your project?
Fact less Fact Table contains nothing but dimensional keys. It is used to support negative
analysis report. For example a Store that did not sell a product for a given period.

What is snapshot?
Snapshot is static data source; it is permanent local copy or picture of a report,
it is suitable for disconnected networks. we can’t add any columns to sanpshot. we can

11 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

sort, grouping and aggregations and it is mainly used for analyzing the historical data.

what are non-additive facts in detail?


A fact may be measure, metric or a dollar value. Measure and metric are non additive
facts.
Dollar value is additive fact. If we want to find out the amount for a particular place for a
particular period of time, we can add the dollar amounts and come up with the total
amount.
A non additive fact, for e.g. measure height(s) for ‘citizens by geographical location’ , when
we rollup ‘city’ data to ’state’ level data we should not add heights of the citizens rather we
may want to use it to derive ‘count’

Data warehouse interview questions only


What is source qualifier?
Difference between DSS & OLTP?

What is cube and why we are crating a cube what is diff between ETL and OLAP cubes?
Any schema or Table or Report which gives you meaningful information Of One attribute
wrt more than one attribute is called a cube. For Ex: In a product table with Product ID and
Sales colomns, we can analyze Sales wrt to Prodcut Name, but if you analyze Sales wrt
Product as well as Region( region being attribute in Location Table) the report or Resultant
table or schema would be Cube.
ETL Cubes: Built in the staging area to load frequently accessed reports to the target.
Reporting Cubes: Built after the actual load of all the tables to the target depending on the
customer requirement for his business analysis.

What is surrogate key?


Surrogate key is a substitution for the natural primary key.

What are Aggregate tables?


Aggregate table contains the Summary of existing warehouse data which is grouped to
certain levels of dimensions. Retrieving the required data from the actual table, which have
millions of records will take more time and also affects the server performance To avoid
this we can aggregate the table to certain required level and can use it. This table reduces
the load in the database server and increases the performance of the query and can
retrieve the result very fastly.
How data in data warehouse stored after data has been extracted and transformed from
heterogeneous sources and where does the data go from data warehouse?
Data in Data warehouse stored in the form of relational tables, most of the data ware
houses approach is snowflake schema.

What is the difference between hierarchies and levels?


Levels: Columns available in dimension table is levels
Hierarchies - Process of representing levels in Top to Bottom OR Bottom to Top
Approach.
Ex: Regional, Country, State, City
Year, Month, Day, Hours
Multi level hierarchies can be natural like Year, Month, and Day. But a hierarchy doesn’t
have to be natural. You can create a hierarchy just For navigational or reporting purposes.
Ex: Days to manufacture and Safety Stock level. There’s no relationship between the two
attributes in this navigational hierarchy.
In natural hierarchy is one in which you should define attribute relationship between levels.
Levels are constructed from attributes.

What is the difference between data warehouse and BI?


DATAWAREHOUSE: Data warehouse is integrated, time-variant, subject oriented and
non-volatile collection data in support of management decision making process.
BUSINESS INTELLIGENCE: Business Intelligence is the process of extracting the data,
converting it into information and then into knowledge base is known as Business
Intelligence.

12 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

What are non-additive facts?


# Additive: Additive facts are facts that can be summed up through all of the dimensions in
the fact table.
# Semi-Additive: Semi-additive facts are facts that can be summed up for some of the
dimensions in the fact table, but not the others.
# Non-Additive: Non-additive facts are facts that cannot be summed up for any of the
dimensions present in the fact table.

What are the different architecture of datawarehouse?


Oracle & Misc, DataWarehousing Basic Interview Qs No Comments »
There are three types of architectures.
Date warehouse Basic Architecture:
In this architecture end users access data that is derived from several sources through the
data warehouse.
Architecture: Source –> Warehouse?–> End Users
Data warehouse with staging area Architecture:
Whenever the data that is derived from sources need to be cleaned and processed before
putting it into warehouse then staging area is used.
Architecture: Source –> Staging Area –>Warehouse –> End Users
Data warehouse with staging area and data marts Architecture:
Customization of warehouse architecture for different groups in the organization then data
marts are added and used.
Architecture: Source –> Staging Area –> Warehouse –> Data Marts –> End Users

What are modeling tools available in the Market?


These tools are used for Data/dimension modeling
Oracle Designer
Erwin (Entity Relationship for windows)
Informatica (Cubes/Dimensions)
Embarcadero
Power Designer Sybase
What is the main difference between schema in RDBMS and schemas in Data Warehouse….?
RDBMS Schema
* Used for OLTP systems
* Traditional and old schema
* Normalized
* Difficult to understand and navigate
* Cannot solve extract and complex problems
* Poorly modeled
DWH Schema
* Used for OLAP systems
* New generation schema
* De Normalized
* Easy to understand and navigate
* Extract and complex problems can be easily solved
* Very good model

What is ODS?
ODS stands for Online Data Storage.

What is a general purpose scheduling tool?


The basic purpose of the scheduling tool in a DW Application is to stream line the flow of
data from Source To Target at specific time or based on some condition.

what is the need of surrogate key; why primary key not used as surrogate key?
Surrogate Key is an artificial identifier for an entity. In surrogate key values are generated
by the system sequentially (Like Identity property in SQL Server and Sequence in Oracle).
They do not describe anything.
Primary Key is a natural identifier for an entity. In Primary keys all the values are entered
manually by the users which are uniquely identified. There will be no repetition of data.

13 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

Need for surrogate key not Primary Key


If a column is made a primary key and later there needs a change in the data type or the
length for that column then all the foreign keys that are dependent on that primary key
should be changed making the database Unstable
Surrogate Keys make the database more stable because it insulates the Primary and
foreign key relationships from changes in the data types and length.

What is Snow Flake Schema?


Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension
data has been grouped into multiple tables instead of one large table. For example, a
product dimension table in a star schema might be normalized into a products table, a
product category table, and a product manufacturer table in a snowflake schema. While
this saves space, it increases the number of dimension tables and requires more foreign
key joins. The result is more complex queries and reduced query performance

What is the Difference between OLTP and OLAP?


OLTP is nothing but Online Transaction Processing, which contains a normalized tables
and online data, which have frequent insert/updates/delete.
What are conformed dimensions?
They are dimension tables in a star schema data mart that adhere to a common structure,
and therefore allow queries to be executed across star schemas. For example, the
Calendar dimension is commonly needed in most data marts. By making this Calendar
dimension adhere to a single structure, regardless of what data mart it is used in your
organization, you can query by date/time from one data mart to another to another.

How are the Dimension tables designed?


Find where data for this dimension are located.
Figure out how to extract this data.
Determine how to maintain changes to this dimension.
Change fact table and DW population routines.

What are conformed dimensions?


A conformed dimension is a single, coherent view of the same piece of data throughout the
organization. The same dimension is used in all subsequent star schemas defined. This
enables reporting across the complete data warehouse in a simple format.

What are the advantages data mining over traditional approaches?


Data Mining is used for?the estimation of future. For example,?if we take a
company/business organization, by using the concept of Data Mining, we can predict the
future of business interms of Revenue (or) Employees (or) Cutomers (or) Orders etc.
Traditional approches use?simple algorithms?for estimating the future. But, it does not give
accurate results when compared to Data Mining.

Which automation tool is used in data warehouse testing?


No Tool testing in done in DWH, only manual testing is done.

Give examples of degenerated dimensions


Degenerated Dimension is a dimension key without corresponding dimension. Example:
?????In the PointOfSale Transaction Fact table, we have:
???????? Date Key (FK), Product Key (FK), Store Key (FK), Promotion Key?(FP),?and
POS Transaction Number??
Date Dimension corresponds to Date Key, Production Dimension corresponds to
Production Key. In a traditional parent-child database, POS Transactional Number would
be? The key to the transaction header record that contains all the info valid for the
transaction as a whole, such as the transaction date and store? identifier.?But in this?
dimensional model, we have already extracted this info into other dimension. Therefore,
POS Transaction Number? looks like a dimension key in the fact table but does not have
the corresponding dimension table.
Therefore, POS Transaction Number is a degenerated dimension.

What is hybrid slowly changing dimension?

14 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

what ever changes done in source for each and every record there is a new entry in target
side, whether it may be UPDATE or INSERT and in target mentaining the history.
Let me give an example to make the point clear….
Like account information is usually maintained in two categories:
Current Account and other is Time of Event Account i.e We have two set of tables eg
CUR_ACCT this is fast moving dimension containing information like Balance etc , while the
other is TOE_ACCT table this contains information like Contact Details, Phone No where
history is not only important but considered to be changing slowly.
With?this respect TOE_ACCT table qualiefies as slowly changing dimension.

what is the data type of the surrogate key?


Normally Surrogate keys are sequencers which keep on increasing with new records being
injected into the table. The standard datatype is integer

What are the steps to build the datawarehouse?


1. Understand the business requirements.
2. Once the business requirements are clear then identify the Grains (Levels).
3. Grains are defined, design the Dimensional tables with the Lower level Grains.
4. Once the Dimensions are designed, design the Fact table With the Key Performance
Indicators (Facts).
5. Once the dimensions and Fact tables are designed define the relation ship between the
tables by using primary key and Foreign Key. In logical phase data base design looks like
Star Schema design so it is named as Star Schema Design.

What are the Different methods of loading Dimension tables?


Conventional Load:
Before loading the data, all the Table constraints will be checked against the data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data will be loaded directly. Later the data will
be
checked against the table constraints and the bad data won’t be indexed.

What is a linked cube?


A cube can be stored on a single analysis server and then defined as a linked cube on
other Analysis servers. End users connected to any of these analysis servers can then
access the cube. This arrangement avoids the more costly alternative of storing and
maintaining copies of a cube on multiple analysis servers. Linked cubes can be connected
using TCP/IP or HTTP. To end users a linked cube looks like a regular cube.

What is degenerate dimension table?


The values of dimension which is stored in fact table is called degenerate dimensions.
these dimensions doesn,t have its own dimensions.

Difference between Snow flake and Star Schema. What are situations where Snow flake Schema is
better than Star Schema to use and when the opposite is true?
Star Schema means: A centralized fact table and surrounded by different dimensions.
Snowflake means: In the same star schema dimensions split into another dimensions.
Star Schema contains Highly Demoralized Data.
Snow flake: contains partially normalized
Star can not have parent table
But snow flake contain parent tables
Why need to go there Star:
Here 1)less joiners contains
2)simply database
3)support drilling up options
Why need to go Snowflake schema:
Here some times we used to provide?seperate dimensions from existing dimensions that
time we will go to snowflake
Disadvantage Of snowflake:
Query performance is very low because more joiners is there

15 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

Star Schema Snowflake


A centralized fact table and surrounded by In the same star schema dimensions split
different dimensions into another dimensions

contains Highly Demoralized Data contains Partially normalized


can not have parent table contain parent tables
Contains less joiners Contains more joiners
Comparativly performance is not very low performance is very low because more
because less joiners is there joiners is there

What are slowly changing dimensions?


Dimensions that change over time are called Slowly Changing Dimensions. For instance, a
product price changes over time; People change their names for some reason; Country
and State names may change over time. These are a few examples of Slowly Changing
Dimensions since some changes are happening to them over a period of time.
If the data in the dimension table happen to change very rarely, then it is called as slowly
changing dimension.
Ex: changing the name and address of a person, which happens rarely?

What are the various Reporting tools in the Market?


1. MS-Excel
2. Business Objects (Crystal Reports)
3. Cognos (Impromptu, Power Play)
4. Microstrategy
5. MS reporting services
6. Informatica Power Analyzer
7. Actuate
8. Hyperion (BRIO)
9. Oracle Express OLAP
10. Proclarity

What is a Star Schema?


Star schema is a type of organising the tables such that we can retrieve the result from the
database easily and fastly in the warehouse environment.Usually a star schema consists of
one or more dimension tables around a fact table which looks like a star,so that it got its
name.

What is the main difference between Inmon and Kimball philosophies of data warehousing?
Both differed in the concept of building the datawarehosue..
According to Kimball …
Kimball views data warehousing as a constituency of data marts. Data marts are focused
on delivering business objectives for departments in the organization. And the data
warehouse is a conformed dimension of the data marts.
Hence a unified view of the enterprise can be obtained from the dimension modeling on a
local departmental level.
Kimball–FirstDataMarts-Combined way-Datawarehouse.

Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the
development of the data warehouse can start with data from the online store. Other
subject areas can be added to the data warehouse as their needs arise. Point-of-sale
(POS) data can be added later if management decides it is necessary.
Inmon-First Datawarehouse-Later-Datamarts.

why fact table is in normal form?


Basically the fact table consists of the Index keys of the dimension/look up tables and the
measures.
so when ever we have the keys in a table .that itself implies that the table is in the normal
form.

16 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

what r the data types present in BO?n wht happens if we implement view in the designer n report
n my knowlegde, these are?called as object types in the Business Objects.And alias is
different from view in the universe. View is at database level, but alias?is a different name
given for the same table to resolve the loops in universe.
The different data types in business objects are:1. Character.2. Date.3. Long text.4.
Number

What is meant by metadata in context of a Datawarehouse and how it is important?


Metadata or Meta Data Metadata is data about data. Examples of metadata include data
element descriptions, data type descriptions, attribute/property descriptions, range/domain
descriptions, and process/method descriptions. The repository environment encompasses
all corporate metadata resources: database catalogs, data dictionaries, and navigation
services. Metadata includes things like the name, length, valid values, and description of a
data element. Metadata is stored in a data dictionary and repository. It insulates the data
warehouse from changes in the schema of operational systems. Metadata Synchronization
The process of consolidating, relating and synchronizing data elements with the same or
similar meaning from different systems. Metadata synchronization joins these differing
elements together in the data warehouse to allow for easier access.

How do you load the time dimension?


Every Datawarehouse maintains a time dimension. It would be at the most granular level at
which the business runs at (ex: week day, day of the month and so on). Depending on the
data loads, these time dimensions are updated. Weekly process gets updated every week
and monthly process, every month.
Generally we load the Time dimension by using SourceStage as a Seq File and we use
one passive stage in that transformer stage we will manually write functions as Month and
Year Functions to load the time dimensions but for the lower level i.e., Day also we have
one function to implement loading of Time Dimension.

What does level of Granularity of a fact table signify?


In simple terms, level of granularity defines the extent of detail. As an example, let us look
at geographical level of granularity. We may analyze data at the levels of COUNTRY,
REGION, TERRITORY, CITY and STREET. In this case, we say the highest level of
granularity is STREET.

Differences between star and snowflake schemas ?


The star schema is created when all the dimension tables directly link to the fact table.
Since the graphical representation resembles a star it is called a star schema. It must be
noted that the foreign keys in the fact table link to the primary key of the dimension table.
This sample provides the star schema for a sales_ fact for the year 1998. The dimensions
created are Store, Customer, Product_class and time_by_day. The Product table links to
the product_class table through the primary key and indirectly to the fact table. The fact
table contains foreign keys that link to the dimension tables.

What is Fact table?


Fact Table contains the measurements or metrics or facts of business process. If your
business process is “Sales” , then a measurement of this business process such as
“monthly sales number” is captured in the Fact table. Fact table also contains the foriegn
keys for the dimension tables.

What is a Data Warehouse?


Data Warehouse is a repository of integrated information, available for queries and
analysis. Data and information are extracted from heterogeneous sources as they are
generated….This makes it much easier and more efficient to run queries over data that
originally came from different sources. Typical relational databases are designed for
on-line transactional processing (OLTP) and do not meet the requirements for effective
on-line analytical processing (OLAP). As a result, data warehouses are designed
differently than traditional relational databases.

Steps In Building the Data Model


While ER model lists and defines the constructs required to build a data model, there is no

17 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

standard process for doing so. Some methodologies, such as IDEFIX, specify a bottom-up

Why is Data Modeling Important?


Data modeling is probably the most labor intensive and time consuming part of the
development process. Why bother especially if you are pressed for time? A common

What is Dimensional Modelling?


Dimensional Modelling is a design concept used by many data warehouse desginers to
build thier datawarehouse. In this design model all the data is stored in two types of tables
- Facts table and Dimension table. Fact table contains the facts/measurements of the
business and the dimension table contains the context of measurements ie, the dimensions
on which the facts are calculated.

What are the methodologies of Data Warehousing?


They are mainly 2 methods.1. Ralph Kimbell Model
2. Inmon Model.
Kimbell model always structed as Denormalised structure.
Inmon model structed as Normalised structure.
Depends on the requirements of the company anyone can follow the company’s DWH will
choose the one of the above models.

What type of Indexing mechanism do we need to use for a typical datawarehouse?


On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or
the other types of clustered/non-clustered, unique/non-unique indexes.
To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports
bitmaps.

What is Normalization, First Normal Form, Second Normal Form , Third Normal Form?
Normalization can be defined as segregating of table into two different tables, so as to
avoid duplication of values.?

What are Semi-additive and factless facts and in which scenario will you use such kinds of fact
tables?
Semi-Additive: Semi-additive facts are facts that can be summed up for some of the
dimensions in the fact table, but not the others. For example:
Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-additive fact,
as it makes sense to add them up for all accounts (what’s the total current balance for all
accounts in the bank?), but it does not make sense to add them up through time (adding
up all current balances for a given account for each day of the month does not give us any
useful information
A factless fact table captures the many-to-many relationships between
dimensions, but contains no numeric or textual facts. They are often used to record events
or
coverage information. Common examples of factless fact tables include:
- Identifying product promotion events (to determine promoted products that did not sell).
- Tracking student attendance or registration events
- Tracking insurance-related accident events
- Identifying building, facility, and equipment schedules for a hospital or university

Is it correct/feasible develop a Data Mart using an ODS?


Yes it is correct to develop a Data Mart using an ODS.becoz ODS which is used to?store
transaction data and few Days (less historical data) this is what datamart is required so it
is coct to develop datamart using ODS .

Explain degenerated dimension.


A Degenerate dimension?is a?Dimension which has only a single attribute.
This dimension is typically represented as a single field in a fact table.
The data items that are not facts and data items that do not fit into the existing dimensions
are termed as Degenerate Dimensions.
Degenerate Dimensions are the fastest way to group similar transactions.
Degenerate Dimensions are used when fact tables represent transactional data.

18 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

They can be used as primary key for the fact table but they cannot act as foreign keys.

What is the difference between view and materialized view?


View - store the SQL statement in the database and let you use it as a table. Everytime
you access the view,? the SQL statement executes.
materialized view - stores the results of the SQL in table form in the database. SQL
statement only executes once and after that everytime you run the query, the stored result
set is used. Pros include quick query results.

What are non-additive facts?


Fact table typically has two types of columns: those that contain numeric facts (often
called measurements), and those that are foreign keys to dimension tables.
A fact table contains either detail-level facts or facts that have been aggregated. Fact
tables that contain aggregated facts are often called summary tables. A fact table usually
contains facts with the same level of aggregation.
Though most facts are additive, they can also be semi-additive or non-additive. Additive
facts can be aggregated by simple arithmetical addition. A common example of this is
sales. Non-additive facts cannot be added at all.
An example of this is averages. Semi-additive facts can be aggregated along some of the
dimensions and not along others. An example of this is inventory levels, where you cannot
tell what a level means simply by looking at it.

What is VLDB?
Very Large Database (VLDB)
it is sometimes used to describe databases occupying magnetic storage in the terabyte
range and containing billions of table rows. Typically, these are decision support systems
or transaction processing applications serving large numbers of users.

Can a dimension table contain numeric values?


yes, we can have numeric values in dimensional table but these are not frequently updated
as dim table contains constent data but only on some occassions it can change

What is rapidly changing dimension?


There is no Dimension called Rapidly changing dimension in DWH.? But if you?consider
overall ODS?tables; a rapidly changing dimension is one that holds the transactional data
rather than staging data.

What is ETL?
ETL is an abbreviation for “Extract, Transform and Load”.This is the process of extracting
data from their operational data sources or external data sources, transforming the data
which includes cleansing, aggregation, summarization, integration, as well as basic
transformation and loading the data into some form of the data warehouse.

What is the definition of normalized and renormalized view and what are the differences between
them?
I would like to add one more pt. here, as OLTP is in Normalized?form, more no. of tables
are?scanned or referred for a single query,?as through primary key and foreign key data
needs to be fetched from its respective Master tables. Whereas in OLAP, as the data is in
De-normailzed form, for a?query?the no. of tables queried?are less.For eq.:- If we have a
banking appln., in OLTP env., we will have a separate table for customer personal details ,
Address details,?its transaction details etc..Whereas in OLAP env. these all details can be
stored in one sinlge table thus decreasing the scanning of multiple tables for a single
record of a customer details.

What is junk dimension?


A “junk” dimension is a collection of random transactional codes, flags and/or text
attributes that are unrelated to any particular dimension. The junk dimension is simply a
structure that provides a convenient place to store the junk attributes.where asA
degenerate dimension is data that is dimensional in nature but stored in a fact table.

What are Data Marts?

19 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

Data Mart: a data mart is a?small data warehouse. In general, a data warehouse
is?divided into small units according the busness requirements. for example, if we take a
Data Warehouse of an organization, then it may be divided into the following individual
Data Marts. Data Marts are used to improve the performance during the retrieval of data.
eg:??Data Mart of Sales, Data Mart?of Finance, Data Mart of Maketing, Data Mart of HR
etc.

What is data mining?


Data mining is a process of extracting hidden trends within a data warehouse. For example
an insurance dataware house can be used to mine data for the most high risk people to
insure in a certain geographical area.

What is Data cleansing..?


This is nothing but polishing of data. For example of one of the sub system store the
Gender as M and F. The other may store it as MALE and FEMALE. So we need to polish
this data, clean it before it is add to Datawarehouse. Other typical example can be
Addresses. The all sub systems maintains the customer address can be different. We
might need an address cleansing to tool to have the customers addresses in clean and
neat form.

What is active data warehousing?


An active data warehouse provides information that enables decision-makers within an
organization to manage customer relationships nimbly, efficiently and proactively. Active
data warehousing is all about integrating advanced decision support with day-to-day-even
minute-to-minute-decision making in a way that increases quality of those customer
touches which encourages customer loyalty and thus secure an organization’s bottom line.
The marketplace is coming of age as we progress from first-generation “passive” decision-
support systems to current- and next-generation “active” data warehouse implementations

What is the difference between ODS and OLTP?


ODS :Having data with Datwarehouse that will be?stand alone. No further transaction will
take place for current data which is part of the data ware house. Current data will be
change once you upload throgh ETL on schedule basis.
OLTP : Having data with on line system which connected to network and all update on
transaction hppened in seconds. Every second data summrasied value will get changed.

explain about type 1, type 2(SCD), type 3 ?


SCD means if the data in the dimension is happen to change very rarely,
Mainly SCD 3 types

What is conformed fact?


Conformed dimensions are those tables that have a fixed structure. There will b no need to
change the metadata of these tables and they can go along with any number of facts in
that application without any changes.

What are aggregate table and aggregate fact table?


Aggregate table contains summarized data. The materialized view is aggregated tables.
For ex in sales we have only date transaction. If we want to create a report like sales by
product per year. in such cases we aggregate the date vales into week_agg, month_agg,
quarter_agg, year_agg. to retrive date from this tables we use @aggrtegate function.
What is BUS Schema?
A BUS Schema or a BUS Matrix? A BUS Matrix (in Kimball approach) is to identify
common Dimensions across Business Processes; ie:?a way of identifying Conforming
Dimensions.

What are the possible data marts in Retail sales?


Example: product information, sales, location, time…

What are data validation strategies for data mart validation after loading process?
Data validation is to make sure that the loaded data is accurate and meets the business
requirements.

20 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

Strategies are different methods followed to meet the validation requirements

Summarize the difference between OLTP, ODS AND DATA WAREHOUSE?


Olap - means online transaction processing, it is nothing but a database, we are calling
oracle, sqlserver, db2 are olap tools.
OLTP databases, as the name implies, handle real time transactions which inherently have
some special requirements.
ODS- stands for Operational Data Store. Its a final integration point ETL process we load
the data in ODS before you load the values in target..
Data Warehouse- Data warehouse is collection of integrated, time varient, non volatile and
time variant collection of data which is used to take management decisions.

Why OLTP database are designs not generally a good idea for a Data Warehouse
OLTP cannot store historical information about the organization. It is used for storing the
details of daily transactions while a data warehouse is a huge storage of historical
information obtained from different data marts for making intelligent decisions about the
organization.

What is data cleaning? How is it done?


I can simply say it as purifying the data.
Data Cleansing: the act of detecting and removing and/or correcting a databases dirty
data (i.e., data that is incorrect, out-of-date, redundant, incomplete, or formatted
incorrectly)

What is a level of Granularity of a fact table?


Level of granularity means level of detail that you put into the fact table in a data
warehouse. For example: Based on design you can decide to put the sales data in each
transaction. Now, level of granularity would mean what detail are you willing to put for each
transactional fact. Product sales with respect to each minute or you want to aggregate it
upto minute and put that data.
It also means that we can have (for example) data aggregated for a year for a given
product as well as the data can be drilled down to Monthly, weekly and daily basis…the
lowest level is known as the grain. Going down to details is Granularity

Which columns go to the fact table and which columns go the dimension table?
The Aggregation or calculated value columns will go to Fact Table and details information
will go to dimensional table.
To add on, Foreign key elements along with Business Measures, such as Sales in $ amt,
Date may be a business measure in some case, units (qty sold) may be a business
measure, are stored in the fact table. It also depends on the granularity at which the data
is stored.

What is a CUBE in data warehousing concept?


Cubes are logical representation of multidimensional data. The edge of the cube contains
dimension members and the body of the cube contains data values.

What is SCD1, SCD2, and SCD3?


SCD Type 1, the attribute value is overwritten with the new value, obliterating the
historical attribute values. For example, when the product roll-up changes for a given
product, the roll-up attribute is merely updated with the current value.
SCD Type 2, a new record with the new attributes is added to the dimension table.
Historical fact table rows continue to reference the old dimension key with the old roll-up
attribute; going forward, the fact table rows will reference the new surrogate key with the
new roll-up thereby perfectly partitioning history.

SCDType 3, attributes are added to the dimension table to support two simultaneous
roll-ups - perhaps the current product roll-up as well as? Current version minus one? or
current version and original.

What is real time data-warehousing?


Real-time data warehousing is a combination of two things: 1) real-time activity and 2)

21 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

data warehousing. Real-time activity is activity that is happening right now. The activity
could be anything such as the sale of widgets. Once the activity is complete, there is data
about it.
Data warehousing captures business activity data. Real-time data warehousing captures
business activity data as it occurs. As soon as the business activity is complete and there
is data about it, the completed activity data flows into the data warehouse and becomes
available instantly. In other words, real-time data warehousing is a framework for deriving
information from data as the data becomes available.

What is ER Diagram?
The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as
a way to unify the network and relational database views.

What is a lookup table?


A lookup table is nothing but a ‘lookup’ it gives values to referenced table (it is a
reference), it is used at the run time, it saves joins and space in terms of transformations.
Example, a lookup table called states, provide actual state name (’Texas’) in place of TX
to the output.

What are the various ETL tools in the Market?


1. Informatica Power Center
2. Ascential Data Stage
3. ESS Base Hyperion
4. Ab Intio
5. BO Data Integrator
6. SAS ETL
7. MS DTS
8. Oracle OWB
9. Pervasive Data Junction
10. Cognos Decision Stream

What is Data warehousing Hierarchy?


Hierarchies are logical structures that use ordered levels as a means of organizing data. A
hierarchy can be used to define data aggregation. For example, in a time dimension, a

What is a dimension table?


A dimensional table is a collection of hierarchies and categories along which the user can
drill down and drill up. it contains only the textual attributes.

Why should you put your data warehouse on a different system than your OLTP system?
Data Warehouse is a part of OLAP (On-Line Analytical Processing). It is the source from
which any BI tools fetch data for Analytical, reporting or data mining purposes. It generally
contains the data through the whole life cycle of the company/product. DWH contains
historical, integrated, Denormalized, subject oriented data.

However, on the other hand the OLTP system contains data that is generally limited to last
couple of months or a year at most. The nature of data in OLTP is: current, volatile and
highly normalized. Since, both systems are different in nature and functionality we should
always keep them in different systems.

Explain the advantages of RAID 1, 1/0, and 5. What type of RAID setup would you put your TX logs
Raid 0 - Make several physical hard drives look like one hard drive. No redundancy but
very fast. May use for temporary spaces where loss of the files will not result in loss of
committed data.
Raid 1- Mirroring. Each hard drive in the drive array has a twin. Each twin has an exact
copy of the other twins data so if one hard drive fails, the other is used to pull the data.
Raid 1 is half the speed of Raid 0 and the read and write performance are good.
Raid 1/0 - Striped Raid 0, then mirrored Raid 1. Similar to Raid 1. Sometimes faster than
Raid 1. Depends on vendor implementation.
Raid 5 - Great for readonly systems. Write performance is 1/3rd that of Raid 1 but Read
is same as Raid 1. Raid 5 is great for DW but not good for OLTP.

22 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

Hard drives are cheap now so I always recommend Raid 1.


What are the differences between the static and dynamic caches?
Static cache stores overloaded values in the memory and it wont change throught the
running of the session
Where as dynamic cache stores the values in the memory and changes dynamically during
the running of the session used in scd types — where target table changes and is cache
are dynamically changes

Does u need separate space for Data warehouse & Data mart?
In the dataware house all the information of the enterprise is there but the data mart is
specific for the particular analysis like sales,production ….,,, so data mart is subject
oriented and warehouse is nothing but collection of datamarts so we assume it also
subject oriented bcz it’s collection of data marts … so for individual analysis we need
datamarts.

What is Difference between E-R Modeling and Dimensional Modeling?


In the E-R modeling the data is represented in entities and attributes and it’s in the
Denormalized form.. In the dimensional modeling the data is represented in form of facts
and dimension’s the fact’s contain only numerical and foreign key’s and the dimension’s are
used to refer’s the data from fact table’s using these fact and dimensions we can form
OLAP cubes for analysis ..Which are useful for decision support for management

What are the types of dimensional Modeling?


1. Star schema
2. Star flake schema
3. Snow flake schema
4. Extended star schema
5. Galaxy schema
What is star schema?
The face surrounded by different dimensions and their respective level is called
star schema.

What is star flake schema?


The fact surrounded by different dimensions and their respective levels with single
level of hierarchy is called star flake schema.

What is snow flake schema?


The fact surrounded by different dimensions and their respective levels with
multiple hierarchies is called snow flake schema.

What is extended star schema?


It is nothing but a star schema, but it contains some additional data, this data is
placed in a separate table and is connected to a dimension is called extended star
schema.

What is galaxy schema?


Some times the two facts are going to share common dimensions, which is called
fact constellation or galaxy schema.

ASENTIAL DATASTAGE 7.5

1. W hat are other Performance tunings you have done in your last project to
increase the performance of slowly running jobs?

1) Minimize the usage of Transformer (Instead of this use Copy, modify, Filter, Row
Generator)
2) Use SQL Code while extracting the data Handle the nulls, Minimize the warnings
3) Reduce the number of lookups in a job design Use not more than 20stages in a job
4)Use IPC stage between two passive stages to Reduces processing time
5)Drop indexes before data loading and recreate after loading data into tables

23 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

6) There is no limit for no of stages like 20 or 30 but we can break the job into small jobs
then we use dataset Stages to store the data.
7) Check the write cache of Hash file. If the same hash file is used for Look up and as well
as target, disable this Option. If the hash file is used only for lookup then "enable Preload
to memory". This will improve the performance. Also, check the order of execution of the
routines.
8) Don't use more than 7 lookups in the same transformer; introduce new transformers
if it exceeds 7 lookups.
9) Use Preload to memory option in the hash file output.
10) Use Write to cache in the hash file input.
11) Write into the error tables only after all the transformer stages.
12) Reduce the width of the input record - remove the columns that you would not use.
13) Cache the hash files you are reading from and writing into. Make sure your cache is
big enough to hold the hash files.
(Use ANALYZE.FILE or HASH.HELP to determine the optimal settings for your hash files.)
This would also minimize overflow on the hash file.

14) If possible, break the input into multiple threads and run multiple instances of the job.

15) Staged the data coming from ODBC/OCI/DB2UDB stages for optimum performance
also for data recovery in case job aborts.

16) Tuned the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values for
faster inserts, updates and selects.
17) Tuned the 'Project Tunables' in Administrator for better performance.
18) Sorted the data as much as possible in DB and reduced the use of DS-Sort for better
performance of jobs. Used sorted data for Aggregator.
19) Removed the data not used from the source as early as possible in the job.
20) Worked with DB-admin to create appropriate Indexes on tables for better performance
of DS queries
21) Converted some of the complex joins/business in DS to Stored Procedures on DS for
faster execution of the jobs.
22) If an input file has an excessive number of rows and can be split-up then use standard
logic to run jobs in parallel.
23) Constraints are generally CPU intensive and take a significant amount of time to
process. This may be the case if the constraint calls routines or external macros but if it is
inline code then the overhead will be minimal.
24) Try to have the constraints in the 'Selection' criteria of the jobs itself. This will eliminate
the unnecessary records even getting in before joins are made.
25) Tuning should occur on a job-by-job basis.
26) Using a constraint to filter a record set is much slower than performing a SELECT …
WHERE….
27) Make every attempt to use the bulk loader for your particular database. Bulk loaders
are generally faster than using ODBC or OCI.

2. How can I extract data from DB2 (on IBM i-series) to the data warehouse via
Datastage as the ETL tool? I mean do I first need to use ODBC to create
connectivity and use an adapter for the extraction and transformation of data?

You would need to install ODBC drivers to connect to DB2 instance (does not come with
regular drivers that we try to install, use CD provided for DB2 installation, that would have
ODBC drivers to connect to DB2) and then try out

3. W hat is DS Designer used for - did u use it?

You use the Designer to build jobs by creating a visual design that models the flow and
transformation of data from the data source to the target warehouse. The Designer
graphical interface lets you select stage icons, drop them onto the Designer work area,
and add links.

24 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

4. How to improve the performance of hash file?

You can improve performance of hashed file by


1. Preloading hash file into memory -->this can be done by enabling preloading options
in hash file output stage
2. W rite caching options -->.It makes data written into cache before being flushed to
disk. You can enable this to ensure that hash files are written in order onto cash before
flushed to disk instead of order in which individual rows are written.
3 .Pre allocating--> Estimating the approx size of the hash file so that file need not to be
spitted to often after write operation

5. How can we pass parameters to job by using file?

You can do like this, by passing parameters from UNIX file, and then calling the execution
of a Datastage job. The ds job has the parameters defined. Which are passed by UNIX

6. W hat is a project? Specify its various components?

You always enter Datastage through a Datastage project. When you start a Datastage
client you are prompted to connect to a project.

7. How can u implement slowly changed dimensions in Datastage? Explain?

8. W hat are built-in components and user-defined components?


Built-in components. These are predefined components used in a job.
User-defined components. These are customized components created using the
Datastage Manager or Datastage Designer

9. Can u join flat file and database in Datastage? How?

Yes, we can do it in an indirect way. First create a job which can populate the data from
database into a Sequential file and name it as Seq_First1. Take the flat file which you are
a having and use Merge Stage to join the two files. You have various join types in Merge
Stage like Pure Inner Join, Left Outer Join, Right Outer Join etc., You can use any one of
these which suits your requirements.

10. Can any one tell me how to extract data from more than 1 heterogeneous
Sources? Means, example 1 sequential file, Sybase, Oracle in a single Job.

Yes you can extract the data from two heterogeneous sources in data stages using the
transformer stage it's so simple you need to just form a link between the two sources in
the transformer stage.

11. W ill Datastage consider the second constraint in the transformer if the first
constraint is satisfied (if link ordering is given)?"
Answer: Yes.

12. How we use NLS function in Datastage? W hat are advantages of NLS function?
W here we can use that one? Explain briefly?

By using NLS function we can do the following


- Process the data in a wide range of languages
- Use Local formats for dates, times and money
- Sort the data according to the local rules

If NLS is installed, various extra features appear in the product.


For Server jobs, NLS is implemented in Datastage Server engine. For Parallel jobs; NLS
is implemented using the ICU library.

13. If a Datastage job aborts after say 1000 records, how to continue the job from

25 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

1000th record after fixing the error?

By specifying Check pointing in job sequence properties, if we restart the job. Then job will
start by skipping upto the failed record. this option is available in 7.5 edition.

14. How to kill the job in data stage? ANS by killing the respective process ID

15. W hat is an environment variable? W hat is the use of this?

Basically Environment variable is predefined variable those we can use while creating DS
job. We can set either as Project level or Job level. Once we set specific variable that
variable will be available into the project/job.
We can also define new environment variable. For that we can go to DS Admin.

16. W hat are all the third party tools used in Datastage?

Autosys, TNG, event coordinator are some of them that I know and worked with

What is APT_CONFIG in Datastage?

APT_CONFIG is just an environment variable used to identify the *.apt file. Don’t confuse
that with *.apt file that has the node's information and Configuration of SMP/MMP server.

17. If you’re running 4 ways parallel and you have 10 stages on the canvas, how
many processes does Datastage create?
Answer is 40
you have 10 stages and each stage can be partitioned and run on 4 nodes which makes
total number of processes generated are 40

18. Did you Parameterize the job or hard-coded the values in the jobs?
Always parameterized the job. Either the values are coming from Job Properties or from a
‘Parameter Manager’ – a third part tool. There is no way you will hard–code some
parameters in your jobs. The often Parameterized variables in a job are: DB DSN name,
username, and password.

19. Defaults nodes for Datastage parallel Edition

Actually the Number of Nodes depends on the number of processors in your system. If
your system is supporting two processors we will get two nodes by default.

20. It is possible to run parallel jobs in server jobs?

No, it is not possible to run Parallel jobs in server jobs. But Server jobs can be executed in
Parallel jobs

21. It is possible to access the same job two users at a time in Datastage?

No, it is not possible to access the same job two users at the same time. DS will produce
the following error: "Job is accessed by other user"

22. Does u know about METASTAGE?

MetaStage is used to handle the Metadata which will be very useful for data linkage and
data analysis later on. Meta Data defines the type of data we are handling. This Data
Definitions are stored in repository and can be accessed with the use of MetaStage.

23. W hat is merge and how it can be done plz explain with simple example taking 2
tables

26 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

Merge is used to join two tables. It takes the Key columns sort them in Ascending or
descending order. Let us consider two table i.e. Emp,Dept.If we want to join these two
tables we are having DeptNo as a common Key so we can give that column name as key
and sort DeptNo in ascending order and can join those two tables

24. W hat is difference between Merge stage and Join stage?

Merge and Join Stage Difference:


1. Merge Reject Links are there
2. Can take Multiple Update links
3. If you used it for comparison, then first matching data will be the output.
Because it uses the update links to extend the primary details which are coming from
master link

25. W hat are the enhancements made in Datastage 7.5 compare with 7.0

Many new stages were introduced compared to Datastage version 7.0. In server jobs we
have stored procedure stage, command stage and generate report option was there in file
tab.
In job sequence many stages like start loop activity, end loop activity, terminates loop
activity and user variables activities were introduced.
In parallel jobs surrogate key stage, stored procedure stages were introduced.

26. How can we join one Oracle source and Sequential file?.

Join and look up used to join oracle and sequential file

27. W hat is the purpose of exception activity in data stage 7.5?

The stages followed by exception activity will be executed whenever there is an unknown
error occurs while running the job sequencer.

28. W hat is Modulus and Splitting in Dynamic Hashed File?

The modulus size can be increased by contacting your Unix Admin

29. W hat is DS Manager used for - did u use it?


The Manager is a graphical tool that enables you to view and manage the contents of the
Datastage Repository

30. W hat is the difference between Datastage and informatica?

The main difference is Vendors? Each one is having plus from their architecture. For
Datastage it is a Top-Down approach. Based on the Business needs we have to choose
products.

31. What are Static Hash files and Dynamic Hash files?

The hashed files have the default size established by their modulus and separation when
you create them, and this can be static or dynamic.
Overflow space is only used when data grows over the reserved size for someone of
the groups (sectors) within the file. There are many groups as the specified by the
modulus.

32. W hat is the exact difference between Join, Merge and Lookup Stage?

27 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

The exact difference between Join, Merge and lookup is

the three stages differ mainly in the memory they use


Datastage doesn't know how large your data is, so cannot make an informed choice
whether to combine data using a join stage or a lookup stage.

Here's how to decide which to use:


if the reference datasets are big enough to cause trouble, use a join. A join does a
high-speed sort on the driving and reference datasets. This can involve I/O if the data is
big enough, but the I/O is all highly optimized and sequential. Once the sort is over the join
processing is very fast and never involves paging or other I/O
Unlike Join stages and Lookup stages, the Merge stage allows you to specify several
reject links as many as input links.

33. How do you eliminate duplicate rows?

The Duplicates can be eliminated by loading the corresponding data in the Hash file.
Specify the columns on which u want to eliminate as the keys of hash.

34. W hat does separation option in static hash-file mean?

The different hashing algorithms are designed to distribute records evenly among the
groups of the file based on characters and their position in the record ids.

When a hashed file is created, Separation and modulo respectively specifies the group
buffer size and the number of buffers allocated for a file. When a Static Hash file is
created, DATASTAGE creates a file that contains the number of groups specified by
modulo.
Size of Hash file = modulus (no. groups) * Separations (buffer size)

35. How can we implement Lookup in Datastage Server jobs?

The DB2 stage can be used for lookups.


In the Enterprise Edition, the Lookup stage can be used for doing lookups.

36. Importance of Surrogate Key in Data warehousing?

The concept of surrogate comes into play when there is slowly changing dimension in a
table.
In such condition there is a need of a key by which we can identify the changes made in
the dimensions.

These are system generated key. Mainly they are just the sequence of numbers or can be
Alfa numeric values also.

These slowly changing dimensions can be of three types namely SCD1, SCD2, and SCD3.

37. How do we do the automation of dsjobs?

We can call Datastage Batch Job from Command prompt using 'dsjob'. We can also pass
all the parameters from command prompt.
Then call this shell script in any of the market available schedulers.
The 2nd option is schedule these jobs using Data Stage director.

38.What is Hash file stage and what is it used for?

We can also use the Hash File stage to avoid / remove duplicate rows by specifying the
hash key on a particular field

28 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

39.What is version Control?

Version Control stores different versions of DS jobs. Runs different versions of same job,
reverts to previous version of a job also view version histories

40.How to find the number of rows in a sequential file?

Using Row Count System variable

41. Suppose if there are million records did you use OCI? If not then what stage do
you prefer?

Using Orabulk

42.How to run the job in command prompt in UNIX?

Using dsjob command,


-options
dsjob -run -jobstatus projectname jobname

How to find errors in job sequence?


Using Datastage Director we can find the errors in job sequence

How good are you with your PL/SQL?


WE will not witting pl/sql in Datastage! Sql knowledge is enough...

If I add a new environment variable in W indows, how can I access it in Datastage?


U can view all the environment variables in designer. U can check it in Job properties. U
can add and access the environment variables from Job properties

How do you pass the parameter to the job sequence if the job is running at night?

Two ways:
1. Set the default values of Parameters in the Job Sequencer and map these parameters
to job.
2. Run the job in the sequencer using dsjobs utility where we can specify the values to be
taken for each parameter.

W hat is the transaction size and array size in OCI stage? How these can be used?
Transaction Size - This field exists for backward compatibility, but it is ignored for
release 3.0 and later of the Plug-in. The transaction size for new jobs is now handled by
Rows per transaction on the Transaction Handling tab on the Input page.
Rows per transaction - The number of rows written before a commit is executed for the
transaction. The default value is 0, that is, all the rows are written before being committed
to the data table.
Array Size - The number of rows written to or read from the database at a time. The
default value is 1, that is, each row is written in a separate statement.

W hat is the difference between DRS (Dynamic Relational Stage) and ODBC
STAGE?

To answer your question the DRS stage should be faster then the ODBC stage as it
uses native database connectivity. You will need to install and configure the required
database clients on your Datastage server for it to work.
Dynamic Relational Stage was leveraged for People soft to have a job to run on any of
the supported databases. It supports ODBC connections too. Read more of that in the
plug-in documentation.
ODBC uses the ODBC driver for a particular database, DRS (Dynamic Relational
stage) is a stage that tries to make it seamless for switching from one database to
another. It uses the native connectivity’s for the chosen target ...

29 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

How do you track performance statistics and enhance it?


Through Monitor we can view the performance statistics

W hat is the mean of Try to have the constraints in the 'Selection' criteria of the
jobs itself? This will eliminate the unnecessary records even getting in before joins
are made?

This means try to improve the performance by avoiding use of constraints wherever
possible and instead using them while selecting the data itself using a where clause. This
improves performance.

How to drop the index befor loading data in target and how to rebuild it in data stage?

This can be achieved by "Direct Load" option of SQL Loaded utility.


There are three different types of user-created stages available
What are they?
These are the three different stages:
i) Custom
ii) Build
iii) Wrapped

How will you call external function or subroutine from Datastage?

There is Datastage option to call external programs. ExecSH

what is DS Administrator used for - did u use it?

The Administrator enables you to set up Datastage users, control the purging of the
Repository, and, if National Language Support (NLS) is enabled, install and manage maps
and locales.

W hat is the max capacity of Hash file in DataStage?


Take a look at the uvconfig file:

# 64BIT_FILES - This sets the default mode used to


# create static hashed and dynamic files.
# A value of 0 results in the creation of 32-bit
# files. 32-bit files have a maximum file size of
# 2 gigabytes. A value of 1 results in the creation
# of 64-bit files (ONLY valid on 64-bit capable platforms).
# The maximum file size for 64-bit
# files is system dependent. The default behavior
# may be overridden by keywords on certain commands.
64BIT_FILES 0

What is the difference between symmetrically parallel processing, Massively parallel


processing?

Symmetric Multiprocessing (SMP) - Some Hardware resources may be shared by


processor. Processors communicate via shared memory and have single operating
system.
Cluster or Massively Parallel Processing (MPP) - Known as shared nothing in which
each processor have exclusive access to hardware resources. Cluster systems can be
physically disporsed.The processor have their own operations system and communicate
via high speed network

30 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

W hat is the order of execution done internally in the transformer with the stage
editor having input links on the left hand side and output links?
Stage variables, constraints and column derivation or expressions

W hat are Stage Variables, Derivations and Constants?


Stage Variable - An intermediate processing variable that retains value during read and
doesn’t pass the value into target column.
Derivation - Expression that specifies value to be passed on to the target column.
Constraint - Conditions that are either true or false that specifies flow of data with a link.

How to implement type2 slowly changing dimension in Datastage? Give me with


example?

Slow changing dimension is a common problem in Dataware housing. For example: There
exists a customer called lisa in a company ABC and she lives in New York. Later she
moved to Florida. The company must modify her address now. In general 3 ways to solve
this problem

Type 1: The new record replaces the original record, no trace of the old record at all
Type 2: A new record is added into the customer dimension table. Therefore, the
customer is treated essentially as two different people.
Type 3: The original record is modified to reflect the changes.

In Type1 the new one will over write the existing one that means no history is maintained,
History of the person where she stayed last is lost, simple to use.

In Type2 New record is added, therefore both the original and the new record Will be
present, the new record will get its own primary key, Advantage of using this type2 is,
Historical information is maintained But size of the dimension table grows, storage and
performance can become a concern.
Type2 should only be used if it is necessary for the data warehouse to track the historical
changes.

In Type3 there will be 2 columns one to indicate the original value and the other to indicate
the current value. Example a new column will be added which shows the original address
as New York and the current address as Florida. Helps in keeping some part of the history
and table size is not increased. But one problem is when the customer moves from Florida
to Texas the New York information is lost. so Type 3 should only be used if the changes
will only occur for a finite number of time.

Functionality of Link Partitioner and Link Collector?

server jobs mainly execute the jobs in sequential fashion, the ipc stage as well as link
partioner and link collector will simulate the parallel mode of execution over the server jobs
having single cpu Link Partitioner: It receives data on a single input link and diverts the data
to a maximum no. of 64 output links and the data processed by the same stage having
same meta data Link Collector: It will collects the data from 64 input links, merges it into a
single data flow and loads to target. These both r active stages and the design and mode
of execution of server jobs has to be decided by the designer

What happens if the job fails at night?

Job Sequence Abort

W hat is job control? How can it used explain with steps?

JCL defines Job Control Language it is used to run more number of jobs at a time with or
without using loops.

31 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

steps: click on edit in the menu bar and select 'job properties' and enter the parameters as
parameter prompt typeSTEP_ID STEP_ID string Source SRC stringDSN DSN string
Username unm string Password pwd stringafter editing the above steps then set JCL
button and select the jobs from the list box and run the job

W hat is the difference between Datastage and Datastage TX?

Its a critical question to answer, but one thing i can tell u that Datastage Tx is not a ETL
tool & this is not a new version of Datastage 7.5.Tx is used for ODS source ,this much I
know

If the size of the Hash file exceeds 2GB...W hat happens? Does it overwrite the
current rows?

Yes it overwrites the file

Do you know about INTEGRITY/QUALITY stage?

Integrity/quality stage is a data integration tool from ascential which is used to


standardize/integrate the data from different sources

How much would be the size of the database in Datastage? W hat is the difference
between In process and Interprocess?

In-process:
You can improve the performance of most DataStage jobs by turning in-process row
buffering on and recompiling the job. This allows connected active stages to pass data via
buffers rather than row by row.

Note: You cannot use in-process row-buffering if your job uses COMMON blocks in
transform functions to pass data between stages. This is not recommended practice, and
it is advisable to redesign your job to use row buffering rather than COMMON blocks.

Inter-process

Use this if you are running server jobs on an SMP parallel system. This enables the job to
run using a separate process for each active stage, which will run simultaneously on a
separate processor.

Note: You cannot inter-process row-buffering if your job uses COMMON blocks in
transform functions to pass data between stages. This is not recommended practice, and
it is advisable to redesign your job to use row buffering rather than COMMON blocks.

How can you do incremental load in Datastage? Incremental load means daily load.

When ever you are selecting data from source, select the records which are loaded or
updated between the timestamp of last successful load and today’s load start date and
time. For this u have to pass parameters for those two dates.
Store the last run date and time in a file and read the parameter through job parameters
and state second argument as current date and time.

How do you remove duplicates without using remove duplicate stage?

In the target make the column as the key column and run the job.

What r XML files and how do you read data from XML files and what stage to be used?

In the pallet there is a Real time stage like xml-input, xml-output, xml-transformer

W here actually the flat files store? W hat is the path?

32 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

Flat files stores the data and the path can be given in general tab of the sequential file
stage

W hat is data set? And what is file set?

File set:- It allows you to read data from or write data to a file set. The stage can have a
single input link. A single output link and a single rejects link. It only executes in parallel
mode the data files and the file that lists them are called a file set. This capability is useful
because some operating systems impose a 2 GB limit on the size of a file and you need to
distribute files among nodes to prevent overruns.

Datasets r used to import the data in parallel jobs like odbc in server jobs

what is meaning of file extender in data stage server jobs. can we run the data
stage job from one job to another job?
File extender means the adding the columns or records to the already existing the file, in
the data stage, we can run the data stage job from one job to another job in data stage.

How do you merge two files in DS?

Either used Copy command as a Before-job subroutine if the metadata of the 2 files are
same or created a job to concatenate the 2 files into one if the metadata is different.

W hat is the default cache size? How do you change the cache size if needed?
Default read cache size is 128MB. We can increase it by going into Datastage
Administrator and selecting the Tunable Tab and specify the cache size.

W hat about System variables?

Datastage provides a set of variables containing useful system information that you can
access from a transform or routine. System variables are read-only.

@DATE the internal date when the program started. See the Date function.
@DAY The day of the month extracted from the value in @DATE.
@FALSE The compiler replaces the value with 0.
@FM A field mark, Char(254).
@IM An item mark, Char(255).
@INROWNUM Input row counter. For use in constraints and derivations in Transformer
stages.
@OUTROWNUM Output row counter (per link). For use in derivations in Transformer
stages.
@LOGNAME The user login name.
@MONTH The current extracted from the value in @DATE.
@NULL The null value.
@NULL.STR The internal representation of the null value, Char(128).
@PATH The pathname of the current Datastage project.
@SCHEMA The schema name of the current Datastage project.
@SM A sub value mark (a delimiter used in Universe files), Char(252).
@SYSTEM.RETURN.CODE
Status codes returned by system processes or commands.
@TIME The internal time when the program started. See the Time function.
@TM A text mark (a delimiter used in UniVerse files), Char(251).
@TRUE The compiler replaces the value with 1.
@USERNO The user number.
@VM A value mark (a delimiter used in UniVerse files), Char(253).

33 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

@WHO The name of the current DataStage project directory.


@YEAR The current year extracted from @DATE.

REJECTED Can be used in the constraint expression of a Transformer stage of an output


link. REJECTED is initially TRUE, but is set to FALSE whenever an output link is
successfully written.

W here does UNIX script of Datastage execute weather in client machine or in


server?
Datastage jobs are executed in the server machines only. There is nothing that is stored in
the client machine.

W hat is DS Director used for - did u use it?

Datastage Director is GUI to monitor, run, validate & schedule Datastage server jobs.

W hat's the difference between Datastage Developers and Datastage Designers?


W hat are the skills required for this.

Datastage developer is one who will code the jobs.datastage designer is one who will
design the job, i mean he will deal with blue prints and he will design the jobs, the stages
that are required in developing the code

W hat other ETL's you have worked with?


Ab-initio
Datastage EE parllel edition
oracle -Etl
there are 7 ETL in market!

W hat will you in a situation where somebody wants to send you a file and use that
file as an input or reference and then run job

Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run the job.
May be you can schedule the sequencer around the time the file is expected to arrive.
B. Under UNIX: Poll for the file. Once the file has start the job or sequencer depending on
the file.

W hat are the command line functions that import and export the DS jobs?

A. dsimport.exe- imports the Datastage components.


B. dsexport.exe- exports the Datastage components.

Dimensional modeling is again sub divided into 2 types.

a) Star Schema - Simple & Much Faster. Denormalized form.


b) Snowflake Schema - Complex with more Granularities. More normalized form.

W hat is sequence stage in job sequencer? W hat are the conditions?

A sequencer allows you to synchronize the control flow of multiple activities in a job
sequence. It can have multiple input triggers as well as multiple output triggers. The
sequencer operates in two modes: ALL mode. In this mode all of the inputs to the
sequencer must be TRUE for any of the sequencer outputs to fire. ANY mode. In this
mode, output triggers can be fired if any of the sequencer inputs are TRUE

W hat are the Repository Tables in Datastage and what are they?

A data warehouse is a repository (centralized as well as distributed) of Data, able to


answer any adhoc, analytical, historical or complex queries. Metadata is data about data.
Examples of metadata include data element descriptions, data type descriptions,

34 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

attribute/property descriptions, range/domain descriptions, and process/method


descriptions.
The repository environment encompasses all corporate metadata resources: database
catalogs, data dictionaries, and navigation services.
Metadata includes things like the name, length, valid values, and description of a data
element. Metadata is stored in a data dictionary and repository. It insulates the data
warehouse from changes in the schema of operational systems. In data stage I/O and
Transfer, under interface tab: input, out put & transfer pages will have 4 tabs and the last
one is build under that u can find the TABLE NAME.

W hat difference between operational data stage (ODS) & data warehouse?

A Dataware house is a decision support database for organizational needs. It is subject


oriented, Non volatile, integrated, time variant collect of data.

ODS (Operational Data Source) is a integrated collection of related information. It contains


maximum 90 days information.

How many jobs have you created in your last project?

100+ jobs for every 6 months if you are in Development, if you are in testing 40 jobs for
every 6 months although it need not be the same number for everybody

1.W hat about System variables?


2.How can we create Containers?
3.How can we improve the performance of DataStage?
4.what are the Job parameters?
5.what is the difference between routine and transform and function?
6.W hat are all the third party tools used in DataStage?
7.How can we implement Lookup in DataStage Server jobs?
8.How can we implement Slowly Changing Dimensions in DataStage?.
9.How can we join one Oracle source and Sequential file?.
10.What is iconv and oconv functions?
11.Difference between Hashfile and Sequential File?
12. Maximum how many characters we can give for a Job name in DataStage?

How do you pass filename as the parameter for a job?

1. Go to DataStage Administrator->Projects->Properties->Environment->UserDefined.
Here you can see a grid, where you can enter your parameter name and the
corresponding the path of the file.

2. Go to the stage Tab of the job, select the NLS tab, click on the "Use Job Parameter"
and select the parameter name which you have given in the above. The selected
parameter name appears in the text box beside the "Use Job Parameter" button. Copy the
parameter name from the text box and use it in your job. Keep the project default in the
text box.

How to remove duplicates in server job

1)Use a hashed file stage or


2) If you use sort command in UNIX(before job sub-routine), you can reject duplicated
records using -u parameter or
3)using a Sort stage

W hat is the utility you use to schedule the jobs on a UNIX server other than using
Ascential Director?

35 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

AUTOSYS": Thru autosys u can automate the job by invoking the shell script written to
schedule the datastage jobs.

It is possible to call one job in another job in server jobs?

I think we can call a job into another job. In fact calling doesn't sound good, because you
attach/add the other job through job properties. In fact, you can attach zero or more jobs.

Steps will be Edit --> Job Properties --> Job Control

Click on Add Job and select the desired job.

How do u clean the datastage repository.

REmove log files periodically.....

If data is partitioned in your job on key 1 and then you aggregate on key 2, what
issues could arise?

Data will partitioned on both the keys ! hardly it will take more for execution .

What is job control?how it is developed?explain with steps?

Controlling Datstage jobs through some other Datastage jobs. Ex: Consider two Jobs XXX
and YYY. The Job YYY can be executed from Job XXX by using Datastage macros in
Routines.

To Execute one job from other job, following steps needs to be followed in Routines.

1. Attach job using DSAttachjob function.

2. Run the other job using DSRunjob function

3. Stop the job using DSStopJob function

Containers: Usage and Types?

Container is a collection of stages used for the purpose of Reusability. There are 2 types
of Containers.
a) Local Container: Job Specific
b) Shared Container: Used in any job within a project. ·
There are two types of shared container:·
1.Server shared container. Used in server jobs (can also be used in parallel jobs).·
2.Parallel shared container. Used in parallel jobs. You can also include server shared
containers in parallel jobs as a way of incorporating server job functionality into a parallel
stage (for example, you could use one to make a server plug-in stage available to a
parallel job).

W hat does a Config File in parallel extender consist of?


a) Number of Processes or Nodes.
b) Actual Disk Storage Location.

How can you implement Complex Jobs in Datastage


Complex design means having more joins and more look ups. Then that job design will be
called as complex job. We can easily implement any complex design in DataStage by
following simple tips in terms of increasing performance also. There is no limitation of using
stages in a job. For better performance, Use at the Max of 20 stages in each job. If it is
exceeding 20 stages then go for another job.Use not more than 7 look ups for a

36 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

transformer otherwise go for including one more transformer.

W hat are validations you perform after creating jobs in designer.

W hat r the different type of errors u faced during loading and how u solve them

Check for Parameters. and check for input files are existed or not and also check
for input tables existed or not and also usernames,datasource names, passwords
like that

W hat user variable activity when it used how it used !where it is used with real
example

By using This User variable activity we can create some variables in the job sequnce,this
variables r available for all the activities in that sequnce.

Most probably this activity is @ starting of the job sequence

I want to process 3 files in sequentially one by one , how can i do that. while
processing the files it should fetch files automatically .

If the metadata for all the files r same then create a job having file name as parameter,
then use same job in routine and call the job with different file name...or u can create
sequencer to use the job...

W hat happens out put of hash file is connected to transformer..W hat error it
through?

If Hash file output is connected to transformer stage the hash file will consider as the
Lookup file if there is no primary link to the same Transformer stage, if there is no primary
link then this will treat as primary link itself. you can do SCD in server job by using Lookup
functionality. This will not return any error code.

W hat is iconv and oconv functions?

Iconv( )-----converts string to internal storage format.


Oconv( )----converts an expression to an output format.
W hat are OConv () and Iconv () functions and where are they used?

iconv is used to convert the date into into internal format i.e only datastage can understand
example :- date comming in mm/dd/yyyy format
datasatge will conver this ur date into some number like :- 740
u can use this 740 in derive in ur own format by using oconv.
suppose u want to change mm/dd/yyyy to dd/mm/yyyy.now u will use iconv and oconv.

ocnv(iconv(datecommingfromi/pstring,SOMEXYZ(seein help which is


iconvformat),defineoconvformat))

W hat is ' inserting for update ' in datastage

I think 'insert to update' is updated value is inserted to maintain history

How I can convert Server Jobs into Parallel Jobs?

I have never tried doing this, however, I have some information which will help you in
saving a lot of time. You can convert your server job into a server shared container. The
server shared container can also be used in parallel jobs as shared container.

Can we use shared container as lookup in Datastage server jobs?

37 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

I am using DataStage 7.5, Unix. we can use shared container more than one time in the
job.There is any limit to use it. why because in my job i used the Shared container at 6
flows. At any time only 2 flows are working. can you please share the info on this.

DataStage from Staging to MDW is only running at 1 row per second! What do we do to
remedy?

I am assuming that there are too many stages, which is causing problem and providing the
solution.

In general. if you too many stages (especially transformers , hash look up), there would be
a lot of overhead and the performance would degrade drastically. I would suggest you to
write a query instead of doing several look ups. It seems as though embarassing to have a
tool and still write a query but that is best at times.

If there are too many look ups that are being done, ensure that you have appropriate
indexes while querying. If you do not want to write the query and use intermediate stages,
ensure that you use proper elimination of data between stages so that data volumes do
not cause overhead. So, there might be a re-ordering of stages needed for good
performance.

Other things in general that could be looked in:

1) for massive transaction set hashing size and buffer size to appropriate values to
perform as much as possible in memory and there is no I/O overhead to disk.

2) Enable row buffering and set appropate size for row buffering

3) It is important to use appropriate objects between stages for performance

W hat is the flow of loading data into fact & dimensional tables?

Here is the sequence of loading a data warehouse.


1. The source data is first loading into the staging area, where data cleansing takes place.
2. The data from staging area is then loaded into dimensions/lookups.
3. Finally the Fact tables are loaded from the corresponding source tables from the
staging area.

What is the difference between sequential file and a dataset? When to use the copy stage?

Sequential Stage stores small amount of the data with any extension in order to access the
file where as Dataset is used to store huge amount of the data and it opens only with an
extension (.ds) The Copy stage copies a single input data set to a number of output
datasets. Each record of the input data set is copied to every output data set. Records
can be copied without modification or you can drop or change the order of columns.

W hat happens if RCP is disable?

Runtime column propagation (RCP): If RCP is enabled for any job, and specifically for
those stages whose output connects to the shared container input, then meta data will be
propagated at run time, so there is no need to map it at design time.

If RCP is disabled for the job, in such case OSH has to perform Import and export every
time when the job runs and the processing time job is also increased.

W hat are Routines and where/how are they written and have you written any
routines before?

Routines are stored in the Routines branch of the DataStage Repository,where you can

38 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

create, view, or edit them using the Routine dialog box. The following program components
are classified as routines:• Transform functions. These are functions that you can use
whendefining custom transforms. DataStage has a number of built-intransform functions
which are located in the Routines ➤ ➤
Examples Functions branch of the Repository. You
can also defineyour own transform functions in the Routine dialog box.• Before/After
subroutines. When designing a job, you can specify asubroutine to run before or after the
job, or before or after an activestage. DataStage has a number of built-in before/after
subroutines,which are located in the Routines ➤ ➤
Built-in Before/Afterbranch in the
Repository. You can also define your ownbefore/after subroutines using the Routine dialog
box.• Custom UniVerse functions. These are specialized BASIC functionsthat have been
defined outside DataStage. Using the Routinedialog box, you can get DataStage to create
a wrapper that enablesyou to call these functions from within DataStage. These
functionsare stored under the Routines branch in the Repository. Youspecify the category
when you create the routine. If NLS is enabled,

How we can call the routine in datastage job?explain with steps?

Routines are used for implementing the business logic they are two types 1) Before Sub
Routines and 2)After Sub Routinestepsdouble click on the transformer stage right click on
any one of the mapping field select [dstoutines] option within edit window give the business
logic and select the either of the options( Before / After Sub Routines)

How can we improve the performance of Datastage jobs?


Performance and tuning of DS jobs:

1.Establish Baselines
2.Avoid the Use of only one flow for tuning/performance testing
3.Work in increment
4.Evaluate data skew
5.Isolate and solve
6.Distribute file systems to eliminate bottlenecks
7.Do not involve the RDBMS in initial testing
8.Understand and evaluate the tuning knobs available.

Types of Parallel Processing?

Parallel Processing is broadly classified into 2 types.


a) SMP - Symmetrical Multi Processing.
b) MPP - Massive Parallel Processing.

W hat are orabulk and bcp stages?

ORABULK is used to load bulk data into single table of target oracle database.

BCP is used to load bulk data into a single table for microsoft sql server and sysbase

How can ETL excel file to Data mart?

Open the ODBC Data Source Administrator found in the control panel/administrative tools.
Under the system DSN tab, add the Driver to Microsoft Excel.
Then u'll be able to access the XLS file from Datastage.

W hat is the OCI? and how to use the ETL Tools?

OCI doesn't mean the orabulk data. It actually uses the "Oracle Call Interface" of the
oracle to load the data. It is kind of the lowest level of Oracle being used for loading the
data.

What are the different types of lookups in Datastage?


There are two types of lookups, lookup stage and lookupfileset.

39 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

Lookup: Lookup reference to another stage or Database to get the data from it and
transforms to other database.
LookupFileSet: It allows you to create a lookup file set or reference one for a lookup. The
stage can have a single input link or a single output link. The output link must be a
reference link. The stage can be configured to execute in parallel or sequential mode when
used with an input link. When creating Lookup file sets, one file will be created for each
partition. The individual files are referenced by a single descriptor file, which by convention
has the suffix .fs.

How can we create Containers?

There are two types of containers


1.Local Container.
2.Shared Container.
Local container is available for that particular Job only.
Where as Shared Containers can be used any where in the project.
Local container:
Step1:Select the stages required
Step2:Edit>Construct Container>Local

SharedContainer:
Step1:Select the stages required
Step2:Edit>Construct Container>Shared
Shared containers are stored in the Shared Containers branch of the Tree Structure

How do you populate source files?

There are many ways to populate one is writing SQL statement in oracle is one way

W hat are the differences between the data stage 7.0 and 7.5 in server jobs?

There are lot of Differences: There are lot of new stages are available in DS7.5 For Eg:
CDC Stage Stored procedure Stage etc..

Briefly describe the various client components?

There are four client components


Data stage Designer. A design interface used to create Datastage applications (known
as jobs). Each job specifies the data sources, the transforms required, and the destination
of the data. Jobs are compiled to create executables that are scheduled by the Director
and run by the Server.

Data stage Director. A user interface used to validate, schedule, run, and monitor
Datastage jobs.

Datastage Manager. A user interface used to view and edit the contents of the
Repository.

Datastage Administrator. A user interface used to configure Datastage projects and


users.

Types of vies in Datastage Director?

There are 3 types of views in Datastage Director


a) Job View - Dates of Jobs Compiled.
b) Log View - Status of Job last run
c) Status View - Warning Messages, Event Messages, and Program Generated
Messages.

W hat are the environment variables in datastage?give some examples?

40 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

There are the variables used at the project or job level. We can use them to configure the
job ie.we can associate the configuration file (Without this u can not run ur job), increase
the sequential or dataset read/ write buffer.

ex: $APT_CONFIG_FILE
Like above we have so many environment variables. Please go to job properties and click
on "add environment variable" to see most of the environment variables.

W hat are the Steps involved in development of a job in DataStage?


The steps required are:
select the data source stage depending upon the sources for ex:flatfile,database, xml etc
select the required stages for transformation logic such as transformer, link collector, link
Partitioner, Aggregator, merge etc
select the final target stage where u want to load the data either it is datawatehouse, data
mart, ODS,staging etc

what is the difference between validated ok and compiled in Datastage.

When we say "Validating a Job", we are talking about running the Job in the "check only"
mode. The following checks are made:

- Connections are made to the data sources or data warehouse.


- SQL SELECT statements are prepared.
- Files are opened. Intermediate files in Hashed File, UniVerse, or ODBC stages that use
the local data source are created, if they do not already exist.

W hy do you use SQL LOADER or OCI STAGE?

When the source data is anormous or for bulk data we can use OCI and SQL loader
depending upon the source

W here we use link Partitioner in data stage job? explain with example?

We use Link Partitioner in DataStage Server Jobs.The Link Partitioner stage is an active
stage which takes one input andallows you to distribute partitioned rows to up to 64 output
links.

Purpose of using the key and difference between Surrogate keys and natural key

We use keys to provide relationships between the entities (Tables). By using primary and
foreign key relationship, we can maintain integrity of the data.

The natural key is the one coming from the OLTP system.

The surrogate key is the artificial key which we are going to create in the target DW. We
can use these surrogate keys insted of using natural key. In the SCD2 scenarions
surrogate keys play a major role

How does Datastage handle the user security?

We have to create users in the Administrators and give the necessary privileges to users.

How to parameterize a field in a sequential file? I am using Datastage as ETL Tool,


Sequential file as source.

41 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

We cannot parameterize a particular field in a sequential file, instead we can parameterize


the source file name in a sequential file.

Is it possible to move the data from oracle ware house to SAP W arehouse using
with DATASTAGE Tool.

We can use Datastage Extract Pack for SAP R/3 and DataStage Load Pack for SAP BW
to transfer the data from oracle to SAP Warehouse. These Plug In Packs are available
with DataStage Version 7.5

How to implement type2 slowly changing dimensions in data stage?explain with example?

We can handle SCD in the following ways

Type 1: Just use, “Insert rows Else Update rows”


Or
“Update rows Else Insert rows”, in update action of target
Type 2: Use the steps as follows
a) U have use one hash file to Look-Up the target
b) Take 3 instances of target
c) Give different conditions depending on the process
d) Give different update actions in target
e) Use system variables like Sysdate and Null.

How to handle the rejected rows in Datastage?

We can handle rejected rows in two ways with help of Constraints in a Tansformer.1) By
Putting on the Rejected cell where we will be writing our constraints in the properties of the
Transformer2)Use REJECTED in the expression editor of the Constraint Create a hash file
as a temporary storage for rejected rows. Create a link and use it as one of the output of
the transformer. Apply either of the two steps above said on that Link. All the rows which
are rejected by all the constraints will go to the Hash File.

W hat are the difficulties faced in using Datastage?


1) If the number of lookups are more?
2) If clients want currency in terms of integer in conjunction with character like 2m,3l.
3) What will happen, while loading the data due to some regions job aborts?

Does Enterprise Edition only add the parallel processing for better performance?

Are any stages/transformations available in the enterprise edition only?


• Datastage Standard Edition was previously called Datastage and Datastage Server
Edition. • Datastage Enterprise Edition was originally called Orchestrate, then renamed to
Parallel Extender when purchased by Ascential. • Datastage Enterprise: Server jobs,
sequence jobs, parallel jobs. The enterprise edition offers parallel processing features for
scalable high volume solutions. Designed originally for UNIX, it now supports Windows,
Linux and Unix System Services on mainframes. • Datastage Enterprise MVS: Server jobs,
sequence jobs, parallel jobs, mvs jobs. MVS jobs are jobs designed using an alternative
set of stages that are generated into Cobol/JCL code and are transferred to a mainframe
to be compiled and run. Jobs are developed on a UNIX or Windows server transferred to
the mainframe to be compiled and run. The first two versions share the same Designer
interface but have a different set of design stages depending on the type of job you are
working on. Parallel jobs have parallel stages but also accept some server stages via a
container. Server jobs only accept server stages; MVS jobs only accept MVS stages.
There are some stages that are common to all types (such as aggregation) but they tend
to have different fields and options within that stage.

W hat is the utility you use to schedule the jobs on a UNIX server other than using
Ascential Director?

42 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

AUTOSYS": Through autosys we can automate the job by invoking the shell script written
to schedule the Datastage jobs.

Is it possible to call one job in another job in server jobs?

I think we can call a job into another job. In fact calling doesn't sound good, because you
attach/add the other job through job properties. In fact, you can attach zero or more jobs.

Steps will be Edit --> Job Properties --> Job Control, Click on Add Job and select the
desired job.

How does u clean the Datastage repository?

Remove log files periodically.....

If data is partitioned in your job on key 1 and then you aggregate on key 2, what issues could arise?

Data will partition on both the keys! Hardly will it take more for execution.

Dimension Modeling types along with their significance

Ans.
1) E-R Diagrams
2) Dimensional modeling
a) logical modeling b) Physical modeling

W hat is job control? How it is developed? Explain with steps?

Controlling Datstage jobs through some other Datastage jobs. Ex: Consider two Jobs XXX
and YYY. The Job YYY can be executed from Job XXX by using Datastage macros in
Routines.

To execute one job from other job, following steps needs to be followed in Routines.
1. Attach job using DSAttachjob function.
2. Run the other job using DSRunjob function
3. Stop the job using DSStopJob function

Containers: Usage and Types?

Container is a collection of stages used for the purpose of Reusability.

There are 2 types of Containers.


a) Local Container: Job Specific
b) Shared Container: Used in any job within a project. ·

There are two types of shared container:


1. Server shared container. Used in server jobs (can also be used in parallel jobs).·
2. Parallel shared container. Used in parallel jobs. You can also include server shared
containers in parallel jobs as a way of incorporating server job functionality into a parallel
stage (for example, you could use one to make a server plug-in stage available to a
parallel job).

What are constraints and derivation?

Constraint specifies the condition under which data flow through the output link.
Constraint which output link is used. Constraints are nothing but business rule or logic.

43 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

For example-we have to split customers.txt file into customer address files based on
customer country, we need to pass constraints. Suppose we want us customer addresses
we need to pass constraint for us customer.txt file. Similarly for Canadian and Australian
customer.
Constraints are used to check for a condition and filter the data. Example: Cust_Id<>0 is
set as a constraint and it means and only those records meeting this will be processed
further.

Derivation is a method of deriving the fields, for example if you need to get some SUM,
AVG etc.derivations specifies the expression to pass values to the target column. For
simple example input column is a derivation that passes the value to target column.

W hat is the difference between constraint and derivations?

Constraints are applied to links where as derivations are applied to columns.

Explain the process of taking backup in Datastage?

Any Datastage objects including whole projects, which are stored in manager repository,
can be exported to a file. This exported file can then imported back into Datastage.

Import and export can be use for many purposes, like

1) Backing up jobs and projects.


2) Maintaining different version of jobs or project.
3) Moving Datastage jobs from one project to another project. Just export the objects,
move to the other project, and then re-import them into the new project.
4) Sharing jobs and projects between developers, the export files, when zipped, are small
can be easily emailed from developer to another.

How can you implement Complex Jobs in Datastage?

Complex design means having more joins and more lookups. Then that job design will be
called as complex job. We can easily implement any complex design in Datastage by
following simple tips in terms of increasing performance also. There is no limitation of using
stages in a job. For better performance, Use at the Max of 20 stages in each job. If it is
exceeding 20 stages then go for another job. Use not more than 7 look ups for a
transformer otherwise go for including one more transformer.

W hat are validations you perform after creating jobs in designer?

Validation guarantees that Datastage job will be successful, it carry out fallowing without
actually data processing.
1) Connections are made for sources.
2) Opens the files.
3) Prepares the sql statements necessary for fetching the data.
4) It makes all connection from source to target that ready for data processing from
source to target.
5) Check for Parameters. And check for input files are existed or not and also check
for input tables existed or not and also usernames, data source names,
passwords like that

W hat r the different type of errors u faced during loading and how u solves them?

How do you fix the error "OCI has fetched truncated data" in DataStage

Can we use Change capture stage to get the truncated data’s? Members please
confirm

W hat user variable activity when it used how it used !where it is used with real

44 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

example

By using This User variable activity we can create some variables in the job sequnce, this
variables r available for all the activities in that sequence.

Most probably this activity is @ starting of the job sequence

W hat is the meaning of the following..

1) If an input file has an excessive number of rows and can be split-up then use standard
logic to run jobs in parallel
ANS: row partitioning and collecting.

2) Tuning should occur on a job-by-job basis. Use the power of DBMS.

If u have SMP machines u can use IPC, link-colector, link-partitioner for performance
tuning If u have cluster,MPP machines u can use parallel jobs

What is DS Manager used for?


The Manager is a graphical tool that enables you to view and manage the contents of the
DataStage Repository . Its main use of export and import is sharing the jobs and projects
one project to other project.
How to handle Date convertions in Datastage?
We use a) “Iconv” function - Internal Convertion.
b) “Oconv” function - External Convertion.
eg: Function to convert mm/dd/yyyy format to yyyy-dd-mm is
Oconv(Iconv(Filedname,”D/MDY[2,2,4]”),”D-MDY[2,2,4]”)
Importance of Surrogate Key in Data warehousing?
The concept of surrogate comes into play when there is slowely changing dimension in a
table.
In such condition there is a need of a key by which we can identify the changes made in
the dimensions.
These slowely changing dimensions can be of three type namely SCD1,SCD2,SCD3.
These are system generated key. Mainly they are just the sequence of numbers or can be
alphanumeric values also.
How can we implement Lookup in DataStage Server jobs?
We can use a Hash File as a lookup in server jobs. The hash file needs at least one key
column to create.
How can u implement slowly changed dimensions in datastage?

Yes you can implement Type1 Type2 or Type 3. Let me try to explain Type 2 with time
stamp.
Step :1 time stamp we are creating via shared container. it return system time and one
key. For satisfying the lookup condition we are creating a key column by using the column
generator.
Step 2: Our source is Data set and Lookup table is oracle OCI stage. by using the change
capture stage we will find out the differences. the change capture stage will return a value
for chage_code. based on return value we will find out whether this is for insert , Edit,??or
update. if it is insert we will modify with current timestamp and the old time stamp will keep
as history.
Sep 19
Summarize the differene between OLTP,ODS AND DATA WAREHOUSE ?
OLTP - means online transaction processing, it is nothing but a database, we are calling
oracle, sqlserver, and db2 are olap tools.
OLTP databases, as the name implies, handle real time transactions which inherently have
some special requirements.
ODS- stands for Operational Data Store. Its a final integration point ETL process we load
the data in ODS before you load the values in target..
Data Warehouse- Datawarehouse is collection of integrated, time varient, non volatile and
time variant collection of data which is used to take management decisions.

45 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

Why OLTP database are designs not generally a good idea for a Data Warehouse
OLTP cannot store historical information about the organization. It is used for storing the
details of daily transactions while a data warehouse is a huge storage of historical
information obtained from different datamarts for making intelligent decisions about the
organization.
What is data cleaning? How is it done?
I can simply say it as Purifying the data.
Data Cleansing: the act of detecting and removing and/or correcting a database’s dirty
data (i.e., data that is incorrect, out-of-date, redundant, incomplete, or formatted
incorrectly)
What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table in a data
warehouse. For example: Based on design you can decide to put the sales data in each
transaction. Now, level of granularity would mean what detail are you willing to put for each
transactional fact. Product sales with respect to each minute or you want to aggregate it
upto minute and put that data.
It also means that we can have (for example) data agregated for a year for a given
product as well as the data can be drilled down to Monthly, weekl and daily basis…teh
lowest level is known as the grain. going down to details is Granularity
Which columns go to the fact table and which columns go the dimension table?
The Aggreation or calculated value colums will go to Fac Tablw and details information will
go to diamensional table.
To add on, Foreign key elements along with Business Measures, such as Sales in $ amt,
Date may be a business measure in some case, units (qty sold) may be a business
measure, are stored in the fact table. It also depends on the granularity at which the data
is stored.

What is a CUBE in datawarehousing concept?


Cubes are logical representation of multidimensional data.The edge of the cube contains
dimension members and the body of the cube contains data values.
What is SCD1 , SCD2 , SCD3?
SCD Type 1, the attribute value is overwritten with the new value, obliterating the historical
attribute values.For example, when the product roll-up
changes for a given product, the roll-up attribute is merely updated with the current value.
SCD Type 2,a new record with the new attributes is added to the dimension table.
Historical fact table rows continue to reference the old dimension key with the old roll-up
attribute; going forward, the fact table rows will reference the new surrogate key with the
new roll-up thereby perfectly partitioning history.
SCDType 3, attributes are added to the dimension table to support two simultaneous
roll-ups - perhaps the current product roll-up as well as ?current version minus one?, or
current version and original.
What is real time data-warehousing?
Real-time data warehousing is a combination of two things: 1) real-time activity and 2)
data warehousing. Real-time activity is activity that is happening right now. The activity
could be anything such as the sale of widgets. Once the activity is complete, there is data
about it.
Data warehousing captures business activity data. Real-time data warehousing captures
business activity data as it occurs. As soon as the business activity is complete and there
is data about it, the completed activity data flows into the data warehouse and becomes
available instantly. In other words, real-time data warehousing is a framework for deriving
information from data as the data becomes available.
What is ER Diagram ?
The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as
a way to unify the network and relational database views. ?
What is a lookup table?
A lookup table is nothing but a ‘lookup’ it give values to referenced table (it is a reference),
it is used at the run time, it saves joins and space in terms of transformations. Example, a
lookup table called states, provide actual state name (’Texas’) in place of TX to the output.

What is the main difference between schema in RDBMS and schemas in Data Warehouse….?

46 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

RDBMS Schema
* Used for OLTP systems
* Traditional and old schema
* Normalized
* Difficult to understand and navigate
* cannot solve extract and complex problems
* poorly modelled
DWH Schema
* Used for OLAP systems
* New generation schema
* De Normalized
* Easy to understand and navigate
* Extract and complex problems can be easily solved
* Very good model
What is the need of surrogate key; why primary key not used as surrogate key?
Surrogate Key is an artificial identifier for an entity. In surrogate key values are?
Generated by the system sequentially (Like Identity property in SQL Server and Sequence
in Oracle). They do not describe anything.
Primary Key is a natural identifier for an entity. In Primary keys all the values are entered
manually by the users which are uniquely identified. There will be no repetition of data.
Need for surrogate key not Primary Key
If a column is made a primary key and? Later there needs? a change in the data type or
the length for that column then all the foreign keys that are dependent on that primary key
should be changed making the database Unstable
Surrogate Keys make the database more stable because it insulates the Primary and
foreign key relationships from changes in the data types and length.

What is Snow Flake Schema?


Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension
data has been grouped into multiple tables instead of one large table. For example, a
product dimension table in a star schema might be normalized into a products table, a
product category table, and a product manufacturer table in a snowflake schema. While
this saves space, it increases the number of dimension tables and requires more foreign
key joins. The result is more complex queries and reduced query performance

INTER VIEW QUESTIONS (GENERAL)

What is WH scheme like “star scheme”, “snow flake” and there advantages /
disadvantages under different conditions?
How to design an optimized data warehouse both from data upload and query
performance point of view?
What to exactly is parallel processing and partitioning & how it can be employed for
optimizing the data warehouse design?
What are preferred indexes & constraints for DWH ?
How the volume of data (from medium to very high) and frequency of querying will effect
the d/n considerations ?
why DATAWARE HOUSE ?
Different between OLTP & OLAP ?
What is the feature of DWH ?
Do you know some more ETL TOOL ?
what is the use of staging Area ?
Do you know the life cycle of WH ?
Did you heard about star
Tell me about ur –self ?
How many dimension & Fact are there in ur project ?
What is Dimension ?
Different between DWH & DATA MART ?
1. How can you Explain DWH to a Lay man?
2. What is Molap and Rolap? What is Diff between Them?
3. what are Diff Schemas used in DWH? Which one is
most Commonly Used?

47 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

4. What is Snow Flex Schema ?Explain?


5. what is Star Schema ?Expalin?
6. How you Decide that you have to Go for Ware
Hosing?In Requirement Study?
7. What are all the Questions you Put to yourClient?when you are Designing DWH?

Oracle :
how many type of Indexes are there..
In ware house which indexes are used
what is diff betw Trancate and Delete table ..
how do you Optimise the Query..Read Optimisation in
Oracle..

Project :
Project Description and All....

What is a Data Warehouse?.


What is the difference between OLTP and OLAP?.
How can you explain Data ware house to a lay man?.
Do you mean to say that if we increase the system resources like RAM, Hard Disk
and Processor, we can as well make an OLTP system behave as a DWH?.
As a DBA what are the differences do you think a DWH architecture should have or
what are the parameters that you are concerned about when taking DWH into account?.
What are indexes and what are the different Types?.
Why should you do indexing first of all?.
What sort of indexing is done in Fact and why?.
What sort of indexing is done in Dimensions and why?.
What sort of normalization will you have on dimensions and facts?.
What are materialized views?.
What is a Star schema?.
What is a Snow Flake schema?.
What is the difference between those two?.
Which one would you choose and why?.
What are dimensions?.
What are facts?.
What is your role in the projects that you have done?.
What is Informatica powercenter capabilities?.
What are the different types of Transformers?.
What is the difference between Source and Joiner transformers?
Why is a source used and why is a joiner used?.
What are active and passive transformers?.
How many transformers have you used?.
What is CDC?.
What are SCDs and what are the different types?.
Which of the types have you used in your project?.
What is a date dimension
How have you handled sessions?.
How can you handle multiple sessions and batches?.
On what platform was your Informatica server?.
How many mappings were there in your projects?.
How many transformers did your biggest mapping have?.
Which was your source and which was your target platforms?.
What is a cube?.
How can you create a catalog and what are its types?.
What is power play transformer?.
What is Power play administrator?(Same as above question).
What is Slice and Dice?.
What is your idea of an Authenticator?.
Can cubes exist independently?.
Can cubes be sources to another application?.
How many maximum number of rows can a date dimension have?.

48 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

How have you done reporting?.


What are hotfiles?
What are snapshots?.
What is the difference?.

I-GATE INTERVIEW QUESTION

1] W hat is the difference between snow flake and star flake schemas?
2] How will u come to know that u have to do performance tuning?
3] Describe project
4] How many dimensions and facts in your project?
5] Draw scd type 1 and scd type 2
6] If from target you are getting timestamp data and you have one port in target
having data type as date then how will u load it?
7] W hat is the different type of lookups?
8] W hat condition will you give in update strategy transformation in scd type 1?
9] W hat the different type of variables in update transformation?
10] W hat is target based commit and source based commit?
11] W hy you think scd type 2 is critical?
12] W hat is the type of facts?
13] W hat is fact less fact?
14] If I am returning one port through connected lookup then why you need
unconnected lookup?
15] If from flat file some duplicate rows are coming then how will you remove it
using informatica?
16] If from relational table duplicate rows are coming then how will you remove
them using informatica?
17] If I did not give group by option in aggregator transformation then what will be
the result?
18] W hat is multidimensional analysis?
19] If I give all the characteristics of datawearhouse to oltp then will it be data
warehouse?
20] W hat are the characteristics of datawearhouse?
21] W hat is the break up of your team?
22] How will you do performance tuning in mapping?
23] W hich is good for performance static or dynamic cache?
24] W hat is target load order?
25] W hat is the transformation you worked on?
26] W hat is the naming convention you are using?
27] How are you getting data from client?
28] How will you convert rows into column and column into rows using informatica?
29] How will you enable test load?
30] Did you work with connected and unconnected lookup tell the difference
31] Did you ever use normalizer?

Here are some questions I had faced during interviews...

• What is a Data Warehouse?.


• What is the difference between OLTP and OLAP?.
• How can you explain Data ware house to a lay man?.
• Do you mean to say that if we increase the system resources like RAM, Hard Disk
and Processor, we can as well make an OLTP system behave as a DWH?.
• As a DBA what are the differences do you think a DWH architecture should have or
what are the parameters that you are concerned about when taking DWH into account?.
• What are indexes and what are the different Types?.
• Why should you do indexing first of all?.
• What sort of indexing is done in Fact and why?.
• What sort of indexing is done in Dimensions and why?.
• What sort of normalization will you have on dimensions and facts?.
• What are materialized views?.

49 of 50 6/4/2011 8:44 AM
datastage 8.1: September 2010 http://datastageinfosoft.blogspot.com/2010_09_01_archive.html

• What is a Star schema?.


• What is a Snow Flake schema?.
• What is the difference between those two?.
• Which one would you choose and why?.
• What are dimensions?.
• What are facts?.
• What is your role in the projects that you have done?.
• What is Informatica powercenter capabilities?.
• What are the different types of Transformers?.
• What is the difference between Source and Joiner transformers?.
• Why is a source used and why is a joiner used?.
• What are active and passive transformers?.
• How many transformers have you used?.
• What is CDC?
• What are SCDs and what are the different types?
• Which of the types have you used in your project?
• What is a date dimension?
• How have you handled sessions?
• How can you handle multiple sessions and batches?
• On what platform was your Informatica server?
• How many mappings were there in your projects?
• How many transformers did your biggest mapping have?.
• Which was your source and which were your target platforms?
• What is a cube?
• How can you create a catalog and what are its types?.
• What is power play transformer?.
• What is Power play administrator?(Same as above question).
• What is Slice and Dice?
• What is your idea of an Authenticator?
• Can cubes exist independently?
• Can cubes be sources to another application?
• How many maximum number of rows can a date dimension have?.
• How have you done reporting?.
• What are hot files?
• What are snapshots?.
• What is the difference?.

Posted by Madhava 0 comments

Newer Posts Home Older Posts

Subscribe to: Posts (Atom)

50 of 50 6/4/2011 8:44 AM

Das könnte Ihnen auch gefallen