Capgemini

Re: How to create a doccument in datastage?
Answer in datastage 8.0 ver in the xml out

# 1 stage--->output tab---
>document setting --->check the
propertity"general xml
chunk"
Re: A flatfile contains 200 records.I want to load first 50 records at first time
running the job,second 50 records at second time running and so on,how u can
develop the job?pls give the steps?pls pls
Answer Design the job like this:
# 1 1. Read records from input flat
file and click on option of
rownumbercolumn in the file. It
will generate a unique
number corresponding to each record
in that file.
2. Use filter stage and write the
conditions like this:
a. rownumbercolumn<=50(in 1st
link to load the records
in target file/database)
b. rownumbercolumn>50 (in 2nd
link to load the records
in the file with the same name as
input file name, in
overwrite mode)
So, first time when your job runs

first 50 records will be
loaded in the target and same time
the input file records
are overwritten with records next
first 50 records i.e. 51
to 200.
2nd time when your job runs first
50 records(i.e. 51-100)
will be loaded in the target and
same time the input file
records are overwritten with
records next first 50 records
i.e. 101 to 200.
And so on, all 50-50 records will
be loaded in each run to
the target
Re: in aggregator , how can i get the sum in readable format

Answer In server Job:
# 1 open Aggregator properties, select
output tab, select column
whatever u required to SUM, double
click on Derivation
column, select the source column, and
finally select the SUM
option in drop down list of Aggregate
Function and click OK.
Re: i/p o/p1 o/p2 1 1 4 1 1 5 1 1 6 2 2 2 2 2 2 3 3 4 5 6 how to

populates i/p rows into o/p1&o/p2 using datastage stages?and also
the same scenario using sql?
Answer Hi 0 Vijaya
#1
first sort the input
date then use the
transformer stage by
using stage variables
you can use the logic
like check the
first record then check
the second record if
both are equal
send the output to o/p1
or else send to o/p2....
I hope i made u
understand the logic
2
Y
Is This Answer es
Correct ? 4 No
Re: i/p o/p1 o/p2 1 1 4 1 1 5 1 1 6 2 2 2 2 2 2 3 3 4 5 6 how to

populates i/p rows into o/p1&o/p2 using datastage stages?and also
the same scenario using sql?
Answer first sort the data,
#2 then
take two stage variable
if (sv1=sv2) then
ds.link1 else ds.link2
Re: what is the differeces between hash and modulus partition

methods
Answer Modulus is nothing but 0 Venkata
# 3 Modulus in Maths, so it can
be
performed only on Numeric
Data Fields, Hash can be
used for
any kind of data fileds, it
will assign similar values
in
the partioning
2
Ye
Is This Answer s
Correct ? 0 No
Re: what is the differeces between hash and modulus partition methods
Answer Hi this is Vijay
#4
Modulus
Hash
1. For numerics
1. For Numerics and
characters
2. Datatype specific
2. Not Datatype spefic
Re: how can u handle null values in transformer stage.

Answer by using 1).Null to value(input
# 2 column)
2.null to empty(input column)
3.if is null(input column_then
three commands are there in
transformers for null handling
Re: how to cleansing data

Answer Data cleansing means 0 Navin
#1 converting non unique
data format into
unique format .This
is performed in
Transformer stage.
2
Is This Y
Answer es
Correct ? 0 No

Answer In this removes the 0 Satyanarayana
#2 unwanted data(Bad
records OR NULL
Values) and find the
inconsistent data and
make it
consistent data.
Example:
LOc
---
Hyd
Hyderabad
hyde
After Cleansing
Loc
---
Hyderabad
Hyderabad
Hyderabad
1
Is This Y
Answer es
Correct ? 0 No

Answer data cleansing is a 0 B.rambabu
#3 process of identifing
the the data
inconsistency and
inaccuracies
ex:
data inaccuracy:
hyd
Hydrabad
after
hydrabad
hydrabad
data
inconsistency
10.78
10.23465
after
10.27
10.23
1
Is This Y
Answer es
Correct ? 0 No

Answer it is process of
#4 correcting the
inconsitency data and
make consitent format
Re: how to transfer file from one system to another system in

unix? which cmd to be use?
Answer using FTP (File Transfer 0 Kalyani
#1 protocal), we can transfer the
file
11
Ye
Is This Answer s
Correct ? 0 No
Re: how to transfer file from one system to another system in unix?
which cmd to be use?
Answer Thrught SCP Command also we
# 2 can
Re: if we take 2 tables(like emp and dept), we use join stage and
how to improve the performance?
Answer when ever join 2 tables 0 Kiran
#1 based on key columns if the
key
column is numeric ,set
modulus,if the key column is
non
numeric set hash partition
technique.and compare to
look up
join give better
performance coz join has
sort operation
by default.
Is This Answer 7
Correct ? Ye 1 No
s
Re: if we take 2 tables(like emp and dept), we use join stage and
how to improve the performance?
Answer above answer has one mistake
#2 i.e join doesn't has sort
operation bydefault we
explicitly
specify
Re: How to transfer file from one system to another system in UNIX?
which command to be use?
Answer BY USING "FTP COMMAND" WE CAN
#1 CHANGE D FILES ONE SYSTEM TO
ANOTHER SYSTEM.
Re: what is factless fact table?

Answer A fact table without any facts is known
# 1 as factless fact
table. Always a Fact should be a Numeric
value but every
numeric value need not be a fact.
Re: WHAT IS FORCE COMPILE?

Answer when ever u compile the job for 0 Vijaya
#1 first time it is know as
force compile when ever u do some
changes to the job and
compile the job this compilation
is known as ordinary compile
3
Ye
Is This Answer s
Correct ? 0 No
Re: WHAT IS FORCE COMPILE?

Answer For parallel jobs there is also a
#2 force compile option. The
compilation of parallel jobs is by
default optimized such
that transformer stages only get
recompiled if they have
changed since the last
compilation. The force compile
option overrides this and causes
all transformer stages in
the job to be compiled. To select
this option:
Choose File > Force Compile
Re: What is RCP?

Answer In running the job the columns may change
# 2 from one stage to
other stages & at the same time we will be
loading the
unnecessary columns in the stage that
doesn't required to
process.So we can load only the required
columns to target
database, as this is done by enabling the
RCP.
Re: what is time dimension? and how to populate time demension

Answer Hi
#1
Every DWH has time dimension u can load the
time dimension
though pl/sql script.
Re: if a column contains data like ram,rakesh,madhan,suraj,pradeep,bhaskar

then I want to place names separated by commas in another columns how can we
do?
Answer By using stage variable in transformer
#1
stg_name='' ( initialize)
stg_name=input.name : " , " : stg_name
_________
Transformer derivation
stg_name --> Target_name
Re: hi this is kiran i have one table i want divide the table with two different
table like even rows and odd rows how can i do this one tell me plzz
Answer oracle stage--->surrogate key generator---
# 1 >transformer
the t/f having 'mod' function
mod(empkey,2)=0 --->even rows

mod(empkey,2)<>0 --->odd rows
empkey is generated from surrogate key

generator
Re: col1 123 abc 234 def jkl 768 opq 567 789 but i want two
targetss target1 contains only numeric values and target2 contains
only alphabet values like trg1 123 234 768 567 789 trg2 abc def jkl
opq
Answer in transformer stage, used Alpha 0 Mike
#1 function which
Checks if a string is alphabetic
or not. if its return
value is 1 then alphabetic.
3
Ye
Is This Answer s
Correct ? 0 No
Re: col1 123 abc 234 def jkl 768 opq 567 789 but i want two targetss
target1 contains only numeric values and target2 contains only alphabet
values like trg1 123 234 768 567 789 trg2 abc def jkl opq
Answer this is not working plz explain
# 2 clearly
Re: what is the use of surogate key in datastage
Answe DW is for to maintain historical & 0 Navin

current data.
r
if use Natural key as primary key in DW
# 1 we can't maintain
history.so we need a physical key in DW
i.e is generated by
ETL developer .In DW we have develop a
key called surrogate
key.
3
Ye
Is This Answer
s
Correct ? 0 No
Re: what is the use of surogate key in datastage
Answe WE DONT USE PRODUCTION KEYs IN OUR DW,

SO WE GENERATE OUR
r
OWN KEYS TO IMPLIMENT DW USING THIS
# 2 SUROGATE KEY.
Re: WHERE YOU USE UNIX COMMANDS AS A ETL DEVELOPER?
Answ you can call in the following places in

er ETL(DataStage)
# 2 1-Sequential File Stage (Stage->Stage Uses
filter commands).
2-Before and after Stage Subroutines in Job
Paramametrs tab.
3-Before and after Stage Subroutines in
Transformer stage.
4-Job Sequences using Execute command Activity
and Routine
activity.
5-Using DataStage Routines
6-Using Routines called ExecSH and
ExecSHSilent
Re: hi, how would i run job1 then job 3 , then job2 in a sequence
of job1 ,job2,job3. Thanks sunitha
Answ Hi
er
U can run the job1 then job3 by two methods by
# 4 using trigger
conditional or by nested condition activity.
What are conformed dimensions?

Ans: A conformed dimension is a single, coherent view of the same piece of data throughout the
organization. The same dimension is used in all subsequent star schemas defined. This enables reporting
across the complete data warehouse in a simple format.
Why fact table is in normal form?

Ans: Basically the fact table consists of the Index keys of the dimension/ook up tables and the measures.
so when ever we have the keys in a table .that itself implies that the table is in the normal form.
What is a linked cube?

Ans: A cube can be stored on a single analysis server and then defined as a linked cube on other
Analysis servers. End users connected to any of these analysis servers can then access the cube. This
arrangement avoids the more costly alternative of storing and maintaining copies of a cube on multiple
analysis servers. linked cubes can be connected using TCP/IP or HTTP. To end users a linked cube
looks like a regular cube.
What is degenerate dimension table?
Ans: The values of dimension which is stored in fact table is called degenerate dimensions. these
dimensions doesn,t have its own dimensions.
What is ODS?
Ans: ODS stands for Online Data Storage.
What is a general purpose scheduling tool?

Ans: The basic purpose of the scheduling tool in a DW Application is to stream line the flow of data from
Source To Target at specific time or based on some condition.
What is the need of surrogate key;why primary key not used as surrogate key?
Ans: Surrogate Key is an artificial identifier for an entity. In surrogate key values are generated by the
system sequentially(Like Identity property in SQL Server and Sequence in Oracle). They do not describe
anything. Primary Key is a natural identifier for an entity. In Primary keys all the values are entered
manually by the user which are uniquely identified. There will be no repetition of data
What is Hash file stage and what is it used for?

Ans: Used for Look-ups. It is like a reference table. It is also used in-place of ODBC, OCI tables for better
performance.
What is the utility you use to schedule the jobs on a UNIX server other than using Ascential
Director?
Ans: Use crontab utility along with d***ecute() function along with proper parameters passed.
What is the OCI? and how to use the ETL Tools?

Ans: OCI means orabulk data which used client having bulk data its retrive time is much more ie., your
used to orabulk data the divided and retrived
What are OConv () and Iconv () functions and where are they used?
Ans: IConv() - Converts a string to an internal storage formatOConv() - Converts an expression to an
output format.
What is Fact table?

Ans: Fact Table contains the measurements or metrics or facts of business process. If your business
process is "Sales" , then a measurement of this business process such as "monthly sales number" is
captured in the Fact table. Fact table also contains the foriegn keys for the dimension tables.
What are the steps In Building the Data Model

Ans: While ER model lists and defines the constructs required to build a data model, there is no standard
process for doing so. Some methodologies, such as IDEFIX, specify a bottom-up
What is Dimensional Modelling?

Ans: Dimensional Modelling is a design concept used by many data warehouse desginers to build thier
datawarehouse. In this design model all the data is stored in two types of tables - Facts table and
Dimension table. Fact table contains the facts/measurements of the business and the dimension table
contains the context of measuremnets ie, the dimensions on which the facts are calculated.
What type of Indexing mechanism do we need to use for a typical datawarehouse?

Ans: On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other
types of clustered/non-clustered, unique/non-unique indexes.
What is Normalization, First Normal Form, Second Normal Form , Third Normal Form?
Ans: Normalization can be defined as segregating of table into two different tables, so as to avoid
duplication of values.
Is it correct/feasible develop a Data Mart using an ODS?

Ans: Yes it is correct to develop a Data Mart using an ODS.becoz ODS which is used to?store
transaction data and few Days (less historical data) this is what datamart is required so it is coct to
develop datamart using ODS.
What are other Performance tunings you have done in your last project to increase the
performance of slowly running jobs?
Ans: 1. Staged the data coming from ODBC/OCI/DB2UDB stages or any database on the server using
Hash/Sequential files for optimum performance also for data recovery in case job aborts.
2. Tuned the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values for faster inserts,
updates and selects.
3. Tuned the 'Project Tunables' in Administrator for better performance.
4. Used sorted data for Aggregator.
5. Sorted the data as much as possible in DB and reduced the use of DS-Sort for better performance of
jobs
6. Removed the data not used from the source as early as possible in the job.
7. Worked with DB-admin to create appropriate Indexes on tables for better performance of DS queries
8. Converted some of the complex joins/business in DS to Stored Procedures on DS for faster execution
of the jobs.
9. If an input file has an excessive number of rows and can be split-up then use standard logic to run jobs
in parallel.
10. Before writing a routine or a transform, make sure that there is not the functionality required in one of
the standard routines supplied in the sdk or ds utilities categories.
Constraints are generally CPU intensive and take a significant amount of time to process. This may be
the case if the constraint calls routines or
external macros but if it is inline code then the overhead will be minimal.
Try to have the constraints in the 'Selection' criteria of the jobs itself. This will eliminate the unnecessary
records even getting in before joins
are made.
12. Tuning should occur on a job-by-job basis.
13. Use the power of DBMS.
14. Try not to use a sort stage when you can use an ORDER BY clause in the database.
15. Using a constraint to filter a record set is much slower than performing a SELECT … WHERE….
16. Make every attempt to use the bulk loader for your particular database. Bulk loaders are generally
faster than using ODBC or OLE.
Related

Capgemini

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Capgemini

Hochgeladen von

Copyright:

Verfügbare Formate

Re: How to create a doccument in datastage?

Answer in datastage 8.0 ver in the xml out

So, first time when your job runs

Re: in aggregator , how can i get the sum in readable format

Re: i/p o/p1 o/p2 1 1 4 1 1 5 1 1 6 2 2 2 2 2 2 3 3 4 5 6 how to

Re: i/p o/p1 o/p2 1 1 4 1 1 5 1 1 6 2 2 2 2 2 2 3 3 4 5 6 how to

Re: what is the differeces between hash and modulus partition

Re: how can u handle null values in transformer stage.

Re: how to cleansing data

Re: how to cleansing data

Re: how to cleansing data

Re: how to cleansing data

Re: how to transfer file from one system to another system in

Re: what is factless fact table?

Re: WHAT IS FORCE COMPILE?

Re: WHAT IS FORCE COMPILE?

Choose File > Force Compile

Re: What is RCP?

Re: what is time dimension? and how to populate time demension

Re: if a column contains data like ram,rakesh,madhan,suraj,pradeep,bhaskar

stg_name --> Target_name

mod(empkey,2)=0 --->even rows

empkey is generated from surrogate key

Answe DW is for to maintain historical & 0 Navin

Re: what is the use of surogate key in datastage

Answe WE DONT USE PRODUCTION KEYs IN OUR DW,

Answ you can call in the following places in

What are conformed dimensions?

Why fact table is in normal form?

What is a linked cube?

What is a general purpose scheduling tool?

What is Hash file stage and what is it used for?

What is the OCI? and how to use the ETL Tools?

What is Fact table?

What are the steps In Building the Data Model

What is Dimensional Modelling?

What type of Indexing mechanism do we need to use for a typical datawarehouse?

Is it correct/feasible develop a Data Mart using an ODS?

Das könnte Ihnen auch gefallen