You are on page 1of 5

All Questions DWH & Data Stage 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. What and why DW?

When did it come into implementation? Concepts of DW? Architecture of DW. Difference between OLTP & DW. What is ODS? What are the different types of Schemas that you come across in DW? Different types of Dimensions Brief about Slowly Changing Dimensions and its implementation in DW. In DW what kind of data can be stored? What is Data Modeling? Hardware configuration required for DS Setup? What is ETL and define its Processing. What is the different type of ETL tools in the Market? Architecture of DataStage. Components of Datastage. Administrator, Designer, Director, Manager 14. What is the basic requirement during installing Datastage Server in NT machine? 15. What are the different services that have to be run during running job from client machine in the server? [ RPC/Telnet/Engine Resource Services]
3 DataStage Services DataStage Engine Resource Service DataStage Telnet Service Uni RPC Service.

16. What is a Project and where do you create it. By using Administrator 17. What are the different roles in DS? How do you create users in DS?
DataStage Users: Administrator can create the users on project basis. You cannot assign individual users to these categories, you have to assign the operating system user group to which the user belongs. DataStage Developer: who has full access to all areas of a DataStage project DataStage Operator: who can run and manage DataStage jobs None: who does not have permission to log on to DataStage.

18. Where do you specify the inactivity time out?

By default, the connection between the DataStage client and server times out after 3600 seconds inactivity. To change the default: 1. In the DataStage Administration window, click the General tab to move the General page to the front. 2. In the Inactivity Timeout area, use the up and down buttons to change the timeout period, or enter the new timeout period in the seconds field. 3. To disable inactivity timeout, select the Do not timeout check box. 4. Click Apply to apply the new settings. The changes take effect when you restart the server engine. If inactivity timeout is already disabled when the DataStage Administrator starts, the timeout reverts to the 3600-second default when you re-enable it. You use the pages in the Project Properties window to do the following: General. Enable job administration from the DataStage Director and define a project-wide setting for auto-purge of the job log. If MetaStage is installed, you can also configure DataStage to send it meta data generated by project jobs. Permissions. Assign user categories to operating system user groups, or enable operators to view all the details of an event in a job log file. The Permissions tab is enabled only if you have logged on to DataStage using a name that gives you administrator status. Traci ng. Enable or disable tracing on the server. Schedule. Set up a user name and password to use for running scheduled DataStage jobs. The Schedule tab is enabled only if you have logged on to a Windows NT server. Mainframe. Set mainframe job properties and the default platform type. This page is enabled only if DataStage/390 is installed. Tunables. Configure cache settings for Hashed File stages. Input Link, Output Link, Reference Link, Reject Link

19. What is tracing in DS. What is Validation?

20. What are the different types of links, and how many I/O link can a transformer have. 21. What are the different views that you can see in Director. View status, View Schedule, View Log 22. Where do you schedule a job and how? Add To Job Scheduler on Director

23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50.

What are the different types of transformers? How is the execution order in Transformers? Errors that you have come across in Datastage What are the different steps that you come across during executing/creation of a job? What is Active (manipulating) and Passive (Insert) stages in Datastage. What are constraints and Derivates How do you do a backup of Datastage Jobs? By Using Datastage Manager Different languages used in DS Conversion of Date format using Datastage or how do you convert a date into Datastage native format? How do you avoid blank row in Sequential file? How will you execute UNIX script in DS? How do you trigger jobs in a sequencer with OK or Warning condition? How can you get start time and end time of job into one table? What is Routines and Transforms How do you do Job Scheduling in DS, what are the different activities that you can do in job scheduling Add To Job Scheduler on Director What is stage/system Variables? Can you give an example were this can be used. Hash files/ ODBC/ OCI, what are the major difference between these, when are they used. What are the different possible ways of extracting data from DB? What is the fastest way of retrieving data from DD? What are the different algorithms used for Hash How does hash file work, in DS As Lookup What are the different kinds of Hash files; in what scenario it can be used? How do you call a stored procedure from Datastage and how do you pass the parameters dynamically? How do you parameterize job in DS. What are the different Permissions that you can provide for a user

What are the different update criteria that you can use in ODBC stage? What is Operator / Operand Menu in DS What are the ways in which you think you can increase the performance in DS. I want to check who all have locked the job and through which option do you rectify that. Locks When you enable the job administration commands in the Director, two commands are enabled in the Director Job menu Cleanup Resources: Lets the user to View processes and locks, End job processes, Release locks. Clear Status File: Lets the user remove the status records associated with all stages of the selected job and helps you return a job to a stage in which you can rerun it. 51. What are shared and local Container, when and where are they used. Containers: Is a group of stages and links. Enable you to simplify and modularize your server job designs by replacing complex areas of the diagram with a single container stage. Local Containers: These are created with a job and are only accessible by that job. You can create a Local container from the scratch or place a set of existing Stages and Links within a container. You can deconstruct a shared container stage in a job design by first converting it to a local container, then deconstructing it. Shared containers: These are created separately and are stored in the Repository in the same way that jobs are. Instances of a shared container can be inserted into any server job in the project. When you add an instance of a shared container to a job, you will need to specify meta data details for the links into and out of the container. If you change the contents of a shared container you will need to recompile these jobs that use the container in order for the changes to take effect. 52. How can you remove Duplicates 53. How can you call Stored Procedures. 54. What is Staging Area, When are you using files and when are you using table for Staging Area. 55. Stage Variable 56. Environmental Variables

5.x System variables: Set of read-only system variables that store system data such current date, time, pathname and so on. These can be accessed from routine or transform Ex: @DATE, @DAY, @FALSE, @LOGNAME, @TIME, @USERNO, @WHO, REJECTED etc.. 7.x Environment variables. 57. Hash / Sequential File. 58. How can you save the details, which are coming in Monitor. Tracing To help Ascential analysts during troubleshooting, and appears only for compiled jobs. The options determine the amount of diagnostic information generated (in &PH& subdirectory of your DataStage server installation directory) the next time a job is run. Trace Level Report row data: to record an entry for every data row read on input or written on output. Property values: to record an entry for every input and output opened and closed. Subroutine calls: to record an entry for every BASIC subroutine called. 59. 5 jobs are running in parallel using a single container having sequence generator. If one job aborts and you reset that what is the out come on the remaining 4 jobs. Parallel Extender 60. What is a APT file and where do you define this file? 61. Where do you define the parameter of DS and what are the different variables that you have to specify to run the job successfully (Config. file). 62. I want to pad a string character globally what will you do? 63. How will you commit n number of rows globally though DS? 64. How will you check the connectivity of DS? 65. What is the difference between copy and Transformer stage? Copies a single input data set to a number of output data sets. Each record of the input data set is copied to every output data set. Records can be copied without modification or you can drop or change the order of columns. 66. Difference between file set and dataset? Where will you use it? 67. I have 4 nodes specified, but I want to use only one node for a particular job, where do you specify in job? 68. What are the different stages for debugging in PX? 69. What is usage Analysis? Usage Analysis tool allows you to check where items in the DataStage Repository are used. Usage Analysis gives you a list of the items that use a particular source item. 70. What is the evaluation sequence in a Transformer? 71. What are the different types of lookup available in Datastage. 72. How do you improve the Import and Export of Sequential File stages ($APT_IMPORT_BUFFER_SIZE and $APT_EXPORT_BUFFER_SIZE) 73. Explain the way as to how will you handle database reject for a given link in DataStage (i.e. rows getting reject due to some primary key constraints).Give the DataStage Basic expression you would use in Constraints to handle the above for a given link? 74. How do you build a controlling job? (DS Job routines) 75. How do you conditionally abort a job? 76. Modify Stage, what is the use of this stage. Modify Operator takes a single data set as input and alters the record schema of the input data set to create the output data set. 1. Keeping and dropping fields. 2. Renaming fields 3. Changing a fields data type 4. Changing the null attribute of a field. 77. Lookup Vs Join Stage 78. What is the difference between Merge/Join and Look up? Joins Lookup Merge Description RDBMS-style Source and lookup table Master table and one or more relational tables in RAM update tables Memory usage Light Heavy Light

Number and names of Inputs Handling of duplicates in Primary input Handling of duplicates in Secondary input Options on unmatched Primary Options on unmatched Secondary On match, Secondary entries are Number of outputs

2 or more inputs OK, produces a cross-product OK, produces a cross-product NONE NONE reusable

1 source and N lookup tables OK warning given. The second lookup table entry is ignored Fail, continue, drop, or reject Fail is the default NONE reusable 1 1 output and optionally 1 reject Unmatched primary entries

1 master table and N update tables Warning given, Duplicate will be an unmatched primary. OK only when N=1

Keep or drop. Keep is the default Capture in reject sets reusable 1 output and 1 reject for each update table Unmatched secondary entries.

Captured in reject sets Does not apply 79. When you use a Funnel Stage? Funnel operators copy multiple data sets to a single output data set. Funnel: combines the records of the input data in no guaranteed order. Sortfunnel: combines the input records in the order defined by the value(s) of one or more key fields. Funnel merges the files without any key checks. It just brings all the input files into single output file 80. What is a IPC stage? Is a passive stage, which provides a communication channel between DataStage processes running simultaneously in the same job. It allows you to design jobs that run on SMP systems with great performance benefits. You can also specify this behavior implicitly by turning inter process row buffering on. 81. Change Capture stage/Change apply, give me an example when you will be using it. Inputs two data sets, denoted before and after, and outputs a single data set whose records represent the changes made to the before data set to obtain the after data set. The operator produces a change data set, whose schema is transferred from the schema of the after data set with the addition of one field; a change code with values encoding the four actions: insert, delete, copy, and edit. You can use the companion operator changeapply to combine the changes from the changecapture operator with the original before dataset to reproduce the after data set. 82. How do you get top 10 employees of a company? 83. How do you convert rows to column? 84. How will you implement inner join in Datastage? Using constraints (table2.key field is not null) 85. How do you Apply Slowing changing Dimension in Datastage? 86. Can I use ODBC/OCI/Hash file as look up? If not why? If yes why and which is beneficial, and what is the difference between them. You can use all 3 as lookups. Hash file is the beneficial one. 87. How many out put links can you have from a reference link? N number of Links 88. When I use Compare stage I am getting the following error: Compare_2: When checking operator: User inserted sort "Compare_2.DSLink3_Sort" does not fulfill the sort requirements of the downstream operator "Compare_2", Tell me the reason? This is because the sort order in each of the file is different, and in compare stage the keys should be specified in the properties tab. 89. Merge can be used as a lookup. Which one is faster Lookup Stage or Merge Stage? UNIX 90. Shell scripting for a. Create directory/file b. Delete file c. Move file d. Copy Command e. View Command f. Finding files

91. 92. 93. 94. 95.

Executing SQL files through UNIX Changing the access mode of the Unix files SED and AWK commands for searching and Printing Usage of Grep command

*#0000#