You are on page 1of 10

DWH Concepts and Informatica Q&A

Q. What are the important fields in a recommended Time dimension table? Time_key Day_of_week Day_number_in_month Day_number_overall Month Month_number_overall Quarter Fiscal_period Season Holiday_flag Weekday_flag Last_day_in_month_flag Q. Why have timestamp as a surrogate key rather than a real date? The tiem stamp in a fact table should be a surrogate key instead of a real date because: the rare timestamp that is inapplicable, corrupted, or hasnt happened yet needs a value that cannot be a real date most end-user calendar navigation constraints, such as fiscal periods, end-of-periods, holidays, day numbers and week numbers arent supported by database timestamps integer time keys take up much less disk space than full dates Q. Why have more than one fact table instead of a single fact table? We cannot combine all of the business processes into a single fact table because: the separate fact tables in the value chain do not share all the dimensions. You simply cant put the customer ship to dimension on the finished goods inventory data each fact table possesses different facts, and the fact table records are recorded at different tiems along the alue chain Q. What types of data sources does Data Warehouse Center support? The Data Warehouse Center supports a wide variety of relational and non relational data sources. You can populate your Data Warehouse Center warehouse with data from the following databases and files: Any DB2 family database Oracle Sybase Informix Microsoft SQL Server IBM DataJoiner Multiple Virtual Storage (OS/390), Virtual Machine (VM), and local area network (LAN) files IMS and Virtual Storage Access Method (VSAM) (with Data Joiner Classic Connect) Q. What are the Different types of OLAP's? What are their differences? OLAP - Desktop OLAP(Cognos), ROLAP, MOLAP(Oracle Discoverer) ROLAP, MOLAP and HOLAP are specialized OLAP (Online Analytical Analysis) applications.

ROLAP stands for Relational OLAP. Users see their data organized in cubes with dimensions, but the data is really stored in a Relational Database (RDBMS) like Oracle. The RDBMS will store data at a fine grain level, response times are usually slow. MOLAP stands for Multidimensional OLAP. Users see their data organized in cubes with dimensions, but the data is store in a Multi-dimensional database (MDBMS) like Oracle Express Server. In a MOLAP system lot of queries have a finite answer and performance is usually critical and fast. HOLAP stands for Hybrid OLAP, it is a combination of both worlds. Seagate Software's Holos is an example HOLAP environment. In a HOLAP system one will find queries on aggregated data as well as on detailed data. Q. What approach to be followed for creation of Data Warehouse? Top Down Approach (Data warehousing first) , Bottom Up (data marts), Enterprise Data Model ( combines both) Q. Database administrators (DBAs) have always said that having non-normalized or denormalized data is bad. Why is de-normalized data now okay when it's used for Decision Support? Normalization of a relational database for transaction processing avoids processing anomalies and results in the most efficient use of database storage. A data warehouse for Decision Support is not intended to achieve these same goals. For Data-driven Decision Support, the main concern is to provide information to the user as fast as possible. Because of this, storing data in a denormalized fashion, including storing redundant data and pre-summarizing data, provides the best retrieval results. Also, data warehouse data is usually static so anomolies will not occur from operations like add, delete and update a record or field. Q. How often should data be loaded into a data warehouse from transaction processing and other source systems? It all depends on the needs of the users, how fast data changes and the volume of information that is to be loaded into the data warehouse. It is common to schedule daily, weekly or monthly dumps from operational data stores during periods of low activity (for example, at night or on weekends). The longer the gap between loads, the longer the processing times for the load when it does run. A technical IS/IT staffer should make some calculations and consult with potential users to develop a schedule to load new data. Q. What are the benefits of data warehousing? Some of the potential benefits of putting data into a data warehouse include: 1. Improving turnaround time for data access and reporting; 2. Standardizing data across the organization so there will be one view of the "truth"; 3. Merging data from various source systems to create a more comprehensive information source; 4. Lowering costs to create and distribute information and reports; 5. Sharing data and allowing others to access and analyze the data; 6. Encouraging and improving fact-based decision making. Q. What are the limitations of data warehousing? The major limitations associated with data warehousing are related to user expectations, lack of data and poor data quality. Building a data warehouse creates some unrealistic expectations that need to be managed. A data warehouse doesn't meet all decision support needs. If needed data

is not currently collected, transaction systems need to be altered to collect the data. If data quality is a problem, the problem should be corrected in the source system before the data warehouse is built. Software can provide only limited support for cleaning and transforming data. Missing and inaccurate data can not be "fixed" using software. Historical data can be collected manually, coded and "fixed", but at some point source systems need to provide quality data that can be loaded into the data warehouse without manual clerical intervention. Informatica

Q. What type of repositories can be created using Informatica Repository Manager? A. Informatica PowerCenter includeds following type of repositories : Standalone Repository : A repository that functions individually and this is unrelated to any other repositories. Global Repository : This is a centralized repository in a domain. This repository can contain shared objects across the repositories in a domain. The objects are shared through global shortcuts. Local Repository : Local repository is within a domain and its not a global repository. Local repository can connect to a global repository using global shortcuts and can use objects in its shared folders. Versioned Repository : This can either be local or global repository but it allows version control for the repository. A versioned repository can store multiple copies, or versions of an object. This features allows to efficiently develop, test and deploy metadata in the production environment.

Q. What is a code page? A. A code page contains encoding to specify characters in a set of one or more languages. The code page is selected based on source of the data. For example if source contains Japanese text then the code page should be selected to support Japanese text. When a code page is chosen, the program or application for which the code page is set, refers to a specific set of data that describes the characters the application recognizes. This influences the way that application stores, receives, and sends character data. Q. Which all databases PowerCenter Server on Windows can connect to? A. PowerCenter Server on Windows can connect to following databases: IBM DB2 Informix Microsoft Access Microsoft Excel Microsoft SQL Server Oracle Sybase Teradata

Q. Which all databases PowerCenter Server on UNIX can connect to? A. PowerCenter Server on UNIX can connect to following databases:

IBM DB2 Informix Oracle Sybase Teradata

Infomratica Mapping Designer Q. How to execute PL/SQL script from Informatica mapping? A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. In SP Transformation PL/SQL procedure name can be specified. Whenever the session is executed, the session will call the pl/sql procedure How can v elimate dublicate values from lookup without overriding sql? We can remove the duplicate values from the lookup by using the DISTINCT clause with SQL queries. 2A : in the source qualifier select the prequalifier and select the distinct clause 3A : hi,if I understand this question correctly,lookup itself eliminate duplicate rows by having options like First Value,Last Value.So whenever there are more than one row for matching lookup condition then it gets eliminated by first value.last Value option. What is the difference between OLTP and ODS? OLTP is online transaction processing systems and ODS os operational database system. In OLTP we can save the current data, it depends on the day to day transactions and it stores the day to day data. In ODS we can store data for a month also and it is not restricted to a specific day or transaction. Q : Discuss the advantages & Disadvantages of star & snowflake schema? Star Schema: It is fully denormalized schema.the diagram of fact table with dimension tables resembles a star that's why it is called star schema.All the dimensions will be in 2nd normal form. Snow flow schema:In this all dimensions will be in normalized form.that's why it is called normalized star schema.For each attribut a seperate table will be created.As there is a possibility for more number of joins,obviously the performance will be degraded. 2A : in starschema the fact is denormalised ...all dimension tables are normalise..there will be primary foreignkey relation ship between fact and dimension tables. for better perfomance we use starschema when compare to snow flake schema ..where fact table and dimension tables are normalised...for every dimension table tthere will be a look table ..we have to dig from top to bottom in the snowflake schema.

Q what are the real time problems generally come up while doing or running mapping or any transformation? mapping is invalid.database driver error . Q ONE FLAT FILE IS THERE WHICH IS COMMA DALAMETED . HOW TO CHANGE THAT COMMA DELEMITER TO ANY OTHER AT THE TIME OF RUNNING ? I think we can change it in session properties of mapping tab.if select flatfile on top of that we see set file properties. Q In router source is a boy age 20 I given 3 conditions in router a>20, a<=20, a=20 which one exit first? In the router source,after defining the above conditions then first it is excute a<20 and then it will excute a=20 and then it will excute a>20 Q version controlling in informatica? version controlling in the sense if u want edit the mapping at the time u use versioning. we are using versioning in vss(visul safe source) 2A : Version controll means, if you want to modify the mapping, we can use this concept and in this the version numbers are like 1,2 etc, but not like 1.1, 1.2. in this we have this follwing function like check in, check out, undo check out, view history, delete, recovery, purge. power center versioning is a repository option. when repository is createing we can enabling this option and after creadted repository also we can enable it, once enble this optin we can not do disable again. Re: i have two flat files.. containing same type of data i want to load it to dwh..how many source qualifires i need If the 2 flat files have the same structure,then we can go for filelist concept in informatica. only one source qualifier is needed and the source should be either of the flat files. Re: I am getting five sources in a day and i donot know when i get them. i need to load data into the target and run the session. but here i can't keep the session in running or can't stop the session. plz help me You can use the "Event wait" task and trigger the workflow whenever you get a particular file in a specified location.Here the file name should be in a specific format. Re: Flat file heaving 1 lack records and I want to push 52000 records to the target? I think this can be handle through unix. If your source file contains more than 52k records we can use if statment in script to take first 52k records from the file by using the head command and pass into other file and use this file as your source. 2A : using test load option we can specify particular no of records How to move the data from development repository to testing repository:

Exporting File from development repository: Open the Informatica Power Center Client Designer. First Login to development repository by entering the Username, Password, Host name and Port number. Now Login to testing repository by entering the Username, Password, Host name and Port number. Now we have both repositories in repository navigator. Click on development repository mapping folder, then right click on the folder(Mapping) Click on open option from right click menu (Then mapping will be opened in mapping designer) Now select Export option from right click menu It will ask you to save the file (Then give the path in the browser where you want to export this file. It is .xml extension file.) Once you save it, it will display output, errors (if any), warnings (if any) Now click on close. Well. Now you have successfully exported the file. Import File from local folder to Test repository: From Informatica Power Center Client Designer menu bar select repository, in this menu you could find import objects option. Click on this option. It will display wizard to browse the object file, browse the file, which you have exported from development repository. Now click next button Add source, target, mapping objects to be imported to test repository Again click on next button. Finally click finish button. Now the file will be imported to test repository. Similarly Workflows will be moved from Development repository to Test repository in the workflow designer. This is the same procedure to move the data from Testing repository to Production repository. What is difference between IIF and DECODE function You can use nested IIF statements to test multiple conditions. The following example tests for various conditions and returns 0 if sales is zero or negative: IIF( SALES > 0, IIF( SALES < 50, SALARY1, IIF( SALES < 100, SALARY2, IIF( SALES < 200, SALARY3, BONUS))), 0 ) You can use DECODE instead of IIF in many cases. DECODE may improve readability. The following shows how you can use DECODE instead of IIF : SALES > 0 and SALES < 50, SALARY1, SALES > 49 AND SALES < 100, SALARY2, SALES > 99 AND SALES < 200, SALARY3, SALES > 199, BONUS) 14. What r the reusable transforamtions?

Reusable transformations can be used in multiple mappings.When u need to incorporate this transformation into maping,U add an instance of it to maping.Later if U change the definition of the transformation ,all instances of it inherit the changes.Since the instance of reusable transforamation is a pointer to that transforamtion,U can change the transforamation in the transformation developer,its instances automatically reflect these changes.This feature can save U great deal of work. 15. What r the methods for creating reusable transforamtions? Two methods 1.Design it in the transformation developer. 2.Promote a standard transformation from the mapping designer.After U add a transformation to the mapping , U can promote it to the status of reusable transformation. Once U promote a standard transformation to reusable status,U can demote it to a standard transformation at any time. If u change the properties of a reusable transformation in mapping,U can revert it to the original reusable transformation properties by clicking the revert button. 16.What r the unsupported repository objects for a mapplet? COBOL source definition Joiner transformations Normalizer transformations Non reusable sequence generator transformations. Pre or post session stored procedures Target defintions Power mart 3.5 style Look Up functions XML source definitions IBM MQ source defintions 17. What r the mapping paramaters and maping variables? Maping parameter represents a constant value that U can define before running a session.A mapping parameter retains the same value throughout the entire session. When u use the maping parameter ,U declare and use the parameter in a maping or maplet.Then define the value of parameter in a parameter file for the session. Unlike a mapping parameter,a maping variable represents a value that can change throughout the session.The informatica server saves the value of maping variable to the repository at the end of session run and uses that value next time U run the session. 18. Can U use the maping parameters or variables created in one maping into another maping? NO. We can use mapping parameters or variables in any transformation of the same maping or mapplet in which U have created maping parameters or variables. 19.Can u use the maping parameters or variables created in one maping into any other reusable transformation? Yes.Because reusable tranformation is not contained with any maplet or maping. 20.How can U improve session performance in aggregator transformation? Use sorted input. 21.What is aggregate cache in aggregator transforamtion? The aggregator stores data in the aggregate cache until it completes aggregate calculations.When u run a session that uses an aggregator transformation,the informatica server creates index and data caches in memory to process the transformation.If the informatica server requires more space,it stores overflow values in cache files. 22.What r the diffrence between joiner transformation and source qualifier transformation?

U can join hetrogenious data sources in joiner transformation which we can not achieve in source qualifier transformation. U need matching keys to join two relational sources in source qualifier transformation.Where as u doesn't need matching keys to join two sources. Two relational sources should come from same datasource in sourcequalifier.U can join relatinal sources which r coming from diffrent sources also. 23.In which condtions we can not use joiner transformation(Limitaions of joiner transformation)? Both pipelines begin with the same original data source. Both input pipelines originate from the same Source Qualifier transformation. Both input pipelines originate from the same Normalizer transformation. Both input pipelines originate from the same Joiner transformation. Either input pipelines contains an Update Strategy transformation. Either input pipelines contains a connected or unconnected Sequence Generator transformation. 24. What r the settiings that u use to cofigure the joiner transformation? Master and detail source Type of join Condition of the join 25. What r the join types in joiner transformation? Normal (Default) Master outer Detail outer Full outer 26. What r the joiner caches? When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the master source and builds index and data caches based on the master rows. After building the caches, the Joiner transformation reads records from the detail source and perform joins. 27. What is the look up transformation? Use lookup transformation in u'r mapping to lookup data in a relational table,view,synonym. Informatica server queries the look up table based on the lookup ports in the transformation.It compares the lookup transformation port values to lookup table column values based on the look up condition. 28.Why use the lookup transformation? To perform the following tasks. Get a related value. For example, if your source table includes employee ID, but you want to include the employee name in your target table to make your summary data easier to read. Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales). Update slowly changing dimension tables. You can use a Lookup transformation to determine whether records already exist in the target. 29.What r the types of lookup? Connected and unconnected 30.Differences between connected and unconnected lookup? Connected lookup Unconnected lookup

Receives input values diectly from the pipe line. U can use a dynamic or static cache Cache includes all lookup columns used in the maping Support user defined default values

Receives input values from the result of a lkp expression in a another transformation. U can use a static cache. Cache includes all lookup out put ports in the lookup condition and the lookup/return port. Does not support user defiend default values

50.Describe two levels in which update strategy transformation sets? Within a session. When you configure a session, you can instruct the Informatica Server to either treat all records in the same way (for example, treat all records as inserts), or use instructions coded into the session mapping to flag records for different database operations. Within a mapping. Within a mapping, you use the Update Strategy transformation to flag records for insert, delete, update, or reject. 51.What is the default source option for update stratgey transformation? Data driven. 52. What is Datadriven? The informatica server follows instructions coded into update strategy transformations with in the session maping determine how to flag records for insert,update,,delete or reject If u do not choose data driven option setting,the informatica server ignores all update strategy transformations in the mapping. 53.What r the options in the target session of update strategy transsformatioin? Insert Delete Update Update as update Update as insert Update esle insert Truncate table 54. What r the types of maping wizards that r to be provided in Informatica? The Designer provides two mapping wizards to help you create mappings quickly and easily. Both wizards are designed to create mappings for loading and maintaining star schemas, a series of dimensions related to a central fact table. Getting Started Wizard. Creates mappings to load static fact and dimension tables, as well as slowly growing dimension tables. Slowly Changing Dimensions Wizard. Creates mappings to load slowly changing dimension tables based on the amount of historical dimension data you want to keep and the method you choose to handle historical dimension data. 22.What is a folder? Folder contains repository objects such as sources, targets, mappings, transformation which are helps logically organize our data warehouse.

Can you create a folder within designer? Not possible 31.What are the port available for update strategy , sequence generator, Lookup, stored procedure transformation? Transformations Update strategy Sequence Generator Lookup Stored Procedure Port Input, Output Output only Input, Output, Lookup, Return Input, Output