Beruflich Dokumente
Kultur Dokumente
A data warehouse can be thought of as a three-tier system in which a middle system provides usable data in a secure way to end users. On either side of this middle system are the end users and the back-end data stores. there are three 1.application tier 2.data tier 3.presentation tier In DWH the three tire architecture can be as follows: 1>The source layer where data lands. 2> The integration layer where after a cleansing,transformation we actually store the data. 3> The dimension layer on which the actual presentation layer stands. Why do we use Surrogate Key? Surrogate key: It is the system generated key which cannot be edited by the user. It is the primary key of the dimension table in warehouse. It is nothing but the sequence which generates the numbers for primary key column.
Surrogate keys are system generated keys. They are integers. Surrogate keys are extremely useful when having type 2 data ( i.e. storing historical information) For ex: Consider one has a table in which a person and his location are stored. Now when his location is changed and we want to keep a historical record of the same, it is stored with a surrogate key that will help us to uniquely identify the record. This is also a reason that OLTP keys are not used in the warehouse and a seperate dimension or surrogate key is maintained.
Its mainly used for tracking the changes of the data.We can easily find the last updated data through this key.
Informatica allows end users and partners to extend the metadata stored in the repository by associating information with individual objects in the repository. For example, when you
create a mapping, you can store your contact information with the mapping. You associate information with repository metadata using metadata extensions. Informatica Client applications can contain the following types of metadata extensions: Vendor-defined. Third-party application vendors create vendor-defined metadata extensions. You can view and change the values of vendor-defined metadata extensions, but you cannot create, delete, or redefine them. User-defined. You create user-defined metadata extensions using PowerCenter/PowerMart. You can create, edit, delete, and view user-defined metadata extensions. You can also change the values of user-defined extensions.
used columns are at the beginning of the delivery. It is also a best practice to avoid delivering transition data. A field that is only used as a source (the date in CYYMMDD format) should not be delivered to the final table. Utilizing the IBM Cognos Decision Stream tool gives the ETL developer all the necessary tools to gather, manipulate, and deliver the data needed to meet the reporting needs of the business.
Production These kinds of tools can facilitate testing and problem correction: Automated test tool Test data generator Test data masker Defect manager Automated test scripts Unit Testing for the Data Warehouse Developers perform tests on their deliverables during and after their development process. The unit test is performed on individual components and is based on the developer's knowledge of what should be developed. Unit testing should definitely be performed before deliverables are turned over to QA by developers. Tested components are likely to have fewer bugs. QA Testers Perform Many Types of Tests QA Testers design and execute a number of tests: Integration Test Test the systems operation from beginning to end, focusing on how data flows through the system. This is sometimes called "system testing" or "end-to-end testing". Regression Test Validate that the system continues to function correctly after being changed. Avoid "breaking" the system. Can the Data Warehouse Perform? Tests can be designed and executed that show how well the system performs with heavy loads of data: Extract Performance Test Test the performance of the system when extracting a large amount of data. Transform and Load Performance Test Test the performance of the system when transforming and loading a large amount of data. Testing with a high volume is sometimes called a "stress test". Analytics Performance Test calculations. Test the performance of the system when manipulating the data through
Business Users Test Business Intelligence Does the system produce the results desired by business users? The main concern is functionality, so business users perform functional tests to make sure that the system meets business requirements. The testing is performed through the user interface (UI) which includes data exploration and reporting. Correctness Test The system must be produce correct results. The measures and supporting context need to match numbers in other systems and be calculated correctly. Usability Test The system should be as easy to use as possible. It involves a controlled experiment
about how business users can use the business intelligence system to reach stated goals. Performance Test The system must be able to return results quickly without bogging down other resources.
Business Intelligence Must Be Believed Quality must be baked into the data warehouse or users will quickly lose faith in the business intelligence produced. It then becomes very difficult to get people back on board. Putting the quality in requires both the testing described in this article and data quality at the source described in the article, Data Sources for Data Warehousing, to launch a successful data warehousing / business intelligence effort.
Understanding Business Intelligence Analyze the current state of the data warehousing industry Data warehousing fundamentals Operational data store (ODS) concepts Data mart fundamentals Defining meta data and its critical role in data warehousing and testing Key Principles in Testing Introduction Testing concepts Overview of the testing and quality assurance phases Project Management Overview Basic project management concepts Project management in software development and data warehousing Testing and quality assurance as part of software project management Requirements Definition for Data Warehouses Requirements management workflow Characteristics of good requirements for decision support systems Requirements-based testing concepts and techniques Audiences in Testing Audiences and their profiles User profiles Customer profiles Functional profiles Testing strategies by audience Test management overview Risk Analysis and Testing Risk analysis overview for testing Test Methods and Testing Levels Static vs. dynamic tests Black, grey and white box testing Prioritizing testing activities Testing from unit to user acceptance Test Plans and Procedures Writing and managing test plans and procedures Test plan structure and test design specifications Test Cases Overview Test case components Designing test scenarios for data warehouse usage Creating and executing test cases from scenarios Validation and Verification Validating customer needs for decision support Tools and techniques for validation, verification and assessment Acceptance Testing for Data Warehouses Ways to capture informal and formal user issues and concerns
Test readiness review Iterative testing for data warehouse projects Reviews and Walk-throughs Reviews versus walkthroughs Inspections in testing and quality assurance Testing Traceability Linking tests to requirements with a traceability matrix Change management in decision support systems and testing To learn more about how *EWSolutions* can provide our World-Class Training for your company or to request a quote, please feel free to contact David Marco, our Director of Education at "DMarco@EWSolutions.com":mailto:DMarco@EWSolutions.com or call him at 630.920.0005 ext. 103. Test Execution and Documentation Managing the testing and quality assurance process Documentation for the testing process Conclusion Summary, advanced exercises, resources for further study
resilient ETL system. Data-lineage and data-dependency functionality. We would like to be able to right-click on a number in a report and see exactly how it was calculated, where the data was stored in the data warehouse, how it was transformed, when the data was most recently refreshed, and what source system or systems underlay the numbers. Dependency is the flip side of lineage: we'd like to look at a table or column in the source system and know which ETL modules, data warehouse tables, OLAP cubes, and user reports might be affected by a structural change. In the absence of ETL standards that hand-coded systems could conform to, we must rely on ETL tool vendors to supply this functionality though, unfortunately, few have done so to date. Advanced data cleansing functionality. Most ETL systems are structurally complex, with many sources and targets. At the same time, requirements for transformation are often fairly simple, consisting primarily of lookups and substitutions. If you have a complex transformation requirement, for example if you need to de-duplicate your customer list, you should use a specialized tool. Most ETL tools either offer advanced cleansing and de-duplication modules (usually for a substantial additional price) or they integrate smoothly with other specialized tools. At the very least, ETL tools provide a richer set of cleansing functions than are available in SQL. Performance. You might be surprised that performance is listed last under the advantages of the ETL tools. It's possible to build a high-performance ETL system whether you use a tool or not. It's also possible to build an absolute dog of an ETL system whether you use a tool or not. I've never been able to test whether an excellent hand-coded ETL system outperforms an excellent tool-based ETL system; I believe the answer is that it's situational. But the structure imposed by an ETL tool makes it easier for an inexperienced ETL developer to build a quality system. Software licensing cost.The greatest disadvantage of ETL tools in comparison to hand-crafted systems is the licensing cost for the ETL tool software. Costs vary widely in the ETL space, from several thousand dollars to hundreds of thousands of dollars. Uncertainty. We've spoken with many ETL teams that are uncertain and sometimes misinformed about what an ETL tool will do for them. Some teams under-value ETL tools, believing they are simply a visual way to connect SQL scripts together. Other teams unrealistically over-value ETL tools, imagining that building the ETL system with such a tool will be more like installing and configuring software than developing an application. Reduced flexibility. A tool-based approach limits you to the tool vendor's abilities and scripting languages. Build a Solid Foundation Learn about a predictive analytics solution that provides forward-looking insights, and offers best practices to ensure you maximize the value of your company's customer relationships Best Practices for Collaboration in the Enterprise There are some over-arching themes in successful ETL system deployments regardless of which tools and technologies are used. Most important and most frequently neglected is the practice of designing the ETL system before development begins. Too often we see systems that just evolved without any initial planning. These systems are inefficient and slow, they break down all the time, and they're unmanageable. The data warehouse team has no idea how to pinpoint the bottlenecks and problem areas of the system. A solid system design should incorporate the concepts described in detail in Kimball University: The Subsystems of ETL Revisited, by Bob Becker. Good ETL system architects will design standard solutions to common problems such as surrogate key assignment. Excellent ETL systems will implement these standard solutions most of the time but offer enough flexibility to deviate from those standards where necessary. There are usually half a dozen ways to solve any ETL problem, and each one may be the best solution in a specific set of
circumstances. Depending on your personality and fondness for solving puzzles, this can be either a blessing or a curse. One of the rules you should try to follow is to write data as seldom as possible during the ETL process. Writing data, especially to the relational database, is one of the most expensive tasks that the ETL system performs. ETL tools contain functionality to operate on data in memory and guide the developer along a path to minimize database writes until the data is clean and ready to go into the data warehouse table. However, the relational engine is excellent at some tasks, particularly joining related data. There are times when it is more efficient to write data to a table, even index it, and let the relational engine perform a join than it is to use the ETL tool's lookup or merge operators. We usually want to use those operators, but don't overlook the powerful relational database when trying to solve a thorny performance problem. Whether your ETL system is hand-coded or tool-based, it's your job to design the system for manageability, auditability, and restartability. Your ETL system should tag all rows in the data warehouse with some kind of batch identifier or audit key that describes exactly which process loaded the data. Your ETL system should log information about its operations, so your team can always know exactly where the process is now and how long each step is expected to take. You should build and test procedures for backing out a load, and, ideally, the system should roll back transactions in the event of a midstream failure. The best systems monitor data health during extraction and transformation, and they either improve the data or issue alerts if data quality is substandard. ETL tools can help you with the implementation of these features, but the design is up to you and your team. Should you use an ETL tool? Yes. Do you have to use an ETL tool? No. For teams building their first or second ETL system, the main advantage of visual tools are self-documentation and a structured development path. For neophytes, these advantages are worth the cost of the tool. If you're a seasoned expert perhaps a consultant who has built dozens of ETL systems by hand it's tempting to stick to what has worked well in the past. With this level of expertise, you can probably build a system that performs as well, operates as smoothly, and perhaps costs less to develop than a tool-based ETL system. But many seasoned experts are consultants, so you should think objectively about how maintainable and extensible a hand-crafted ETL system might be once the consultant has moved on. Don't expect to reap a positive return on investment in an ETL tool during the development of your first system. The advantages will come as that first phase moves into operation, as it's modified over time, and as your data warehouse grows with the addition of new business process models and associated ETL systems.
Price and Quantity are measurable attributes of a transaction. Store, Products Sold, Sales Person, Store Name, Sales Date, and Customer are dimensional attributes of a transaction. We can see that the dimensional data is already embedded in the transaction. And with dimensional attributes we can successfully complete the transaction.Dimensional data that directly participates in a transaction is master data. But is the list of dimensional attributes in the transaction complete? Asking few analytical questions can help us discover the answer. -What is the Male to Female ratio of customers doing purchase at the store? -What type of products are customers buying? Ex: Electronic, Computers, Toys -What type of Store is it? Ex: Web store, Brick & Mortar, Telesales, Catalog Sales The above questions cannot be answered by attributes in the transaction. These dimensional data is missing in the transactions. This missing dimensional data that does not directly participate in transaction but are attributes of the dimension is reference data. Why it is important for an ETL person to understand the differences? Well once the Reference Data Management (RDM) was popular then suddenly in last few years there is this new word Master Data Management (MDM). These words mean different things and they have significant implication on how they are managed. But that will be a topic of discussion for some future post! I hope this article will help clear atleast some confusion.