Sie sind auf Seite 1von 91

DWH Concepts

What is a DATA WAREHOUSE? A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources. In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users. A data warehouse is a database designed to support a broad range of decision tasks in a specific organization. It is usually batch updated and structured for rapid online queries and managerial summaries. Data warehouses contain large amounts of historical data. The term data warehousing is often used to describe the process of creating, managing and using a data warehouse. What are the characteristics of a DATA WAREHOUSE? The characteristics of a DWH are

Subject-Oriented: DWHs are designed to help you analyze data. For example, to learn more about the companys sales data, you can build a warehouse that concentrates on sales. This ability to define a DWH by subject matter, sales in this case makes the DWH subject oriented. Integrated: It is closely related to subject orientation. DWHs put data from desperate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said be integrated. Nonvolatile: It means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred and whatever once happened never changes. Time-Variant: In order to discover trends, analysts need large amounts of data. This is very much in contrast to OLTP systems, where performance requirements demand that historical data be moved to an archive. A DWH focus on change over time is what is meant by the term time variant.

What are the goals of a DATA WAREHOUSE?

The goals of a DATA WAREHOUSE are

To provide a reliable, single integrated source of key corporate information. To give end users access to their data without a reliance on reports produced by the information system department. To allow analysts to analyze corporate data and even produce predictive what if models from that data. The data warehouse is simply one component of modern reporting architectures. The real goal of reporting systems are decision support or its modern equivalent Business intelligence-to help people makes better, more intelligent decision. When should a company consider implementing a data warehouse? Data warehouses or a more focused database called a data mart should be considered when a significant number of potential users are requesting access to a large amount of related historical information for analysis and reporting purposes. So-called active or real-time data warehouses can provide advanced decision support capabilities. What are the uses of DATAWAREHOUSE?

It separates analysis workload and enables an organization to consolidate data from several sources. It manages the process of gathering data and delivering to business users. It is used to analyze data. It puts data from desperate sources into a consistent format.

What are the benefits of data warehousing? Some of the potential benefits of putting data into a data warehouse include: 1. Improving turnaround time for data access and reporting; 2. Standardizing data across the organization so there will be one view of the "truth"; 3. Merging data from various source systems to create a more comprehensive information source; 4. Lowering costs to create and distribute information and reports; 5. Sharing data and allowing others to access and analyze the data; 6. Encouraging and improving fact-based decision-making.

What are the limitations of data warehousing?

The major limitations associated with data warehousing are related to user expectations, lack of data and poor data quality. Building a data warehouse creates some unrealistic expectations that need to be managed. A data warehouse doesn't meet all decision support needs. If needed data is not currently collected, transaction systems need to be altered to collect the data. If data quality is a problem, the problem should be corrected in the source system before the data warehouse is built. Software can provide only limited support for cleaning and transforming data. Missing and inaccurate data can not be "fixed" using software. Historical data can be collected manually, coded and "fixed", but at some point source systems need to provide quality data that can be loaded into the data warehouse without manual clerical intervention. What data is stored in a data warehouse? In general, organized data about business transactions and business operations is stored in a data warehouse. But, any data used to manage a business or any type of data that has value to a business should be evaluated for storage in the warehouse. Some static data may be compiled for initial loading into the warehouse. Any data that comes from mainframe, client/server, or web-based systems can then be periodically loaded into the warehouse. The idea behind a data warehouse is to capture and maintain useful data in a central location. Once data is organized, managers and analysts can use software tools like OLAP to link different types of data together and potentially turn that data into valuable information that can be used for a variety of business decision support needs, including analysis, discovery, reporting and planning. Database administrators (DBAs) have always said that having non-normalized or de-normalized data is bad. What are the methodologies of Data Warehousing? Every company has methodology of their own. But to name a few SDLC Methodology, AIM methodology are sturdily used. Other methodologies are AMM, World class methodology and many more. How does my company get started with data warehousing? Build one! The easiest way to get started with data warehousing is to analyze some existing transaction processing systems and see what type of historical trends and comparisons might be interesting to examine to support decision making. See if there is a "real" user need for integrating the data. If there is, then IS/IT staff can develop a data model for a new schema and load it with some current data and start creating a decision support data store using a database management system (DBMS). Find some software for query and reporting and build a decision support interface that's easy to use. Although the initial data warehouse/data-driven DSS may seem to meet only limited needs, it is a "first step". Start small and build more sophisticated systems based upon experience and successes.

What is the Data warehouse Implementation Schemes? What type of Indexing mechanism do we need to use for a typical data warehouse? On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes. To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps. What are the steps to build the data warehouse? Gathering business requirements Identifying Sources Identifying Facts Defining Dimensions Define Attributes Redefine Dimensions & Attributes Organize Attribute Hierarchy & Define Relationship Assign Unique Identifiers Additional conventions: Cardinality/Adding ratios How often should data be loaded into a data warehouse from transaction processing and other source systems? It all depends on the needs of the users, how fast data changes and the volume of information that is to be loaded into the data warehouse. It is common to schedule daily, weekly or monthly dumps from operational data stores during periods of low activity (for example, at night or on weekends). The longer the gap between loads, the longer the processing times for the load when it does run. A technical IS/IT staffer should make some calculations and consult with potential users to develop a schedule to load new data.

What are the different architectures of data warehouse? What are the different approaches of a Data warehouse? There are two main things

Top down - (bill Inmon) Bottom up - (Ralph Kimball) What are the types of a data warehouse? What is the main difference between Inmon and Kimball philosophies of data warehousing? Both differed in the concept of building the data warehouse. Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtained from the dimension modeling on a local departmental level. Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary. i.e., Kimball--First Data Marts--Combined way ---Data warehouse Inmon---First Data warehouse--Later----Data marts When should I consider a Data warehouse solution? What is the process of warehousing data? Explain the architecture of a data warehouse with the diagram. What is Staging Area? What is a general purpose scheduling tool? The basic purpose of the scheduling tool in a DW Application is to stream line the flow of data from Source to Target at specific time or based on some condition. What is real time data warehousing? Real-time data warehousing is a combination of two things: 1. real-time activity and 2. Data warehousing. Real-time activity is activity that is happening right now. The activity could be anything such as the sale of widgets. Once the activity is complete, there is data about it. Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into

the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available. What is ODS? ODS means Operational Data Store. A collection of operation or bases data that is extracted from operation databases and standardized, cleansed, consolidated, transformed, and loaded into enterprise data architecture. An ODS is used to support data mining of operational data, or as the store for base data that is summarized for a data warehouse. The ODS may also be used to audit the data warehouse to assure summarized and derived data is calculated properly. The ODS may further become the enterprise shared operational database, allowing operational systems that are being reengineered to use the ODS as there operation databases. What is Active data warehousing? An active data warehouse provides information that enables decision-makers within an organization to manage customer relationships nimbly, efficiently and proactively. Active data warehousing is all about integrating advanced decision support with dayto-day-even minute-to-minute-decision making in a way that increases quality of those customer touches which encourages customer loyalty and thus secure an organization's bottom line. The marketplace is coming of age as we progress from first-generation "passive" decision-support systems to current- and next-generation "active" data warehouse implementations. Active Data ware house means every user can access the database any time 24/7 that is called Active DWH. Active Transformation means data can change and pass.

What is meant by OLTP? OLTP stands for On-Line Transaction Processing. This is a standard, normalized database structure. OLTP is designed for Transactions i.e., day-to-day transactions. OLTP database has hundreds of users connected to it. These databases are normalized to reduce the redundancy of the data & increase the performance while inserting the data. The ratio of no. of records being inserted is more than the ration of no. of records being updated or deleted. OLTP systems are not designed for analysis, reporting and decision support. Examples: ATM Machines, Online Shopping, Online Application Filling, and Online Railway Reservations. Why OLTP database are designs not generally a good idea for a Data Warehouse?

Since in OLTP, tables are normalized and hence query response will be slow for end user and OLTP doesnt contain years of data and hence cannot be analyzed. Why is de-normalized data now ok when it's used for Decision Support? Normalization of a relational database for transaction processing avoids processing anomalies and results in the most efficient use of database storage. A data warehouse for Decision Support is not intended to achieve these same goals. For Data-driven Decision Support, the main concern is to provide information to the user as fast as possible. Because of this, storing data in a de-normalized fashion, including storing redundant data and pre-summarizing data, provides the best retrieval results. Also, data warehouse data is usually static so anomalies will not occur from operations like add, delete and update a record or field. Why should you put your data warehouse on a different system than your OLTP system? A OLTP system is basically data oriented (ER model) and not Subject oriented "(Dimensional Model) .That is why we design a separate system that will have a subject oriented OLAP system...Moreover if a complex query is fired on a OLTP system will cause a heavy overhead on the OLTP server that will affect the day-today business directly. What is Business Intelligence? Business intelligence (BI) is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions.

What are the important concerns of OLTP and DSS systems? OLTP No. of users Data Many DSS FEW

1. Stored in a Complex data format. 1. Stored in multidimensional structures (Normalized) e.g.: 2. Stored in a normalized form. cube (3 dimensional). Normally 3rd Normalized form. Normalization enhances 2. Stored in de-normalized format. performance. 3. Large volumes of data. 3. Small volumes of data. 4. Static in nature with periodic 4. Data is volatile in nature. loads. Transactions. Reporting.

Operations

Indexes Joins Performanc e OLTP Complex Data Structures Few Many

Few Many(because it is normalized)

Many. Few (because it is de-normalized).

Concurrency and availability are Response time is most imp. more imp aspects. e.g.: ATM's. DSS Multidimensional Data Structures INDEXES JOINS DUPLICATED DATA DERIVED DATA AND AGGREGATES Many Some De-Normalized DBMS Common

Normalized DBMS Rare

Many Predefined operations Volatile Small Volumes

NUMBER OF USERS WORKLOAD

Few AD-HOC queries

DATA MODIFICATIONS DATA

Update on a regular basis Large Volume (Historical Data)

Availability Must be high

Response time must be good

What is the difference between ODS and OLTP? ODS: It is nothing but a collection of tables created in the Data warehouse that maintains only current data where as OLTP maintains the data only for transactions, these are designed for recording daily operations and transactions of a business ODS: Having data with Data warehouse that will be stand alone. No further transaction will take place for current data which is part of the data ware house. Current data will be change once you upload through ETL on schedule basis.

OLTP: Having data with on line system which connected to network and all update on transaction happened in seconds. Every second data summarized value will get changed. What is an OLAP? What are the types of OLAP? OLAP is software for manipulating multidimensional data from a variety of sources. The data is often stored in data warehouse. OLAP software helps a user create queries, views, representations and reports. OLAP tools can provide a "front-end" for a data-driven DSS. OLAP: On-Line Analytical Processing: On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. OLAP stands for On-Line Analytical Processing. OLAP system stores data in multidimensional databases. U then accesses these databases to perform financial and statistical analysis on different combinations of the data. An OLAP database is generally used to analyze data. It is optimized so that u can quickly retrieve data. An OLAP database is generally created from the information u have put in an OLTP database. OLAP products can be grouped into 3 categories. MOLAP: (Multidimensional OLAP) o Data is stored multidimensional arrays in order to be viewed in a multidimensional manner. o Multidimensional arrays provide efficiency in storage and operations. o Examples: ORACLE Express Servers, Essbase by Hyperion Software, Power play by Cognos. o MOLAP does not support ad-hoc queries because it is optimized for multidimensional operations o Retrieval is Fast o Storage is very efficient ROLAP: (Relational OLAP) o Data is stored in a Relational model because OLAP capabilities are best provided against the relational database. o Examples: Oracle, SQL Server etc. o ROLAP integrates naturally with existing technology and standards. o ROLAP can readily take advantage of parallel relational technology. HOLAP: (Hybrid OLAP) o These products combine MOLAP and ROLAP. o With HOLAP products, a relational database stores most of the data. o A separatable multidimensional database stores a small portion of the data o

Is OLAP databases are called decision support system??? True/false? True What does the term Metadata mean? Very loosely, it is documentation about data; it is how you provide context for data people might be using. Metadata is basically the wrapping you put around data you use in everyday life to transform it into meaningful information. What is the difference between data warehousing and OLAP? The terms data warehousing and OLAP are often used interchangeably. As the definitions suggest, warehousing refers to the organization and storage of data from a variety of sources so that it can be analyzed and retrieved easily. OLAP deals with the software and the process of analyzing data, managing aggregations, and partitioning information into cubes for in-depth analysis, retrieval and visualization. Some vendors are replacing the term OLAP with the terms analytical software and business intelligence. Data warehouse is the place where the data is stored for analyzing where as OLAP is the process of analyzing the data, managing aggregations, partitioning information into cubes for in-depth visualization. What is OLAP, MOLAP, ROLAP, DOLAP, and HOLAP? OLAP - On-Line Analytical Processing: Designates a category of applications and technologies that allow the collection, storage, manipulation and reproduction of multidimensional data, with the goal of analysis. MOLAP - Multidimensional OLAP: This term designates a Cartesian data structure more specifically. In effect, MOLAP contrasts with ROLAP. In the former, joins between tables are already suitable, which enhances performances. In the latter, joins are computed during the request. Targeted at groups of users because it's a shared environment. Data is stored in an exclusive server-based format. It performs more complex analysis of data. ROLAP - Relational OLAP: Designates one or several star schemas stored in relational databases. This technology permits multidimensional analysis with data stored in relational databases. Used for large departments or groups because it supports large amounts of data and users. DOLAP - Desktop OLAP: Small OLAP products for local multidimensional analysis Desktop OLAP. There can be a mini multidimensional database (using Personal Express), or extraction of a data cube (using Business Objects). Designed for lowend, single, departmental user. Data is stored in cubes on the desktop. It's like having your own spreadsheet. Since the data is local, end users don't have to worry about performance hits against the server.

HOLAP: Hybridization of OLAP, which can include any of the above. What is meant by metadata in context of a Data warehouse and how it is important? Meta data is the data about data; Business Analyst or data modeler usually capture information about data - the source (where and how the data is originated), nature of data (char, varchar, nullable, existence, valid values etc) and behavior of data (how it is modified / derived and the life cycle) in data dictionary a.k.a metadata. Metadata is also presented at the Data mart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.

What is difference between MOLAP, ROLAP? ROLAP Tactical Detailed Data Simple calculations Analyze past trends Data storage structure Tables Advantages Requires less memory storage space Disadvantages Data access is slow Strategic Summary Data Complex Predict future trends Data storage structure Cube Advantages Data access is faster Disadvantages Requires more memory storage space. Is sparsely filled as the number of dimensions in the cube increases MOLAP

What is the Difference between OLTP and OLAP? Main Differences between OLTP and OLAP are:1. User and System Orientation OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT professionals.

OLAP: market-oriented, used for data analysis by knowledge workers (managers, executives, analysis). 2. Data Contents OLTP: manages current data, very detail-oriented. OLAP: manages large amounts of historical data, provides facilities for summarization and aggregation, stores information at different levels of granularity to support decision making process. 3. Database Design OLTP: adopts an entity relationship(ER) model and an application-oriented database design. OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database design. 4. View OLTP: focuses on the current data within an enterprise or department. OLAP: spans multiple versions of a database schema due to the evolutionary process of an organization; integrates information from many organizational locations and data stores What types of Metadata are there and when will they be available? Metadata will be made available on the Decision Support website as each increment 'goes live'. We have two classifications of metadata: one that is business and one that is technical. Technical metadata is fairly clear-cut: where did the data come from or how was it transformed along the way? Business metadata deals more with the possible meaning of the data and how it can be used. Why is Metadata important to the DWH User? Metadata is what makes the data in the Data Warehouse meaningful. The Data Warehouse is very different from an operational application. When you're using an operational application, you can get clues from the screen that tells you to update a particular field on the window. If Im processing a new employee, I know exactly what needs to be updated for that new employee record, and can move through the process based on the context that the application provides. In a data-warehousing environment, you dont have that context or workflow. You have data that is interrelated, and it is raw out there in a form, but there is no application between you and the data. Basically, you have a number of tables and structures that you have access to without a business layer, without a definition on top of it. So metadata is very important to be able to provide that context to people so they know how to go

between subject areas or how data within a subject area is related and what it defines and represents. Is Metadata a description of what the data represents? In the simplest terms it is. As an example, if a user of the Data Warehouse is interested in a field called "campus code", then the metadata might have a definition of what the campus code represents, such as "an indicator for one of the three campuses". That is a form of metadata, although it is not a complete picture of what metadata can be.

What types of Metadata will be made available to the User? Decision Support has identified several kinds of metadata that will be published on the website. Some basic categories are the data model, source-to-target mapping, and the logical & physical model. The logical model gives more of a grouping or identifies logically what would be expected from the business side. The physical model goes into more detail with more of the data dictionary definition, but it gives the user a pictorial representation of the data, not just a list of columns and tables. It provides a visual so people can see how data elements relate to each other. There is also a category of metadata that we call usage notes. These go into expanding on how someone might query the Data Warehouse or use a query against a data mart. Based on going through the requirements process and working with the focus groups, as data is available, we expect to expand the metadata categories. Is Metadata also useful to the average User of the DWH, in addition to a departments technical staff? Yes. For an "ad hoc" user, there may be questions as to what a field represents. Another form of metadata at a business user level would be sample queries that Decision Supports Services area would publish based on findings from the requirements process and focus groups. These queries provide samples of relating data to answer a business question. What Challenges are involved when providing Metadata? Historically organizations find it a challenge to manage metadata over time. So I think the biggest challenge that we face at Decision Support is learning from those mistakes and from what weve read in the industry. We need to make sure the metadata we have is live; that its not something that is static and put on the shelf. Decision Support has formed a Custodial Data Council that will take ownership in making sure we have business definitions and work with the user community. I think we also need to technically streamline those processes as much as possible, publish the metadata, and make it as consistent as possible.

What is the difference between DWH and BI? There may be a Feature film (movie) without a Trailer. But there will be no trailer without a movie. Similarly Data warehousing is a concept related to extracting client's business data and applying business processing features on that data according to user needs and finally loading the processed data into a database, this database is what we call a warehouse or data warehouse. After the completion of a data warehouse the business user ultimately want to view his data (a precise and summary data) but as a business person he may don't have knowledge of accessing a database (a computer person can access the database with SQL). So there comes OLAP tools (which help that person to access the database) we can call these OLAP tools as Business Intelligence tools (Intelligence in sense they generate SQL queries internally and provide lot of facilities and privileges for a reporting developers in formatting the data and presenting it in a highly convenient manner). So data warehouse (movie) is a database and business intelligence tools (trailers) present the content of a database in an efficient manner. Simply speaking, BI is the capability of analyzing the data of a data warehouse in advantage of that business. A BI tool analyzes the data of a data warehouse and to come into some business decision depending on the result of the analysis. Data warehouses deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term business intelligence is used to encompass OLAP, data visualization, data mining and query/reporting tools. Think of the data warehouse as the back office and business intelligence as the entire business including the back office. The business needs the back office on which to function, but the back office without a business to support, makes no sense. DATAWAREHOUSE: Data warehouse is integrated, time-variant, subject oriented and non-volatile collection data in support of management decision making process. BUSINESS INTELLIGENCE: Business Intelligence is the process of extracting the data, converting it into information and then into knowledge base is known as Business Intelligence. A data warehouse is a database geared towards the business intelligence requirements of an organization. It integrates data from the various operational systems and is typically loaded from these systems at regular intervals.

BI - It is category of technologies that allows for gathering, storing, accessing and analyzing data to help business users make better decisions. To make Business Analysis effective and efficient we require specialized form of storage. This special form of storage of data is called Data Warehouse and the process Data Warehousing. Business Intelligence, is the mechanism of using data according to type of industry for predictive analysis, fault findings, process improvement etc. What is a Data Dictionary? A data dictionary is a kind of metadata. A data dictionary explains how data physically resides in an environment. A data dictionary identifies the type of column it is, whether it is character or numeric or some other value. It identifies the width of a column as well as the name of the column. Sometimes in data dictionaries you see descriptions; sometimes you dont. But basically it is how that field is physically represented in Oracle or Sybase or some other platform, if thats where the data resides. It's difficult to do any meaningful query or report without basic metadata. What are the possible data marts in Retail sales? Product information, sales information. What are data validation strategies for data mart validation after loading process? Data validation is to make sure that the loaded data is accurate and meets the business requirements. Strategies are different methods followed to meet the validation requirements. What is a Data Mart? A Data Mart is a focused subset of a DWH that deals with a single area of data and is organized for quick analysis. It contains the summarized data of the warehouses and is referred as High Performance Query Structures . They consist of Materialized Views and Special Indexes. In some businesses these data marts may be maintained within the warehouses whereas, in some other scenarios they may be maintained apart from the DWHs. A data mart is a repository of data gathered from operational data and other sources that is designed to serve a particular community of knowledge workers. The systems designed for a particular line of business. What are Data Marts?

Data Marts are designed to help manager make strategic decisions about their business. Data Marts are subset of the corporate-wide data that is of value to a specific group of users. There are two types of Data Marts: 1. Independent data marts sources from data captured form OLTP system, external providers or from data generated locally within a particular department or geographic area. 2. Dependent data mart sources directly form enterprise data warehouses. What are the levels of Data mart? What are the difference between Database, DATAWAREHOUSE and Data Marts? A Database is an organized collection of data. A DWH is a very large database with special set of tools to extract and cleanse data from operational systems and to analyze data. A Data Mart is a focused subset of a DWH that deals with a single area of data and is organized for quick analysis. What is Data Sampling? What is Data Scrubbing? What is Data Acquisition Process? What is data mining? Data mining is a process of extracting hidden trends within a data warehouse. For example an insurance data warehouse can be used to mine data for the most high risk people to insure in a certain geographical area. What is a transformation? It is a repository object that generates, modifies or passes data. Transformations: Transformations are the manipulation of data from how it appears in the source systems into another form in the DWH or data mart in a way that enhances or simplifies its meaning. In another way, you transform data into information. This includes the following: Data Merging: It is a process of standardizing data types and fields. Suppose one source system calls integer type data as smallint whereas another calls same data as decimal. The data from the two source systems needs to rationalize when moved into the oracle data format called number.

Cleansing: It is the process of validating the data brought from multiple sources. This involves identifying any changing inconsistencies or inaccuracies. Eliminating inconsistencies in the data from multiple sources. Converting data from different systems into single consistent data set suitable for analysis. Meets a standard for establishing data elements, codes, domains, formats and naming conventions. Correct data errors and fills in for missing data values. Aggregation: The process where by multiple detailed values are combined into a single summary value typically summation numbers representing dollars spend or units sold. Generate summarized data for use in aggregate fact and dimension tables. What are the advantages of data mining over traditional approaches? Data Mining is used for the estimation of future. For example, if we take a company/business organization, by using the concept of Data Mining, we can predict the future of business in terms of Revenue (or) Employees (or) Customers (or) Orders etc. Traditional approaches use simple algorithms for estimating the future. But, it does not give accurate results when compared to Data Mining. What is ETL? ETL stands for extraction, transformation and loading. ETL provide developers with an interface for designing source-to-target mappings, transformation and job control parameter. Extraction: Take data from an external source and move it to the warehouse pre-processor database. Transformation: Transform data task allows point-to-point generating, modifying and transforming data. Loading: Load data task adds records to a database table in a warehouse. Explain the classification of Tables in a Data warehouse? What is Fact table? Fact Table contains the measurements or metrics or facts of business process. If your business process is "Sales, then a measurement of this business process such as "monthly sales number" is captured in the Fact table. Fact table also contains the foreign keys for the dimension tables.

Why fact table is in normal form? Basically the fact table consists of the Index keys of the dimension/look up tables and the measures. So when ever we have the keys in a table. That itself implies that the table is in the normal form. What is a level of Granularity of a fact table? Level of granularity means level of detail that you put into the fact table in a data warehouse. For example: Based on design you can decide to put the sales data in each transaction. Now, level of granularity would mean what detail you are willing to put for each transactional fact. Product sales with respect to each minute or you want to aggregate it up to minute and put that data. What does level of Granularity of a fact table signify? Granularity: The first step in designing a fact table is to determine the granularity of the fact table. By granularity, we mean the lowest level of information that will be stored in the fact table. This constitutes two steps: Determine which dimensions will be included. Determine where along the hierarchy of each dimension the information will be kept. The determining factors usually go back to the requirements What is aggregate fact table? Aggregate table contains the [measure] values, aggregated /grouped/summed up to some level of hierarchy. What is fact less fact table? Where you have used it in your project? Factless table means only the key available in the Fact there is no measures available.

What is the common use of creating a Factless Fact Table? What are the different types of Fact Table? Explain with an example. 1. Cumulative Fact Table: 2. Snapshot Fact Table:

What are the types of Facts?

Additive: A Fact that can be summed up with any of the dimensions is called Additive Facts. A measure can participate arithmetic calculations using all or any dimensions. Ex: Sales profit Semi additive: A Fact that can be summed up with some of the dimensions is called Semi-additive Facts. A measure can participate arithmetic calculations using some dimensions. Ex: Sales amount Non Additive: A Fact that can be summed up with none of the dimensions is called Non-additive Facts. A measure cant participate arithmetic calculations using dimensions. Ex: temperature What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables? Snapshot facts are semi-additive, while we maintain aggregated facts we go for semi-additive. EX: Average daily balance A fact table without numeric fact columns is called factless fact table. Ex: Promotion Facts While maintain the promotion values of the transaction (ex: product samples) because this table doesnt contain any measures. What are non-additive facts in detail? A fact may be measure, metric or a dollar value. Measure and metric are non additive facts. Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount. A non additive fact, for e.g. measure height(s) for 'citizens by geographical location' , when we rollup 'city' data to 'state' level data we should not add heights of the citizens rather we may want to use it to derive 'count'.

What is conformed fact? Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly.

What is a continuously valued fact? What is Centipede Fact Table? What is Fact Constellation? What are the categories of Snapshot Fact Table Grains? What is a dimension table? A dimensional table is a collection of hierarchies and categories along which the user can drill down and drill up. It contains only the textual attributes. How are the Dimension tables designed? Most dimension tables are designed using Normalization principles up to 2NF. In some instances they are further normalized to 3NF. Find where data for this dimension are located. Figure out how to extract this data. Determine how to maintain changes to this dimension (see more on this in the next section). Change fact table and DW population routines. What are the Different methods of loading Dimension tables? Conventional Load: Before loading the data, all the Table constraints will be checked against the data. Direct load: (Faster Loading) All the Constraints will be disabled. Data will be loaded directly. Later the data will be checked against the table constraints and the bad data won't be indexed. Can a dimension table contain numeric values? What is hierarchy relationship in a dimension? Whether it is: 1. 1:1 2. 1: m 3. M: m What are the different types of dimensions? Explain with examples. 1. Regular Dimensions 2. Shared dimensions What are the different types of dimension tables? Explain with examples.

Why dimensions are de-normalized in nature? Can 2 fact tables share same dimension tables? What is junk dimension? Junk dimension: Grouping of Random flags and text attributes in a dimension and moving them to a separate sub dimension. A dimension, which does not change the grain level, is called junk dimension. Grain- lowest level of reporting. (Or) The junk dimension is simply a structure that provides a convenient place to store the junk attributes (Or) A junk dimension is a convenient grouping of flags and indicators. What are Conformed Dimensions? A dimension that is used in more than one cube. The use of conformed dimensions and shared measures is the primary way a set of data marts can be united into one consolidated data warehouse. Conformed dimensions are dimensions which are common to the cubes.(cubes are the schemas contains facts and dimension tables) Consider Cube-1 contains F1, D1, D2, D3 and Cube-2 contains F2, D1, D2, D4 are the Facts and Dimensions. Here D1,D2 are the Conformed Dimensions Conformed dimensions mean the exact same thing with every possible fact table to which they are joined. Ex: Date Dimensions is connected all facts like Sales facts, Inventory facts. Etc What is degenerated dimension? Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table in order eliminate unnecessary joins while retrieving order information. What is degenerate dimension table? Degenerate Dimensions: If a table contains the values, which r neither dimension nor measures is called degenerate dimensions. Ex: invoice id, empno. What is Audit dimension? Explain with an example. What is a Fact Dimension?

What is a Mini Dimension? What are Role-playing dimensions? What is a Mystery Dimension? How do you connect the facts and dimensions in the tables? 1. Smart Matching columns 2. Manually you can link Which columns go to the fact table and which columns go the dimension table? The Primary Key columns of the Tables (Entities) go to the Dimension Tables as Foreign Keys. The Primary Key columns of the Dimension Tables go to the Fact Tables as Foreign Keys. What is Associate Table? What is Bridge Table? What is crass reference table? What is Event-Tracking Table?

What is a lookup table? A lookup table is the one which is used when updating a warehouse. When the lookup is placed on the target table (fact table / warehouse) based upon the primary key of the target, it just updates the table by allowing only new records or updated records based on the lookup condition. What is the data type of the surrogate key? Data type of the surrogate key is either integer or numeric or number. What is a Schema? What is a Star Schema?

Star schema is a type of organizing the tables such that we can retrieve the result from the database easily and fastly in the warehouse environment. Usually a star schema consists of one or more dimension tables around a fact table which looks like a star, so that it got its name. Differences between star and snowflake schemas? Star schema: A single fact table with N number of Dimension. Snowflake schema: Any dimensions with extended dimensions are known as snowflake schema. Star schema - all dimensions will be linked directly with a fat table. Snow schema - dimensions maybe interlinked or may have one-to-many relationship with other tables. What is Snow-Flake Schema? When do U go for Star Schema? & when do U go for Snow-Flake Schema? What is the main difference between schema in RDBMS and schemas in Data Warehouse? RDBMS Schema Used for OLTP systems Traditional and old schema Normalized Difficult to understand and navigate Cannot solve extract and complex problems Poorly modeled

DWH Schema Used for OLAP systems New generation schema De Normalized Easy to understand and navigate Extract and complex problems can be easily solved Very good model

Why did u choose STAR SCHEMA only? What are the benefits of STAR SCHEMA?

Because its de-normalized structure, i.e., Dimension Tables are de-normalized. Why to de-normalize means the first (and often only) answer is: speed. OLTP structure is designed for data inserts, updates, and deletes, but not data retrieval. Therefore, we can often squeeze some speed out of it by de-normalizing some of the tables and having queries go against fewer tables. These queries are faster because they perform fewer joins to retrieve the same record set. Joins are also confusing to many End users. By de-normalizing, we can present the user with a view of the data that is far easier for them to understand. Benefits of STAR SCHEMA: Far fewer Tables. Designed for analysis across time. Simplifies joins. Less database space. Supports drilling in reports. Flexibility to meet business and technical needs. Difference between Snow flake and Star Schema. What are situations where Snow flake Schema is better than Star Schema to use and when the opposite is true? Star schema contains the dimension tables mapped around one or more fact tables. It is a denormalised model. No need to use complicated joins. Queries results fastly. Snowflake schema: It is the normalized form of Star schema. It contains in-depth joins, because the tables r splitted in to many pieces. We can easily do modification directly in the tables. We have to use complicated joins, since we have more tables .There will be some delay in processing the Query. Which is preferable? Star Schema or Snow-Flake Schema? If U have 2 fact tables connected in the schema, do U know the name of the schema? What is Galaxy Schema? What is Multi-Star Schema? How do you load the time dimension? Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. It is not unusual for 100 years to be represented in a time dimension, with one row per day.

What are slowly changing dimensions? SCD stands for Slowly changing dimensions. Slowly changing dimensions are of three types SCD1: only maintained updated values. Ex: a customer address modified we update existing record with new address. SCD2: maintaining historical information and current information by using A) Effective Date B) Versions C) Flags Or combination of these

SCD3: by adding new columns to target table we maintain historical information and current information Type-1: Most Recent Value Type-2(full History) i) Version Number ii) Flag iii) Date Type-3: Current and one Previous value Type 1: overwrite data is to be there. Type 2: current, recent and history data should be there. Type 3: current and recent data should be there. What is BUS Schema? BUS Schema is composed of a master suite of confirmed dimension and standardized definition if facts. What is hybrid slowly changing dimension? What are Critical columns? What is a surrogate key? Why is it used? What is its need? Give an example. Explain in detail what do you mean by Slicing and Dicing? Slicing and dicing refers to the ability to combine and re-combine the dimensions to see different slices of the information. Picture slicing a three-dimensional cube of

information, in order to see what values are contained in the middle layer. Dicing is the ability to view the cube from different perspectives. Slicing and dicing a cube allows an end-user to do the same thing with multiple dimensions. What is a Measure? What are the types of Measures? How can U create Measures & Dimensions? Can we group a measure? What do U mean by Multi-dimensional Analysis? What is a Grain? What is Drill-up, Drill-down & Drill-Across? Differentiate between Level and Category? Level is a logical subdivision of a dimension e.g.: if orderdate is a dimension, the levels are year, quarter, month, week, day etc. Category is the different instances of a level E.g. if year is a level, the category are 1996, 1997, 1998 etc. What is a CUBE in data warehousing concept? Cubes are logical representation of multidimensional data. The edge of the cube contains dimension members and the body of the cube contains data values. What is a Virtual Cube? Difference between filter and condition? Parameter is the only difference The difference between Filter and Condition: Condition returns true or false Ex: if Country = 'India' then ...Filter will return two types of results. 1. Detail information which is equal to where clause in SQL statement 2. Summary information which is equal to Group by and having clause in SQL statement I filter we just create a parameter on which we can filter the fields. but in condition we can have the static functions like if yes then color it green, if no then color it as red etc. so here we can create conditions for filtering in the report. Mean we can make different filtering function at the same time by using conditional formatting. What is snapshot?

You can disconnect the report from the catalog to which it is attached by saving the report with a snapshot of the data. However, you must reconnect to the catalog if you want to refresh the data. What is a linked cube? Linked cube in which a sub-set of the data can be analyzed into great detail. The linking ensures that the data in the cubes remain consistent. What is VLDB? VLDB stands for Very Large Database. It is an environment or storage space managed by a relational database management system (RDBMS) consisting of vast quantities of information. VLDB doesnt refer to size of database or vast amount of information stored. It refers to the window of opportunity to take back up the database. Window of opportunity refers to the time of interval and if the DBA was unable to take back up in the specified time then the database was considered as VLDB.

What is batch processing? What is incremental loading? Incremental loading means loading the ongoing changes in the OLTP. Explain the advantages of RAID 1, 1/0, and 5. What type of RAID setup would you put your TX logs. Transaction logs write sequentially and don't need to be read at all. The ideal is to have each on RAID 1/0 because it has much better write performance than RAID 5. RAID 1 is also better for TX logs and costs less than 1/0 to implement. It has a tad less reliability and performance is a little worse generally speaking. RAID 5 is best for data generally because of cost and the fact it provides great read capability. What is BAS? What is the function? The Business Application Support (BAS) functional area at SLAC provides administrative computing services to the Business Services Division and Human Resources Department. We are responsible for software development and maintenance of the PeopleSoft applications and consultation to customers with their computer-related tasks. Its called Broadcast Agent Server. Its function is to run the jobs or reports scheduled and can be monitored using Broadcast Agent Console.

What are modeling tools available in the Market? There are a number of data modeling tools Tool Name Erwin Embarcadero Rational Rose Power Designer Oracle Designer Company Name Computer Associates Embarcadero Technologies IBM Corporation Sybase Corporation Oracle Corporation

What are the various Reporting tools in the Market? 1. MS-Excel 2. Business Objects (Crystal Reports) 3. Cognos (Impromptu, Power Play) 4. Microstrategy 5. MS reporting services 6. Informatica Power Analyzer 7. Actuate 8. Hyperion (BRIO) 9. Oracle Express OLAP 10. Proclarity Some of the standard Business Intelligence tools in the market According to their performance 1) MICROSTRATEGY 2) BUSINESS OBJECTS, CRYSTAL REPORTS 3) COGNOS REPORT NET 4) MS-OLAP SERVICES Or 1. Seagate Crystal report 2. SAS 3. Business objects 4. Microstrategy

5. Cognos 6. Microsoft OLAP 7. Hyperion 8. Microsoft integrated services and some more. What are the various ETL tools in the Market? Various ETL tools used in market are: Informatica. Data Stage. Oracle Warehouse Builder. Ab Initio. Data Junction. Name some of the real time data-warehousing tools? What is Outsourcing, Offshoring & Insourcing? And what is the difference between them. Outsourcing is not strictly IT. Any function of an organization that is executed by nonemployees is essentially an Outsourced task. Insourcing is the use of external resources (not employees of the Organization) to accomplish some function, but they are predominately carrying out the function at the clients site. So, the function is sourced but not out sourced. These resources are also typically managed more closely by the client directly with little management involvement from the supplier. Offshoring is a subset of Outsourcing which is generally understood to involve a country in which cost remain lower than the clients country of operations. While most Offshoring situations are indeed an example of Outsourcing, for those companies (HP for example) who now own their offshore operations and have folded them into the company, the line gets blurred. In other words, Offshoring is not always outsourcing anymore. What is ER Diagram? The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views. Simply stated the ER model is a conceptual data model that views the real world as entities and

relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represent data objects. Since Chen wrote his paper the model has been extended and today it is commonly used for database design for the database designer, the utility of the ER model is: It maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables. It is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user. In addition, the model can be used as a design plan by the database developer to implement a data model in specific database management software. What Oracle tools can be used to build and design a warehouse? What Oracle features can be used to optimize my warehouse system? What is Data Modeling? Data modeling represent information in the entities, attributes and relationships. Visual representation of the information. What are the different steps for Data Modeling? 1. Define the problem and scope of the problem. 2. Information gathering. 3. Analysis(normalization) 4. Create a logical data model (independent of platform). 5. Decision about physical platform like oracle or SQL etc. 6. Create a physical data model, which is platform specific. 7. Database creation. What is Dimensional Modeling? Dimensional Modeling is a design concept used by many data warehouse designers to build their data warehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements i.e., the dimensions on which the facts are calculated. Data modeling is probably the most labor intensive and time consuming part of the development process. Why bother especially if you are pressed for time? A common response by practitioners who write on the subject is that you should no more build a database without a model than you should build a house without blueprints. The goal of the data model is to make sure that the all data objects required by the database are completely and accurately represented. Because the data model uses easily understood notations and natural language, it can be reviewed and verified as correct by the end-users. The data model is also detailed enough to be used by the

database developers to use as a "blueprint" for building the physical database. The information contained in the data model will be used to define the relational tables, primary and foreign keys, stored procedures, and triggers. A poorly designed database will require more time in the long-term. Without careful planning you may create a database that omits data required to create critical reports, produces results that are incorrect or inconsistent, and is unable to accommodate changes in the user's requirements.

What is Logical Modeling? The Logical Model: In Erwin, the logical model is the version of the model that represents all of the logical business requirements of an organization. There are three levels of logical models that are used to capture these requirements: The Entity Relationship Diagram A high-level data model that includes all major entities and relationships. The Entity Relationship Diagram does not contain much detail and is often used in the initial planning phase. The Key Based Model A model that describes major data structures such as entities, primary keys, and sample attributes. The Fully Attributed Model A complete model that includes all required entities, attributes, key groups, and relationships. In Erwin, a logical model can be created in conjunction with the physical model, or independent of the physical model. Logical models can also be derived from other models using the Derive Model Wizard. In addition, Erwin supports the definition of model objects in a logical model as logical only and in a physical model as physical only. These options allow for the logical model to be fully normalized and for the corresponding physical model to be de-normalized. Erwin also allows for the automatic conversion of many-to-many and super type/subtype relationships when you change from a logical model to a physical model. What are the types of Dimensional Modeling? What is Conceptual Modeling? What is Physical Modeling? Comparing Logical and Physical Models in a Logical/Physical Model: In an Erwin logical/physical model, each model that you create automatically includes both a logical and a physical model. By default, the logical model is closely

related to the physical model. If you make a change in the logical model, the change is automatically reflected in the physical model and vice-versa. You can use either the logical model or the physical model to define and document database structures; although the model you use typically depends on the type of work you want to perform. You can use the logical model to represent business information and define business rules in a fully normalized model, while the physical model supports the needs of the database administrator, who focuses on the physical implementation of the model in a database. Comparing Logical and Physical Model Objects: Most of the objects in the logical model correspond to a related object in the physical model. For example, the logical model contains entities, attributes, and key groups, which are represented in the physical model as tables, columns, and indexes, respectively. The following table compares the logical and physical components in an Erwin model. What is Difference between E-R Modeling and Dimensional Modeling? Basic diff is E-R modeling will have logical and physical model. Dimensional model will have only physical model. E-R modeling is used for normalizing the OLTP database design. Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design. What is Entity, Attribute and Relationship? Entity: Entity is an object of which an organization wants to maintain the information E.g.: Employee. Attribute: Is an object that maintains the information. Key attribute: A key attribute consists of one or more attributes of an entity, which uniquely identify the entity. e.g.; Bank account no identifies for account. Relationship: Defines the association between different entities. one to one, one to many, many to one, many to many. What is meant by De-Normalization? What is the definition of normalized and denormalized view and what are the differences between them? Normalization is the process of removing redundancies. Denormalization is the process of allowing redundancies.

Why Denormalization is promoted in Universe Designing? In a relational data model, for normalization purposes, some lookup tables are not merged as a single table. In a dimensional data modeling (star schema), these tables would be merged as a single table called DIMENSION table for performance and slicing data. Due to this merging of tables into one large Dimension table, it comes out of complex intermediate joins. Dimension tables are directly joined to Fact tables. Though, redundancy of data occurs in DIMENSION table, size of DIMENSION table is 15% only when compared to FACT table. So only Denormalization is promoted in Universe Designing. What is Cardinality? What is Referential Integrity? What are Integrity Constraints? What is the difference between view and materialized view? View - store the SQL statement in the database and let you use it as a table. Every time you access the view, the SQL statement executes. Materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that every time you run the query, the stored result set is used. Pros include quick query results. What is Normalization, First Normal Form, Second Normal Form , Third Normal Form? 1. Normalization is process for assigning attributes to entitiesReduces data redundanciesHelps eliminate data anomaliesProduces controlled redundancies to link tables 2. Normalization is the analysis of functional dependency between attributes / data items of user views. It reduces a complex user view to a set of small and stable subgroups of fields / relations 1NF: Repeating groups must be eliminated, Dependencies can be identified, All key attributes defined, No repeating groups in table 2NF: The Table is already in1NF,Includes no partial dependenciesNo attribute dependent on a portion of primary key, Still possible to exhibit transitive dependency, Attributes may be functionally dependent on non-key attributes 3NF: The Table is already in 2NF, Contains no transitive dependencies. What is a Table space? What does it contain? What is a Composite Key or Concatenated Key? What is its use?

What are Unique Identifiers? What is an Index? What are the types of Indexes? What do U mean Partitioned Indexes? What is partitioning? What are the methods of partitioning? What is Parallelism? What are the advantages and disadvantages of reporting directly against the database? Do you always need to copy the data before reporting on it? (Example, real-time & on-demand reporting is a requirement) There isnt any need to copy the data before reporting on as long as the data is clean. But if the data is not clean it should be cleansed and so go for ETL process. Adv of reporting directly against the database (OLTP): No need to separately maintain a Database for it. (Space consumption is reduced). Disadv of reporting directly against the database (OLTP): It slows down the process bcoz OLTP system is designed for the online application but a Data Warehouse application which requires to do analysis and hence takes the same data but takes a long time. What are the most frequent data errors that slow down data input process? Data mining is the process of data selection, exploration and building models using vast data stores to uncover previously unknown patterns. What does this mean to you? You can produce new knowledge to better inform decision makers before they act. Build a model of the real world based on data collected from a variety of sources, including corporate transactions, customer histories and demographics, even external sources such as credit bureaus. Then use this model to produce patterns in the information that can support decision making and predict new business opportunities. Text mining capabilities enable you to apply such analyses to textbased documents. With SAS's rich suite of text processing and analysis tools, you can uncover underlying themes or concepts contained in large document collections, group documents into topical clusters, classify documents into predefined categories and integrate text data with structured data for enriched predictive modeling endeavors.

Before you begin, you should know the answers for the following questions. what is Data? what is a Database? what is an RDBMS? What is a Data Model? Why we follow Normalization while designing data model? What is an OLTP system

WHAT IS A DATAWAREHOUSING: A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources. In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users. A Data warehouse is a complete set of Subject Oriented Integrated Time variant Nonvolatile data which helps business in taking organization decision Subject Oriented Data warehouses are designed to help you analyze data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented.

Integrated Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated. Nonvolatile Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred. Time Variant In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. A data warehouse's focus on change over time is what is meant by the term time variant. When an organization should create a Data Warehouse ? Once an organization have too much of information where it becomes too difficult to get the meaning full information for the business to take the strategic decisions. The decisions we make using the Data warehousing data will affect the entire organization instead of one customer or one employee. Example of decisions we make in DW is, should we continue with the specific product offerings to our customers or not. Should we move the customer support department to a different location for a cost saving, etc etc. Data warehouses and OLTP systems have very different requirements. Here are some examples of differences between typical data warehouses and OLTP systems:

Workload Data warehouses are designed to accommodate ad hoc queries. You might not know the workload of your data warehouse in advance, so a data warehouse should be optimized to perform well for a wide variety of possible query operations. OLTP systems support only predefined operations. Your applications might be specifically tuned or designed to support only these operations.

Data modifications

A data warehouse is updated on a regular basis by the ETL process (run nightly or weekly) using bulk data modification techniques. The end users of a data warehouse do not directly update the data warehouse. In OLTP systems, end users routinely issue individual data modification statements to the database. The OLTP database is always up to date, and reflects the current state of each business transaction.

Schema design Data warehouses often use denormalized or partially denormalized schemas (such as a star schema) to optimize query performance. OLTP systems often use fully normalized schemas to optimize update/insert/delete performance, and to guarantee data consistency.

Typical operations A typical data warehouse query scans thousands or millions of rows. For example, "Find the total sales for all customers last month." A typical OLTP operation accesses only a handful of records. For example, "Retrieve the current order for this customer."

Historical data Data warehouses usually store many months or years of data. This is to support historical analysis. OLTP systems usually store data from only a few weeks or months. The OLTP system stores only historical data as needed to successfully meet the requirements of the current transaction.

END USER OF APPPLICATION: What you mean by end user in OLTP system ? An end user is who is entering data or reading a particular report from the system. For a Bank teller he/she should enter the account number see the balance or deposit the cheque etc For a customer representative job he/she must see the cust information to be more effective

What kind of information management wants to know, because the DW data is primarily used by management. Which are our lowest/highest margin customers? What is the most effective distribution channel? What product promotions have the biggest impact on revenue? What impact will new products/services have on revenue and margins? Which customers are most likely to go to the competition? Who are my customers and what products are they buying? In OLTP applications, end users are individuals who takes care of day to day operations. In DW applications, end users are managers and above who takes decisions based on the trend, history, predictions etc If end users are not satisfied with the application, then the product is considered to be failure even though the technology wise its a great achievement.

Data Warehouse Architecture:

Source Data: An organization will have many OLTP applications, all these operational data becomes the source for the Data Warehouse database. ETL: (Extract Transform and Load) We extract data from various operational systems and clean the data so that we get only the information make sense to have in Data Warehouse. While cleansing the data we may reject some records or we fill in the missing information. Once we transform the operational data to the format in which DW expects, then we load the data to DW. This process takes most of the time while developing DW applications. DW Database This is the area where we store the data which is required by the business so that they can run any report against the data. In data warehouses we will have current and history information which is very useful for trend analysis, behavioral analysis etc. What is Data Mart?

A data mart is a simple form of a data warehouse that is focused on a single subject (or functional area), such as Sales or Finance or Marketing. Data marts are often built and controlled by a single department within an organization. Given their singlesubject focus, data marts usually draw data from only a few sources. The sources could be internal operational systems, a central data warehouse, or external data Difference between Data Warehouse and Data Mart Data Warehouse Enterprise-wide Structure for corporate view of data Organized E-R Model or Galaxy of Star (Multiple Star schemas in the Data Model) Long turn around time Data Mart Departmental Star Schema based (Facts and dimensions) Quick turn around (up and running as there are less stakeholders)

Data Granularity What is Granularity of your DW? Granularity is the level of details we want to store in the data warehouse. For a retail store, Point of Sale (POS) is the lowest granularity information available. For banking its the account level details based on every day transactions. As DSS is learning towards analyzing the data as a whole, not necessarily the data warehouse will have all the details up to daily transactions. Daily sales by date, product and customer Weekly sales by product and customer Monthly sales by product and customer

Quarterly sales by product and customer Yearly sales by product and customer Usually in Data Warehouses (EDW) we will tend to have POS where as in

Data marts we will have it aggregated by week or month so that we never loose the detailed information. This detailed level data can be used to get the micro behaviors of our customers (especially in Data Mining) Data Warehousing Objects: Data ware housing consists only two objects Fact Dimension Fact Tables: A fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables. A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables. A fact table usually contains facts with the same level of aggregation. Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be aggregated by simple arithmetical addition. A common example of this is sales. Nonadditive facts cannot be added at all. An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions and not along others. An example of this is inventory levels, where you cannot tell what a level means simply by looking at it. Dimension Tables: A dimension is a structure, often composed of one or more hierarchies, that categorizes data. Dimensional attributes help to describe the dimensional value. They are normally descriptive, textual values. Several distinct dimensions, combined with facts, enable you to answer business questions. Commonly used dimensions are customers, products, and time. Dimension data is typically collected at the lowest level of detail and then aggregated into higher level totals that are more useful for analysis. These natural rollups or aggregations within a dimension table are called hierarchies. Hierarchies: Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time

dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure. Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies--one for product categories and one for product suppliers. Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse. When designing hierarchies, you must consider the relationships in business structures. For example, a divisional multilevel sales organization. Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly. YE AR

QUATER

WEEK MONTH

How to handle Slowly Changing Dimensions (SCDs) in data model design? Posted by Dylan Wan on January 13, 2007 There are multiple methods to handle the slowly changing dimensions. Which technique to use depends on your business requirements. The choice among these three methods are not a technical design decision since their behaviors are different.

Type One: Overwite the old data with new data Using this method, you do not store the histoy. For example, that say each customer can have one salesrep at any given point in time. When the salerep of ABC Inc., changes from Sandy to Laura, Sandy was a salerep of ABC will not be kept anywhere. Any report by salesrep will assume that Laura is the salereps of ABC Inc. forever and count all the sales done by Sandy as Lanuras. The above example may not sound making business sense. However, if you only report the sales of the current period, and salesrep does not change during the period, this method is ok to be used. Mary OLTP tables does not need to track the history of changes and thus this method may be used by the source application. However, if you want to report the historical data, even your OLTP does not track history, the data warehouse can still use other methods to track the history. Type Two: Add a new record at the timeof the change Using this method, all priorhistory are saved. There are two alternative methods to model the key of this table. Method A No surrogate key Use timestamp When a change happens, a new record is added into the table. All the attributes are copied from the previous record except the changed values. The nature key is copied as well so the timestamps is used to differentiate the records. When a fact table is joined with the dimension, if you are interested in the historical data, the timestamp will be used as part of the join condition. To ease the join, the record typically use two date columns the effective start date and the effective end date. Method B No surrogate key Use version number Instead of using the date column, a version number is used to differentiate the different versions of the records. This technique requires the fact table store both nature key and the version number to retrive a given version of the dimension date. Method C Use a surrogate key When an attribue is change, a sequence generated key is used, the fact table will also use this key column as the foreign key. Type Three: Track changes using a separate column Using this method, you use a separate column of dimension table to store the values of previous years, in addition to the current year data.

This method does not track all the history, but just one prior version. If the data is changed, the old value need to be moved from the current value column to the prior column and the new value overwrites the current column. This method is used when the changes is not randon but a predefined interval such as annual.

Structured Query Language SQL is a database language used to create, manipulate and control the access to the Database objects. SQL is a non procedural language used to access relational databases. It is a flexible, efficient language with features designed to manipulate and examine relational data. SQL is only used for definition and manipulation of database objects. It cannot be used for application development like form definitions, creation of procedures etc...For that you need to necessarily have some 3gl languages such as cobol or 4gl languages such as Dbase to provide front-end support to the database. Key features of SQL are:

Non procedural language Unified Language Common language for all Relational databases. ( Syntax may change between different RDBMS )

SQL is made of Three sub-languages such as:


Data Definition language (DDL) Data Manipulation language (DML) Data control language (DCL)

Data Definition Language (DDL): allows you to define database objects at the conceptual level. It consists of commands to create objects and alter the structure of objects, such as tables, views, indexes etc.. Commonly used DDL statements are CREATE, DROP etc.. If you want to create a table Student,then use the following syntax CREATE TABLE STUDENT ( STUDENT_ID INTEGER PRIMARY KEY, STUDENT_NM VARCHAR(30), COURSE_ID VARCHAR(15) , PHONE VARCHAR(10) , ADDRESS VARCHAR(50) ); To drop a table from the database DROP TABLE STUDENT; Data Manipulation language(DML): Allows you to retrieve or update data within a database. It is used for query, insertion, deletion and updating of information stored in databases. Eg: Select, Insert, Update, Delete.

STUDENT_ID STUDENT_NM COURSE_ID PHONE 1001 1002 1003 JAMES JIM BRUCE Oracle MSSql Server Java 972-8889018 972-6788909 214-5711567

ADDRESS 888, North Central Exp, Dallas, TX- 75089 567, Preston Road, Dallas, TX - 75240 1234, Elm Street, Dallas, TX - 75039

Select statement: Select statement in SQL language is used to display certain data from the table.For example:- if you want to know what course Jim is taking; Select statement fetches you the information you want,when you use the information you have. So,in the above scenario the information you have is student_nm as Jim and and the information you want is course_id, the intersection of those two columns in that table is what you are looking for. SELECT (what you want) FROM (which tables) WHERE (what you have ) Now the select statement to know the course_id Jim looks like this: SELECT COURSE_ID FROM STUDENT WHERE STUDENT_NM = 'JIM' You will get the result as: COURSE_ID MSSql Server If you want to see all the rows in the table then your select will be: SELECT * FROM STUDENT; If you would like to show student_nm and address who is attending Oracle course in the form of a report then your select will look like: SELECT STUDENT_NM, ADDRESS FROM STUENT WHERE COURSE_ID = 'Oracle' The result will be

STUDENT_NM ADDRESS JAMES 888, North Central Exp, Dallas, TX- 75089

Insert Statement Insert statement is used to insert a new row into the table. For example:- If a new student DAVE is joining Java course then,use the INSERT SQL statement. INSERT INTO STUDENT (STUDENT_ID, STUDENT_NM, COURSE_ID,PHONE, ADDRESS ) VALUES (1004, 'DAVE', 'Java','972-912-4008', '567, Washington Ave, Dallas - 75543' ) after executing the insert statement,your table should look like below when you issue a select from student table: STUDENT_ID STUDENT_NM COURSE_ID PHONE 1001 1002 1003 1004 JAMES JIM BRUCE DAVE Oracle MSSql Server Java Java 972-8889018 972-6788909 214-5711567 972-9124008 ADDRESS 888, North Central Exp, Dallas, TX- 75089 567, Preston Road, Dallas, TX - 75240 1234, Elm Street, Dallas, TX - 75039 567, Washington Ave, Dallas - 75543

Update Statement is used to change the existing information in the table.For example:-If DAVE moved to another address then we need to change the ADDRESS column for DAVE's record.If the new address is 146, Dallas Parkway, Dallas - 75240 then your update should be: UPDATE STUDENT SET ADDRESS = '146, Dallas Parkway, Dallas - 75240' WHERE STUDENT_NM = 'DAVE' In order to make sure you updated the Address column for DAVE issue following SQL SELECT * FROM STUDENT WHERE STUDENT_NM = 'DAVE' then you should see the following result STUDENT_ID STUDENT_NM COURSE_ID PHONE 1004 DAVE Java 972-9124008 ADDRESS 146, Dallas Parkway, Dallas - 75240

Delete Statement is used to delete a row from the table ie remove records from the table.For example:JAMES moved to different city, and he does not want to take the course.In order to remove JAMES's record from the table we use the DELETE statement DELETE STUDENT WHERE STUDENT_NM = 'JAMES' once you delete the record and you select all the information from the student table you should see the following information: STUDENT_ID STUDENT_NM COURSE_ID PHONE 1002 1003 1004 JIM BRUCE DAVE MSSql Server Java Java 972-6788909 214-5711567 972-9124008 ADDRESS 567, Preston Road, Dallas, TX - 75240 1234, Elm Street, Dallas, TX - 75039 567, Washington Ave, Dallas - 75543

If you dont include where clause in delete statment then it will remove all the rows from the table. Data control language(DCL) In RDBMS one of the main advantages is the security for the data in the database. You can allow some user to do a specific operation or all operations on certain objects. Examples for DCL statements are GRANT, REVOKE statements. GRANT is used to Grant a permission to an user so that the user can do that operation. REVOKE is used to take back that permission from that user on that object. For example we have two users JAMES and DAVID If JAMES created a table called ITEMS then JAMES becomes the owner of that table. DAVID cannot access ITEMS table because he is not the owner of that table. DAVID can access ITEMS if JAMES gives the permission on his table. JAMES can give different types of access like Select, Update, Delete and Insert on ITEMS table to DAVID. For example:If JAMES wants to provide only Select on ITEMS to DAVID then he can issue: GRANT SELECT ON ITEMS TO JAMES If JAMES wants to provide only Select and Insert on ITEMS to DAVID then he can issue: GRANT SELECT, INSERT ON ITEMS TO JAMES If JAMES wants to provide all the operations on ITEMS to DAVID then he can issue: GRANT ALL ON ITEMS TO JAMES

Once you provide all permissions on an object to an user then indirectly he becomes the owner and can do any manipulation to the table. Oracle datatypes Data in a database is stored in the form of tables. Each table consists of rows and columns to store the data. A particular column in a table must contain the same type of data.For example: PLAYER_NAME(char COUNTRY ) (char) AGASSI WILLIAM JIM HINGIS USA USA RUSSIA DATE_OF_BIRTH(date ROOM_NO(number ) ) 10/12/1969 01/15/1975 05/25/1980 1004 1006 1007 1009

SWITZERLAN 06/25/1979 D

Every column has certain information, PLAYER_NAME is a char column. DATE_OF_BIRTH is a Date column, ROOM_NO is a number column. Different datatypes available in Oracle database: CHAR: To store character type of data,for example: name of a person (you can save anything in character field) VARCHAR: Same as CHAR. The only difference between CHAR and VARCHAR is the way the database saves the data. To understand the difference better we will take the following example. CREATE TABLE EMPLOYEE (EMP_NO NUMBER(4), ENAME CHAR(15)) EMP_NO ENAME 888 889 890 CLARK KING DAVID COOPER

As Ename column defined as CHAR(15) every value you put it that column will occupy all 15 bytes ie CLARK is 5 bytes string,so the database pads 10 spaces. CREATE TABLE EMPLOYEE (EMP_NO NUMBER(4), ENAME VARCHAR(15)) EMP_NO ENAME 888 CLARK

889 890

KING DAVID COOPER

Here as Ename is defined as VARCHAR(15) it occupies only the required space. so in the above table ename CLARK occupies only 5 bytes in the database. So what are the advantages and disadvantages?.The thumb rule here is that if you are using a char column as primary key then it better be a char field. If you are using a column to have comments then you must use varchar. NUMBER: Used to store the numbers, for example:If you want to store employee numbers then you define the column's data type as number. If you want to define a column to store currency then you can define the column as NUMBER(7,2). DATE: Used to store the date,like Date of birth of a person, join date in a company etc. LONG: to store the variable char length. RAW: LONG RAW: store binary data of variable length. LOB: Large objects to store binary files. In addition oracle 8 supports CLOB, BLOB and BFILE CLOB - A table can have multiple columns of this type. BLOB - can store large binary objects such as graphics, video and sound files. BFILE - stores file pointers to LOB managed by file systems external to the database

Constraints When you bind a business rule to a column in the table then those rules are called the Constraints. Constraints are defined while creating the table. Say for example, you cannot have an employee who does not have a name, then employee name column in employee table should be a NOT NULL column. The NOT NULL is a constraint. The following table shows the constraint types and short descriptions.

Constraint Type Description NOT NULL PRIMARY KEY CHECK DEFAULT REFERENCES you must provide the value in that column. you cannot leave that column blank No duplicate values allowed, for example Empno in Employee table should be unique checks the value and controls the inserting and updating values. Assigns a default value if no value is given. To maintain the referential integrity (Foreign Key)

Examples for some of the rules usually implement through the business rules. NOT NULL If we have a business rule saying that all customers should have a name, we cannot have any customer without a name. So to implement that business rule we can create customer table and specify customer name column as NOT NULL (constraint) Example CREATE TABLE EMPLOYEE (EMPNO NUMBER(4) PRIMARY KEY, ENAME VARCHAR(4) NOT NULL); CHECK Check constraint is used where we define a condition on a column. Check constraint consists of the keyword col_name datatype CHECK (col_name in(value1, value2)) Example If you have a business rule saying that all employees in the organization should get atleast $500 then we can use CHECK constraint while creating table. CREATE TABLE EMPLOYEE ( EMPNO NUMBER(4) PRIMARY KEY, ENAME VARCHAR(4) NOT NULL, SALARY NUMBER(7,2) CHECK (SALARY > 500) ); DEFAULT While inserting a row into a table without giving values for every column, SQL must insert a default value to fill in the excluded columns, or the command will be rejected. The most common default value is NULL. This can be used with columns not defined with a NOT NULL. Default value assigned to a column while creating the table using CREATE TABLE operation. Example CREATE TABLE ITEM (ITEM_ID NUMBER(4) PRIMARY KEY, ITEM_NAME VARCHAR(15), ITEM_DESC VARCHAR(100), QOH NUMBER(4) DEFAULT 100) Assigning a default value 0 for numeric columns makes the computation.

PRIMARY KEY Primary Key in a table is a unique identifier of a row. For example,if you are maintaning the customer profiles, you should assign particular number to each one. So customer_number should be defined as a Primary key in Customer table. REFERENCES is a Foreign key. A foreign key column value refers a column in another table to check whether the value exists or not. UNIQUE The values entered into a column are unique ie no duplicate values exists.This constraint ensures business that there is no duplicates allowed. Data Definition Language It's a part of SQL langugae which creates a database object. Examples of database objects are tables, procedures, functions, packages etc. When you create a table or drop a table you are modifying the structure of the database and that is the reason why it is called data definition language. When you issue a create or alter or drop sql statements database internally does a commit,and that is why we cannot include the DDL as part of the transaction.Following are a few DDL statements. Create table Create table course ( course_id not null number(5) primary key, course_name not null varchar2(30), start_date Date); Alter table course modify ( start_date not null date ); Alter table course add ( instructor_id null ); Drop table course Create table course ( course_id not null primary key, course_name varchar(30), start_Date date ) tablespace=course_info storage (initial 1024k next 1024 pctincrease=10) Data Manipulation Language Data Manipulation in RDBMS means maintaining the data in the database. There are three DML statements:Insert,Update and Delete. INSERT statment is used to insert a new record into a table. The UPDATE statement is used to change the existing information of a table. The DELETE statement is used to remove certain information from the table. We will take an example here:If you are running an apartment complex where you rent apartments,the day to day record maintenance would look like this. tenant_id aptno tenant_name home_phone work_phone apt_rent no_of_pets 1000 888 SMITH 881-890-9000 767-908-5432 900 1

1001 1002

889 890

STEVE BILL

881-909-8971 898-543-9032 890 781-897-9011 567-891-9108 880

0 2

INSERT Statement If a person named JAMES rented an apartment,we need to add his information into the table. We have to do an INSERT because the information does not exist in the table as of now.The following information has to be entered into the database:-name = JAMES aptno = 891, home_phone as 676-789-9011, work_phone as 777-5671234, apt_rent = 880 and no_of_pets as 1. So now how we can write the INSERT statement. INSERT into TENANT (tenant_id, aptno, tenant_name, home_phone, work_phone, apt_rent, no_of_pets ) VALUES (1003, 891, 'JAMES','676-789-9011','777-567-1234', 880, 1 ). After executing the insert statement the table now should have four rows as shown below tenant_id aptno tenant_name home_phone work_phone apt_rent no_of_pets 1000 1001 1002 1003 888 889 890 891 SMITH STEVE BILL JAMES 881-890-9000 767-908-5432 900 881-909-8971 898-543-9032 890 781-897-9011 567-891-9108 880 676-789-9011 777-567-1234 880 1 0 2 1

Following shown are the different syntaxes available INSERT SQL syntaxes. Syntax1 INSERT into table_name values (col1, col2, col3....) values (value1, value2, value3.....) In the syntax 1 we need to specify the column names of a table and values respectively. In the application development its more recommened to use this syntax while doing inserts into the table, reason being if you added a column in the table then it wont give an error except the value for that column wont be supplied and program will run fine. Syntax2 INSERT into table_name values ( value1, value2.....) In the Syntax 2 we wont specify the column names and pass all the values to the columns respectively. Syntax3 INSERT itno table_name (col1, col2, col3...) SELECT col1, col2, col3........ FROM table

In the Syntax 3 we can insert multiple rows using one INSERT into statement where as in Syntax 1 and Syntax 2 you can insert only one row at a time. UPDATE Statement Now we will go the next DML statement UPDATE. Update is used to change the existing value in a column of a table. As JAMES work_phone number changed to 765-123-9087 from 777-567-1234 then we need to change that information in JAMES record in the table. UPDATE TENANT SET work_phone = '765-123-9087' WHERE tenant_name = 'JAMES'. After executing theUPDATE statement the table now should have four rows as shown below. tenant_id aptno tenant_name home_phone work_phone apt_rent no_of_pets 1000 1001 1002 1003 888 889 890 891 SMITH STEVE BILL JAMES 881-890-9000 767-908-5432 900 881-909-8971 898-543-9032 890 781-897-9011 567-891-9108 880 676-789-9011 777-567-1234 880 1 0 2 1

Syntax UPDATE (table_name) SET (colname1 = Value1, colname2 = Value2.......) [WHERE clause] If you wont include WHERE clause in your UPDATE statement then it will update all the rows in the table, so you should be very careful when you are writing UPDATE statements in work. DELETE Statement SMITH moves out of the apartment complex, so now we do not need to have his information in the table. You can use DELETE Sql statement. DELETE TENANT WHERE tenant_name = 'SMITH' After executing the DELETE statement the table now should have three rows as shown below. tenant_id aptno tenant_name home_phone work_phone apt_rent no_of_pets 1001 1002 889 890 STEVE BILL 881-909-8971 898-543-9032 890 781-897-9011 567-891-9108 880 0 2

1003 Syntax

891

JAMES

676-789-9011 777-567-1234 880

DELETE FROM (table_name) [WHERE clause] If you wont include WHERE clause in your DELETE statement then you delete all the rows in the table, so you should be very careful when you are writing DELETE statements in work. Some of the examples of INSERT, UPDATE and DELETE statements. Insert SQL examples Example 1 INSERT into BOOKS ( book_id, book_nm, author, price ) values ( 234, 'Oracle', 'Smith', 45 ); Example 2 INSERT into BOOKS values ( 235, 'C++','Austin', 50); Example 3 INSERT into BOOKS (book_id, book_nm, author, price ) SELECT book_no, book_name, author_name, book_price FROM legacy_books WHERE author_name = 'BILL'; Update SQL Examples Example 1 UPDATE BOOKS SET book_nm = 'C++ for Experts' Example 2 UPDATE BOOKS SET book_nm = 'Oracle' WHERE book_no = 103 Example 3 UPDATE BOOKS SET price = price - 5 WHERE author in ( SELECT author FROM authors WHERE state = 'CA') Example 4 UPDATE BOOKS SET price = price - 2 WHERE exists ( select author FROM auhtors WHERE books.author = author.author ) DELETE SQL Examples Example 1 DELETE BOOKS

Example2 DELETE BOOKS WHERE book_no = 235 Example 3 DELETE BOOKS WHERE author in ( SELECT author FROM authors WHERE state = 'TX') Create a table called PATIENT so that we can do Data manipulation like INSERT, UPDATE and DELETE statements. Patient_id Number(4) Primary Key Patient_name Varchar(35) Not Null, Primary_doctor Number(4) Foreign Key, Patient_dob Date Not Null, Patient_phone Char(10) NULL

Using INSERT statements insert the following rows into PATIENT table. PATIENT _ID 1500 1501 1502 1503 1504 PATIENT_NAME PRIMARY_DOC PATIENT_DOB PATIENT_PHONE SMITH KTMAN WATER MARINO DAWKINS ABDUL JON ABDUL JON DUPOINT 10/10/1964 02/02/1960 03/03/1955 09/02/1975 05/07/1978 312-896-9632 312-666-1478 312-885-9632 312-555-7412 312-951-7532

Change the patient name SMITH to RODMAN whose dob is 10/10/1964 and primary doctor is ABDUL. Change the phone number of WATER from 312-666-1478 to 312-567-8988. Delete patient SMITH from the PATIENT table. NULL VALUES According to CODD's rule any RDBMS should support NULL value. What is a null value? Its a unknown value or an undefined value. How you will insert a NULL value into table. For example if you have a table called APT_ENQUIRY with the following structure.

ENQ_NAME char(25) not null PHONE char(10) not null ADDRESS1 varchar(30) not null ADDRESS2 varchar(30) CITY STATE ZIP varchar(30) not null char(2) not null char(5) not null

If you see the address2 column for MARK there is no value, that is NULL value. How you will insert a null value when its undefined. If you omit the column name in your insert statement while inserting a row then that column will have a NULL value, you cannot omit the not null column from the insert statement.

ENQ_NAME PHONE SMITH MARK 675-0983478 972-8907654

ADDRESS1

ADDRESS2 CITY NEW YORK DALLAS

STATE ZIP NY TX 01123 75240

KING CORNER 9th STREET QUEEN STREET

Considerations while dealing with NULL's. NULL value is different from simply assigning a column the value 0 or a blank. It cannot compared using the relational or logical operators. select * from apt_enquiry where address2 = null - This is wont fetch any rows. select * from apt_enquiry where address2 is null - This is right. select * from apt_enquiry where address2 <> null - This is wrong. select * from apt_enquiry where address2 is not null - This is right Select Statement is the powerful SQL Command we use the most in the database activity. Select statement is used to retrieve the data from the tables. Employee Table with data (Following examples and selects based on the following table (EMP)) empno ename 1001 Jones dob mgr deptno job sal comm 500

10/10/1967 1013

10 MANAGER 4000

1002 Dave 1004 David Syntax

10/10/1950 1001 06/10/1960 1003

10

CLERK 3000 50

1003 Jhonson 08/06/1955 1013

20 MANAGER 4000 20 SALESMAN 3500

SELECT col_name, col_name................. FROM table_name WHERE condition

Selecting all columns We can select all the columns from a table using * operator in SELECT statement. SELECT * FROM EMP; Displays all the rows from the emp table. Usually we can write this sort of select in the development environment, we should not write this sort of select in the production environment. Selecting particular columns. We can select particular columns from a table. Suppose if we want to select empno, ename and sal column values from the EMP table then we can write the SELECT as follows. SELECT empno, ename, sal FROM Emp; Column Aliases Usually if we select a column from a table then the column heading is same as the column name, if we want to change the column header for display purpose then we have to use Aliases for the column names. If the alias includes the space in it then we should include within the double quotes. SELECT empno "Employee Number", ename "Employee Name", sal Salary FROM Emp Specific Rows If we want to display all employee numbers and names who works in deptno 10 then how we should write the select. Here we need to display empno, ename so the columns in SELECT clause is empno, ename. In the FROM clause we need to

specify the table name ie EMP. What is the condition? needs to display the employees works in deptno 10. So we need to write the WHERE clause in the SELECT. Here we are selecting specific rows within the table. So our Select statement will be SELECT empno, ename FROM Emp WHERE deptno = 10;

Ordering Rows If we want to display the result set in an order then we include the ORDER BY CLAUSE in the Select statement. Display the employee names, salary information and sort the employee names alphabetically. SELECT ename, sal FROM Emp ORDER BY ename; Suppose we want to display the result set by salary in descending order then SELECT ename, sal FROM Emp ORDER BY sal DESC; By default the order by is ASC ie asending. Expressions in Select statement In order to get the sum of salary and commission we need to add two columns ie sal and comm. So you can manipulate in the Select statement itself. SELECT ename, sal, comm, sal + comm "Total" FROM emp; If comm column is null then if we add sal to it, it ends up with a null value. So we can use NVL function. SELECT ename, sal, comm, sal + nvl(comm,0) "Total" FROM emp; If you want to display all the employees who has their employee numbers as even number.

SELECT empno FROM emp WHERE mod(empno,2) = 0 Concatinating Strings Suppose if we want to display the employee name and salary information as int the follwing format JONES works in deptno 10 then in the above shown format JONES is ename column and 10 is deptno column from emp table. In the JONES works in deptno 10, the highlighted text should get repeated for all the rows then we should concatenate the ename information with the deptno value. To concate the two values in SQL you can use || or CONCAT function. SELECT ename || ' works in deptno ' || deptno FROM emp or SELECT ename CONCAT 'works in deptno' CONCAT deptno FROM emp Querying Multiples Tables Joins are used to combine columns from different tables. With joins, the information from any number of tables can be related. In a join, the tables are listed in the FROM clause, separeated by commas.The condition of the query can refer to any column of any table joined. The connection between tables is established through the WHERE clause. Based on the condition specified in WHERE clause, the required rows are retrived. Following are the different types of joins Equi Joins, Cartesian Joins, Outer Joins, Self Joins Equi Joins When two tables are joined together using equality of values in one or more columns, they make and equi join. Table prefixes are utilized to prevent ambiguity and the WHERE clause specifies the columns being joined. Example List the employee number, employee name, department number and department name. See the information we want in this example. We can get Employee number, Employee name, Department information from employee table but department name exists in department table, so to get all the information in one Select we should join two tables and join with a common column between two tables(where clause), here deptno column is the common column between emp and dept tables.

Select empno, ename, emp.deptno, dname From emp, dept Where emp.deptno = dept.deptno Cartesian Joins If you are selecting information from more than on table and if you did not specify the where clause, each row of one table matches every row of the other table ie Cartesian Join. If you have a table TAB1 which has 25 rows, TAB2 which has 10 rows then, if you join these two tables without where cluase then you get 25 * 10 ( 250 ) rows as the result set. Cartesian products is useful in finding out all the possible combination of columns from different tables. Outer Joins If there are any values in one table that do not have corresponding values in the other, in an equijoin that row will not be selected. Such rows can be forcefully selected by using the outer join symbol (+). The corresponding columns for that row will have NULLs. Where you will use the Outer Join. For example we have employee and department tables. In department table deptno is the primary key, in employee table deptno exists and its a foreign key. By rule you cannot have a deptno in employee table if it does not exists in dept table, ie the primary and foreign key concept. So we can have a department record and there is no employee in the related department. In the emp table, no record of the employees belonging to the department 40 is present. Therefore, in the example above for equi join, the row of department 40 from the dept table will not be displayed Display the list of employees working in each department. Display department information even if no employee exists in that department. Select empno, ename, dept.deptno, dname, loc from emp, dept where emp.deptno( + ) = dept.deptno The outer join symbol (+) cannot be used both the sides Self Join To join a table to itself means that each row of the tables is combined with itself and every other row of the table. The self join can be viewed as a join of two copies of same table. The table is not actually copied, but SQL performs the command as though it were.

Ex:Get the employee name and manager name assigned for that employee . since manager is also employee in employees table

Syntax: Select a.ename employee_name, b.name manager_name from emp a, emp b where a.mgr = b.empno Built-in Database Functions Character Functions LOWER ( char variable) - Used to show the string in lower case UPPER (char variable) - Used to show the string in Upper case LTRIM (char variable) - Remove the spaces " " in left side of the string RTRIM (char variable) - Remove the spaces " " in right side of the string SUBSTR(char variable, m) - Gets you the part of a string LENGTH(char variable) - Gives the length of the string INSTR(string variable, char) - Gives the position of the char you are searching for LPAD RPAD INITCAP(char variable) - Every first letter in the passed string becomes upper case Examples for Character Functions Select lower('EXAMPLE FOR LOWERCASE') from dual; Select upper('example for upper case') from dual; Select ltrim(' left trim example') from dual; Select rtrim ('right trim example ') from dual; Select Substr('you are correct', 1,7) from dual; Select length('you are correct') from dual; Select instr('you are correct', 'correct') from dual; Select initcap('you are correct') from dual; Arithmetic Functions

ABS (numeric) CEIL (numeric) FLOOR (numeric) MOD POWER SIGN SQRT TRUNC ROUND Examples for Arithmetic Functions Select ABS(-9) from dual; Select MOD(5,2) from dual; Date Functions Sysdate - Gives the current date. Add_months ( date variable, number of months to be added to that date ) Months_between ( date variable d1, date variable d2) To_date( char variable, date variable) Last_Day( date variable ) Next_Day (date variable, day ) To_Char( date variable, to what format you want ) Examples for Date Functions SELECT sysdate FROM dual; SELECT to_date('1997 09 24', 'yyyy mm dd') FROM dual; SELECT months_between( sysdate, to_date('10-24-1994','MM-DD-YYYY') FROM dual; SELECT add_months( sysdate, 4) FROM dual; SELECT last_day( sysdate ) FROM dual; SELECT next_day( sysdate,'monday') FROM dual; SELECT to_char(sysdate,'day-month-yyyy') FROM dual;

Group Functions Group By is used mostly with functions in which the functions produces value for each group.

Avg Sum Count Max Min Group By Clause The GROUP BY clause can be used in a SELECT statement to collect data across multiple records and group the results by one or more columns. We use GROUP BY clause when we use the aggregate functions by grouping the records based on a column. All columns in the SELECT list that are not in group functions must be in the GROUP BY clause Example: To find the sum of salary by department. Emp Table: Empno Ename 7369 7499 7321 7200 7654 7622 7644 AGASSI JIM HINGIS MARIA JULIE SANIA Salary Deptno 20000 10 25000 10 9000 7000 30 20 19000 10 10000 20

WILLIAMS 10000 20

To select sum of salary for each department we write query as SELECT SUM(SAL),DEPTNO FROM EMP GROUP BY DEPTNO; The Output will be: SUM(SAL) DEPTNO 64000 27000 9000 10 20 30

The Aggregate functions that can be used along with the GROUP BY clause are SUM(), MAX(), MIN(), COUNT(), AVG(), FIRST(), LAST().

NOTE: If we list any of the columns which are not encapsulated in the aggregate functions in the SELECT statement, we must list those columns in the GROUP BY clause. We call it as the "THUMB RULE". Example: SELECT DEPTNO, COUNT(*) AS "NUMBER OF EMPLOYEES" FROM EMP WHERE SALARY > 15000 GROUP BY DEPTNO; Because you have listed one column in your SELECT statement that is not encapsulated in the COUNT function, the DEPTNO field must, therefore, be listed in the GROUP BY section. HAVING CLAUSE The HAVING clause is used in combination with the GROUP BY clause. It can be used in a SELECT statement to filter the records that a GROUP BY returns. (i.e. we can also say HAVING clause is WHERE clause on the GROUP BY clause.) Example: SELECT DEPTNO, SUM(SALARY) AS "TOTAL SALARY" FROM EMP GROUP BY DEPTNO HAVING SUM(SALARY) > 30000; The above example gives the department number and sum of the salary to that department and filters the result like, the sum of salary should be greater than 30000. DECODE In Oracle/PLSQL, the DECODE function has the functionality of an IF-THEN-ELSE statement. The syntax for the DECODE function is: DECODE( expression , search , result [, search , result]... [, default] ) expression is the value to compare. search is the value that is compared against expression. result is the value returned, if expression is equal to search. default is optional. If no matches are found, the decode will return default. If default is omitted, then the decode statement will return null (if no matches are found). Example:1

EMP Table: Empno Deptno Gender Ename 7499 7234 2345 1234 10 20 30 40 M F M F Raghu Sita Ramu Rani

To select gender column as 'M' to 'F' and 'F' to 'M', we write the query as SELECT DECODE(gender,'M','F','M') FROM EMP; The Output will be: gender F M F M Example:2 The following expression decodes the DEPTNO column in DEPT table. If DEPTNO is 10 then the expression evaluates to 'ACCOUNTING', if 20 then the expression evaluates to 'RESEARCH', if 30 then it evaluates to 'SALES' and 'NONE' as default value. DECODE(deptno,10,'ACCOUNTING',20,'RESEARCH',30,'SALES','NONE'); The following example uses the decode expression in the SELECT statement. SELECT DECODE(deptno,10,'ACCOUNTING',20,'RESEARCH',30,'SALES','NONE') FROM EMP; The Output will be: DECODE ACCOUNTING RESEARCH SALES NONE

CASE The CASE function specifies conditions and results for a select or update statement. You can use the CASE function to search for data based on specific conditions or to update values based on the condition. The CASE expression can do all that DECODE does plus lot of other things including IF-THEN analysis, use of any comparison operator and checking multiple conditions, all in a SQL query itself. Moreover, using the CASE function, multiple conditions provided in separate SQL queries can be combined into one, thus avoiding multiple statements on the same table.

Syntax for the CASE function is: CASE WHEN condition 1 THEN result 1 WHEN condition 2 THEN result 2 -----WHEN condition n THEN result n ELSE default result END; Example1: The following statement gives the same result as the above used DECODE statement. SELECT EMPNO, CASE deptno WHEN 10 THEN 'ACCOUNTING' WHEN 20 THEN 'RESEARCH' WHEN 30 THEN 'SALES' ELSE 'NONE' END FROM EMP; The Output will be: empno case 7499 7234 ACCOUNTING RESEARCH

2345 1234

SALES NONE VIEW

A view is a virtual table. A view consists of rows and columns just like a table. The difference between a view and a table is that views are definitions built on top of other tables (or views), and do not hold data themselves. If data is changing in the underlying table, the same change is reflected in the view. A view can be built on top of a single table or multiple tables. It can also be built on top of another view. DEF: Logically represents subsets of data from one or more tables ADVANTAGES: To restrict data access To make complex queries easy To provide data independence To present different views of the same data

SYNTAX CREATE VIEW viewname [(column name,....)] AS subquery CREATE VIEW empvu80 AS SELECT employee_id, last_name, salary FROM employees

WHERE department_id = 80; Retrieval Operations: Using SELECT statement, Contents of the view can be viewed. Eg:1.Select * from view2 2.Select totqty,title_id from totsales Modification of views You can add a record to the view by inserting a record in the base table. For example, you can insert a record into view2 by adding a record to the table Sales. Take another example.

Create a table table1 with two fields col1 and col2.col1 allows not null and col2 allows null. Create a view view4which will have only col1.Insert a record into view4.Use select statement to display the contents of table1 and view4. You will find out that table1 will have a new record with a null value of col2.View4 will also include this new record. If you are inserting a new record into the view columns other than those in the table should allow for null values. If they do not allow for null values, then inserting a record to the view is not possible. If you want to delete a record from the view, you can do so by deleting it from the base table. Similarly, updation of view is possible only through base tables SYNTAX Modify the EMPVU80 view by using CREATE OR REPLACE VIEW clause. Add an alias for each column name. CREATE OR REPLACE VIEW empvu80 (id_number, name, sal, department_id) AS SELECT employee_id, first_name || ' ' || last_name, salary, department_id FROM employees

WHERE department_id = 80; Column aliases in the CREATE VIEW clause are listed in the same order as the columns in the subquery. REMOVING A VIEW: DROP VIEW view; Sequences A sequence: Automatically generates unique numbers Is a sharable object Is typically used to create a primary key value Replaces application code Speeds up the efficiency of accessing sequence values when cached in memory

Sequence is an object which generates the sequence numbers, first time when you get a value you get 1, next time you get 2, next you get 3.............. SYNTAX: CREATE SEQUENCE sequence [INCREMENT BY n] [START WITH n] [{MAXVALUE n | NOMAXVALUE}] [{MINVALUE n | NOMINVALUE}] [{CYCLE | NOCYCLE}] [{CACHE n | NOCACHE}]; to drop a Sequence DROP SEQUENCE CUSTOMER_SEQ ALTER SEQUECE CUSTOMER_SEQ RECYCLE CACHE 100 <seq_name> CURRVAL :Returns the current value of sequence. <seq_name> NEXTVAL :Returns the next value of the sequence.Also increments the value

What is an Index? An index: Is a schema object Is used by the Oracle server to speed up the retrieval of rows by using a pointer Can reduce disk I/O by using a rapid path access method to locate data quickly Is independent of the table it indexes Is used and maintained automatically by the Oracle server How Are Indexes Created? Automatically: A unique index is created automatically when you define a PRIMARY KEY or UNIQUE constraint in a table definition.

Manually: Users can create nonunique indexes on columns to speed up access to the rows. Creating an Index CREATE INDEX index ON table (column[, column]...); When to Create an Index You should create an index if:

A column contains a wide range of values A column contains a large number of null values One or more columns are frequently used together in a WHERE clause or a join condition The table is large and most queries are expected to retrieve less than 2 to 4 percent of the rows When Not to Create an Index It is usually not worth creating an index if:

The table is small The columns are not often used as a condition in the query Most queries are expected to retrieve more than 2 to 4 percent of the rows in the table The table is updated frequently The indexed columns are referenced as part of an expression Removing an Index DROP INDEX index; Synonyms

Synonyms is nothing but another name for a table in the current database or in other database. Its easier CREATE [PUBLIC] SYNONYM synonym

FOR

object;

Creating and Removing Synonyms CREATE SYNONYM d_sum FOR dept_sum_vu; REMOVING A SYNONYMS DROP SYNONYM d_sum;

Introduction to PL/SQL PL/SQL is nothing but a Procedural language which includes SQL. PL/SQL includes programming concepts such as using variables, IF..THEN ie condition branching, Loops, Error Handling etc. PL/SQL combines the SQL power and processing ability to give the best in the database industry. There are two types of blocks in PL/SQL: 1. Anonymous Blocks: have no name (like scripts) can be written and executed immediately in SQLPLUS can be used in a trigger

2. Named Blocks: Procedures Functions

Anonymous Blocks The structure of a block in the PL/SQL programming language is a program through which you can write a PL/SQL. It starts with DECLARE section where you declare all the variables you need in the block, but its an optional what it means is there is no need to have declare section if you dont have any variables. Next comes BEGIN section which is mandatory followed by EXCEPTION section where you handle the errors and the block ends with a END; Usually a Anonymous Block can be executed only once and usually you save the code in a file. Block Structure

DECLARE --where you declare variables like var_customer number(4); -- declaring a variable named var_customer of data type number. var_numofrows number(6) BEGIN -- where you actually perform the operation -- embedded select inside the PL/SQL Block Select count(*) into var_numofrows from invoices where customer_no = var_customer; -- Display the value which you got dbms_output.put_line('The number of invoices we have for the customer ' || var_customer || ' is ' || var_numofrows); EXCEPTION -- where you handle the error; END; Advantages of PL/SQL Modularity Reusability Maintenance Abstraction Performance Data Integrity Data security Difference between SQL & PL/SQL SQL is a non procedural and interactive. PL/SQL is a programming language where we can declare variables and write the code to process some job. Variables and Constants Variables and constants are used to hold and manipulate the values with in the PL/SQL. In the declare section of a block you declare the variables and its data type. suppose we want to hold the customer number in a block then how we declare a variable? first we should know the data type of customer number, whether its a number data type or char data type. If its a number data type then var_custno NUMBER(5); to declare a constant value ticket_price CONSTANT number(4) := 150;

SQL data types - CHAR, DATE, NUMBER, VARCHAR PL/SQL data types - BOOLEAN, BINARY_INTEGER, EXCEPTION Example for %TYPE What is the use of %TYPE declaration in PL/SQL students instead of hardcoding the datatype? DECLARE var_custno number(3); var_custname varchar(100); BEGIN Select customer_name into var_custname from customer where customer_no = var_custno; dbms_output.put_line('Customer Name is ' || var_custname); EXCEPTION WHEN no_data_found then dbms_output.put_line('Customer does not exists in Customer table'); END; If this code is in the database (which is in production), after some days if they changed the customer_no column from number(3) to number(4) then your program should change otherwise you will end up getting invalid number error while trying to execute. If we could use %TYPE instead of number(3) then we dont need to change the program, just changing the datatype in table automatically reflects in the PL/SQL block as it gets the datatype dynamically from the table when the PL/SQL block gets executed. DECLARE var_custno CUSTOMER.CUSTOMER_NO%TYPE; var_custname CUSTOMER.CUSTOMER_NAME%TYPE; BEGIN Select customer_name into var_custname from customer where customer_no = var_custno; dbms_output.put_line('Customer Name is ' || var_custname); EXCEPTION WHEN no_data_found then dbms_output.put_line('Customer does not exists in Customer table'); END; Example for %ROWTYPE In the %ROWTYPE we can assign the whole selected row from a table to that variable. The ROWTYPE variable have as many columns in the table from which the ROW is defined.

Main use of this ROWTYPE is, we can pass the whole row to a function or procedure instead of passing all the columns as seperate arguments, so that maintenance will be easiar. Declaring the ROWTYPE variable. TABLENAME%ROWTYPE; Example DECLARE var_custno CUSTOMER.CUSTOMER_NO%TYPE; var_custrec CUSTOMER%ROWTYPE; BEGIN var_custno := &CustomerNumber; SELECT * into var_custrec FROM customer WHERE customer_no = var_custno; dbms_output.put_line( var_custrec.customer_name || ' , ' || var_custrec.cust_addr); EXCEPTION WHEN no_data_found then dbms_output.put_line( 'No data found for the Customer no you entered'); END; Record Datatype A record in PL/SQL is nothing but a variable which includes more than one datatype. First we have to declare the record type data type and then assign then declare a variable of that type so that we can use it in the block. So there are two steps and the syntax is

TYPE rec_datatype IS RECORD ( var_name1 datatype, var_name2 datatype,....) var_rec rec_datatype

Example for RECORD type DECALRE TYPE custinforec IS RECORD ( var_custno customer.cust_no%TYPE, var_custname customer.cust_name%TYPE); var_custrec custinforec; BEGIN SELECT cust_no, cust_name into var_custrec FROM customer WHERE cust_no = 1123; dbms_output.put_line(var_custrec.cust_no); EXCEPTION WHEN no_data_found then

dbms_output.put_line('No data found for the query'); END;

Variable Scope DECLARE -- Outermost block var_customer number(4); BEGIN -- In this block we can see var_customer variable which we declared in this block DECLARE -- inner block var_innername varchar(20); BEGIN -- In this block we can see var_innernmae variable as well as var_customer which is declared in the outermost block. We cannot refer var_inexception variable from this block becase its in a different block. We can see the current blocks variables as well as outer block variables EXCEPTION -- Error handling for the inner block END; EXCEPTION WHEN an exception occurs DECLARE var_inexception number(4); begin -- another block BEGIN -- In this block we can refer var_inexception variables as well as var_customer which is declared in the outermost block. We cannot refer var_innername variable from this block because its in a different block. EXCEPTION -- Error handling for the exception block i.e current block END; END;

Control Statements and Loops Used to control PL/SQL logic with the conditional structure, with loops and with unconditional branching. PL/SQL Control Statement IF-THEN-ELSE LOOP FOR-LOOP WHILE-LOOP GOTO Description Condition, If the expression is true then execute one sequence else another sequence Repeat a statement or set of statements unconditionally You break the loop using EXIT statement. Repeat a statement or set of statements for a fixed number of times Repeat a statement of set of statements until condition is FALSE. Branch to a new set of statements.

A Loop is nothing but executing the same block of code more than one time. PL/SQL supports the Loops, various types of loops in PL/SQL are as shown in this page. IF..THEN Statement is used to check certain condition, if the condition is TRUE then execute the THEN set of statements, otherwise execute ELSE set of statements. IF-THEN ELSE DECLARE var_num1 number(4); var_num2 number(4); BEGIN var_num1 := 10; var_num2 := 20; IF var_num1 > var_num2 THEN dbms_output.put_line('The largest number is ' || to_char(var_num1)); ELSE dbms_output.put_line('The largest number is ' || to_char(var_num2)); END IF; END;

DECLARE var_checkno number(4); BEGIN var_checkno := &tocheck; IF mod(var_checkno,2) = 0 THEN dbms_output.put_line('Even number'); ELSE dbms_output.put_line('Odd number'); END IF; END; In the following PL/SQL block lets you enter 3 numbers and finds the largest one. In this you can see IF..THEN with in another IF..THEN. DECLARE first_num number(3); sec_num number(3); third_num number(3); BEGIN first_num := &number1; sec_num := &number2; third_num := &number3; IF first_num > sec_num THEN IF first_num > third_num THEN dbms_output.put_line('First Number ' || to_char(first_num) || ' is greater of all entered numbers'); ELSE dbms_output.put_line('Third Number ' || to_char(third_num) || ' is greater of all entered numbers'); END IF; ELSE IF sec_num > third_num THEN dbms_output.put_line('Second Number ' || to_char(sec_num) || ' is greater of all entered numbers'); ELSE dbms_output.put_line('Third Number ' || to_char(third_num) || ' is greater of all entered numbers'); END IF; END IF; END; IF-THEN-ELSIF Enter customer number if the total number of orders < 1000 then OK, between 1000 and 2000 then GOOD other wise TOP CUSTOMER. DECLARE var_custno CUTOMER.CUSTNO%TYPE; var_orders number(10); BEGIN

var_custno := &CustomerNo; Select count(order_id) into var_orders From orders where customer_no = var_custno; IF var_orders < 1000 THEN dbms_output.put_line(to_char(var_custno) || ' is a OK customer'); ELSIF var_orders between 1000 and 2000 THEN dbms_output.put_line(to_char(var_custno) || ' is a GOOD customer'); ELSE dbms_output.put_line(to_char(var_custno) || ' is a TOP customer'); END IF; END; Unconditional Loop What's an unconditional loop? which enters into the loop first then check the condition to get out of the loop, where as conditional loop checks the condition, based on the result it will decide whether to go into the loop or bypass the whole loop and continue to the next statement in the block. DECLARE var_running number(4); BEGIN var_running := 1; Loop var_running := var_running + 1; dbms_output.put_line(' The current number is ' || to_char( var_running ) ); If var_running > 101 then Exit; End If; End Loop END; While Loop The syntax for While loop is WHILE condition LOOP --pl/sql statements END LOOP For Loop If you the know the number of times you are going to execute the code then we can use For loop in PL/SQL. The syntax for For Loop is For var in starting_no..ending_no Loop -- write the code to execute so many times End Loop

Example To Display the even numbers between 1 and 200 using For Loop in a PL/SQL block. DECLARE var_runningvalue number(3); BEGIN dbms_output.put_line('Even numbers between 1 and 200'); dbms_output.put_line('-----------------------------------'); For var_runningvalue in 1..200 Loop If mod(var_running,2) = 0 then dbms_output.put_line(var_running) End If; End Loop; END; GOTO Statement In a block we can skip some of the statements and jump to a execute position using GOTO statement. Declaration of GOTO statement is GOTO Lable_Name. Example DECLARE var_empno employee.employeeno%type; var_empname employee.employeename%type; var_empstate employee.state_code%type; var_salary number(12,2); BEGIN Select stat_code,salary into var_empstate, var_salary From employee Where employeeno = var_empno; IF var_empstate = 'TX' THEN GOTO <<texas>> END IF; Select state_tax into :var_statetax From state where state = var_empstate; var_salary = var_salary - (var_salary * var_statetax/100); <<texas>> var_salary = var_salary + 0; -- just add 0 to the var_salary if its texas END;

CURSORS is the way you loop through the rows returned by a Select statement. Say for example from SQL* Plus if you write SELECT customer_name FROM customer returns the set of rows from customer table. If your corporation decides to give some discounts for your customers, based on how loyal he is, how much business we do with that customer etc etc, now we need to check some of the stuff before giving discounts, so we need to check one by one row from customer table and make a decision based on the rules. So here we cannot use a single update statement to the related tables. Now Cursor comes into picture. A cursor is nothing but a result set through which you can fetch one by one row. They are two different types of cursors

Explicit Cursors Implicit Cursors.

Implicit cursors is nothing but if you issue a select statement the server executes the query and stores the rows in a memory area in the server and returns the rows in network packets to the server, here you do not have the control in the result set of rows. Explicit cursors is the sql statements where you have the control over the result set where you can fetch one by one row from the result set. Things should be done while dealing with cursors Declare a cursor Open a cursor Fetch data into variables from the cursor Close the cursor Following picture will give you some idea about the cursor.

Remember while working with cursors we must provide same number of variables as the number of columns we selected in the select statement of the cursor. You cannot fetch once you closed the cursor, if you do it will raise an exception called invalid_cursor. You can do fetch only after opening the cursor. Cursor Attributes %ISOPEN returns TRUE if already the cursor is open. returns FALSE if its not opened. %NOTFOUND returns TRUE if the last fetch statement does not return a row. %FOUND returns TRUE if the last fetch statement return a row. %ROWCOUNT total number of rows returned so far. How you declare a Cursor? This can be done in the DECLARE section of a PL/SQL Block. DECLARE var_custname customer.cust_name%type; CURSOR getcustnames IS SELECT cust_name FROM customer; BEGIN OPEN getcustnames; --opening a cursor, actually execute the sql --and places all the rows in server memory area LOOP FETCH getcustnames into var_custname; -- Fetching the current record Exit When getcustnames%NOTFOUND --If all the rows got over then -- %NOTFOUND cursor attribute will

be true. dbms_output.put_line(var_custname); -- Display the customer name END LOOP; Close getcustnames; -- Close the cursor so that server releases memory. END; Passing Arguments to a cursor DECLARE TYPE id_emp_table IS TABLE OF number(2) INDEX BY BINARY_INTEGER; v_deptno number(2); i BINARY_INTEGER := 1; CURSOR get_empno(v_in_dept number(2)) IS SELECT empno FROM emp where deptno = v_in_dept; empno_plsql_table id_emp_table; BEGIN v_deptno := 30; Open c1(v_deptno); Loop Fetch c1 into v_empno_hold; If c1%FOUND then empno_plsql_table(i) := v_empno_hold; i := i + 1; Else Exit; End If; End Loop; Close c1; For j in 1..i Loop dbms_output.put_line( empno_plsql_table(j) ); End Loop; END;

FOR UPDATE cursor DECLARE v_deptno number(3); CURSOR c1 IS select empno, deptno from emp FOR UPDATE; BEGIN For c1_record IN c1 Loop If deptno = 40 then DELETE from emp WHERE CURRENT OF c1; End If; End Loop; COMMIT WORK; END;

PROCEDURE A procedure is nothing but a PL/SQL wrapped up with in a name to save the PL/SQL in the database. What is the difference between a PL/SQL block and a Procedure? When you execute a PL/SQL block the RDBMS check the syntax, parses the query and creates the execution plan and then executes the PL/SQL block, where as if we create a stored procedure in the database while saving the stored procedure it checks the syntax, parses the queries and saves all the information in the database so that when we execute the stored procedure it wont do all that stuff again instead it executes the stored procedure using the existing information. Syntax CREATE OR REPLACE PROCEDURE procedure_name ( argument1 in/out data type, argument2 in/out data type....) AS PL/SQL Block End Procedure_name IN argument OUT argument IN OUT argument IN - pass the value from calling environment into the procedure. OUT - return a value from the procedure to the calling program. IN OUT - pass the value from calling program and the called program passes some other calculated value through the same variable to the calling program. Following diagram explains the difference between IN and IN OUT arguments passing to a stored procedure or stored function. From where we call stored procedures We can call a stored procedure from a Pl/SQL block, another stored procedure, function or a trigger.

PRODUCT_I PRODUCT_N QTY_ON_HAN PRICE_PER_QT REORDER_LEVE D M D Y L 1250 1251 GEM MONITORS Microsoft Win 98 25 100 $125 $50 10 50

Write a stored procedure when you sell a product, check if the qty_on_hand is equal to reorder level or less than reorder level, if so insert a row into the orders table. If you already placed the order with in last 2 days then do not place an order on that product. CREATE OR REPLACE PROCEDURE check_update_reorder ( prod_id in number, curr_qty in number ) is v_reorder_level product.reorder_level%type; begin select reorder_level into v_reorder_level from products where product_id = prod_id; If curr_qty <= v_reorder_level then begin select 1 from orders where product_id = prod_id and order_date between trunc(sysdate) - 2 and trunc(sysdate); exception when no_data_found then insert into orders ( order_id, product_id, order_date ) values ( order_seq.nextval, prod_id, sysdate ); end; End If; end check_update_level; FUNCTION A function is nothing but a stored PL/SQL program which perform some operation which takes arguments and return a value back to the calling program. Difference between Procedure and Function Procedure may not return a value to the calling program. Always function must return a value to the calling program. Syntax CREATE OR REPLACE FUNCTION function_name (argument1 in/out data type, argument2 in/out data type....) RETURN data type AS PL/SQL block End function_name While writing a function we should have a return statement with in the PL/SQL block. You cannot execute a function as same as stored procedure. You should call a function from a PL/SQL block or from a sql statement or from another stored procedure or stored function, the reason being the value is returned by the function and that value should be in a variable.

Write a function to get the customer name from customer table by passing the customer number. CREATE OR REPLACE FUNCTION get_custname ( var_custno CUSTOMER.CUST_NO%TYPE) return char AS var_custnmhold CUSTOMER.CUST_NAME%TYPE; SELECT cust_name into var_custnmhold FROM customer WHERE cust_no = var_custno; Return var_custnmhold; EXCEPTION WHEN no_data_found then Return ' '; END get_custname; Write a Function to update the customer name by passing the customer number and the new name. If you find the row and updated then return1 else -1. CREATE OR REPLACE FUNCTION func_upt_custname ( var_custno customer.cust_no%TYPE, var_custname customer.cust_name%TYPE ) return number IS BEGIN Update customer set cust_name = var_custname Where cust_no = var_custno; IF SQL%FOUND then Return 1 ELSIF SQL%NOTFOUND then Return -1 END IF EXCEPTION WHEN Others THEN Return -1 END func_upt_custname;

PACKAGES A package is a object where you put in all the related procedures and functions together in one object. Packages has two parts, one is Package Spec and another one is package body. In Packgae Spec is nothing but an object in which you declare the procedure and function names which you are going to group together, arguments for the procedure ie declaration part of procedures and functions. In Package Body we write the code for all the procedures and functions we declare in the package spec. You should have same number of procedure and functions in body as same as package spec, otherwise you will get an error when try to save the body. Syntax to create the Package Spec CREATE OR REPLACE PACKAGE SPEC <spec_name >

declare variables here so that any procedure or function with in this package can use it. Subprograms declartion Example CREATE PROCEDURE invoice_monthly_report ( var_mnthyear char(4) ); CREATE FUNCTION check_invoice_balance ( var_invno number(4) ) RETURN number; END <spec_name> Syntax to create the Package body CREATE OR REPLACE PACKAGE BODY <spec_name > CREATE PROCEDURE invoice_monthly_report ( var_mnthyear char(4) ) AS declare variables BEGIN write the pl/sql code EXCEPTION handle the exceptions END invoice_monthly_report CREATE FUNCTION check_invoice_balance (var_invno number(4))RETURN number AS declare the variables BEGIN write pl/sql code return the value to the calling program EXCEPTION handle the exception return the value (may be -1 if the program failed) END check_invoice_balance; END <spec_name>

TRIGGERS A trigger is a stored program which will get executed when an event occurs on a table which is nothing but an insert or update or delete statement. You cannot call a trigger like a stored procedure or a function. Triggers cannot pass any arguments to triggers. Following are different types of triggers on a table.

Insert Trigger (Before statement, Before Row, After Row, After Statement) Update Trigger (Before statement, Before Row, After Row, After Statement) Delete Trigger (Before statement, Before Row, After Row, After Statement

Das könnte Ihnen auch gefallen