You are on page 1of 90

Implementing Data Flow in SQL Server Integration Services 2008

Course 10058

Table of Contents
Defining Data Sources and Destinations .............................................................................................1 Introduction .............................................................................................................................................. 1 Lesson Introduction .............................................................................................................................. 1 Lesson Objectives.................................................................................................................................. 1 Introduction to Data Flows ....................................................................................................................... 2 Data Flow Sources ..................................................................................................................................... 3 Object Linking and Embedding Database (OLE DB) .............................................................................. 3 Flat file................................................................................................................................................... 3 Raw file.................................................................................................................................................. 4 Excel ...................................................................................................................................................... 4 XML ....................................................................................................................................................... 5 ADO.NET (ActiveX Data Objects)........................................................................................................... 5 Data Flow Destinations ............................................................................................................................. 7 Valid Data Destinations ......................................................................................................................... 7 Invalid Data Destinations ...................................................................................................................... 7 Configuring Access and Excel Data Sources .............................................................................................. 8 Excel ...................................................................................................................................................... 8 Access .................................................................................................................................................... 8 Data Flow Paths ............................................................................................................................... 10 Introduction ............................................................................................................................................ 10 Lesson Introduction ............................................................................................................................ 10 Lesson Objectives................................................................................................................................ 10 Introduction to Data Flow Paths ............................................................................................................. 11 Data Viewers ........................................................................................................................................... 12 Grid...................................................................................................................................................... 12 Histogram ............................................................................................................................................ 12 Scatter Plot .......................................................................................................................................... 12 Column Chart ...................................................................................................................................... 12 Implementing Data Flow Transformations: Part 1 ............................................................................. 13 i

Introduction ............................................................................................................................................ 13 Lesson Introduction ............................................................................................................................ 13 Lesson Objectives................................................................................................................................ 13 Introduction to Transformations ............................................................................................................ 14 Data Formatting Transformations .......................................................................................................... 15 Character Map transformation ........................................................................................................... 15 Data Conversion transformation ........................................................................................................ 16 Sort transformation ............................................................................................................................ 17 Aggregate transformation................................................................................................................... 17 Column Transformations ........................................................................................................................ 18 Copy Column transformation ............................................................................................................. 18 Derived Column transformation ......................................................................................................... 18 Import Column transformation........................................................................................................... 18 Export Column transformation ........................................................................................................... 19 Multiple Data Flow Transformations ...................................................................................................... 21 Conditional Split transformation ........................................................................................................ 21 Multicast transformation .................................................................................................................... 21 Merge transformation ........................................................................................................................ 22 Merge Join transformation ................................................................................................................. 22 Union All transformation .................................................................................................................... 22 Custom Transformations ........................................................................................................................ 23 Script Component transformation ...................................................................................................... 23 OLE DB Command transformation ...................................................................................................... 24 Slowly Changing Dimension Transformation .......................................................................................... 25 Implementing Data Flow Transformations: Part 2 ............................................................................. 26 Introduction ............................................................................................................................................ 26 Lesson Introduction ............................................................................................................................ 26 Lesson Objectives................................................................................................................................ 26 Creating a Lookup and Cache Transformation ....................................................................................... 27 Data Analysis Transformations ............................................................................................................... 28 Pivot transformation ........................................................................................................................... 28 Unpivot transformation ...................................................................................................................... 28 ii

Data Mining Query transformation .................................................................................................... 29 Data Sampling Transformations.............................................................................................................. 30 Percentage Sampling transformation ................................................................................................. 30 Row Sampling transformation ............................................................................................................ 30 Row Count transformation ................................................................................................................. 31 Audit Transformations ............................................................................................................................ 32 Fuzzy Transformations ............................................................................................................................ 33 Fuzzy Lookup ....................................................................................................................................... 33 Fuzzy Grouping.................................................................................................................................... 33 Term Transformations ............................................................................................................................ 35 Term Extraction transformation ......................................................................................................... 35 Term Lookup transformation .............................................................................................................. 35 Best Practices .................................................................................................................................. 37 Lab: Implementing Data Flow in SQL Server Integration Services 2008 ............................................... 38 Lab Overview .......................................................................................................................................... 38 Lab Introduction.................................................................................................................................. 38 Lab Objectives ..................................................................................................................................... 38 Scenario................................................................................................................................................... 39 Exercise Information ............................................................................................................................... 40 Exercise 1: Defining Data Sources and Destinations ........................................................................... 40 Exercise 2: Working with Data Flow Paths.......................................................................................... 40 Exercise 3: Implementing Data Flow Transformations ....................................................................... 40 Lab Instructions: Implementing Data Flow in SQL Server Integration Services 2008 ............................. 41 Exercise 1: Defining Data Sources and Destinations ........................................................................... 41 Exercise 2: Working with Data Flow Paths.......................................................................................... 43 Exercise 3: Implementing Data Flow Transformations ....................................................................... 46 Lab Review .............................................................................................................................................. 50 What is the purpose of Data Flow paths?........................................................................................... 50 What kind of errors can be managed by the error output Data Flow path? ...................................... 50 What data types does the Export Column transformation manage? ................................................. 50 What is the difference between a Type 1 and a Type 2 Slowly Changing Dimension and how are they represented in the Slowly Changing Dimension transformation?.............................................. 50 iii

What is the difference between a Lookup and a Fuzzy Lookup transformation? .............................. 50 Module Summary ............................................................................................................................ 51 Defining Data Sources and Destinations ................................................................................................. 51 Data Flow Paths ...................................................................................................................................... 51 Implementing Data Flow Transformations: Part 1 ................................................................................. 52 Implementing Data Flow Transformations: Part 2 ................................................................................. 53 Lab: Implementing Data Flow in SQL Server Integration Services 2008 ................................................. 53 Glossary........................................................................................................................................... 54

iv

Defining Data Sources and Destinations


Introduction
Lesson Introduction SSIS provides support for a wide range of data sources and destinations within a package. The starting point of a Data Flow task is to define the data source. This informs the Data Flow task of the location of the data that will be moved. Dependent on the data source used, different properties must be configured. Understanding the properties that are available within a data source will help you configure them efficiently. Data source destinations are objects within the Data Flow task that must be configured separately to data sources. Like data sources, they consist of properties that need to be configured to inform SSIS of the destination that the data will be loaded into. There are also additional data destinations such as Analysis Services. Lesson Objectives After completing this lesson, you will be able to:

Describe data flows. Use data flow sources. Use data flow destinations. Configure OLE DB data source. Configure Microsoft Office Access and Microsoft Office Excel data sources.

Introduction to Data Flows


Data flows are configured within the Data Flow task to determine the location of the source data the destination that the data will be inserted into and optionally, any transformations that may be performed on the data as it is being moved between the source and the destination. SQL Server Integration Services starts by defining a data source. Depending on the data source chosen, different properties will have to be configured. Typically, you would have to define connection information that would include the server name of the source data and the database name if accessing a table within a database and the filename if the source is a text or a raw file. You can also define more than one data source. You can then optionally add one or more transformations after the data source is defined. Transformations are used to modify the data so that it can be standardized. SQL Server Integration Services provides a wide variety of transformations to meet an organizations requirements. Each transformation contains different properties to control how the data is changed. You then define data destinations in which the transformed data is loaded into. Like data sources, the properties that are configured will differ depending on the data destination chosen and you are not limited to one data destination. To connect data sources, transformations and data destinations together, you use Data Flow paths to control the flow of the Data Flow tasks.

Data Flow Sources


SSIS provides a range of data source connections that you can use to access the source data from a wide variety of technologies. Additional sources are also available for download such as Microsoft Connectors for Oracle and TERADATA by Attunity and Microsoft SQL Server 2008 Feature Pack. Object Linking and Embedding Database (OLE DB) Using OLE DB, you can access the data that exists with SQL Server, Access and Excel. You can also connect to OLE DB providers for third-party databases. With OLE DB, you can access data directly from tables or views within a database. You can also use SQL statements to specifically target the data that you wish to return and take advantage of SQL clauses, such as ORDER BY, to retrieve the data. Furthermore, parameters can be defined in the SQL statement by using ? (question marks) and mapping the parameter to SSIS variables. The following properties can be configured:

Connection Manager page. Here, you can define a connection to the server, the database and the authentication by clicking the New button. The Data Access Mode has a list where you can define how to access the data. The options in the list can include selecting Table or View, Table Name or View name from a variable, a SQL Command or a SQL Command from a variable. Depending on what is selected, the options can change whereby you can select a specific table, view, variable or you can manually type the SQL command. There is also a Preview button to view the data. Columns page. You can use this page to view the Available External Columns so you can choose which columns is a part of the data source. They will appear under the External Columns if selected. You can also rename the output of the column by typing in a different column name in the Output Column list. Error Output page. You can use this page to control the error handling options. Should the data fail, you can ignore the failure, redirect the row or fail the component. This can be specified if the error is caused by data truncation or general data errors. The Column property lists the columns that are a part of the data source and you can add an optional description.

Flat file You can connect to text files by using the Flat file data source connection. This allows you to control how the text file is structured by defining the column and row delimiter. You can also define if the first row contains headers and provide information about the width of the columns and the locale of the text file. The following properties can be configured:

Connection Manager page. Here, you can define a connection to the text file by clicking the New button. This opens up a Flat File Connection Manager Editor, where you can define the location of the text file, the column and row delimiter, whether the text is qualified, the locale of the text file and whether the first row contains headings. Once defined, you can preview the data by clicking the Preview button. You can also specify whether null columns in the text file are retained by selecting the check box next to Retain null values from the data source as null values in the data flow. Columns page. This page enables you to view the Available External Columns so you can choose which columns is a part of the data source. If selected, they appear under the External Columns.

You can also rename the output of the column by typing in a different column name in the Output Column list. Error Output page. You can use this page to control the error handling options. Should the data fail, you can ignore the failure, redirect the row or fail the component. This can be specified if the error is caused by data truncation or general data errors. The Column property lists the columns that are part of the data source and you can add an optional description.

In the advance properties, the Fast Parse property provides a fast, simple set of routines for parsing data. These routines are not locale-sensitive and they support only a subset of date, time and integer formats. By implementing Fast Parse, a package forfeits its ability to interpret date, time and numeric data in locale-specific formats. Raw file The Raw file data flow source is used to retrieve raw data that was previously written by the Raw File Destination and allows for fast reading and writing of data and are typically used as an intermediary data file in a larger data load operation. The Raw file source has less configuration options than the text file, so no translation of the data is required providing the speed of data extraction. There is no error output page for this data source, so there is little parsing of the data required. The following properties can be configured:

Connection Manager page. Here, you can define a connection to the raw file by firstly specify the Access mode; this can either be a filename or a filename from a variable. If Filename is selected, you can then browse to the Raw file in the file system. If Filename from Variable is selected, you can select the variable from a drop-down list. Columns page. This page enables you to view the Available External Columns so that you can choose which columns is a part of the data source. If selected, they appear under the External Columns. You can also rename the output of the column by typing in a different column name in the Output Column list.

Excel Excel 2007 requires the OLE DB provider for the Microsoft Office 12.0 Access Database Engine OLE DB. For earlier versions of Excel, use the Excel Source data source component. The options are similar to the OLE DB data source, except that you point the connection manager to the Excel file. Any named ranges that are defined in Excel are the equivalent of tables and views. The following properties can be configured:

Connection Manager page. Here, you can define a connection to the Excel file by clicking the New button and browsing to the Excel file in the Excel Connection Manager dialog box. The Data Access Mode has a list where you can define how to access the data. The list can include selecting Table or View, Table Name or View name from a variable, a SQL Command or a SQL Command from a variable. Depending on what is selected, the options can change whereby you can select a specific table, view, variable or you can manually type in the SQL command by using the worksheet name as the equivalent to a table name in the FROM clause. There is also a Preview button to view the data.

Columns page. You can use this page to view the Available External Columns so that you can choose which columns is a part of the data source. They appear under the External Columns, if selected. You can also rename the output of the column by typing in a different column name in the Output Column list. Error Output page. You can use this page to control the error handling options. Should the data fail, you can ignore the failure, redirect the row or fail the component. This can be specified if the error is caused by data truncation or general data errors. The Column property lists the columns that are part of the data source and you can add an optional description.

XML The XML data source helps you retrieve data from an XML source document. You can also include that the data is read from a schema that is either an inline schema or a separate XML Schema Definition (XSD) file to ensure that the content of the XML meets with the data integrity checks within the schema. Data Type Definition (DTD) files are not supported. Schemas can support a single namespace, and does not support schema collections. The XML source does not validate the data in the XML file against the XSD file. The following properties can be configured:

Connection Manager page. The Data Access Mode has a list where you can define how to access the XML data. The list can include selecting XML File Location, XML file from a variable or an XML Data from a variable. Depending on what is selected, the options can change whereby you can select a specific file or variable from the list below the Data Access Mode. You can also define if the XML file or fragment works in conjunction with an XSD file. This can either be located in the existing XML data, in which case you can select the Use inline Schema check box or you can refer to a separate XSD file by clicking on the Browse button next to the XSD Location box. There is also a Preview button to view the data. Columns page. You can use this page to view the Available External Columns so that you can choose which columns is part of the data source. They appear under the External Columns, if selected. You can also rename the output of the column by typing in a different column name in the Output Column list. Error Output page. You can use this page to control the error handling options. Should the data fail, you can ignore the failure, redirect the row or fail the component. This can be specified if the error is caused by data truncation or general data errors. The Column property lists the columns that are part of the data source and you can add an optional description.

ADO.NET (ActiveX Data Objects) You can use the ADO.NET source to connect to a database and retrieve data by using .NET. The options that are available within the ADO.NET data source are very similar to the OLE DB data source and can access the .NET provider for OLE DB to create a datareader, which enables you to have a single row of data loaded into memory. However, unlike the OLE DB data source, the ADO.NET data source can also access non-OLE DB connections like .NET providers for ODBC data providers. The following properties can be configured:

Connection Manager page. Here, you can define a connection to the server, the database and the authentication by clicking the New button. The Data Access Mode has a drop-down list where you can define how to access the data, which can include selecting Table or View, Table 5

Name or View name from a variable, a SQL Command or a SQL Command from a variable. Depending on what is selected, the options can change whereby you can select a specific table, view, variable or you can manually type in the SQL command. There is also a Preview button to view the data. Columns page. You can use this page to view the Available External Columns so that you can choose which columns is part of the data source. They appear under the External Columns, if selected. You can also rename the output of the column by typing in a different column name in the Output Column list. Error Output page. You can use this page to control the error handling options. Should the data fail, you can ignore the failure, redirect the row or fail the component. This can be specified if the error is caused by data truncation or general data errors. The Column property lists the columns that are part of the data source and you can add an optional description.

Data Flow Destinations


Valid Data Destinations Excel Recordset Flat file SQL Server OLE DB SQL Server compact ADO.NET Raw file SQL Server Analysis Services (SSAS) partition SSAS dimension SSAS data mining training model Invalid Data Destinations SQL Server Reporting Services (SSRS) Access XML

Configuring Access and Excel Data Sources


Prior to working with the data sources in the Data Flow task, connection managers are created first so that they can easily be used within the Data Flow task. There are considerations to be mindful of when using Access and Excel in your SSIS package. Excel To connect to Excel, it is important to understand that different connection managers are used depending on the version of Excel that you are connecting to. To connect to a workbook in Excel 2003 or an earlier version of Excel, you must create an Excel connection manager from the Connection Managers area. To create an Excel connection manager, perform the following steps: 1. In Business Intelligence Development Studio, open the package. 2. In the Connections Managers area, right-click anywhere in the area, and then select New Connection. 3. In the Add SSIS Connection Manager dialog box, select Excel, and then configure the connection manager. To connect to a workbook in Excel 2007, you must create an OLE DB connection manager from the Connection Managers area. To create an OLE DB connection manager, perform the following steps: 1. In Business Intelligence Development Studio, open the package. 2. In the Connections Managers area, right-click anywhere in the area, and then select New OLE DB Connection. 3. In the Configure OLE DB Connection Manager dialog box, click New. 4. In the Connection Manager dialog box, for Provider, select Microsoft Office 12.0 Access Database Engine OLE DB. Access To connect to Access, it is important to understand that different connection managers are used depending on the version of Access that you are connecting to. If you want to connect to a data source in Access 2003 or an earlier version of Access, you must create an Access connection manager from the Connection Managers area. To create an Access connection manager, perform the following steps: 1. In Business Intelligence Development Studio, open the package. 2. In the Connections Managers area, right-click anywhere in the area, and then select New OLE DB Connection. 3. In the Configure OLE DB Connection Manager dialog box, click New. 4. In the Connection Manager dialog box, for Provider, select Microsoft Jet 4.0 OLE DB Provider, and then configure the connection manager as appropriate. 8

If you want to connect to a data source in Access 2007, you must create an OLE DB connection manager from the Connection Managers area. To create an OLE DB connection manager, perform the following steps: 1. In Business Intelligence Development Studio, open the package. 2. In the Connections Managers area, right-click anywhere in the area, and then select New OLE DB Connection. 3. In the Configure OLE DB Connection Manager dialog box, click New. 4. In the Connection Manager dialog box, for Provider, select Microsoft Office 12.0 Access Database Engine OLE DB, and then configure the connection manager as appropriate.

Data Flow Paths


Introduction
Lesson Introduction Data Flow paths are similar to Control Flow paths in that they control the flow of data within a Data Flow task. Data Flow paths can be simply used to connect a data source directly to a data destination. Typically, you use a Data Flow path to determine the order in which a transformation takes place; specifying the path that is taken should a transformation succeed or fail. This provides the ability to separate the data that cause errors from the data that is being successfully transformed. You can add data viewers to the Data Flow path. This enables you to get a snapshot of the data that is being transformed. This is useful when developing packages when you wish to see the data before and after it is transformed. Lesson Objectives After completing this lesson, you will be able to:

Describe Data Flow paths. Configure a data flow path. Describe a data viewer. Use a data viewer.

10

Introduction to Data Flow Paths


Data Flow paths play an important role in controlling the order that data is transformed between a source connection and the destination connection. Here you can control the flow of the data flow when a Data Flow component executes successfully, and control the flow of the data when the Data Flow component fails. This enables you to create robust data flows. When a data source or transformation is added to the Data Flow Designer, a green arrow appears underneath the Data Flow component. You can click and drag the arrow to connect it another Data Flow component. This will indicate that on successful execution of the first Data Flow component, the data flow can provide input data to the next Data Flow component. When this is done, a red arrow will appear under the original Data Flow component. You can click and drag this to another Data Flow component, typically a data destination. This will indicate an error output failure of the Data Flow component. The data flow can provide a data flow input to the next Data Flow component that it is connected to. In this manner, you can control the workflow of the Data Flow tasks by using the Data Flow paths. The Data Flow paths can be configured by double-clicking on a Data Flow path. Properties can include name and description. You can also view the metadata of the data that is involved in the data flow. Data viewers can also be configured so that you can view the data as it is passing through the data flow.

11

Data Viewers
A data viewer is a useful debugging tool that enables you to view the data as it passes through the data flow between two data flow components. You can apply data viewers to any data flow path so that you can view the state of the data at each stage of the Data Flow task. Data viewers provide four different methods for viewing the data. A data viewer window shows data one buffer at a time. By default, the data flow pipeline limits buffers to about 10,000 rows. If the data flow extracts more than 10,000 rows, it will pass that data through the pipeline in multiple buffers. For example, if the data flow is extracting 25,000 rows, the first two buffers will contain about 10,000 rows, and the third buffer will contain about 5,000 rows. You can advance to the next buffer by clicking the green arrow in the data flow window. Grid The Grid data viewer type returns the data in rows and columns in a table. This is useful if you want to view the impact that a transformation has had on the data. The data viewer allows you to copy the data within the data viewer so that it can be stored in a separate file such as an Excel file. Histogram Working with numeric data only, the Histogram data viewer type allows you to select one column from the data flow. The histogram then displays the distribution of numeric data within the specified column. This is useful if you wish to view the frequency that particular numeric values have within a specific column. You can also copy the results to an external file. Scatter Plot The Scatter Plot data viewer type allows you to select two numeric columns from a data source. This information is then plotted on the X-axis and Y-axis of a chart. With this data viewer, you can see how the numeric data from the two columns are related to each other. This information can be copied to an external file. Column Chart The Column Chart data viewer type allows you to select one column from the data flow. This presents a column chart that shows the number of occurrences of a value within the data. This can provide an indication of the data values that are stored within the data. The result can be copied to an external file.

12

Implementing Data Flow Transformations: Part 1


Introduction
Lesson Introduction Data Flow transformations have the ability to ensure that your BI solution provides one version of the truth when it comes to providing the data to the data warehouse. The transformations can be used to change the format of data, sort and group data and perform custom transformations that will ensure that the data is placed within the data warehouse as standardized format that can then be consumed within Analysis Services as a cube, Reporting Services as reports or a combination of both. Understanding the capabilities of the many transformations that are available will aid you in building a trusted data warehouse. Lesson Objectives After completing this lesson, you will be able to:

Describe transformations. Use data formatting transformations. Use column transformations. Use multiple Data Flow transformations. Use custom transformations. Implement transformations. Use Slowly Changing Dimension transformation.

13

Introduction to Transformations
Transformations are the unique aspect of SQL Server Integration Services within SQL Server that allows you to change the data as the data is being moved from a source connection to a destination connection such as a text file to a table within a database. Transformations can be simple such as performing a straight copy of the data between a source and a destination. It can be complex such as performing fuzzy lookups on the data being moved. However, all can be used to standardize and cleanse the data; an important objective when loading a data warehouse with data.

14

Data Formatting Transformations


Data formatting transformations convert data as it passes through the data flow. By using these transformations, you can change data types, adjust value lengths, convert values to a different case or perform a number of other operations. Sorting and grouping transformations reorganize data as it passes through the data flow. Character Map transformation The Character Map transformation applies string operations to the data. For example, you can convert data from lowercase to uppercase for a State column in a customers table. The transformation can be performed in place or a new output column can be generated from the character map conversion. Mapping Operations with the Character Map Transformation
The following table describes the mapping operations that the Character Map transformation supports. Value Lowercase Uppercase Byte reversal Hiragana Katakana Half width Full width Linguistic casing Simplified Chinese Traditional Chinese Convert to lower case. Convert to upper case. Convert by reversing byte order. Convert Japanese katakana characters to hiragana. Convert Japanese hiragana characters to katakana. Convert full-width characters to half-width. Convert half-width characters to full-width. Apply linguistic rules of casing (Unicode simple case mapping for Turkic and other locales) instead of the system rules. Convert traditional Chinese characters to simplified Chinese. Convert simplified Chinese characters to traditional Chinese. Description

15

Mutually Exclusive Mapping Operations More than one operation can be performed in a transformation. However, some mapping operations are mutually exclusive. The following table lists restrictions that apply when you use multiple operations on the same column. Operations in the columns Operation A and Operation B are mutually exclusive.
Operation A Lowercase Hiragana Half width Traditional Chinese Lowercase Uppercase Uppercase Katakana Full width Simplified Chinese Hiragana, katakana, half-width, full-width Hiragana, katakana, half-width, full-width Operation B

You use the Character Map Transformation Editor dialog box to make the changes by using the following properties:

Available Input Columns. The Available Input Columns enables you to select the columns that the operation will affect. When a column is selected, it appears in the Input Columns list. Destination column. You use the Destination column to determine if the change will generate a new column or the change is an in-place change. Operation column. The Operation column provides a drop-down list to specify the operation that occurs on the data such as Uppercase. Output Alias column. The Output Alias column allows you to name the column name for a new column destination or retains the same column name for transformations that are an in-place change.

Data Conversion transformation The Data Conversion transformation converts data from one data type to another during the data flow and creates a new column with the new data. This can be useful when data is extracted from different data sources and needs standardizing before being loaded into a single destination. Like the Character Map transformation, this may cause some of the data to be truncated; you can use the Configure Error Output option to handle such types of errors. The Data Conversion task can be configured by using the following properties:

Available Input Columns. The Available Input Columns enables you to select the columns that the operation will affect; when a column is selected, it appears in the Input Columns list.

16

Output Alias column. The Output Alias column allows you to define a name for the new column. You can then set the DataType, Length, Precision and Scale for the data to be converted. Furthermore, the Code Page is used to define the code page for any columns that use the DT_STR data type.

Sort transformation The Sort transformation take data from an input and then sorts the data in ascending or descending order when passed to the output. The Sort transformation can perform multiple sorts on different columns within the same transformation and duplicate values can be removed from the Sort operation. Any columns that are not part of the Sort operation are passed through to the transformation output. Within the Sort Transformation Editor dialog box, the Available Input Columns enables you to select the columns that the operation will affect. When a column is selected, it appears in the Input Columns list. The Output alias defines the name of the output column, which is the same name as the input column name. The Sort Type property determines if the Sort operation is ascending or descending and the Sort Order property control which column is sorted first when multiple columns are defined. The lowest number specified is the first column to be sorted. Comparison Flags can be set to ignore case and ignore character width. To remove duplicate values, ensure that the Remove rows with duplicate sort values check box is selected. The Sort transformation does not support Error Output configuration. Aggregate transformation Not only does the Aggregate transformation apply aggregate functions to a set of numeric data to create a new transformation output, it can also use the Transact-SQL Group By clause, which allows you to apply aggregate functions to groups of data. The Aggregate Transformation Editor dialog box contains two tabs that contain properties. On the Aggregations tab, the Available Input Columns enables you to select the columns that the operation will affect. When a column is selected, it appears in the Input Columns list. The Output alias defines the name of the output column. The Operation column determines the aggregate function that is used or the Group By operator can be defined. Comparison flags can be configured to refine the data that is aggregated such as ignore spacing. The Count Distinct Scale property can be used to count the approximate number of distinct values and Count Distinct Keys properties can be used to provide an exact count of the distinct values. Alternatively, by clicking the Advanced button, you can use the Key property to provide an exact count of the distinct values or Key Scales to provide an approximate count of the distinct values. These values can be used to improve performance of the Aggregate transformation. This can be configured in the Advanced tab as well. The Aggregate transformation does not support Error Output.

17

Column Transformations
Column transformations copy and create columns in the data flow. The transformations enable you to import large files, such as images or documents, into the data flow or export the same to a file. Copy Column transformation The Copy Column transformation takes a data flow input and creates a new column as the transformation output. You have the ability to create multiple copies of the same column. The Copy Column Transformation Editor dialog box consists of the Available Input Columns property that enables you to select the columns, which the Copy Column operation will affect. When a column is selected, it appears in the Input Columns list. The Output alias allows you to define the name of the output column. The Copy Column transformation does not support Error Output configuration. Derived Column transformation The Derived Column transformation allows you to create a new column or replace values in an existing column by using expressions to create a new column derived from a combination of variables, functions, operators and columns from the transformation input. You can use this task to concatenate columns, use functions to extrapolate information from existing input columns and perform mathematical calculations. The Derived Column Transformation Editor dialog box contains an expression editor used to create expressions within the Expression property. The Derived Column property allows you to determine if the operation will create a New Column or replace values in an Existing column. This setting affects the Derived Column Name property that allows you to specify the name for the column. You can then set the DataType, Length, Precision and Scale for the data to be derived. Furthermore, the Code Page is used to define the code page for any columns that use the DT_STR data type. The Derived Column transformation may cause some of the data to be truncated; you can use the Configure Error Output to handle such types of errors. Import Column transformation The Import Column transformation reads data from a file and imports it to a column in the data flow. This transformation does the opposite of the Export Column transformation by adding text and images stored in separate files to a data flow. The Import Column Transformation task contains three tabs:

Component Properties tab. The Component Properties tab allows you to define a Name and Description for the task and configure the locale for the task by using the LocaleID property. The ValidateExternalMetadata defines whether the transformation is validated against external data during its design or when it is executed. 18

Input Columns tab. The Input Columns tab consists of the Available Input Columns property that enables you to select the columns that the copy column operation will affect. When a column is selected, it appears in the Input Columns list. The Output alias allows you to define the name of the output column. The Usage Type property defines if the data imported is READONLY data or READWRITE data. Input and Output Properties tab. The Input and Output Properties tab enables you to configure additional properties for the input and output columns.

Export Column transformation The Export Column transformation allows you to export images and documents that are stored within the data flow and export them to a file. Specifically, the data types that can be exported to the file include DT_IMAGE, DT_TEXT and DT_NTEXT. The Export Column Transformation Editor dialog box contains the following properties. The Extract Column property allows you to select the input column to be transferred. The File Path Column must point to a column within the input columns that specifies the file name. Both of these properties are mandatory. You can then use the Allow Append and Force Truncate check boxes to determine if a new file with the images are created or an existing file is used, if present. How the settings for the Append and Truncate options affect results
Append False Truncate False File exists No Results The transformation creates a new file and writes the data to the file. The transformation creates a new file and writes the data to the file. The transformation creates a new file and writes the data to the file. The transformation fails design time validation. It is not valid to set both properties to True. A run-time error occurs. The file exists, but the transformation cannot write to it. The transformation deletes and re-creates the file and writes the data to the file. The transformation opens the file and writes the data at the end of the file. The transformation fails design time validation. It is not valid to set both properties to True.

True

False

No

False

True

No

True

True

No

False

False

Yes

False

True

Yes

True

False

Yes

True

True

Yes

19

The Write Byte-Order Mark property specifies whether to write a byte-order mark (BOM) to the file. A BOM is only written if the data has the DT_NTEXT or DT_WSTR data type and is not appended to an existing data file.

20

Multiple Data Flow Transformations


Multiple Data Flow transformations enable you to take a data input and separate the data based on an expression. For example, in the Conditional Split transformation, if your data flow includes employee information, you can split the data flow according to the cities in which the employees work. Multiple Data Flow transformations also enables you to join data together. For example, you can bring data together from separate data sources by using transformations such as Merge or Union All transformations. Conditional Split transformation The Conditional Split transformation takes a single data flow input and creates multiple data flow outputs based on multiple conditional expressions defined within the transformation. The order of the conditional expression is important. If a record satisfies the first condition, the data is moved based on that condition even if it meets the condition of the second expression. There, the record will no longer be available to be evaluated against the second condition. Expression can be a combination of functions and operators to define a condition. The Conditional Split Transformation Editor dialog box contains an expression editor and a number of properties that can be used to configure the conditional split. The Order property determines the order in which the condition is evaluated. You can then provide Output Name for the data that is outputted by the condition. The Condition property allows you to define an expression that defines the condition. Examples include:
SUBSTRING(FirstName,1,1) == "A" TerritoryID == 1

You can use the Configure Error Output to handle errors. Multicast transformation The Multicast transformation allows you to output multiple copies of the same data flow input to different data flow outputs. This transformation can be useful when you wish to output the same data that will be transformed further down the data flow. For example, one output may then be summarized using an aggregate transformation. The other output used as a basis to provide more detailed information in a separate data flow. The properties of the Multicast Transformation Editor dialog box can only be viewed once the outputs of the transformation have been configured. Within the Editor, you are presented with an Outputs pane on the left, which shows you the outputs the Multicast transform is generating. By selecting an output, the Properties pane shows read-only information such as Identification String and ID property. The only properties that you can change are the Name and Description properties. The Multicast transformation does not support Error Output configuration.

21

Merge transformation The Merge transformation takes multiple inputs into the transformation and merges the data together from the separate inputs. A prerequisite to the merge input working successfully is that the input columns are sorted. Furthermore, the columns that are sorted must also be of compatible data types. For example, you cannot merge the input that has a character data type with a second input that has a numeric data type. The Merge Transformation Editor dialog box consists of a number of columns dependent on how many inputs are connected to the Merge transformation. For example, if three inputs are defined, then four columns will appear; if two inputs are defined, then three columns appear and so on. The first column is the Output column that allows you to define a name for the output data flow. The second column is called Merge Input 1. In this column, you map the input column to the output column. The third column is called Merge Input 2; again, you map the input column to the output column. If more input columns are defined, the number of Merge Input columns increase. The Merge transformation does not support Error Output configuration. Merge Join transformation The Merge Join transformation is similar to the Merge transformation. However, you can make use of the following Transact-SQL clauses to determine how the data is merged. The Transact-SQL clauses include FULL, LEFT or INNER join. Like the Merge transformation, the input columns must be sorted and the columns that are joined must have compatible data types. You must also specify the type of join the Merge transformation will use and how it will handle nulls in the data. The Merge Join Transformation Editor dialog box has at the top a Join Type drop-down list that allows you to specify the type of join that will be used in the transformation. The Input property enables you to select the columns that the Merge Join transformation operation will affect. When a column is selected, it appears in the Input Columns list and the Input column determine from which data flow input the data is from. The Output alias allows you to define the name of the data flow output. Union All transformation The Union All transformation is very similar to the Merge transformation. The key difference is that the Union All transformation does not require the input columns to be sorted. However, the columns that are mapped must still have compatible data types. The Union All Transformation Editor dialog box consists of a number of columns that are dependent on how many inputs are connected to the Union All transformation. For example, if three inputs are defined, then four columns will appear; if two inputs are defined, then three columns appear and so on. The first column is the Output column that allows you to define a name for the output data flow. The second column is called Union All Input 1. In this column, you map the input column to the output column. The third column is called Union All Input 2; again, you map the input column to the output column. If more input columns are defined the number of Union All Input columns increase. The Union All transformation does not support Error Output configuration. 22

Custom Transformations
Many of the transformations that are provided within SSIS will meet many of your business requirements when performing ETL operations. There may be situations when the transformations provided may not provide a solution. You can use the Script transformation to create custom transformations by using .NET. The OLE DB Command transformation allows you to apply Transact-SQL statements to data within a Data Flow path. Script Component transformation The Script Component transformation enables you to add custom data sources, transformations and destinations by using .NET code, which can be programmed in Visual Basic (VB) 2008 or Visual C# 2008. It is similar to the Script task within the control flow of an SSIS package but is used within the Data Flow task. In order to use the Script task, the local machine on which the package runs must have Microsoft Visual Studio Tools for Applications installed. This provides a rich environment for building the custom scripts including IntelliSense and its own Object Explorer. You can access Microsoft Visual Studio Tools for Applications from within the Script Component on the Script page by clicking the Edit Script button. It is also where you can define the Scripting Language. The Script page also allows you to specify a Name and Description for the OLE DB Command task. You can also specify a locale with the LocaleID property and whether the data flow is validated at run time or design time by using the ValidateExternalMetadata property. You can also specify ReadOnlyVariables and ReadWriteVariables that are available to the Script Component. When the Script Component is added to the data flow, you are first prompted to select the Script Component Type. This will determine if the Script Component is used as a Source, a Transformation or a Destination and will affect the Script Component Editor. The following properties can be configured:

Input Columns tab. The Input Columns tab consists of the Input Name to determine the data flow input to use. The Available Input Columns property that enables you to select the columns which the Script Component operation will affect, when a column is selected it appears in the Input Columns list. The Output alias allows you to define the name of the output column. The Usage Type property defines if the data imported is READONLY data or READWRITE data. Input and Output Properties tab. The Input and Output Properties tab allows you to set the properties of the input and the output columns. Connections Manager tab. The Connections Manager tab allows you to define connection information that is used by the Script Component. This will include a Name and Description property for the connection. The Connections Manager property allows you to select a predefined connection manager or Add or Remove connection managers.

Note that the Script Component does not support error outputs.

23

OLE DB Command transformation The OLE DB Command transformation enables you to apply SQL statements to each row within the data flow. The SQL statement can include data manipulation statements such as INSERT, UPDATE and DELETE. The SQL statement can accept parameters that are represented as ? (question marks) within the SQL statement. Each question mark will be called param_0, param_1 and so on. You can use the OLE DB Command transformation to make changes to the data as it passes through the data flow. For example, a change in the tax rate for selling products can be updated by using the OLE DB Command transformation as the data runs through the data flow. The changed data becomes the output of the OLE DB Command transformation. The Advanced Editor for OLE DB Command dialog box contains four tabs that allow you to configure the transformation:

Connections Manager tab. The Connections Manager tab allows you to define connection information that is used within the data flow. This includes a Name and Description property for the connection. The Connections Manager property allows you to select a predefined connection manager. Component Properties tab. The Component Properties tab allows you to specify a Name and Description for the OLE DB Command task. You can also specify a locale with the LocaleID property and whether the data flow is validated at run time or design time by using the ValidateExternalMetadata property. In this same area, the SQLCommand property is where the SQL statement is defined. You can use property expression to define the content of the SQLCommand property as well. The CommandTimeout defines the number of seconds the command has to run and the DefaultCodePage property sets the code page for the SQL statement. Column Mappings tab. The Column Mappings tab allows you to map the columns from the data flow input to the parameters that are defined in the SQLCommand property. This is done by mapping the Available Input Columns to the Destination Columns. Input and Output Properties tab. The Input and Output Properties tab allows you to set the properties of the input and the output columns.

24

Slowly Changing Dimension Transformation


The Slowly Changing Dimension transformation performs a very important role when loading and updating data within a dimension table within a data warehouse. Through the Slowly Changing Dimension transformation, you can manage changes to the data. Some of the data within a dimension data may remain static. As such, you can define this data as a fixed attribute. Any changes that occur to this data will be treated as an error. The Slowly Changing Dimension transformation supports two types of Slowly Changing Dimension. Type 1 Slowly Changing Dimension is an overwrite of the original data. This is referred to as a changing attribute within the wizard. Here, no historical content is retained and this is useful to overwrite invalid data values. Type 2 Slowly Changing Dimension is referred to as a historical changing attribute. Here, changing data will generate a new row of data. The business key will be used to identify that the records are related. The use of a start and end date is also used to indicate which record is the current record. The Type 3 Slowly Changing Dimension will make use of an additional attribute within the record to identify a records original value and an attribute for the most recent value. This is not supported directly by the Slowly Changing Dimension Wizard. To overcome this, you can use a Slowly Changing Dimension to identify a Type 3 column as fixed. On the output of these columns, you can then perform inserts and updates on the column to perform Type 3 updates. The Slowly Changing Dimension transformation task makes the process of managing dimension data within a data warehouse straightforward.

25

Implementing Data Flow Transformations: Part 2


Introduction
Lesson Introduction Data Flow transformations can go beyond changing data by providing transformations that can perform data analysis, sampling and auditing. Lesson Objectives After completing this lesson, you will be able to:

Use Lookup and Cache transformation. Use data analysis transformations. Use data sampling transformations. Use monitoring transformations. Use fuzzy transformations. Use term transformations.

26

Creating a Lookup and Cache Transformation


The Lookup transformation enables you to take information from an input column and then look up additional information from another dataset that is linked to the input columns through a common column. The dataset can be a table, view, SQL query or a cache file. The Cache transformation has been introduced in SQL Server 2008. The Cache transformation can be used to improve the performance of a Lookup transformation by connecting to a data source and population a cache file on the server on which the package runs. This means that the Lookup transformation performs its lookup against the cache file rather than to a remote dataset. The Cache transformation requires a connection manager to point to the .cache file and contains a Mappings tab where you can map the input columns to the cache file. Note that one of the columns must be marked as an index column.

27

Data Analysis Transformations


SSIS provides a range of data transformations that enables you to analyze data, as shown in the table below. Pivot transformation The Pivot transformation takes data from a normalized result set and presents the data in a cross tabulated or denormalized structure. For example, a normalized Orders data set that lists customer name, product and quantity purchased typically has multiple rows for any customer who purchased multiple products, with each row for that customer showing order details for a different product. By pivoting the data set on the product column, the Pivot transformation can output a data set with a single row per customer. That single row lists all the purchases by the customer, with the product names shown as column names, and the quantity shown as a value in the product column. Because not every customer purchases every product, many columns may contain null values. The Advanced Editor for Pivot dialog box contains three tabs to configure the properties:

Component Properties tab. The Component Properties tab allows you to specify a Name and Description for the OLE DB Command Task. You can also specify a locale with the LocaleID property and whether the data flow is validated at run time or design time by using the ValidateExternalMetadata property. Input Columns tab. The Input Columns tab consists of the Available Input Columns property that enables you to select the columns that the Pivot transformation operation will affect. When a column is selected, it appears in the Input Columns list. The Output alias allows you to define the name of the output column. The Usage Type property defines if the data imported is READONLY data or READWRITE data. Input and Output Properties tab. The Input and Output Properties tab allows you to set the properties of the input and the output columns. The most important property here is the PivotUsage property. This determines what role the input column will play in creating the pivot table and can be configured with the following values: o 0. The column is not pivoted, and the values are passed through to the transformation output. o 1. The column is part of the set key that identifies one or more rows as part of one set. o 2. The column is a pivot column. At least one column is created from each column value. This data must be sorted input column. o 3. The values from this column are placed in columns that are created because of the pivot.

Unpivot transformation The Unpivot transformation takes data from a denormalized or cross-tabulated result set and presents the data in a normalized structure. The Unpivot transformation can be configured with the following properties.

28

At the bottom of the Unpivot Transformation Editor dialog box is the Pivot key value column name. Here, you define a column heading for the column that will hold the pivoted data that is converted into normalized data such as Products or Fruits. The Available Input Columns property enables you to select the input columns that the Unpivot transformation operation turns into rows. When a column is selected, it appears in the Input Columns list. Any columns that are not selected are passed through to the data flow output. The Destination Column allows you to define the name of the destination column in the normalized output. In the Unpivot scenario, multiple input columns are usually mapped to one destination column. For example, the Available Input Columns may consist of column headings such as Apples, Pears and Peaches. All of these input columns are mapped to a destination column named Fruits that may be defined by the Pivot key value column name property. The Pivot Key value property specifies the value that is used in the rows in the normalized result set and, by default, uses the same name as the input column but can be changed. Data Mining Query transformation The Data Mining Query transformation enables you to run Data Mining Expression (DMX) statements that use prediction statements against a mining model. Prediction queries enable you to use data mining to make predictions about sales or inventory figures as an example. You can then create a data flow output of the results. One transformation can execute multiple prediction queries if the models are built on the same data mining structure.

Mining Model tab. The Mining Model tab is used to provide an existing Connection to the Analysis Services database. You can specify a new connection by clicking the New button. The Mining Structure allows you to specify the Data Mining Structure that is to be used as a basis for analysis. A list of mining models is then presented. Query tab. The Query tab allows you to write the DMX prediction query. A Build New Query button is provided to build the DMX prediction query through a builder.

29

Data Sampling Transformations


Data sampling transformations are useful when you want to extract sample data from the data flow or you want to count the number of rows in the data flow. This can be useful in a number of different scenarios. Ultimately, the objective is to create a small data output that can be used for testing or development within the SSIS package. Percentage Sampling transformation The Percent Sampling transformation allows you to select a percentage of random rows from a data flow input. This can be useful to generate a smaller set of data that is representative of the whole data that can be used for development purposes. For example, in data mining, you can randomly divide a data set into two data sets; one for training the data-mining model, and one for testing the model. A random number determines the randomness. If you use the Random Seed property, you can specify a number that the transformation will use. If you use the same number, it will always return the same result set if the sampling is based on the same source data. The Row Sampling transformation contains one screen that holds the properties to be configured. You can specify the percentage number of rows to take from the data flow input by using the Percentage of Rows property. You can also provide a name for the data flow outputs generated for both the Sample Output Name and the Unselected Output Name. You can define your own random seed by specifying a value in the Specify random seed value property. Row Sampling transformation The Row Sampling transformation allows you to select an exact number of random rows from a data flow input. This can be useful to generate a smaller set of data that is representative of the whole data that can be used for development purposes. For example, a company can randomly select 50 employees to receive Christmas prizes for a calendar year against employee database to generate the exact number of winners. A random number determines the randomness. If you use the Random Seed property, you can specify a number that the transformation uses. If you use the same number, it always returns the same result set if the sampling is based on the same source data. The Row Sampling transformation contains two pages that hold properties to be configured:

Sampling page. The Sampling page allows you to specify the exact number of rows to take from the data flow input by using the Number of Rows property. You can also provide a name for the data flow outputs generated for both the Sample Output Name and the Unselected Output Name. You can define your own random seed by specifying a value in the Specify random seed value property. Columns page. The Columns page consists of the Available Input Columns property that enables you to select the columns that the Row Sampling transformation operation affects. When a column is selected, it appears in the Input Columns list. The Output alias allows you to define the name of the output column. 30

Row Count transformation A Row Count transformation counts the rows that pass through the data flow and stores the result of the count in a variable. This variable can then be used elsewhere in the SSIS package. The following properties can be configured:

Component Properties tab. The Component Properties tab allows you to specify a Name and Description for the OLE DB Command task. You can also specify a locale with the LocaleID property and whether the data flow is validated at run time or design time by using the ValidateExternalMetadata property. The most important property here is the Variable property. You use this to map the result of the Row Count transformation to a user-defined variable. Input Columns tab. The Input Columns tab consists of the Available Input Columns property that enables you to select the columns that the Row Count operation affects. When a column is selected, it appears in the Input Columns list. The Output alias allows you to define the name of the output column. The Usage Type property defines if the data imported is READONLY data or READWRITE data. Input and Output Properties tab. The Input and Output Properties tab allows you to set the properties of the input and the output columns.

31

Audit Transformations
The Audit transformation allows you to create additional output columns within the data flow that holds metadata about the SSIS package. This information can be used to provide metadata information that maps to system variables, which exist within the SSIS package. The following information is available within the Audit transformation and appears in a drop-down list in the AuditType property:

ExecutionInstanceGUID PackageID PackageName VersionID ExecutionStartTime MachineName UserName TaskName TaskId

The only other property to configure in the Audit transformation is the Output Column Name that allows you to define a name for the columns that are used in the data flow output.

32

Fuzzy Transformations
Fuzzy transformations can be very useful for improving the data quality of existing data as well as new data that is being loaded into your database. Fuzzy Lookup The Fuzzy Lookup transformation performs data cleansing tasks such as standardizing data, correcting data and providing missing values. Using the fuzziness capability that is available to the Fuzzy Grouping transformation, this logic can be applied to Lookup operations so that it can return data from a dataset that may closely match the Lookup value required. This is what separates the Fuzzy Lookup transformation from the Lookup transformation, which requires an exact match. Note that the connection to SQL Server must resolve to a user who has permission to create tables in the database. The Fuzzy Lookup Transformation Editor dialog box consists of three tabs to configure:

Reference Table tab. The Reference Table tab allows you to define connection information that is used within the data flow. This includes an OLE DB Connection Manager property for the connection. The Reference table property allows you to select the reference table. You can also choose whether to create new or use existing indexes with the Store New Index or Use Existing Index Property. Columns tab. The Columns tab consists of the Available Input Columns and Available Lookup Columns property that enables you to select the columns that the Fuzzy Lookup transformation operation affects. When a column is selected in the Available Lookup Columns, it appears in the Lookup Columns list. The Output alias allows you to define the name of the output column. Advanced tab. The Advanced tab sets the Similarity threshold property, which is a slider. The closer the threshold is to one, the more the rows must resemble each other to qualify as duplicates. You can also tokenize data using the Token delimiters property.

Fuzzy Grouping The Fuzzy Group transformation allows you to standardize and cleanse data by selecting likely duplicate data and comparing it to an alias row of data that is used to standardize the input data. As a result, a connection is required to SQL Server, as the Fuzzy Group transformation requires a temporary table to perform its work. The Fuzzy Group transformation allows you to perform an exact match or a fuzzy match. An exact match means that the data must exactly match for it to be part of the same group. A fuzzy match groups data together that is approximately the same. You can determine the fuzziness by configuring numerous properties to determine how dissimilar data values can be. The Fuzzy Group Transformation Editor dialog box consists of three tabs:

Connection Managers tab. The Connection Managers tab allows the Fuzzy Group transformation to create the temporary table required to perform the Fuzzy Group transformation. You use the

33

OLE DB Connection Manager property to point to an existing OLE DB connection or click on New to create a new OLE DB connection. Columns tab. The Columns tab consists of the Available Input Columns property that enables you to select the columns that the Fuzzy Grouping transformation operation affects. When a column is selected, it appears in the Input Columns list. The Output alias allows you to define the name of the output column. The Group Output Alias allows you to define a group name for the data that is grouped together. The Match Type property defines the type of fuzzy operation that is conducted, which can be exact or fuzzy. You can determine the fuzziness by using the Minimum Similarity property, a value close to one means that the data is nearly similar and you use the Similarity Output Alias that generates a new output column that contains the similarity scores for the selected join. You can specify how leading and trailing values are evaluated by using the Numerals property and Comparison Flags can be used to ignore spaces or character widths. Advanced tab. The Advanced tab sets the Input key column name for the output column that contains the unique identifier for each input row; Output key column name for the output column that contains the unique identifier for the alias row of a group of duplicate rows; and Similarity score column name for the name for the column that contains the similarity score. The Similarity threshold property is a slider. The closer the threshold is to one; the more the rows must resemble each other to qualify as duplicates. You can also tokenize data by using the Token delimiters property.

34

Term Transformations
You have the ability to extract nouns only, noun phrases only or both nouns and noun phases from descriptive columns with the Term Extraction and Term Lookup transformation. Term Extraction transformation The Term Extraction transformation allows data flow inputs to be compared to a built-in dictionary to extract nouns only, noun phrases only or both nouns and noun phases. However, a noun phrase can include two words, one word, a noun and the other an adjective. It can also stem nouns to extract the singular noun from a plural noun, so cars become car. This extraction formulates the basis of the data flow output. This capability is only available with the English language. The Term Extraction Transformation Editor dialog box contains three tabs to configure:

Term Extraction tab. The Term Extraction tab specifies a text column that contains text to be extracted. The Available Input Columns property enables you to select the columns that the Term Extraction transformation operation affects. You can define an output column name for the Term that is extracted by using the Term property. The Score property allows you to define a column name for the score that is assigned to the extracted term column. Exclusion tab. The Exclusion tab allows you to point to a table that consists of a list of terms that are excluded from the term extraction. This includes an OLE DB Connection Manager property for the connection. The Table or View and Column property allows you to select the column within the table that holds the exclusion terms. Advanced tab. The Advanced tab allows you to set the term extraction type by using the Term Type property set to nouns only, noun phrases only or both nouns and noun phases. The Score type property sets the basis for scoring the terms by using frequency or Term Frequency Inverse Document Frequency (TFIDF: a term scoring algorithm). You can specify case-sensitive extractions and set Parameters for the Frequency Threshold, which specifies the frequency a word must appear before it is extracted and the Maximum length of Term, which defines the maximum number of characters in a word to perform the term extraction on.

Term Lookup transformation The Term Lookup transformation can perform an extraction of terms from a reference table rather than the built-in dictionary. It counts the number of times a term in the Lookup table occurs in the input data set, and writes the count together with the term from the reference table to columns in the transformation output. Reference Table tab. The Reference Table tab allows you to define connection information that is used within the data flow. This includes an OLE DB Connection Manager property for the connection. The Reference table property allows you to select the reference table. Term Lookup tab. The Term Lookup tab consists of the Available Input Columns and Available Reference Columns property that enables you to select the columns that the Term Lookup transformation operation affects. When a column is selected in the Available Input Columns, it appears in the Pass through Columns list. The Output Column alias allows you to define the name of the output data flow column. 35

Advanced tab. The Advanced tab has the Use case-sensitive term lookup to add case sensitivity to the Term Lookup transformation.

36

Best Practices

Use the correct data sources from the Data Flow Sources section in the Business Intelligence Development Studio Toolbox that will extract data. Use the correct data destinations from the Data Flow Destinations section in the Business Intelligence Development Studio Toolbox that will load the data. Use OLE DB data sources to connect to SQL Server tables, the Access database and Excel 2007 spreadsheet. Use the ADO.NET data source to connect to ODBC data sources and destinations. Identify the transformation required to meet the data load requirements. Use in-built transformations when possible. Use the Script component Data Flow transformation to create custom data source, data destinations or transformations. Use Data Flow paths to control transformations within the Data Flow transformations. Use the Slowly Changing Dimension transformation to manage changing data in dimension tables in a data warehouse. Use the Lookup transformation to load a fact table in a data warehouse with the correct data. Use the Cache transformation in conjunction with the Lookup transformation to improve the performance of loading fact tables.

37

Lab: Implementing Data Flow in SQL Server Integration Services 2008


Lab Overview
Lab Introduction The purpose of this lab is to focus on using data flows within an SSIS package to populate a simple data warehouse. You will firstly edit an existing package to add data sources and destinations and use common transformation to complete the loading of the StageProduct table. You will also implement a data viewer in this package and run the package to ensure that data is being loaded correctly into the ProductStage table. You will then create the dimension tables in the data warehouse focusing specifically on the Slowly Changing Dimension task to manage changing data in the dimension tables. You will finally explore how to populate the fact table within the data warehouse by using the Lookup transformation to ensure that the correct data is being loaded into the fact table. Lab Objectives After completing this lab, you will be able to:

Define data sources and destinations. Work with data flow paths. Implement data flow transformations.

38

Scenario
You are a database professional for Adventure Works, a manufacturing company that sells bicycle and bicycle components through the Internet and a reseller distribution network. You are continuing to work on using SSIS to populate a simple data warehouse for testing purposes in a database named AdventureWorksDWDev. You want to complete the AWStaging package by configuring the Data Flow task that will load data into the ProductStage table. You will implement simple transformations that you think you will use in the production data warehouse. To verify that the transformations are working, you will add data viewers to the data path to view the data before and after the transformation has occurred. You will then edit the package named AWDataWarehouse. You will firstly edit a Data Flow task to explore common transformations that are used within the data flow. However, you want to explore the use of the Slowly Changing Dimension task to manage data changes when transferring data from the ProductStage to the ProductDim table. Finally, you will edit the LoadFact Data Flow task that will populate the FactSales table, which will use a Lookup transformation to ensure that the correct data is loaded into the fact table.

39

Exercise Information
Exercise 1: Defining Data Sources and Destinations In this exercise, you will complete the configuration of the AWStaging package by configuring the Data Flow task that will populate the ProductStage table. You will define the data source as the AdventureWorks2008 database. You will then use transformations to ensure that the data that is cleanly loaded into the ProductStage table. You will then define the data destination as the ProductDim table in the AdventureworksDWDev database. Exercise 2: Working with Data Flow Paths In this exercise, you will add an error Data Flow path from the AdventureWorksDWDev StageProduct Data Flow task to a text file named StageProductLoadErrors.txt located in D:\Labfiles\Starter folder. You will add a data viewer before and after the Category Uppercase Character Map transformation. You will then run the package and review the data viewer before and after the Category Uppercase Character Map transformation runs to view the differences in the data. After completing the review, you will remove the data viewers. Exercise 3: Implementing Data Flow Transformations In this exercise, you will edit the package AWDataWarehouse. You will firstly edit the Generate Resellers Data Data Flow task to explore common transformations that are used within the data flow. However, you want to explore the use of the Slowly Changing Dimension task to manage changes of data when transferring data from the ProductStage to the ProductDim table that is defined within the Generate Product Data Data Flow task. Finally, you will edit the Generate FactSales Data Data Flow task that will populate the FactSales table that will use a Lookup transformation to ensure that the correct data is loaded into the fact table.

40

Lab Instructions: Implementing Data Flow in SQL Server Integration Services 2008
Exercise 1: Defining Data Sources and Destinations
Exercise Overview In this exercise, you will complete the configuration of the AWStaging package by configuring the data flow task that will populate the ProductStage table. You will define the data source as the AdventureWorks2008 database. You will then use transformations to ensure that the data that is loaded into the ProductStage table is done so cleanly. You will then define the data destination as the ProductDim table in the AdventureworksDWDev database. Task 1: You are logged on to MIAMI with the username Student and password Pa$$w0rd. Proceed to the next task Log on to the MIAMI server. a. b. c. To log on to the MIAMI server, press CTRL+ALT+DELETE. On the Login screen, click the Student icon. In the Password box, type Pa$$w0rd and then click the Forward button.

Task 2: Open Business Intelligence Development Studio and open the solution file AW_BI solution located in D:\Labfiles\Starter\AW_BI folder 1. 2. Open the Microsoft Business Intelligence Development Studio. Open the AW_BI solution file in D:\Labfiles\Starter\AW_BI folder.

Task 3: Open the AWStaging package in the AW_SSIS project in the AW_BI solution Open the AWStaging package in Business Intelligence Development Studio.

Task 4: Edit the Load Products Data Flow task and add an OLE DB Source to the data flow designer that is configure to retrieve data from the Production.Product table in the AdventureWorks2008 database 1. 2. 3. Open the Load Products Data Flow Designer in the AWStaging package in Business Intelligence Development Studio. Add an OLE DB Source data flow source from the Toolbox onto the Data Flow Designer. Name the OLE DB Source data flow source AdventureWorks2008 Products. Edit the AdventureWorks2008 Products OLE DB data source by retrieving the ProductID, Name, SubCategory name, Category name, ListPrice, Color, Size, Weight, DaystoManufacture, SellStartDate and SellEndDate form the Production.Product, Production.ProductSubcategory and Production.ProductCategory table in the AdventureWorks2008 database. Add a WHERE clause that will return all products greater than the date stored in the ProductLastExtract variable. Save the AW_BI solution.

4.

Task 5: Add a Character Map transformation to the Load Products Data Flow Designer that is configured to transform the data in the Category column to uppercase. Name the transformation Category Uppercase and set the Data Flow path from the AdventureWorks2008 Products Data Flow task to the Category Uppercase transformation 1. 2. Add a Character Map transformation from the Toolbox onto the Data Flow Designer. Name the Character Map transformation Category Uppercase. Set the Data Flow path from the AdventureWorks2008 Products Data Flow task to the Category Uppercase transformation.

41

Task 6: Edit the Category Uppercase Character Map transformation to change the character set of the Category column to uppercase 1. 2. Edit the Category Uppercase Character Map transformation to change the character set of the Category column to uppercase. Save the AW_BI solution.

Task 7: Edit the Load Products Data Flow task and add an OLE DB Destination to the Data Flow Designer named AdventureWorksDWDev StageProduct. Then set the Data Flow path from the Category Uppercase transformation to the AdventureWorksDWDev StageProduct OLE DB Destination 1. 2. Add an OLE DB Source data flow destination from the Toolbox onto the Data Flow Designer. Name the OLE DB Source data flow source AdventureWorksDWDev StageProduct. Set the Data Flow path from the Category Uppercase transformation to the AdventureWorksDWDev StageProduct OLE DB Destination.

Task 8: Edit the AdventureWorksDWDev StageProduct OLE DB Destination to load the data into the StageProduct table and remove the Check constraints option 1. 2. 3. Edit the AdventureWorksDWDev StageProduct OLE DB Destination to load the data into the StageProduct table in the AdventureWorksDWDev database. Edit the AdventureWorksDWDev StageProduct OLE DB Destination by performing column mapping between the source and destination data. Save and close the AW_BI solution.

Task 9: You have completed all tasks in this exercise A successful completion of this exercise results in the following outcomes: a. b. c. d. You have created an OLE DB Source data flow source. You have created a Transact-SQL statement to query the source data. You have created a simple character map transformation. You have created an OLE DB Destination data flow destination.

42

Exercise 2: Working with Data Flow Paths


Exercise Overview In this exercise, you will add an error Data Flow path from the AdventureWorksDWDev StageProduct Data Flow task to a text file named StageProductLoadErrors.txt located in D:\Labfiles\Starter folder. You will then add a data viewer before and after the Category Uppercase Character Map transformation. You will then run the package and then review the data viewer prior to the Category Uppercase Character Map transformation running and after the Category Uppercase Character Map transformation to view the differences in the data. Once completing the review, you will then remove the data viewers. Task 1: Open Business Intelligence Development Studio and open the solution file AW_BI solution located in D:\Labfiles\Starter\AW_BI folder 1. 2. Open the Microsoft Business Intelligence Development Studio. Open the AW_BI solution file in D:\Labfiles\Starter\AW_BI folder.

Task 2: Open the AWStaging package in the AW_SSIS project in the AW_BI solution Open the AWStaging package in Business Intelligence Development Studio.

Task 3: Edit the Load Products Data Flow task and add a Flat File Destination to the Data Flow Designer that is configure to a text file located in the D:\Labfiles\Starter folder named StageProductLoadErrors.txt 1. 2. Open the Load Products Data Flow Designer in the AWStaging package in Business Intelligence Development Studio. Add a Flat File Destination data flow destination from the Toolbox onto the Data Flow Designer. Name the Flat File Destination data flow destination StageProduct Load Errors.

Task 4: Create an Error Data Flow path from the AdventureWorksDWDev StageProduct OLE DB Destination StageProduct Load Errors Flat File Destination Set the Data Flow path from the AdventureWorksDWDev StageProduct OLE DB Destination to the StageProduct Load Errors Flat File Destination.

Task 5: Edit the StageProduct Load Errors Flat File Destination creating a connection to the StageProductLoadErrors.txt located in D:\Labfiles\Starter folder. Name the connection StageProduct Errors 1. Configure the StageProduct Load Errors Flat File Destination to create a text file named StageProductLoadErrors.txt located in D:\Labfiles\Starter. Name the connection StageProduct Errors. Review the column mappings between the AdventureWorksDWDev StageProduct OLE DB Destination and the StageProduct Load Errors Flat File Destination.

2.

Task 6: Edit the AdventureWorksDWDev StageProduct OLE DB Destination to redirect rows when an error is encountered Configure AdventureWorksDWDev StageProduct OLE DB Destination to redirect rows when errors are encountered in the data flow.

Task 7: Add a Grid Data Viewer in the Data Flow path between the AdventureWorks2008 Products OLE DB Source and the Category Uppercase Character Map transformation Add a Grid Data Viewer in the Data Flow path between the AdventureWorks2008 Products OLE DB Source and the Category Uppercase Character Map transformation.

43

Task 8: Add a Grid Data Viewer in the Data Flow path between the Category Uppercase Character Map transformation and the AdventureworksDWDev StageProduct OLE DB Destination. Then save the AW_BI solution 1. 2. Add a Grid Data Viewer in the Data Flow path between the Category Uppercase Character Map transformation and the AdventureworksDWDev StageProduct OLE DB Destination. Save the AW_BI solution.

Task 9: Execute the Load Products Data Flow task to view the data viewers to confirm that the transform has worked correctly. Observe the data load into the StageProduct table of the AdventureWorksDWDev database and for any records that have failed verify that the data has loaded into the StageProductLoadErrors.txt file located in the D:\Labfiles\Starter folder 1. 2. 3. 4. Execute the Load Products Data Flow task and view the data viewers that execute. View the AdventureWorksDWDev StageProduct OLE DB Destination and confirm that 295 rows are inserted into the StageProduct table. View the data in the StageProduct table in the AdventureWorksDWDev database by using SQL Server Management Studio. Confirm that the StageProductLoadErrors.txt file located in D:\Labfiles\Starter folder contains 50 records.

Task 10: Clean out the data from the StageProduct table and the StageProductLoadErrors.txt file. Remove the Data viewers and correct the error that is Occurring with the LoadProducts Data Flow task 1. 2. In Notepad, delete the data within the StageProductLoadErrors.txt text file. Remove the data from the StageProduct table in the AdventureWorksDWDev database. In the Query Window, type in the following code. USE AdventureWorksDWDev GO DELETE FROM StageProduct GO SELECT * FROM StageProduct

3. 4.

Stop debugging in Business Intelligence Development Studio and remove the data viewers from the Load Products Data Flow task. Edit the AdventureWorks2008 Products OLE DB data source by changing the query to replace Null values returned in the Color column with the value of None.

Task 11: Clean out the data from the StageProduct table and the StageProductLoadErrors.txt file. Remove the Data viewers and correct the error that is Occurring with the LoadProducts Data Flow task 1. 2. 3. Execute the Load Products Data Flow task. Confirm that the StageProductLoadErrors.txt file located in D:\Labfiles\Starter folder contains 0 records. View the data in the StageProduct table in the AdventureWorksDWDev database by using SQL Server Management Studio.

44

4.

Remove the data from the StageProduct table in the AdventureWorksDWDev database. In the Query Window, type in the following code. USE AdventureWorksDWDev GO DELETE FROM StageProduct GO SELECT * FROM StageProduct

Task 12: Save and close the AW_BI solution in Business Intelligence Development Studio Save and close the AW_BI solution.

Task 13: You have completed all tasks in this exercise A successful completion of this exercise results in the following outcomes: a. b. c. d. You have created and configured an error data path. You have added data viewers to the Data Flow path. You have observed the effects of Data Flow paths. You have corrected errors in a data flow and observed the successful completion of a Data Flow path.

45

Exercise 3: Implementing Data Flow Transformations


Exercise Overview In this exercise, you will edit the package AWDataWarehouse. You will firstly edit the Generate Resellers Data Data Flow task to explore common transformations that are used within the data flow. However, you want to explore the use of the Slowly Changing Dimension task to manage changes of data when transferring data from the ProductStage to the ProductDim table that is defined within the Generate Product Data Data Flow task. Finally, you will edit the Generate FactSales Data Data Flow task that will populate the FactSales table that will use a lookup transformation to ensure that the correct data is loaded into the fact table. Task 1: Open Business Intelligence Development Studio and open the solution file AW_BI solution located in D:\Labfiles\Starter\AW_BI folder 1. 2. Open the Microsoft Business Intelligence Development Studio. Open the AW_BI solution file in D:\Labfiles\Starter\AW_BI folder.

Task 2: Open the AWDataWarehouse package in the AW_SSIS project in the AW_BI solution Open the AWStaging package in Business Intelligence Development Studio.

Task 3: Edit the Generate Resellers Data Data Flow task in the AWDataWarehouse package and add a OLE DB Source to the Data Flow Designer that is configure to retrieve data from the dbo.StageReseller table in the AdventureWorksDWDev database 1. 2. 3. 4. Open the Generate Resellers Data Flow task in the AWDataWarehouse package in Business Intelligence Development Studio. Add an OLE DB Source data flow source from the Toolbox onto the Data Flow Designer. Name the OLE DB Source data flow source AdventureWorksDWDev StageResellers. Edit the AdventureWorksDWDev StageResellers OLE DB data source by retrieving all columns from the StageReseller table in the AdventureWorksDWDev database. Save the AW_BI solution.

Task 4: Add a Conditional Split transformation that will keep all of the Resellers with an AddressType of Main Office within the dimension table data load and output other address types to a text file named NonMainOffice.txt in the D:\Labfiles\Starter folder. Name the Conditional Split transformation MainOffice 1. 2. 3. 4. Add a Conditional Split transformation from the Toolbox onto the Data Flow Designer. Name the Conditional Split transformation Main Office data. Configure the MainOffice Conditional Split transformation to identify records that have an AddressType of Main Office and those records that do not. Create the Flat File Destination and name the Flat File Destination NonMainOffices. Set the Data Flow path from the MainOffice Conditional Split transformation to the NonMainOffices Flat File Destination.

Task 5: Add a Sort transformation named CountryRegionSort below the MainOffice Conditional Split transformation and drag a Data Flow path from the MainOffice Conditional Split transformation to the CountryRegionSort Sort transformation 1. 2. 3. Add a Sort transformation from the Toolbox onto the Data Flow Designer. Name the Sort transformation Main Office data. Set the Data Flow path from the MainOffice Conditional Split transformation to the NonMainOffices Flat File Destination. Configure the CountryRegionSort Sort transformation to sort by CountryRegionName.

46

Task 6: Edit the Generate Reseller Data Data Flow task and add an OLE DB Destination to the Data Flow Designer named AdventureWorksDWDev DimReseller. Then set the Data Flow path from the CountryRegionSort transformation to the AdventureWorksDWDev DimReseller OLE DB Destination 1. 2. Add an OLE DB Source data flow destination from the Toolbox onto the Data Flow Designer. Name the OLE DB Source data flow source AdventureWorksDWDev DimReseller. Set the Data Flow path from the Category Uppercase transformation to the AdventureWorksDWDev StageProduct OLE DB Destination.

Task 7: Edit the AdventureWorksDWDev DimReseller OLE DB Destination to load the data into the DimReseller table and remove the Check constraints option 1. 2. 3. Edit the AdventureWorksDWDev StageProduct OLE DB Destination to load the data into the StageProduct table in the AdventureWorksDWDev database. Edit the AdventureWorksDWDev DimReseller OLE DB Destination by performing column mapping between the source and destination data. Save the AW_BI solution.

Task 8: Edit the Generate Product Data Data Flow task in the AWDataWarehouse package and add an OLE DB Source to the Data Flow Designer that is configured to retrieve data from the dbo.StageProduct table in the AdventureWorksDWDev database 1. 2. 3. 4. Open the Generate Product Data Data Flow task in the AWDataWarehouse package in Business Intelligence Development Studio. Add an OLE DB Source data flow source from the Toolbox onto the Data Flow Designer. Name the OLE DB Source data flow source AdventureWorksDWDev StageProducts. Edit the AdventureWorksDWDev StageProducts OLE DB data source by retrieving all columns from the StageProduct table in the AdventureWorksDWDev database. Save the AW_BI solution.

Task 9: Edit the Generate Product Data Data Flow task in the AWDataWarehouse package and add a Slowly Changing Dimension task that loads data into the DimProduct table and treats the Category and Subcategory data as changing attributes and the EnglishProductName as a historical attribute. All remaining columns will be treated as a fixed attribute 1. 2. Open the Generate Product Data Data Flow task in the AWDataWarehouse package in Business Intelligence Development Studio. Add a Slowly Changing Dimension Data Flow task to the Data Flow Designer and then create a Data Flow path from the AdventureWorksDWDev StageProducts OLE DB data source to the Slowly Changing Dimension. Run a Slowly Changing Dimension wizard selecting DimProduct as the destination table and ProductAlternateKey column as the business key. In the Slowly Changing Dimension wizard, treat the Category and Subcategory data as changing attributes and the EnglishProductName as a historical attribute. All remaining columns will be treated as a fixed attribute. In the Slowly Changing Dimension wizard, set the wizard to fail transformations with changes to fixed attributes and use start and end dates to identify current and expired records based on the System::StartTime variable. Disable the inferred members support. Save the AW_BI solution.

3. 4.

5.

6.

Task 10: Review the FactSales table in the AdventureWorksDWDev database removing the ExtendedAmount, UnitPriceDiscountPct, TotalProductCost and TaxAmount columns.

47

Then, edit the Generate FactSales Data Data Flow task to load the FactSales table with the correct data 1. 2. Open SQL Server Management Studio and view the columns in the FactSales table of the AdventureWorksDWDev database. Maximize Business Intelligence Development Studio and add an OLE DB data source to the AdventureWorks2008 database within the Generate FactSales Data Data Flow task that uses the SourceFactLoad.sql file located in D:\Labfiles\Starter. Use a Data Conversion transformation to convert the following columns that will be loaded into the FactSales table in the AdventureWorksDWDev database: 4. 5. 6. 7. 8. 9. ProductID integer data type to a Unicode string (25) with an output name ProductIDMapping. BusinessEntityID integer data type to a Unicode string (25) with an output name ResellerIDMapping. Convert the SalesOrderNumber to a Unicode string (20) with an output name of StringSalesOrderNumber. Covert the SalesOrderLineNumber to a single byte unsigned integer with an output name of TinyIntSalesOrderLineNumber. Convert the UnitPriceDiscount column to a double-precision float data type with an output name of cnv_UnitPriceDiscount. Convert the LineTotal column to a currency data type with an output name of cnv_LineTotal.

3.

Add a Lookup Task within the Generate FactSales Data Data Flow task that will look up the Product Dimension Key based on the ProductAltenateKey. Add a Lookup Task within the Generate FactSales Data Data Flow task that will look up the Reseller Dimension Key based on the BusinessEntityID. Add a Raw File destination within the Generate FactSales Data Data Flow task that will be used as the error output for the ResellerKey Lookup task. Add a Lookup Task within the Generate FactSales Data Data Flow task that will look up the Time Key based on the Orderdate column. Add a Lookup Task within the Generate FactSales Data Data Flow task that will look up the Time Key based on the DueDate column. Add a Lookup Task within the Generate FactSales Data Data Flow task that will look up the Time Key based on the ShipDate column.

10. Add an OLE DB Destination to the data flow and map the input columns correctly to the columns in the SalesFact table of the AdventureWorksDWDev database. Map the following Available Input Columns to the Available Destination Columns:

Available Input Columns ProductKey OrderDate Lookup.TimeKey DueDate Lookup.TimeKey ShipDate Lookup.TimeKey ResellerKey StringSalesOrderNumber TinyIntSalesOrderLineNumber

Available Destination Column ProductKey OrderDateKey DueDateKey ShipDateKey ResellerKey SalesOrderNumber SalesOrderLineNumber

48

RevisionNumber OrderQty UnitPrice Cnv_UnitPriceDiscount StandardCost Cnv_LineTotal

RevisionNumber OrderQuantity UnitPrice DiscountAmount ProductStandardCost SalesAmount

11. Save the AW_BI solution. Task 11: Execute the LoadAWDW package that contains the Execute Package tasks that controls the load of the AdventureWorksDWDev data warehouse and review the data in the database by using SQL Server Management Studio 1. 2. Open Business Intelligence Development Studio, execute the LoadAWDWDev package. Save and close the AW_BI solution.

Task 12: You have completed all tasks in this exercise A successful completion of this exercise results in the following outcomes: You have opened Business Intelligence Development Studio and opened a data flow component within a package. You have added an OLE DB data source within a data flow. You have added a conditional split transformation to a data flow task. You have added a sort transformation to a data flow task. You have added and edited an OLE DB data destination within a data flow. You have added and edited a slowly changing dimension transformation. You have added and edited a lookup transformation to load a fact table with data within a data warehouse. You have added and edited execute package task to control the load of data into a data warehouse.

49

Lab Review
In this lab, you used data flows within an SSIS package to populate a simple data warehouse. You firstly edited an existing package to add data sources and destinations and use common transformation to complete the loading of the ProductStage table. Then, you implemented a data viewer in this package and ran the package to ensure that data was loaded correctly into the ProductStage table. You then created the dimension tables in the data warehouse focusing specifically on the Slowly Changing Dimension task to manage changing data in the dimension tables. You finally explored to populate the fact table within the data warehouse by using the Lookup transformation to ensure that the correct data was loaded into the fact table. What is the purpose of Data Flow paths? Data Flow paths are used to control the flow of data within the Data Flow task. You can define a success Data Flow path represented by a green arrow, which will move the Data Flow path onto the next data flow component. You can also use an error output Data Flow path to control the flow of data when an error occurs. What kind of errors can be managed by the error output Data Flow path? You can define errors or truncation errors to be managed by the error output Data Flow path. What data types does the Export Column transformation manage? The DT_IMAGE, DT_TEXT and DT_NTEXT data types. The Export Column transformation moves this type of data stored within a table to a file. What is the difference between a Type 1 and a Type 2 Slowly Changing Dimension and how are they represented in the Slowly Changing Dimension transformation? Type 1 is a Slowly Changing Dimension that will overwrite data values within a dimension table. As a result, no historical data is retained. In the Slowly Changing Dimension Wizard, this is referred to as a Changing Attribute. Type 2 Slowly Changing Dimension will insert a new record when the value in a dimension table changes. As a result, historical data is retained. This is referred to as a Historical Attribute in the Slowly Changing Dimension Wizard. What is the difference between a Lookup and a Fuzzy Lookup transformation? The Lookup transformation enables you to take information from an input column and then look up additional information from another dataset that is linked to the input columns through a common column. Fuzzy Lookup transformation uses logic that can be applied to Lookup operations so that it can return data from a dataset that may closely match the Lookup value required.

50

Module Summary
Defining Data Sources and Destinations
In this lesson, you have learned the following key points:

The ETL operation uses data sources to retrieve the source data, transformations to change the data and data destinations to load the data into a destination database. The range of data flow source that enables SSIS to connect to a wide range of data sources include: o OLEDB to connect to SQL Server, Microsoft Access 2007 and Microsoft Excel 2007 o Flat file to connect to text and csv files o Raw file to connect to raw file sources created by raw file destinations o Microsoft Excel to connect to Microsoft Office Excel 972002 o XML to connect to XML data sources o ADO.Net sources to connect to a database to create a datareader The data flow destinations that are available in SSIS include: o OLEDB to connect to SQL Server, Microsoft Access 2007 and Microsoft Excel 2007 o Flat file to connect to text and csv files o Raw file to connect to raw file sources created by raw file destinations o Microsoft Excel to connect to Microsoft Office Excel 97 2002 o XML to connect to XML data sources o ADO.Net sources to connect to a database to create a datareader You can configure an OLE DB Data Source to retrieve data from SQL Server 2008 objects defining a server name, authentication method and database name. You can configure data sources for Access by using the OLEDB data source. You can configure data sources for specific versions of Excel by using OLEDB and Microsoft Excel data sources and destinations.

Data Flow Paths


In this lesson, you have learned the following key points:

Data flow paths can be used to control the flow of data flows and transformations in an SSIS package using success data flow paths and error data flow paths. You can create data flow paths and use them to create inputs into other data flow components. In addition, you can use data flow paths to create error data flow outputs by clicking and dragging the data flow path between different data flow components. Data viewers help you to view the data before and after transformations take place to verify that, the transformations are working as expected. The types of data viewers available to check the data within the data flow include: o Grid that returns the data in rows and columns in a table o Histogram works with numeric data only, allowing you to select one column from the data flow

51

Scatter Plot works with two numeric columns from a data source, providing the X-axis and Y-axis of a chart o Column Chart allows you to select one column from the data flow that presents a column chart that shows the number of occurrences You can create data viewers with SSIS to view the data flow as the package executes.

Implementing Data Flow Transformations: Part 1


In this lesson, you have learned the following key points:

Transformations in SSIS allow you to change the data as the data is being moved from a source connection to a destination connection. They can also be used to standardize and cleanse the data. You can modify data by using the data formatting transformations, including: o Character Map transformation for simple data transforms such as uppercase or lowercase o Data conversion transformation to convert data in the data flow o Sort transformation to sort the data ascending or descending within the data flow o Aggregate transformation that enables you to create a scalar results set or use in conjunction with a Group By clause to return multiple results You can manipulate column data by using column transformations, including: o Copy transformation to copy data between a source and a destination o Derived Column transformation to create a new column of data o Import column transformation o Export column transformation You can manage the data flow by using Multiple Data Flow transformations, including: o Conditional Split transformation to separate data based on an expression that acts as a condition for the split o Multicast transformation that enables you to generate multiple copies of the same data o Merge transformation that enables you to merge sorted data o Merge Join transformation that enables you to merge sorted data based on a join condition o Union All transformation that enables you to merge unsorted data You can create custom data sources, destinations and data transformations by using Custom transformations, including: o Script transformation that allows you to create custom data sources, destinations and data transformations using Visual Basic or C# o OLE DB command transformation to issue OLE DB commands You can implement simple transformations in the Data Flow of SSIS. You can use the Slowly Changing Dimension transformation to manage changing data within a dimension table in a data warehouse.

52

Implementing Data Flow Transformations: Part 2


In this lesson, you have learned the following key points:

You can create Lookup and Cache transformations in SQL Server 2008. The Lookup transformation helps you to take information from an input column and then look up additional information from another dataset that is linked to the input columns through a common column managing data in a data warehouse. The Cache transformation is used to improve the performance of a Lookup transformation. You can analyze data within the data flow by using Data Analysis transformations, including: o Pivot transformation to create a crosstab result set o Unpivot transformation to create a normalized result set o Data Mining Query transformation to use data mining extension to perform data analysis You can create a sample of data using Data Sampling transformations, including: o Percentage Sampling transformation to generate a sample of data based on a percentage value o Row Sampling transformation to generate a sample of data based on a set value o Row Count transformation enables you to perform a row count of data and pass the value to a variable Audit Transformation is used to add metadata information to the data flow. Fuzzy transformations can be used to help standardize data, including: o Fuzzy Lookup to perform lookups of data against data that mat not exactly match o Fuzzy Grouping to group data together that are candidates for the same type of data You can use Term transformations to extract nouns and noun phrases from within the data flow, including: o Term Extraction transformation o Term Lookup transformation

Lab: Implementing Data Flow in SQL Server Integration Services 2008


In this lab, you used data flows within an SSIS package to populate a simple data warehouse. You firstly edited an existing package to add data sources and destinations and use common transformation to complete the loading of the ProductStage table. Then, you implemented a data viewer in this package and ran the package to ensure that data was loaded correctly into the ProductStage table. You then created the dimension tables in the data warehouse focusing specifically on the Slowly Changing Dimension task to manage changing data in the dimension tables. You finally explored the ways to populate the fact table within the data warehouse by using the Lookup transformation to ensure that the correct data was loaded into the fact table.

53

Glossary
.NET Framework An integral Windows component that supports building, deploying and running the next generation of applications and Web services. It provides a highly productive, standards-based, multilanguage environment for integrating existing investments with next generation applications and services, as well as the agility to solve the challenges of deployment and operation of Internet-scale applications. The .NET Framework consists of three main parts: the common language runtime, a hierarchical set of unified class libraries and a componentized version of ASP called ASP.NET. ad hoc report An .rdl report created with report builder that accesses report models. aggregation A table or structure that contains precalculated data for a cube. aggregation design In Analysis Services, the process of defining how an aggregation is created. aggregation prefix A string that is combined with a system-defined ID to create a unique name for a partition's aggregation table. ancestor A member in a superior level in a dimension hierarchy that is related through lineage to the current member within the dimension hierarchy. attribute The building block of dimensions and their hierarchies that corresponds to a single column in a dimension table. attribute relationship The hierarchy associated with an attribute containing a single level based on the corresponding column in a dimension table.

54

axis A set of tuples. Each tuple is a vector of members. A set of axes defines the coordinates of a multidimensional data set. ActiveX Data Objects Component Object Model objects that provide access to data sources. This API provides a layer between OLE DB and programming languages such as Visual Basic, Visual Basic for Applications, Active Server Pages and Microsoft Internet Explorer Visual Basic Scripting. ActiveX Data Objects (Multidimensional) A high-level, language-independent set of object-based data access interfaces optimized for multidimensional data applications. ActiveX Data Objects MultiDimensional.NET A managed data provider used to communicate with multidimensional data sources. ADO MD See Other Term: ActiveX Data Objects (Multidimensional) ADOMD.NET See Other Term: ActiveX Data Objects MultiDimensional.NET AMO See Other Term: Analysis Management Objects Analysis Management Objects The complete library of programmatically accessed objects that let and application manage a running instance of Analysis Services. balanced hierarchy A dimension hierarchy in which all leaf nodes are the same distance from the root node. calculated column A column in a table that displays the result of an expression instead of stored data. calculated field A field, defined in a query, that displays the result of an expression instead of stored data.

55

calculated member A member of a dimension whose value is calculated at run time by using an expression. calculation condition A MDX logical expression that is used to determine whether a calculation formula will be applied against a cell in a calculation subcube. calculation formula A MDX expression used to supply a value for cells in a calculation subcube, subject to the application of a calculation condition. calculation pass A stage of calculation in a multidimensional cube in which applicable calculations are evaluated. calculation subcube The set of multidimensional cube cells that is used to create a calculated cells definition. The set of cells is defined by a combination of MDX set expressions. case In data mining, a case is an abstract view of data characterized by attributes and relations to other cases. case key In data mining, the element of a case by which the case is referenced within a case set. case set In data mining, a set of cases. cell In a cube, the set of properties, including a value, specified by the intersection when one member is selected from each dimension. cellset In ADO MD, an object that contains a collection of cells selected from cubes or other cellsets by a multidimensional query.

56

changing dimension A dimension that has a flexible member structure, and is designed to support frequent changes to structure and data. chart data region A report item on a report layout that displays data in a graphical format. child A member in the next lower level in a hierarchy that is directly related to the current member. clickthrough report A report that displays related report model data when you click data within a rendered report builder report. clustering A data mining technique that analyzes data to group records together according to their location within the multidimensional attribute space. collation A set of rules that determines how data is compared, ordered and presented. column-level collation Supporting multiple collations in a single instance. composite key A key composed of two or more columns. concatenation The combining of two or more character strings or expressions into a single character string or expression, or to combine two or more binary strings or expressions into a single binary string or expression. concurrency A process that allows multiple users to access and change shared data at the same time. SQL Server uses locking to allow multiple users to access and change shared data at the same time without conflicting with each other.

57

conditional split A restore of a full database backup, the most recent differential database backup (if any), and the log backups (if any) taken since the full database backup. config file See Other Term: configuration file configuration In reference to a single microcomputer, the sum of a system's internal and external components, including memory, disk drives, keyboard, video and generally less critical add-on hardware, such as a mouse, modem or printer. configuration file A file that contains machine-readable operating specifications for a piece of hardware or software, or that contains information about another file or about a specific user. configurations In Integration Services, a name or value pair that updates the value of package objects when the package is loaded. connection An interprocess communication (IPC) linkage established between a SQL Server application and an instance of SQL Server. connection manager In Integration Services, a logical representation of a run-time connection to a data source. constant A group of symbols that represent a specific data value. container A control flow element that provides package structure. control flow The ordered workflow in an Integration Services package that performs tasks.

58

control-break report A report that summarizes data in user-defined groups or breaks. A new group is triggered when different data is encountered. cube A set of data that is organized and summarized into a multidimensional structure defined by a set of dimensions and measures. cube role A collection of users and groups with the same access to a cube. custom rollup An aggregation calculation that is customized for a dimension level or member, and that overrides the aggregate functions of a cube's measures. custom rule In a role, a specification that limits the dimension members or cube cells that users in the role are permitted to access. custom variable An aggregation calculation that is customized for a dimension level or member and overrides the aggregate functions of a cube's measures. data dictionary A set of system tables, stored in a catalog, that includes definitions of database structures and related information, such as permissions. data explosion The exponential growth in size of a multidimensional structure, such as a cube, due to the storage of aggregated data. data flow The ordered workflow in an Integration Services package that extracts, transforms and loads data. data flow engine An engine that executes the data flow in a package.

59

data flow task Encapsulates the data flow engine that moves data between sources and destinations, providing the facility to transform, clean and modify data as it is moved. data integrity A state in which all the data values stored in the database are correct. data manipulation language The subset of SQL statements that is used to retrieve and manipulate data. data mart A subset of the contents of a data warehouse. data member A child member associated with a parent member in a parent-child hierarchy. data mining The process of analysing data to identify patterns or relationships. data processing extension A component in Reporting Services that is used to retrieve report data from an external data source. data region A report item that displays repeated rows of data from an underlying dataset in a table, matrix, list or chart. data scrubbing Part of the process of building a data warehouse out of data coming from multiple (OLTP) systems. data source In ADO and OLE DB, the location of a source of data exposed by an OLE DB provider. The source of data for an object such as a cube or dimension. It is also the specification of the information necessary to access source data. It sometimes refers to object of ClassType clsDataSource. In Reporting Services, a specified data source type, connection string and credentials, which can be saved separately to a report server and shared among report projects or embedded in a .rdl file.

60

data source name The name assigned to an ODBC data source. data source view A named selection of database objects that defines the schema referenced by OLAP and data mining objects in an Analysis Services databases. data warehouse A database specifically structured for query and analysis. database role A collection of users and groups with the same access to an Analysis Services database. data-driven subscription A subscription in Reporting Services that uses a query to retrieve subscription data from an external data source at run time. datareader A stream of data that is returned by an ADO.NET query. dataset In OLE DB for OLAP, the set of multidimensional data that is the result of running a MDX SELECT statement. In Reporting Services, a named specification that includes a data source definition, a query definition and options. decision support Systems designed to support the complex analytic analysis required to discover business trends. decision tree A treelike model of data produced by certain data mining methods. default member The dimension member used in a query when no member is specified for the dimension.

61

delimited identifier An object in a database that requires the use of special characters (delimiters) because the object name does not comply with the formatting rules of regular identifiers. delivery channel type The protocol for a delivery channel, such as Simple Mail Transfer Protocol (SMTP) or File. delivery extension A component in Reporting Services that is used to distribute a report to specific devices or target locations. density In an index, the frequency of duplicate values. In a data file, a percentage that indicates how full a data page is. In Analysis Services, the percentage of cells that contain data in a multidimensional structure. dependencies Objects that depend on other objects in the same database. derived column A transformation that creates new column values by applying expressions to transformation input columns. descendant A member in a dimension hierarchy that is related to a member of a higher level within the same dimension. destination An Integration Services data flow component that writes the data from the data flow into a data source or creates an in-memory dataset. destination adapter A data flow component that loads data into a data store. dimension A structural attribute of a cube, which is an organized hierarchy of categories (levels) that describe data in the fact table. 62

dimension granularity The lowest level available to a particular dimension in relation to a particular measure group. dimension table A table in a data warehouse whose entries describe data in a fact table. Dimension tables contain the data from which dimensions are created. discretized column A column that represents finite, counted data. document map A navigation pane in a report arranged in a hierarchy of links to report sections and groups. drillthrough In Analysis Services, a technique to retrieve the detailed data from which the data in a cube cell was summarized. In Reporting Services, a way to open related reports by clicking hyperlinks in the main drillthrough report. drillthrough report A report with the 'enable drilldown' option selected. Drillthrough reports contain hyperlinks to related reports. dynamic connection string In Reporting Services, an expression that you build into the report, allowing the user to select which data source to use at run time. You must build the expression and data source selection list into the report when you create it. Data Mining Model Training The process a data mining model uses to estimate model parameters by evaluating a set of known and predictable data. entity In Reporting Services, an entity is a logical collection of model items, including source fields, roles, folders and expressions, presented in familiar business terms. executable In Integration Services, a package, Foreach Loop, For Loop, Sequence or task. 63

execution tree The path of data in the data flow of a SQL Server 2008 Integration Services package from sources through transformations to destinations. expression In SQL, a combination of symbols and operators that evaluate to a single data value. In Integration Services, a combination of literals, constants, functions and operators that evaluate to a single data value. ETL Extraction, transformation and loading. The complex process of copying and cleaning data from heterogeneous sources. fact A row in a fact table in a data warehouse. A fact contains values that define a data event such as a sales transaction. fact dimension A relationship between a dimension and a measure group in which the dimension main table is the same as the measure group table. fact table A central table in a data warehouse schema that contains numerical measures and keys relating facts to dimension tables. field length In bulk copy, the maximum number of characters needed to represent a data item in a bulk copy character format data file. field terminator In bulk copy, one or more characters marking the end of a field or row, separating one field or row in the data file from the next. filter expression An expression used for filtering data in the Filter operator.

64

flat file A file consisting of records of a single record type, in which there is no embedded structure information governing relationships between records. flattened rowset A multidimensional data set presented as a two-dimensional rowset in which unique combinations of elements of multiple dimensions are combined on an axis. folder hierarchy A bounded namespace that uniquely identifies all reports, folders, shared data source items and resources that are stored in and managed by a report server. format file A file containing meta information (such as data type and column size) that is used to interpret data when being read from or written to a data file. File connection manager In Integration Services, a logical representation of a connection that enables a package to reference an existing file or folder or to create a file or folder at run time. For Loop container In Integration Services, a container that runs a control flow repeatedly by testing a condition. Foreach Loop container In Integration Services, a container that runs a control flow repeatedly by using an enumerator. Fuzzy Grouping In Integration Services, a data cleaning methodology that examines values in a dataset and identifies groups of related data rows and the one data row that is the canonical representation of the group. global assembly cache A machine-wide code cache that stores assemblies specifically installed to be shared by many applications on the computer. grant To apply permissions to a user account, which allows the account to perform an activity or work with data.

65

granularity The degree of specificity of information that is contained in a data element. granularity attribute The single attribute is used to specify the level of granularity for a given dimension in relation to a given measure group. grid A view type that displays data in a table. grouping A set of data that is grouped together in a report. hierarchy A logical tree structure that organizes the members of a dimension such that each member has one parent member and zero or more child members. hybrid OLAP A storage mode that uses a combination of multidimensional data structures and relational database tables to store multidimensional data. HTML Viewer A UI component consisting of a report toolbar and other navigation elements used to work with a report. input member A member whose value is loaded directly from the data source instead of being calculated from other data. input set The set of data provided to a MDX value expression upon which the expression operates. isolation level The property of a transaction that controls the degree to which data is isolated for use by one process, and is guarded against interference from other processes. Setting the isolation level defines the default locking behavior for all SELECT statements in your SQL Server session.

66

item-level role assignment A security policy that applies to an item in the report server folder namespace. item-level role definition A security template that defines a role used to control access to or interaction with an item in the report server folder namespace. key A column or group of columns that uniquely identifies a row (primary key), defines the relationship between two tables (foreign key) or is used to build an index. key attribute The attribute of a dimension that links the non-key attributes in the dimension to related measures. key column In an Analysis Services dimension, an attribute property that uniquely identifies the attribute members. In an Analysis Services mining model, a data mining column that uniquely identifies each case in a case table. key performance indicator A quantifiable, standardised metric that reflects a critical business variable (for instance, market share), measured over time. KPI See Other Term: key performance indicator latency The amount of time that elapses when a data change is completed at one server and when that change appears at another server. leaf In a tree structure, an element that has no subordinate elements. leaf level The bottom level of a clustered or nonclustered index.

67

leaf member A dimension member without descendants. level The name of a set of members in a dimension hierarchy such that all members of the set are at the same distance from the root of the hierarchy. lift chart In Analysis Services, a chart that compares the accuracy of the predictions of each data mining model in the comparison set. linked dimension In Analysis Services, a reference in a cube to a dimension in a different cube. linked measure group In Analysis Services, a reference in a cube to a measure group in a different cube. linked report A report that references an existing report definition by using a different set of parameter values or properties. list data region A report item on a report layout that displays data in a list format. local cube A cube created and stored with the extension .cub on a local computer using PivotTable Service. lookup table In Integration Services, a reference table for comparing, matching or extracting data. many-to-many dimension A relationship between a dimension and a measure group in which a single fact may be associated with many dimension members and a single dimension member may be associated with a many facts. matrix data region A report item on a report layout that displays data in a variable columnar format.

68

measure In a cube, a set of values that are usually numeric and are based on a column in the fact table of the cube. Measures are the central values that are aggregated and analyzed. measure group All the measures in a cube that derive from a single fact table in a data source view. member An item in a dimension representing one or more occurrences of data. member property Information about an attribute member, for example, the gender of a customer member or the color of a product member. mining structure A data mining object that defines the data domain from which the mining models are built. multidimensional OLAP A storage mode that uses a proprietary multidimensional structure to store a partition's facts and aggregations or a dimension. multidimensional structure A database paradigm that treats data as cubes that contain dimensions and measures in cells. MDX A syntax used for defining multidimensional objects and querying and manipulating multidimensional data. Mining Model An object that contains the definition of a data mining process and the results of the training activity. Multidimensional Expression A syntax used for defining multidimensional objects and querying and manipulating multidimensional data. named set A set of dimension members or a set expression that is created for reuse, for example, in MDX queries.

69

natural hierarchy A hierarchy in which at every level there is a one-to-many relationship between members in that level and members in the next lower level. nested table A data mining model configuration in which a column of a table contains a table. nonleaf In a tree structure, an element that has one or more subordinate elements. In Analysis Services, a dimension member that has one or more descendants. In SQL Server indexes, an intermediate index node that points to other intermediate nodes or leaf nodes. nonleaf member A member with one or more descendants. normalization rules A set of database design rules that minimizes data redundancy and results in a database in which the Database Engine and application software can easily enforce integrity. Non-scalable EM A Microsoft Clustering algorithm method that uses a probabilistic method to determine the probability that a data point exists in a cluster. Non-scalable K-means A Microsoft Clustering algorithm method that uses a distance measure to assign a data point to its closest cluster. object identifier A unique name given to an object. In Metadata Services, a unique identifier constructed from a globally unique identifier (GUID) and an internal identifier. online analytical processing A technology that uses multidimensional structures to provide rapid access to data for analysis. online transaction processing A data processing system designed to record all of the business transactions of an organization as they occur. An OLTP system is characterized by many concurrent users actively adding and modifying data. 70

overfitting The characteristic of some data mining algorithms that assigns importance to random variations in data by viewing them as important patterns. ODBC data source The location of a set of data that can be accessed using an ODBC driver. A stored definition that contains all of the connection information an ODBC application requires to connect to the data source. ODBC driver A dynamic-link library (DLL) that an ODBC-enabled application, such as Excel, can use to access an ODBC data source. OLAP See Other Term: online analytical processing OLE DB A COM-based API for accessing data. OLE DB supports accessing data stored in any format for which an OLE DB provider is available. OLE DB for OLAP Formerly, the separate specification that addressed OLAP extensions to OLE DB. Beginning with OLE DB 2.0, OLAP extensions are incorporated into the OLE DB specification. package A collection of control flow and data flow elements that runs as a unit. padding A string, typically added when the last plaintext block is short. The space allotted in a cell to create or maintain a specific size. parameterized report A published report that accepts input values through parameters. parent A member in the next higher level in a hierarchy that is directly related to the current member.

71

partition In replication, a subset of rows from a published table, created with a static row filter or a parameterized row filter. In Analysis Services, one of the storage containers for data and aggregations of a cube. Every cube contains one or more partitions. For a cube with multiple partitions, each partition can be stored separately in a different physical location. Each partition can be based on a different data source. Partitions are not visible to users; the cube appears to be a single object. In the Database Engine, a unit of a partitioned table or index. partition function A function that defines how the rows of a partitioned table or index are spread across a set of partitions based on the values of certain columns, called partitioning columns. partition scheme A database object that maps the partitions of a partition function to a set of filegroups. partitioned index An index built on a partition scheme, and whose data is horizontally divided into units which may be spread across more than one filegroup in a database. partitioned snapshot In merge replication, a snapshot that includes only the data from a single partition. partitioned table A table built on a partition scheme, and whose data is horizontally divided into units which may be spread across more than one filegroup in a database. partitioning The process of replacing a table with multiple smaller tables. partitioning column The column of a table or index that a partition function uses to partition a table or index. perspective A user-defined subset of a cube.

72

pivot To rotate rows to columns, and columns to rows, in a crosstabular data browser. To choose dimensions from the set of available dimensions in a multidimensional data structure for display in the rows and columns of a crosstabular structure. polling query A polling query is typically a singleton query that returns a value Analysis Services can use to determine if changes have been made to a table or other relational object. precedence constraint A control flow element that connects tasks and containers into a sequenced workflow. predictable column A data mining column that the algorithm will build a model around based on values of the input columns. prediction A data mining technique that analyzes existing data and uses the results to predict values of attributes for new records or missing attributes in existing records. proactive caching A system that manages data obsolescence in a cube by which objects in MOLAP storage are automatically updated and processed in cache while queries are redirected to ROLAP storage. process In a cube, to populate a cube with data and aggregations. In a data mining model, to populate a data mining model with data mining content. profit chart In Analysis Services, a chart that displays the theoretical increase in profit that is associated with using each model. properties page A dialog box that displays information about an object in the interface.

73

property A named attribute of a control, field or database object that you set to define one of the object's characteristics, such as size, color or screen location; or an aspect of its behavior, such as whether it is hidden. property mapping A mapping between a variable and a property of a package element. property page A tabbed dialog box where you can identify the characteristics of tables, relationships, indexes, constraints and keys. protection In Integration Services, determines the protection method, the password or user key and the scope of package protection. ragged hierarchy See Other Term: unbalanced hierarchy raw file In Integration Services, a native format for fast reading and writing of data to files. recursive hierarchy A hierarchy of data in which all parent-child relationships are represented in the data. reference dimension A relationship between a dimension and a measure group in which the dimension is coupled to the measure group through another dimension. This behaves like a snowflake dimension, except that attributes are not shared between the two dimensions. reference table The source table to use in fuzzy lookups. refresh data The series of operations that clears data from a cube, loads the cube with new data from the data warehouse and calculates aggregations.

74

relational database A database or database management system that stores information in tables as rows and columns of data, and conducts searches by using the data in specified columns of one table to find additional data in another table. relational database management system A system that organizes data into related rows and columns. relational OLAP A storage mode that uses tables in a relational database to store multidimensional structures. rendered report A fully processed report that contains both data and layout information, in a format suitable for viewing. rendering A component in Reporting Services that is used to process the output format of a report. rendering extension(s) A plug-in that renders reports to a specific format. rendering object model Report object model used by rendering extensions. replay In SQL Server Profiler, the ability to open a saved trace and play it again. report definition The blueprint for a report before the report is processed or rendered. A report definition contains information about the query and layout for the report. report execution snapshot A report snapshot that is cached. report history A collection of report snapshots that are created and saved over time.

75

report history snapshot A report snapshot that appears in report history. report intermediate format A static report history that contains data captured at a specific point in time. report item Any object, such as a text box, graphical element or data region, that exists on a report layout. report layout In report designer, the placement of fields, text and graphics within a report. In report builder, the placement of fields and entities within a report, plus applied formatting styles. report layout template A predesigned table, matrix or chart report template in report builder. report link A URL to a hyperlinked report. report model A metadata description of business data used for creating ad hoc reports in report builder. report processing extension A component in Reporting Services that is used to extend the report processing logic. report rendering The action of combining the report layout with the data from the data source for the purpose of viewing the report. report server database A database that provides internal storage for a report server. report server execution account The account under which the Report Server Web service and Report Server Windows service run.

76

report server folder namespace A hierarchy that contains predefined and user-defined folders. The namespace uniquely identifies reports and other items that are stored in a report server. It provides an addressing scheme for specifying reports in a URL. report snapshot A static report that contains data captured at a specific point in time. report-specific schedule Schedule defined inline with a report. resource Any item in a report server database that is not a report, folder or shared data source item. role A SQL Server security account that is a collection of other security accounts that can be treated as a single unit when managing permissions. A role can contain SQL Server logins, other roles, and Windows logins or groups. In Analysis Services, a role uses Windows security accounts to limit scope of access and permissions when users access databases, cubes, dimensions and data mining models. In a database mirroring session, the principal server and mirror server perform complementary principal and mirror roles. Optionally, the role of witness is performed by a third server instance. role assignment Definition of user access rights to an item. In Reporting Services, a security policy that determines whether a user or group can access a specific item and perform an operation. role definition A collection of tasks performed by a user (i.e. browser, administrator). In Reporting Services, a named collection of tasks that defines the operations a user can perform on a report server. roleplaying dimension A single database dimension joined to the fact table on different foreign keys to produce multiple cube dimensions. 77

RDBMS See Other Term: relational database management system RDL See Other Term: Report Definition Language Report Definition Language A set of instructions that describe layout and query information for a report. Report Server service A Windows service that contains all the processing and management capabilities of a report server. Report Server Web service A Web service that hosts, processes and delivers reports. ReportViewer controls A Web server control and Windows Form control that provides embedded report processing in ASP.NET and Windows Forms applications. scalar A single-value field, as opposed to an aggregate. scalar aggregate An aggregate function, such as MIN(), MAX() or AVG(), that is specified in a SELECT statement column list that contains only aggregate functions. scale bar The line on a linear gauge on which tick marks are drawn analogous to an axis on a chart. scope An extent to which a variable can be referenced in a DTS package. script A collection of Transact-SQL statements used to perform an operation.

78

security extension A component in Reporting Services that authenticates a user or group to a report server. semiadditive A measure that can be summed along one or more, but not all, dimensions in a cube. serializable The highest transaction isolation level. Serializable transactions lock all rows they read or modify to ensure the transaction is completely isolated from other tasks. server A location on the network where report builder is launched from and a report is saved, managed and published. server admin A user with elevated privileges who can access all settings and content of a report server. server aggregate An aggregate value that is calculated on the data source server and included in a result set by the data provider. shared data source item Data source connection information that is encapsulated in an item. shared dimension A dimension created within a database that can be used by any cube in the database. shared schedule Schedule information that can be referenced by multiple items. sibling A member in a dimension hierarchy that is a child of the same parent as a specified member. slice A subset of the data in a cube, specified by limiting one or more dimensions by members of the dimension.

79

smart tag A smart tag exposes key configurations directly on the design surface to enhance overall design-time productivity in Visual Studio 2005. snowflake schema An extension of a star schema such that one or more dimensions are defined by multiple tables. source An Integration Services data flow component that extracts data from a data store, such as files and databases. source control A way of storing and managing different versions of source code files and other files used in software development projects. Also known as configuration management and revision control. source cube The cube on which a linked cube is based. source database In data warehousing, the database from which data is extracted for use in the data warehouse. A database on the Publisher from which data and database objects are marked for replication as part of a publication that is propagated to Subscribers. source object The single object to which all objects in a particular collection are connected by way of relationships that are all of the same relationship type. source partition An Analysis Services partition that is merged into another and is deleted automatically at the end of the merger process. sparsity The relative percentage of a multidimensional structure's cells that do not contain data. star join A join between a fact table (typically a large fact table) and at least two dimension tables.

80

star query A star query joins a fact table and a number of dimension tables. star schema A relational database structure in which data is maintained in a single fact table at the center of the schema with additional dimension data stored in dimension tables. subreport A report contained within another report. subscribing server A server running an instance of Analysis Services that stores a linked cube. subscription A request for a copy of a publication to be delivered to a Subscriber. subscription database A database at the Subscriber that receives data and database objects published by a Publisher. subscription event rule A rule that processes information for event-driven subscriptions. subscription scheduled rule One or more Transact-SQL statements that process information for scheduled subscriptions. Secure Sockets Layer (SSL) A proposed open standard for establishing a secure communications channel to prevent the interception of critical information, such as credit card numbers. Primarily, it enables secure electronic financial transactions on the World Wide Web, although it is designed to work on other Internet services as well. Semantic Model Definition Language A set of instructions that describe layout and query information for reports created in report builder. Sequence container Defines a control flow that is a subset of the package control flow.

81

table data region A report item on a report layout that displays data in a columnar format. tablix A Reporting Services RDL data region that contains rows and columns resembling a table or matrix, possibly sharing characteristics of both. target partition An Analysis Services partition into which another is merged, and which contains the data of both partitions after the merger. temporary stored procedure A procedure placed in the temporary database, tempdb and erased at the end of the session. time dimension A dimension that breaks time down into levels such as Year, Quarter, Month and Day. In Analysis Services, a special type of dimension created from a date/time column. transformation In data warehousing, the process of changing data extracted from source data systems into arrangements and formats consistent with the schema of the data warehouse. In Integration Services, a data flow component that aggregates, merges, distributes and modifies column data and rowsets. transformation error output Information about a transformation error. transformation input Data that is contained in a column, which is used during a join or lookup process, to modify or aggregate data in the table to which it is joined. transformation output Data that is returned as a result of a transformation procedure. tuple Uniquely identifies a cell, based on a combination of attribute members from every attribute hierarchy in the cube. 82

two A process that ensures transactions that apply to more than one server are completed on all servers or on none. unbalanced hierarchy A hierarchy in which one or more levels do not contain members in one or more branches of the hierarchy. unknown member A member of a dimension for which no key is found during processing of a cube that contains the dimension. unpivot In Integration Services, the process of creating a more normalized dataset by expanding data columns in a single record into multiple records. value An expression in MDX that returns a value. Value expressions can operate on sets, tuples, members, levels, numbers or strings. variable interval An option on a Reporting Services chart that can be specified to automatically calculate the optimal number of labels that can be placed on an axis, based on the chart width or height. vertical partitioning To segment a single table into multiple tables based on selected columns. very large database A database that has become large enough to be a management challenge, requiring extra attention to people, processes and processes. visual A displayed, aggregated cell value for a dimension member that is consistent with the displayed cell values for its displayed children. VLDB very large database.

83

write back To update a cube cell value, member or member property value. write enable To change a cube or dimension so that users in cube roles with read/write access to the cube or dimension can change its data. writeback In SQL Server, the update of a cube cell value, member or member property value. Web service In Reporting Services, a service that uses Simple Object Access Protocol (SOAP) over HTTP and acts as a communications interface between client programs and the report server. XML for Analysis A specification that describes an open standard that supports data access to data sources that reside on the World Wide Web. XMLA See Other Term: XML for Analysis

84