Sie sind auf Seite 1von 16

SQL Server 2005 Integration

Services - Part 38 - Pivot


Transformation
By Marcin Policht
Extensive ETL (Extraction, Transformation, and Loading) capabilities that SQL Server
2005 Integration Services are based on, deliver such essential functionality as the
combination and cleanup of data originating from heterogeneous sources or scheduling
and coordination of activities that frequently take place beyond the boundaries of
database management systems. While a substantial number of these features help
with traditional database administration tasks, there are also a few intended primarily
for assisting with data analysis. In this article, we will cover Pivot transformation,
which is one of the more popular choices in this category.
Pivot operation (just like its T-SQL equivalent) modifies the way in which a recordset is
presented; typically by rotating row data into columns, (SSIS also offers Unpivot
transformation, which reverses this process). Even though these changes do not
introduce any new data, they tend to enhance the ability to analyze existing content by
simplifying comparisons and uncovering less apparent trends. In order to help you
understand this concept, we will present an example illustrating pivot operation. As our
data source, we will use sample spreadsheets available on the Microsoft Web site in the
form of Excel 2002 Sample: PivotTable Reports which you need to extract to an
arbitrary folder by running the downloadable Report.exe. The target location will host
SampleSalespersonReports.xls (which we will manipulate throughout the course of this
article), SampleProductReports.xls, SampleOrderReports.xls, and
SampleCustomerReports.xls Excel workbooks. Even though they were designed with
Excel Pivot Table functionality in mind, we will be able to leverage them for the purpose
of our demonstration.
The spreadsheet serving as our data source ('Source Data' in the
SampleSalespersonReports.xls) contains inventory of orders handled between July
2003 and May 2005 by nine salespeople located in the USA and the UK. Our intention
is to convert it into a recordset that would allow us to easily determine the total value
of orders for each salesperson during each year. More specifically, we want the
outcome to consist of five columns - Salesperson, Country, 2003 Orders Amount, 2004
Orders Amount, and 2005 Orders Amount. Since the values stored in the last three
need to be calculated by adding individual order amounts on per salesperson and per
year basis, we will use the Derived Column and Aggregate transformations for this
purpose. Once the summarized data is available (still in the original format), we will
reorganize it by applying Pivot. The final result will be saved in a spreadsheet by using
the Excel Destination Data Flow component.
To accomplish this, start by initiating a new project of Integration Services type in the
Business Intelligence Development Studio. Add to the newly created project a Data
Flow task (by dragging its icon from the Toolbox onto Designer interface) and double-
click on it to switch to its tabbed area. Create Excel Source (listed under Data Flow
sources in the Toolbox) and display its Editor (by selecting the Edit... entry from its
context sensitive menu). Within the Editor window, click on the New... command
button to provide parameters for Connection Manager, pointing to
SampleSalespersonReports.xls. Ensure that the "Table or view" option appears in the
"Data access mode" listbox and pick 'Source Data$' as Name of the Excel sheet. Switch
to the Column section within the Editor window and mark Country, Salesperson, Order
Date, and Order Amount in the Available External Columns listing. Once you complete
these steps, click on the OK button to close the Editor window.
Next, drag Derived Column transform from the Toolbox and connect the output of our
Excel Source with its input. Launch its Editor window and define a new derived column
named Order Year, calculated using the YEAR([Order Date]) expression (set its data
type to two byte unsigned integer). Accept the changes by clicking on the OK button.
Once back to the Data Task tab, add Aggregate transform to the Data Flow task area
and drag the green arrow originating from the Derived Column to its input. On the
Aggregation tab of its Editor window, select Country, Salesperson, Order Year and
Order Amount in the Available Input Columns box and ensure that the first three are
listed with "Group by" operation and the last one has "Sum" applied to it. Close the
Editor window and return to the Data Flow task tab.
Next transformation that needs to be included in our package is Pivot. Once you have
dropped it onto the Data Flow area from the Toolbox, connect the output of Aggregate
to its input and select Edit or Show Advanced Editor - interestingly both present you
with the same Advanced Editor for Pivot window. Once there, review the Component
Properties tab and switch to the Input Columns tab. Ensure that all available input
columns (Salesperson, Order Amount, Country, and Order Year) are selected and
switch to the Input and Output Properties. This is where the majority of configuration
takes place.
As mentioned before, our goal is to display the outcome in the specific format, with
three extra columns (2003 Orders Amount, 2004 Orders Amount, and 2005 Orders
Amount), in addition to the two original ones - Salesperson and Country. With the
assistance of Derived Column and Aggregate, we have so far managed to create a
recordset with Salesperson, Country, Order Year, and Order Amount fields, which
contains the total amount of orders for a specific salesperson in a given year, giving us
27 rows (9 salespeople times 3 years) - down from 799 rows in the
SampleSalespersonReports.xls spreadsheet. At this point, we want to rearrange
records in such way that instead of Order Year and Order Amount columns, we will
have three columns, one per each year covered by our sales inventory (giving us a
table with 9 rows and 5 columns - with a single row for each salesperson) listing the
amount of sales for an individual salesperson in that year. According to pivot
nomenclature, Salesperson and Country function as SetKeys (values in these input
columns identify records that need to be grouped together in the same output row),
Order Year serves the role of the PivotKey (column which values are used to determine
additional columns in the resulting recordset), and Order Amount contains
PivotedValues (which are copied to the new columns created by pivot). Keep in mind
that entries in SetKey and PivotKey columns have to be unique on the per-row basis
(which is the case, since the data has been aggregated prior to applying the pivot).
Continue our configuration by expanding the Pivot Default Input node, which lists all
input columns. For each, you need to define its role in the pivot process, by setting the
PivotUsage custom property, which can take on one of the following values:
• 0 - indicates that content of the column is simply copied to the output,
• 1 - designates column participating in KeySet (this value should be assigned for
Salesperson and Country),
• 2 - identifies the PivotKey column (Order Year in our case),
• 3 - intended for PivotedValues (Order Amount).
For all input columns, take a note of the values of their LineageID property, since you
will need to know them to proceed with the next step. Once completed, switch to the
Pivot Default Output node and create the following output columns:
• Salesperson - set its SourceColumn custom property to match the LineageID
parameter of the Salesperson input column,
• Country - set its SourceColumn custom property to match the LineageID
parameter of the Country input column,
• 2003 Orders - set its SourceColumn custom property to match the LineageID
parameter of the Order Amount input column and its PivotKeyValue to the
number 2003 (needs to be equal to "2003" integer value in Order Year column),
• 2004 Orders - set its SourceColumn custom property to match the LineageID
parameter of the Order Amount input column and its PivotKeyValue to the
number 2004 (needs to be equal to "2004" integer value in Order Year column),
• 2005 Orders - set its SourceColumn custom property to match the LineageID
parameter of the Order Amount input column and its PivotKeyValue to the
number 2005 (needs to be equal to "2005" integer value in Order Year column).
Confirm your choices by clicking on the OK button and return to the Data Flow tab
area. To capture the results, create an Excel Destination, connect its input with the
output of the Pivot transformation and specify the target spreadsheet by assigning
appropriate values in its Excel Connection Manager. The outcome should contain five
columns and nine rows, listing aggregate order values for each salesperson in each of
the three years covered by the SampleSalespersonReports.xls.
It is important to remember that correct output requires that SetKeys entries
containing identical values appear in adjacent input rows. In our example, this was
handled by the Aggregate component (grouping all records by salesperson), however
in cases where this operation is not needed, make sure you introduce Sort
transformation prior to performing pivot. Otherwise, you will end up with a separate
row for each non-adjacent value in the SortKey column (and NULLs entries in some of
pivoted columns for this row). For example, if three rows for a given salesperson were
not grouped together in our Pivot input data, we would end up with three output rows
sharing the same SetKey value (i.e. for the total of 11 rows in the output recordset).
One of them would contain total sales in the 2003 Orders column as well as two NULLs
under 2004 and 2005 Orders, while the remaining two would have a single value in the
2004 Orders and 2005 Orders columns, respectively (and NULLs in the other two
columns).
SQL Server 2005 Books Online (November 2008)
Pivot Transformation
Updated: 14 April 2006
The Pivot transformation makes a normalized data set into a less normalized but more
compact version by pivoting the input data on a column value. For example, a
normalized Orders data set that lists customer name, product, and quantity purchased
typically has multiple rows for any customer who purchased multiple products, with
each row for that customer showing order details for a different product. By pivoting
the data set on the product column, the Pivot transformation can output a data set
with a single row per customer. That single row lists all the purchases by the customer,
with the product names shown as column names, and the quantity shown as a value in
the product column. Because not every customer purchases every product, many
columns may contain null values.
When a dataset is pivoted, input columns perform different roles in the pivoting
process. A column can participate in the following ways:
• The column is passed through unchanged to the output. Because many input
rows can result only in one output row, the transformation copies only the first
input value for the column.
• The column acts as the key or part of the key that identifies a set of records.
• The column defines the pivot. The values in this column are associated with
columns in the pivoted dataset.
• The column contains values that are placed in the columns that the pivot
creates.
The following diagram shows a data set before the data is pivoted on the Product
column.

The following diagram shows a data set after the data has been pivoted on the
Product column.

To pivot data efficiently, which means creating as few records in the output dataset as
possible, the input data must be sorted on the pivot column. If the data is not sorted,
the Pivot transformation might generate multiple records for each value in the set key,
which is the column that defines set membership. For example, if the dataset is
pivoted on a Name column but the names are not sorted, the output dataset could
have more than one row for each customer, because a pivot occurs every time that the
value in Name changes.
The input data might contain duplicate rows, which will cause the Pivot transformation
to fail. "Duplicate rows" means rows that have the same values in the set key columns
and the pivot columns. For example, if you use the data set before the data is pivoted
on the Product column, as shown in the diagram, and add a row with Kate in the Cust
column and Soda in the Product column, these duplicates values would cause the
Pivot transformation to fail, regardless of the quantity in the Qty column. To avoid
failure, you can either configure the transformation to redirect error rows to an error
output or you can pre-aggregate values to ensure there are no duplicate rows. For
example, in the sample data set, you could sum the values in the Qty column by
customer and product.
The Pivot transformation uses the properties on its input and output columns to define
the pivot operation.
The Pivot transformation includes the PivotKeyValue custom property. This property
can be updated by a property expression when the package is loaded. For more
information, see Integration Services Expression Reference, Using Property
Expressions in Packages, and Transformation Custom Properties.
This transformation has one input, one regular output, and one error output.
Configuring the Sample Dataset
The sample dataset shown in the diagram was configured as follows: the PivotUsage
property of the Cust column was set to 1, to indicate that it is a set key column; the
PivotUsage property of the Product input column was set to 2, to indicate that a
column must be created for each product; the PivotUsage property of the Qty input
column was set to 3, to indicate that quantity values are placed into the pivot column.
The transformation output was configured to include six columns. The columns, which
can be added by using the Advanced Editor dialog box, were named Cust, Ham,
Soda, Milk, Beer, and Chips. The PivotKeyValue property of the Ham column was
set to Ham, to indicate that the transformation should look for that value in the input
column. Similarly, the PivotKeyValue property of the Soda column was set to Soda,
and so on.
Columns in the transformation input were then mapped to columns in the output.
The SourceColumn property of the Cust column was configured to use the lineage
identifier of the Cust input column. The SourceColumn properties of the Ham, Soda,
Milk, Beer, and Chips columns were configured to use the lineage identifier of the
Qty input column. Another way to configure this would be to set the SourceColumn
property of the Ham, Soda, Milk, Beer, and Chips columns to -1, which would insert
the value True instead of the data value. For example, instead of the values 12 and 24,
the Beer column would then contain the value True, to indicate only that the customer
had purchased the product, instead of showing the quantity purchased.
The rows in the transformation output contain the values from the Cust and Qty input
columns.
Pivot Options
You set the PivotUsage property of the input columns to specify the role each column
performs in the pivoting process. The valid values of PivotUsage are 0, 1, 2, and 3.
The following table describes the PivotUsage options.
Option Description
The column is not pivoted, and the column values are passed through to the
0
transformation output.
The column is part of the set key that identifies one or more rows as part of
1 one set. All input rows with the same set key are combined into one output
row.
The column is a pivot column. At least one column is created from each column
2
value.
The values from this column are placed in columns that are created as a result
3
of the pivot.
Configuring the Pivot Transformation
You can set properties through SSIS Designer or programmatically.
For more information about the properties that you can set in the Advanced Editor
dialog box or programmatically, click one of the following topics:
• Common Properties
• Transformation Custom Properties
For more information about how to set the properties, click one of the following topics:
• How to: Set the Properties of a Data Flow Component in the Properties Window
• How to: Set the Properties of a Data Flow Component Using the Advanced
Editor
Unpivot Transformation
Updated: 14 April 2006
The Unpivot transformation makes an unnormalized dataset into a more normalized
version by expanding values from multiple columns in a single record into multiple
records with the same values in a single column. For example, a dataset that lists
customer names has one row for each customer, with the products and the quantity
purchased shown in columns in the row. After the Unpivot transformation normalizes
the data set, the data set contains a different row for each product that the customer
purchased.
The following diagram shows a data set before the data is unpivoted on the Product
column.

The following diagram shows a data set after it has been unpivoted on the Product
column.

Under some circumstances, the unpivot results may contain rows with unexpected
values. For example, if the sample data to unpivot shown in the diagram had null
values in all the Qty columns for Fred, then the output would include only one row for
Fred, not five. The Qty column would contain either null or zero, depending on the
column data type.
The Unpivot transformation includes the PivotKeyValue custom property. This
property can be updated by a property expression when the package is loaded. For
more information, see Integration Services Expression Reference, Using Property
Expressions in Packages, and Transformation Custom Properties.
This transformation has one input and one output. It has no error output.
Configuring the Unpivot Transformation
You can set properties through SSIS Designer or programmatically.
For more information about the properties that you can set in the Unpivot
Transformation Editor dialog box, click one of the following topics:
• Unpivot Transformation Editor
For more information about the properties that you can set in the Advanced Editor
dialog box or programmatically, click one of the following topics:
• Common Properties
• Transformation Custom Properties
For more information about how to set the properties, click one of the following topics:
• How to: Set the Properties of a Data Flow Component Using a Component
Editor
• How to: Set the Properties of a Data Flow Component in the Properties Window
• How to: Set the Properties of a Data Flow Component Using the Advanced
Editor
Pivot and UnPivot with SSIS
By : Dinesh Asanka
Nov 28, 2007

Page 2 / 5
Next, we need to derive the Quarter. Even though we can modify the initial T-SQL to
return the Quarter, I have used derive column data flow transformation task. The
following expression is used to derive the Quarter.
MONTH(OrderDate) >= 1 && MONTH(OrderDate) <= 3 ? 1 :
MONTH(OrderDate) >= 4 && MONTH(OrderDate) <= 6 ? 2 :
MONTH(OrderDate) >= 7 && MONTH(OrderDate) <= 9 ? 3 :
MONTH(OrderDate) >= 10 && MONTH(OrderDate) <= 12 ? 4 : 0

We now need to group the above data with Category and Quarter. We can use
aggregate transformation and configure it to be grouped by Name and intQtr.

Next we need to add a sort transformation, and here I have used category to sort. We
also need to sort the key column, otherwise pivot will not work properly. To see the data
up to this point,you can add a data viewer.
Below is the scrennshot of the data set should be getting, which is the data set we
need to pivot.
We have now reached the core part of this article- pivoting. For pivoting, there is a
pivot transformation confirguration which is not exactly straight forward. At input tab of
the pivot transformation, you need to select columns that you would use in the pivot
operation, which in this case would be all three available columns.
The next most important tab is the ‘Input and Output’ properties tab, pictured below.
For input columns, we need to configure the pivot usage attribute.

Optio Description
n

0 The column is not pivoted, and the column values are passed through to
the transformation output.

1 The column is part of the set key that identifies one or more rows as part
of one set. All input rows with the same set key are combined into one
output row.

2 The column is a pivot column. At least one column is created from each
column value.

3 The values from this column are placed in columns that are created as a
result of the pivot.

Source: Books on line, SQL Server 2005

According to the above table, Name column should be Option 1 , intQtr should have
Option 2 and OrderQty should have Option 3 for pivot usage attribute value.
SSIS: UNPIVOT Transformation
Turning some of columns into rows was one of the tasks had to be done recently. Even though I
couldn’t use “SSIS UNPIVOT transformation” for that, I had a chance to play with it. As it is
really a useful data flow item for some operations, thought to make a post on it.

The given below is the data contain in a text file.


ProjectName equipments transportation rental software hardware
CCN 54632.56 78433.00 9876 0 0

FX5 100547.55 205465.00 99526 78000 45465

Assume we need to load above information structured like below.


ProjectName ExpenseType Amount
CCN equipments 54632.56
CCN hardware 0.00
CCN rental 9876.00
CCN software 0.00
CCN transportation 78433.00
FX5 equipments 100547.55
FX5 hardware 45465.00
FX5 rental 99526.00
FX5 software 78000.00
FX5 transportation 205465.00

Simple create a SSIS package and add necessary source (for the text file) and destination items.
Then add a UNPIVOT transformation item and set the source output path to it. Open the
UNPIVOT transformation editor and set equipments, transportation, rental, software and
hardware as Input Column. Do not select ProjectName. Set “Amount” for all Destination Column
of all Input Columns. The Pivot Key Value will be same as Input Column name. Enter
“ExpenseType” for Pivot key value column name. Set the output of UNPIVOT transformation item
to the destination. It is done!

Since my requirement was little bit different, I had to load them to the SQL Server temp table
and use UNPIVOT TSQL command. But for scenario like this, this method can be easily applied.

You may be adding Data Conversion item to convert data if the destination is SQL Server.
How to convert a row to a column

○ Alert Me
○ Alert Me

RedDennis Thursday, August 02, 2007 1:21 PM

0 votesVote As Helpful

hi, I have a requirement as stated below


Convert one row in a table to a column:
Input
Name ID Date Profile Manager
John 12 1/1/1900 Admin 12
Cary 1 1/1/1900 Admin 12

Output

Name John Cary


ID 12 1
Date 1/1/1900 1/1/1900
Profile Admin Admin
Manager 12 12

I thing i have to use pivot transformation in the data flow but i am not sure how to
configure this. Can anyone suggest me how to implemet this or configure this!

Thanks in advance!

Report As Abuse

○ Reply
○ Quote
○ Quote
All Replies

• Eric WisdahlModerator Thursday, August 02, 2007 5:53 PM


0 votesVote As Helpful
What it appears that you are trying to do doesn't make much sense. Would you
continue to add columns to the table if you had more than two records? Do you
know what type of records you would like to get out? What would tie the records
together such that you know that all of the values in the "Column" are related?

It appears that you would really just like to present the data differently, which will
be a front end application job (C# or VB).

If I am misreading your question, here is a breif overview of what a pivot


(denormalize) / unpivot (normalize) would do:

Pivot:

Name Type Amount


name1,typeA,2
name1,typeB,3
name2,typeB,1
name3,typeC,4

===>

Name typeA typeB typeC


name1,2,3,0
name2,0,1,0
name3,0,0,4

Notice that the name in the column "Type" is treated as the column name when
unpivoted. These names can be set up to be different than the value which causes
them to pivot, but most people will leave them the same so as to not confuse
themselves later.
The value that was stored in the column "Amount" is transferred to the column
name associated with it.

The record is identified by the key value of name.

UnPivot:
Name typeA typeB typeC
name1,2,3,0
name2,0,1,0
name3,0,0,4

==>

Name Type Amount


name1,typeA,2
name1,typeB,3
name2,typeB,1
name3,typeC,4

Notice that the column name is transferred to the pivoting column "Type" and the
value that was stored in that column is pivoted into the column "Amount". The
record is identified by the key value of name.

<Rant>
NOTE:
There have been a few questions lately on how you would pivot multiple rows,
which usually look something like the following:

Name T1 D1 T2 D2
a,txt1,01/01/2000,txt2,01/01/2007
b,txt7,02/02/2010,txt3,08/08/2004

and they would like to "Pivot" to

Name T D
a,txt1,01/01/2000
a,txt2,01/01/2007
b,txt7,02/02/2010
b,txt3,08/08/2004

This IS NOT pivoting! This is subselecting into a generic category. A pivot or


unpivot operation must include the column name which is being transferred.
</Rant>

Report As Abuse

○ Reply
○ Quote
○ Propose As Answer
○ Propose As Answer

• RedDennis Thursday, August 02, 2007 6:56 PM


0 votesVote As Helpful
I have an excel workbook with many sheets, each sheet has a table with different structure.
the first spread sheet has a list of all tables(its like contents). I will get a tablename at a time
from the content sheet and go to the respective sheet which has the record information.
Then the first row has column info ... based upon this info i have to create a table
dynamically
What i am trying to do is get the column names (the colum number and names changes
frequently) and build a table dynamically in the Database.
For which i need to get the column names from the first row of excel spread sheet
and convert them to a single column so that i can use them one by one loop over and
dynamically generate a DDL script for the table. The above diagram illustrated how
i want to unpivot the table.

Das könnte Ihnen auch gefallen