Sie sind auf Seite 1von 9

Naming conventions and effective practices for SQL Server Integration Services (SSIS)

Writing maintainable code

Sunil Kumar Arvindakshan Deloitte Consulting LLP October 2011

Purpose of document
This whitepaper describe the naming conventions and effective practices for SQL Server Integration Services (SSIS). It is based on authors experience in SSIS project. SQL Server Integration Services (SSIS) is a component of the Microsoft SQL Server database software that can be used to perform a broad range of data migration tasks. SSIS is a platform for data integration and workflow applications. It features a fast and flexible data warehousing tool used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data. This paper is intended to help developers during all stages of the project. As part of software development life cycle, we come across various stages like development and unit testing, system testing, user acceptance testing (UAT) and production. Hence, there might be different personnel working on different modules in different stages. In this scenario, one of the main challenges for SSIS developer is maintaining naming for various SSIS components. Using the best naming convention help developers to easily locate and identify SSIS objects during all stages of project development. The recommendations for effective practices in this whitepaper are based on authors experience of building SSIS packages in a real-world environment. The benefits of effective practices help to maintain high quality deliverables, improve performance, and maintainability. The effective practices usually evolve as the product or solution evolves with time.

Naming conventions
This section suggests the naming conventions that can be followed for various SSIS components. Control flow containers/tasks naming convention Control Flow Items
ActiveX Script Analysis Services Execute DDL Analysis Services Processing Bulk Insert Data Flow Data Mining Query Data Profiling Task Execute DTS 2000 Package Execute Package Execute Process Execute SQL File System For Loop Container For each Loop Container FTP Message Queue Script Send Mail Sequence Container Transfer Database Transfer Error Messages Transfer Jobs Transfer Logins Transfer Master Stored Procedures Transfer SQL Server Objects Web Service WMI Data Reader WMI Event Watcher XML

Prefix
TSK_AXS_ TSK_ASE_ TSK_ASP_ TSK_BLK_ TSK_DF_ TSK_DMQ_ TSK_DP_ TSK_EDTS_ TSK_EPKG_ TSK_EPRC_ TSK_ESQL_ TSK_FSYS_ CON_FL_ CON_FEL_ TSK_FTP_ TSK_MSMQ_ TSK_SCR_ TSK_MAIL_ CON_SEQ_ TSK_TDB_ TSK_TEM_ TSK_TJOB_ TSK_TLOG_ TSK_TMSP_ TSK_TSSO_ TSK_WS_ TSK_WMIDR_ TSK_WMIEW_ TSK_XML_

Data flow transformations naming convention Data Flow Transformations


Aggregate Audit Cache Transform Character Map Conditional Split Copy Column Data Conversion Data Mining Query Derived Column Export Column Fuzzy Grouping Fuzzy Lookup Import Column Lookup Merge Merge Join Multicast OLE DB Command Percentage Sampling Pivot Row Count Row Sampling Script Component Slowly Changing Dimension Sort Term Extraction Term Lookup Tx TopQueries Union All Unpivot

Prefix
DFT_AGG_ DFT_AUD_ DFT_CT_ DFT_CHM_ DFT_CSPL_ DFT_CPYC_ DFT_DCNV_ DFT_DMQ_ DFT_DERC_ DFT_EXPC_ DFT_FZGRP_ DFT_FZLKP_ DFT_IMPC_ DFT_LKP_ DFT_MRG_ DFT_MRGJ_ DFT_MLT_ DFT_CMD_<DB>_ DFT_PSMP_ DFT_PVT_ DFT_RCNT_ DFT_RSMP_ DFT_SCR_ DFT_SCD_ DFT_SRT_ DFT_TEX_ DFT_TEL_ DFT_TTQ_ DFT_UALL_ DFT_UPVT_

Data flow sources naming convention Data Flow Sources


ADO NET Source Data Reader Source Excel Source Flat File Source OLE DB Source Performance Counters Source Raw File Source XML Source

Prefix
SRC_ADONET_ SRC_DR_ SRC_XLS_ SRC_FF_ SRC_OLEDB_<DB>_ SRC_PCNT_ SRC_RF_ SRC_XML_

Data flow destinations naming convention Data Flow Destinations


ADO NET Destination Data Mining Model Training Data Reader Destination Dimension Processing Excel Destination Flat File Destination OLE DB Destination Partition Processing Raw File Destination Record set Destination SQL Server Compact Edition Destination SQL Server Destination SQL Server Mobile Destination

Prefix
DST_ADONET_ DST_DMMT_ DST_DR_ DST_DP_ DST_XLS_ DST_FF_ DEST_OLEDB_<DB>_ DEST_PP_ DEST_RF_ DEST_RS_ DST_SSCE_ DEST_SS_ DEST_SSM_

Connection manager naming convention Connection Managers


ADO ADO.NET Cache Excel File Flat File FTP HTTP MSMQ MSOLAP90 Multi File Multi Flat File ODBC OLEDB Connection SMTP SMOServer SQLMobile WMI tasks

Prefix
CM_ADO_ CM_ADONET_ CM_CH_ CM_XLS_ CM_<FileType>_ CM_FF_ CM_FTP_ CM_HTTP_ CM_MSMQ CM_AS90_ CM_MFILE_ CM_MFF_ CM_ODBC_ CM_OLEDB_<DB> CM_SMTP_ CM_SMO_ CM_SQLM_ CM_WMI_

Variable naming convention Use CAPITAL CASE for variables that are used to hold values of configurable properties to help distinguish from other variables. Use CamelCase for variables that are not used to hold values of configurable properties. Use descriptive variable names. Avoid single character variable names such as i or n Avoid Hungarian notations.

Effective practices
The following effective practices describe the common performance-tuning techniques that you can apply to your SSIS solutions. These recommendations are based on lessons learnt during the production performance tuning effort. Effective practices improve overall performance, quality, and maintainability. However, there are other factors that impact the performance; one of them is infrastructure and network. Use annotations wherever possible to document the package. Use SELECT <column list> statement to fetch data from table/view instead of using the table dropdown available in OLE DB Source, Lookup Transformation, and Fuzzy Lookup Transformation. Perform data type conversion in the SELECT statement. This avoids an extra step to do so in the Data Conversion/Derived Column transformation. Filter data in the source adapter wherever possible rather than filter using the Conditional Split transformation after retrieving the data. Set AccessMode = SQL Command from variable and EvaluateAsExpression = TRUE while using dynamic SQL statement in an OLE DB source component and build the SQL statement in a variable. Define your Variable scope properly; do not scope all the variables to a package container if they are not using it. In Conditional Split transformation, the condition most likely to be evaluated should precede all other conditions. This improves performance as fewer conditions would be required to be evaluated by the service. Avoid using Sort and Aggregate transformations by performing these operations at the source wherever possible. These transformations must read and process all input records before creating any output records. Set IsSorted = True (using the Advance Editor) on the source adapter if the data from the source is sorted. Setting this value does not perform the sort operation; it only indicates that the data is sorted. Specify the SortKeyPosition of the output columns. Remove columns not used in the downstream pipeline. This reduces buffer size being used and also reduces OnWarning events at execution time. Use Merge Join transformation instead of Lookup transformation, if possible, especially for large datasets. Use Merge transformation instead of Union All transformation for combining two sorted datasets. Use variables to store expressions and configurations. This makes them reusable in different objects. Use package configurations to store environment related information so it is easier to move the packages across environments. Store the entire connection string when storing information about an OLE DB Connection Manager in a configuration instead of storing the individual properties such as Initial Catalog, Username, Password, etc. Organize the package layout by using the Format menu options such as Autosize, Auto Layout, etc. This makes is easier to understand the flow activities within the package. Organize the package structure into logical units using the Sequence containers. This makes it easier to identify what the package does and also helps to control transactions if they are being implemented. Always log to the text file, even if logging elsewhere. Logging to the text file has least dependency on external factors. Use dynamic file names when logging to a file so a new log file is created for each execution. Avoid database access in Script transformations ProcessInputRow method since ProcessInputRow method is evaluated for each row of the input pipeline causing performance issue in case the input volume is large. Avoid reading and writing into the same table in the same Data Flow task or parallel Data Flow tasks as this may lead to deadlock situations if the data volume is huge. Avoid multiple inserts into the same table in case of same Data Flow task or parallel Data Flow tasks as this may lead to deadlock situations if the data volume is huge.

For multiple insert to a same table use Union All transformation to combine all inputs followed by a single destination. Avoid using Cross-Database queries in the packages as it may cause the collation problem while execution. Also, if one of the databases is moved to a different server, the packages will error out. This can be critical if the package is already in production and any changes to the package would need to be thoroughly tested. Verify and set appropriate data types for columns in Flat File Connection Manager. Always implement custom tasks/components instead of script task or components because it is more reusable than script task or components. Set ProtectionLevel=DontSaveSenstive for a package, if the package is to be shared with other team members or the package has to be deployed to any other development, QA, UAT, production systems. Set DelayValidation=True for tasks and connection managers to quickly load the packages in the BIDS. Setting this value leads to skipping of the validation of the tasks and connection manager when loading. The validation then happens at run-time.

Conclusion
Today, SSIS has been widely used in different industries like telecommunications, retail, commercial real estate, finance, and supply chain. Hence, it becomes imperative to follow the proper naming conventions and use the effective practices to produce the best quality deliverables. This paper intends to help SSIS developer to suggest and use good naming convention. The effective practices, explained in this whitepaper, are based on the lessons learned while working on the SSIS project. By understanding the importance of naming convention and effective practices, you can make more informed design decisions and develop better solutions. Sunil Aravindakshan is a Consultant with Deloitte Consulting LLP. He has work experience in data warehousing with ETL using DataStage and SSIS and experience in development, maintenance and supporting the data warehouse projects which includes design, development and testing. Sunil can be reached at sarvindakshan@deloitte.com.

This publication contains general information only and is based on the experiences and research of Deloitte practitioners. Deloitte is not, by means of this publication, rendering business, financial, investment, or other professional advice or services. This publication is not a substitute for such professional advice or services, nor should it be used as a basis for any decision or action that may affect your business. Before making any decision or taking any action that may affect your business, you should consult a qualified professional advisor. Deloitte, its affiliates, and related entities shall not be responsible for any loss sustained by any person who relies on this publication. As used in this document, Deloitte means Deloitte Consulting LLP, a subsidiary of Deloitte LLP. Please see www.deloitte.com/us/about for a detailed description of the legal structure of Deloitte LLP and its subsidiaries. Certain services may not be available to attest clients under the rules and regulations of public accounting. Copyright 2011 Deloitte Development LLC, All rights reserved.

Das könnte Ihnen auch gefallen