SvalTech
The Basics of Database Archiving
DAMA Kansas City Chapter
9 Macrh, 2010
Jack E. Olson jack.olson@SvalTech.com www.svaltech.com
Database Archiving: How to Keep Lots of Data for a Long Time Jack E. Olson, Morgan Kaufmann, 2008
Copyright SvalTech, Inc., 2009
SvalTech
Topics
Database Archiving Definitions Database Archiving Application Profiles Elements of a Successful Implementation Solution Comparisons Business Case Basics
Copyright SvalTech, Inc., 2009
SvalTech
Database Archiving Definitions
Copyright SvalTech, Inc., 2009
SvalTech
Definition
The process of removing selected data items from operational databases that are not expected to be referenced again and storing them in an archive database where they can be retrieved if needed.
Physical Documents application forms mortgage papers prescriptions
File Archiving structured files source code reports
Document Archiving Multi-media files word pictures pdf sound excel telemetry XML
Email Archiving outlook lotus notes
Database Archiving DB2 IMS ORACLE SAP PEOPLESOFT
Copyright SvalTech, Inc., 2009
SvalTech
Business Records: the Archive Unit
You dont archive databases; you archive data from databases. A Business Record is the data captured and maintained for a single business event or to describe a single real world object. Databases are collections of Business Records. Database Archiving is Records Retention.
customer employee stock trade purchase order deposit loan payment
Copyright SvalTech, Inc., 2009
SvalTech
Data Retention
The requirement to keep data for a business object for a specified period of time. The object cannot be destroyed until after the time for all such requirements applicable to it has past.
Business Requirements
Regulatory Requirements
The Data Retention requirement is the longest of all requirement lines.
Copyright SvalTech, Inc., 2009
SvalTech
Data Retention
Retention requirements vary by business object type
Retention requirements from regulations are exceeding business requirements
Retention requirements will vary by country Retention requirements imply the obligation to maintain the authenticity of the data throughout the retention period
Retention requirements imply the requirement to faithfully render the data on demand in a common business form understandable to the requestor
The most important business objects tend to have the longest retention periods The data with the longest retention periods tends to be accumulate the largest number of instances Retention requirements often exceed 10 years. more years for some applications Requirements exist for 25, 50, 70 and
Copyright SvalTech, Inc., 2009
SvalTech
Data Time Lines
for a single instance of a business record
create event
operational phase reference phase inactive phase
discard event
operational phase
can be updated, can be deleted, may participate in processes that create or update other data
reference phase
used for business reporting, extracted into business intelligence or analytic databases, anticipated queries
inactive phase
no expectation of being used again, no known business value, being retained solely for the purpose of satisfying retention requirements. Must be available on request in the rare event a need arises.
Copyright SvalTech, Inc., 2009
SvalTech
Data Time Lines
Some objects exit the operational phase almost immediately (financial records) Some objects never exit the operational phase (customer name and address) Most transaction data has an operational phase of less than 10% of the retention requirement and a reference phase of less than 20% of the retention requirement Inactive data generally does not require access to application programs: only access to ad hoc search and extract tools
Copyright SvalTech, Inc., 2009
SvalTech
Database Archiving Application Profiles
Copyright SvalTech, Inc., 2009
10
SvalTech
Overloaded Operational Database
Transaction data Lots of data
Hundreds of millions of rows High daily transaction rate
24/7 operational availability requirement Long retention period (15 years or more) Short useful active life (less than 2 years) Low access requirements during the inactive period
Very low access frequency Response time not critical Access requirements are simple, easily satisfied with ad hoc tools
Copyright SvalTech, Inc., 2009
11
SvalTech
Retired Application
Merger of companies results in an operational application being duplicated Data Structures are not compatible
One keeps data elements not in other One encodes data elements differently One designed for different OS/DBMS than other
Decision is made to use one system and abandon the other one Meets all characteristics of an operational application
Copyright SvalTech, Inc., 2009
12
SvalTech
Application Renovation Project
Application is undergoing major change
Replaced with packaged application Legacy modernization Legacy termination Rewritten to be web-centric Need to satisfy new requirements
Old data structures are out of date
Legacy DBMS Legacy file system
Data meets all other requirements for archiving operational application
Copyright SvalTech, Inc., 2009
13
SvalTech
Elements of a Successful Implementation
Copyright SvalTech, Inc., 2009
14
SvalTech
Archive Staff
Database Archive Specialist
Received education on database archive design and implementation Knows tools available Experienced Full time job
Database Archive Administrator
Received education on database archiving administration Full time job
Supporting Roles
Storage Administrators Database Administrators Data Stewards Security Administrators Compliance staff IT management Business Unit Management Legal Records Management
Copyright SvalTech, Inc., 2009
15
SvalTech
Architecture of Database Archiving
Operational System
Application program OP DB
Archive Extractor
Archive extractor
Archive Administrator Archive Designer Archive Data Manager Archive Access Manager
Archive Server
archive catalog
archive storage
Copyright SvalTech, Inc., 2009
16
SvalTech
Archive Designer Component
Metadata
Capture current metadata Validate it Enhance it Design archive storage format
Data
Define business records to be archived Define source of data Define data structures within operational system Define reference data needed to include with it Define archive format of data
Policies
Define extract policy (when a record becomes inactive) Define operational disposal policy (when to remove from operational database) Define storage policy (how to protect data in archive) Define discard policy (when to remove from archive)
Copyright SvalTech, Inc., 2009
17
SvalTech
Archive Extractor Components
Extractor process
Verify consistency with design metadata Extract data as defined in designer Mark or delete from operational database as defined in designer Pass data to archive data manager Keep audit records on everything done Do not impact operational performance Support interruptions with transaction level recovery Support restart Finish scans within acceptable time periods
Scheduling
Establish periodic executions Find non-disruptive periods Be consistent
Copyright SvalTech, Inc., 2009
18
SvalTech
Archive Extractors
Physical vs. Application Extractors
Operational System
Application program Archive extractor OP DB
Archive Extractor
Physical Extractor
Gets/deletes data directly from the database tables, rows, columns
Application Extractor
Gets/deletes data from an application API virtual tables, rows, columns
application program
Copyright SvalTech, Inc., 2009
19
SvalTech
Archive Data Manager Component
Put data away
Receive data from extractors Format into archive segment files Determine metadata version affinity Format and store metadata files if new Build or update segment indexes both internal and external
Execute Storage policies
Encryption/ signatures Backup copies created and stored Geographic dispersion of backups Register archive files with archive catalog Enter audit trail information
Fetch metadata on request
Return to accessing programs
Fetch data on request
Scan archive segments Search through indexes
Execute Archive Discard Process
Periodic scheduling Delete qualifying business records Update archive catalog
Copyright SvalTech, Inc., 2009
20
SvalTech
Archive Access Component
Query Capability
Determine applicability based on archive segment versions of metadata SQL based is best, if possible Employ external indexes to determine which archive segments to look into Employ internal indexes to avoid reading all of an archive segment
Support standard access tools
Report generation (such as Crystal Reports) Generic query tools JDBC interface
Support metadata version browsing Support generation of load files based on query results Support generation of load files based on original data source based on query results
Copyright SvalTech, Inc., 2009
21
SvalTech
Archive Administration Component
Manage Archive Catalog
Application archive designs Audit trails Results logs
Manage Archive Storage Systems
Ensure periodic readability checks Maintain access audit trails
Manage Archive Access
Authorizations for users Authorizations for specific events Unloads Ensure audit records are created for all access
Manage e-Discovery requests Ensure Extract and Discard processes are run when they are supposed to Manage Metadata Change Process
Copyright SvalTech, Inc., 2009
22
SvalTech
Solution Comparisons
Copyright SvalTech, Inc., 2009
23
SvalTech
Home-Grown vs. Vendor
Home-Grown Solutions:
Use Parallel DB Use Database Partitions Put in UNLOAD files Save Image Copies of DB
Vendor Solutions:
More Complete Solutions Support Long Term Administration Put data in XML files Put data in reformatted files Exploit strengths of storage subsystems
Copyright SvalTech, Inc., 2009
24
SvalTech
Home-Grown Solutions
Solve Operational Problems, BUT:
Create downstream problems Fail to achieve cost savings Render archive data inaccessible Either completely or, Expensive in time and cost to query Lose data authenticity
Common Omissions
No handling or improvement of metadata No change process for structure changes No long term storage management Fail to achieve application/system independence No administration platform
Copyright SvalTech, Inc., 2009
25
SvalTech
Vendor Solutions
Not a Lot of Vendors
Only 6 I know of 3 large companies Through acquisition Gartner pre-recession characterization Is a new technology $100M in 2008 40% per year growth rate Early adopter stage
Solutions not complete
Need growth in function and maturity Common weak spots Design modeling Extractor technology Not pervasive across data sources Storage structure Storage management 26
Copyright SvalTech, Inc., 2009
SvalTech
Business Case Basics
Copyright SvalTech, Inc., 2009
27
SvalTech
Drivers
Longer Data Retention requirements Expanded Business Mergers and Acquisitions overloaded operational databases Operational problems
Cost of Keeping Old Systems Difficulty in Making Application Changes
Data Governance e-Records Retention e-Discovery Readiness concerns
Copyright SvalTech, Inc., 2009
28
SvalTech
Reason for Archiving
All data in operational db most expensive system most expensive storage most expensive software
In a typical op db 60-80% of data is inactive Size Today This percentage is growing
Inactive data in archive db least expensive system least expensive storage least expensive software
Operational
operational
archive
Copyright SvalTech, Inc., 2009
29
SvalTech
Cost Saving Elements
Look for and compute difference in storage costs front-line vs archive storage byte counts differences between operational and archive Look for and compute difference in system costs operational vs archive systems are operational system upgrades avoided are software upgrades avoided can systems be eliminated for application can software be eliminated for application Look for savings on people costs can people be eliminated or redirected for retired applications Potential savings on changes/ application renovations simplification of design elimination of data conversions
Copyright SvalTech, Inc., 2009
30
SvalTech
Operational Efficiency Impacts
Will operational performance be enhanced with less data Will utility time periods be reduced (backup, reorganization) fewer occurrences needed less data to process each time Will recovery times be reduced and what is that worth interruption recoveries disaster recoveries Will implementation of data structure changes be improved avoided reduced amount of data to unload/modify/reload
Copyright SvalTech, Inc., 2009
31
SvalTech
Risk Factors
Will the saved data have better authenticity not changed in archive shielded from updates or damage traceable back to original form Will e-Discovery benefit from archiving can locate and process data outside of operational environment can easily create legal-hold archive units Will exposure of data reduced fewer authorized users against the archive complete audit trails of all access
Copyright SvalTech, Inc., 2009
32
SvalTech
Business Case Summary
Database Archiving solutions generally provide for lower cost software, can use lower cost storage more efficiently, and run on smaller machines. Each business case is different Many factors can be used in building business case Seen an application justified on storage costs alone Seen an application justified on disaster recovery time alone Seen an application justified on better data security alone Each organization will have many potential applications Having a database archiving practice can create synergies across many applications thus adding more value
Copyright SvalTech, Inc., 2009
33
SvalTech
Final Thoughts
Database Archiving is coming Database Archiving is good Reduces cost Improves operational efficiency Reduces Risk Need a complete solution to be effective Need professional staff Educated Fulltime
Copyright SvalTech, Inc., 2009
34
Viel mehr als nur Dokumente.
Entdecken, was Scribd alles zu bieten hat, inklusive Bücher und Hörbücher von großen Verlagen.
Jederzeit kündbar.