Sie sind auf Seite 1von 37

Worlds Largest

Howard Fosdick
(C) 2004 FCI
Who Am I?

Hands-on DBA (and SA) for

Oracle, DB2, SQL Server

Unix, Linux, Windows


Author, Speaker

Independent Contractor

1. Whats a Big Database

2. DSS
4. Observations
Statistics Sources

1. Winter Corp.

-- Database Top Ten

-- Yearly survey
-- Vendor neutral
-- Free at:


-- High-End BI/DW Competitive Analysis

-- Survey of 150 companies w/ big warehouses
-- Free at:

Thank You to both sources

Classifying Large Databases


Decision Support Systems (DSS) Online Transaction Processing (OLTP)

Online Analytical Processing (OLAP)
Data Warehouses (DW)
Multi-dimensional Databases (MDD)

+ Query oriented, mainly Read-only + Update with short transactions

(transaction = small CPU & data resources)

Commercial IT vs. Scientific/Research databases

Whats a Large Database ?

Database Size

- User data
- User data plus metadata & indexes
- DASD farm VLDB = Very Large Database


- Concurrent users
- Total user population


- Concurrent queries
- Queries / day or hour
(simple vs complex queries)

Good definitions and measurements are key to success

II. Worlds Biggest DSS Systems
Data Warehouses VS. Data Marts


Application neutral Application specific

Service multiple organizational needs Organizationally focused

Largest systems are usually data warehouses

Whats Driving the Growth of
Large Data Warehouses ?

!!!!! Super Big Groceries !!!!!

Web Sites --
Preferred Customer Card #283736
- Clickstream data
Hello, Im Scot94
Retail --
03/04/04 02:38 3284 03 2918 33
- Transaction Level Detail (TLD) Store 493 Loc 229


DIAPERS 2 10.00

Tax 2.40 BAL 36.79

Understanding customer Cash 40.00
Change 3.21
behavior means $$$ !
Save this Receipt
Get $2.00 off on Prozac
When You Buy Super-Baby Food !
Whats Driving the Growth of
Large Data Warehouses ?

Necessary Preconditions --

Cheap Hardware

Higher reliability / availability

(based on dynamic hardware swapping)

Better Software

Lax privacy laws in USA

EU curtails cross-usage of data

EU has stronger privacy laws
Worlds Largest DSS Systems

Way bigger than just 3 years ago 2003 Winter Corp.

All Unix mainframes

All use SANs (Storage Area Networks) (aka ESS)
No IBM Mainframes
No Windows or Wintel
No SQL Server
No Linux or Open Source databases Database Size =
NCR/Teradata niche market at 2.7% (Gartner 05/28/03) disk storage for
user tables,
Goodbye Informix! indices, aggregates
Large DSS Systems
Unix mainframe Storage Area Network

Sun E12/15K
Users HP Superdome EMC

IBM Regatta Hitachi



Unix mainframes SANs

+ Dynamically add/drop CPUs, RAM + Flash (snap) backup

(Sun calls it partitioning) (OS-level backup)
+ High reliability + Large Cache
(as good as clusters or Mainframes) + Intelligent data
+ Capacity on Demand placement/movement
Example Evolution
Scaling a Unix Mainframe

35 concurrent

25 concurrent 64 CPUs
users @ 64 Gig RAM

12 concurrent Other upgrades:

32 CPUs
@ 64 Gig RAM
Oracle 8i -> 9i
Sun E10K -> E12K

8 CPUs
@ 16 Gig RAM
Worlds Largest DSS Systems -- Windows

2003 Winter Corp.

Way smaller than Unix systems

Way bigger than just 3 years ago
Oracle vs SQL Server (like market share battle for Windows DBMSs)
Also use SANs (Storage Area Networks)
No Teradata
Worlds Largest DSS Systems
-- By Peak Workload
2003 Winter Corp.

2003 Winter Corp.

Where did IBM Mainframes Go ?

1994 2004

Big Big
Iron Silicon

-- Goodbye + Hello Linux !

-- Largest databases + Good for --
-- Smaller mainframes (VM, VSE) + Consolidation platform
+ Legacy systems
-- Reliability advantage eroded + Virtualization
-- High cost per CPU (multi-OS platform)
Oracle Rising

Joined the Top Ten list 3 to 5 years ago

8i added essential DSS technologies ...

+ Partitions
+ New ROW ID (for bigger databases)
+ Thorough Parallelism (DML, DDL, utilities)
+ Index improvements
(bit mapped IXs, function-based, desc, others)
+ Resource Manager (proactive)
+ Materialized Views
+ Large memory mgmt
+ Optimizer is Partition-aware
+ Online DDL operations and Utilities
Example Oracle Warehouses
Amazon Best Buy Colgate Telecom Italia

System HP Superdome Sun 15K IBM p690 HP AlphaServer

Architecture SMP SMP SMP Cluster


Processors 64 24 24 2 node cluster

Oracle Version 9i 8i 9i 8i
DB Size 13 T 6.3 T 3.8 T 16 T
Number of 600 4025 27,000 1,200
Clickstream Sales Varied detail Call detail records
Detail Data data Transaction data data

User Population 800 16,000 6,200 400

Concurrent 55-60 600-700 600-700 55

2 2 n/a 3
Peak Workload 4300 queries / 150,000 queries / 14,200 steps / 700 M records
day 4 hour period day loaded / day
Why Not Oracle Clustering ?

+ Great for non-disruptive scaling of existing systems

. . . But the biggest systems tend not to use it

-- Unix mainframe no longer requires clustering

for reliability, availability or easy scalability

-- Clustering means complexity in minimizing the

-- Locking issues

9i improved this via Cache Fusion

but SMP Unix mainframe will still be favored
Wheres SQL Server 2000 ?

Big in OLTP but lacks essential DSS technologies ...

-- Parallelism restricted to SELECTs

-- Needs it for other DML, DDL, utilities

-- Partitions

-- Wintel restriction

Yukon ?
-- Many new features. . . ready for Top Ten DSS ?
(Features = partitioning, database mirroring, mirrored backups, online Indexing & Restore, fast recovery,
ANSI 1999 T-SQL, CLR support, native XML, XML Query, better .NET support,
Reporting Services, Service Broker (async messaging), extensible data types)
Wheres Open Source ?


+ 2.6 kernel now out

+ More CPUs (to 16)
+ More RAM (> 4+ Gig)
+ Better threading, file system support

MySQL and PostgresQL

-- Top out at 500,000 page views per day (EWeek 2003)

(or 15 per second)
+ Improving rapidly

Prediction open source will support big databases

but not Top Ten list sites
Risks of Large DWs

40% of IT projects fail due to Management (time & budget issues)

Large warehouses are unforgiving --

Design issues critical

Database Design
Query design (and EXPLAINs)
ETL design and scheduling

Pre-program wherever possible

(control users and the resources they use)

Monitoring and alerts

Scale gradually (staggered loads on a schedule)

Benchmarks (after each Scaling Point)

Risks of Large DWs

Partitioning data properly is critical

For better physical management (utilities)

Optimizers use this info
Parallelism via multiple partitions

How to partition

Depends on data usage

Examples: geographical, hash, unique id, ranges
III. Worlds Biggest OLTP Systems
Worlds Largest OLTP Systems

2003 Winter Corp.

Wintel mainframes arrive !

SQL Server arrives
Use SANs
CA can do the job (but has tiny overall database market share)
Oracle has big systems -- but not in the top ten
Worlds Largest OLTP Systems
-- Unix -- Windows

Winter Corp.

Winter Corp.
Worlds Largest OLTP Systems
-- By Number of Rows

Winter Corp.

Winter Corp.
OLTP Observations

Wintel mainframes w/ SQL Server displace MVS/CICS

SQL Server dominates Wintel OLTP

Great for pre-programmed, resource-limited txns

Oracle dominates Unix OLTP

IV. Observations

Clusters Shared-nothing
(Massively Parallel Processing or MPP)

Large SMP

The architectural debate means

far less than it used to !
Vendor Architectures
Product: Architecture: Implementation:

DB2 UDB for z/OS Shared-disk clustering DB2 Data Sharing on Sysplex

DB2 UDB for LUW Shared nothing DB2 UDB ESE partitioning

Oracle Shared-disk clustering Real Application Clusters (RAC)

or SMP -- previously known as
Oracle Parallel Server
SQL Server 2000 Shared nothing
or SMP Customer-developed
partitioning based on SQL
Server features
Teradata Shared nothing
Teradata on NCR MPP
DBMS Licensing Costs

+ Low-cost SQL Server supports the

biggest OLTP systems

-- Pressure on Teradata to keep its niche Teradata

+ Open Source DBMSs have a role
but its not Top Ten databases Oracle

DB2 UDB Biggest DSS


Biggest OLTP
SQL Server 2000 Systems

Open Source
(MySQL, PostgreSQL) Database pricing varies by the options
$ selected and by the deal an IT organization
cuts with the vendor.
Your mileage may vary! TCO ?
DW Labor Costs


Like TCO, Labor Costs may be an un-measurable

Figures applicable across sites ?

Every vendor claims lowest labor costs
Terabytes per DBA may be non-linear!
1 or 2 DBAs for a 24/7 site ?
Development staff will be larger than Maintenance staff
Your mileage will vary
Multi- Machine Mixed Systems

Sabre /

45 Linux w/ 17 Himalaya
Non-stop w/
MySQL servers Master database

(Fare look-up (Transactional updates)

EWeek, 2/23/04 and routing)
Multi- Machine Mixed Systems
* 50,000 to 68,000 daily sessions
Steaks * 1 year in Production / 8 Million sessions

17 Linux w/ DB2
MySQL servers
(Shopping cart)

EWeek 2003
Databases are growing exponentially

IT is closing in on Scientific/Research databases

Multiple machine mixed systems are becoming popular

(Monolithic central databases are no longer the only game in town)

Mixed use databases are becoming more common

Multiple applications
Read and update

Open Source supports large systems -- but not Top Ten

VLDBs are instructive but unique in some ways

? ? ?

? ? ?