Sie sind auf Seite 1von 37

Worlds Largest

Databases
Howard Fosdick
(630)-279-4286
(C) 2004 FCI
Who Am I?

Hands-on DBA (and SA) for

Oracle, DB2, SQL Server


Unix, Linux, Windows

Founder IDUG, MWDUG, CAMP


Author, Speaker

Independent Contractor
(630)-279-4286
hfosdick@compuserve.com
Outline

1. Whats a Big Database


2. DSS
3. OLTP
4. Observations
Statistics Sources

1. Winter Corp.

-- Database Top Ten


-- Yearly survey
-- Vendor neutral
-- Free at: www.wintercorp.com

2. Survey.com

-- High-End BI/DW Competitive Analysis


-- Survey of 150 companies w/ big warehouses
-- Free at: www.survey.com

Thank You to both sources


Classifying Large Databases

DSS OLTP

Decision Support Systems (DSS) Online Transaction Processing (OLTP)


Online Analytical Processing (OLAP)
Data Warehouses (DW)
Multi-dimensional Databases (MDD)

+ Query oriented, mainly Read-only + Update with short transactions


(transaction = small CPU & data resources)

Commercial IT vs. Scientific/Research databases


Whats a Large Database ?

Database Size

- User data
- User data plus metadata & indexes
- DASD farm VLDB = Very Large Database

Users

- Concurrent users
- Total user population

Load

- Concurrent queries
- Queries / day or hour
(simple vs complex queries)

Good definitions and measurements are key to success


II. Worlds Biggest DSS Systems
Data Warehouses VS. Data Marts

DW DM

Application neutral Application specific


Service multiple organizational needs Organizationally focused

Largest systems are usually data warehouses


Whats Driving the Growth of
Large Data Warehouses ?

!!!!! Super Big Groceries !!!!!


Web Sites --
Preferred Customer Card #283736
- Clickstream data
Hello, Im Scot94
Retail --
03/04/04 02:38 3284 03 2918 33
- Transaction Level Detail (TLD) Store 493 Loc 229

PRETTY-LADY HAIRCLR 1 5.99


AARP MAGAZINE 1 4.95
DIAPERS 2 10.00
BEER SIX-PACK 1 3.45

Tax 2.40 BAL 36.79


Understanding customer Cash 40.00
Change 3.21
behavior means $$$ !
Save this Receipt
Get $2.00 off on Prozac
When You Buy Super-Baby Food !
Whats Driving the Growth of
Large Data Warehouses ?

Necessary Preconditions --

Cheap Hardware

Higher reliability / availability


(based on dynamic hardware swapping)

Better Software

Lax privacy laws in USA

EU curtails cross-usage of data


EU has stronger privacy laws
Worlds Largest DSS Systems

Way bigger than just 3 years ago 2003 Winter Corp.

All Unix mainframes


All use SANs (Storage Area Networks) (aka ESS)
No IBM Mainframes
No Windows or Wintel
No SQL Server
No Linux or Open Source databases Database Size =
NCR/Teradata niche market at 2.7% (Gartner 05/28/03) disk storage for
user tables,
Goodbye Informix! indices, aggregates
Large DSS Systems
Unix mainframe Storage Area Network

Sun E12/15K
Query
Users HP Superdome EMC

IBM Regatta Hitachi

HP

LSI

Unix mainframes SANs

+ Dynamically add/drop CPUs, RAM + Flash (snap) backup


(Sun calls it partitioning) (OS-level backup)
+ High reliability + Large Cache
(as good as clusters or Mainframes) + Intelligent data
+ Capacity on Demand placement/movement
Example Evolution
Scaling a Unix Mainframe

35 concurrent
users

25 concurrent 64 CPUs
users @ 64 Gig RAM

12 concurrent Other upgrades:


32 CPUs
users
@ 64 Gig RAM
Oracle 8i -> 9i
Sun E10K -> E12K

8 CPUs
@ 16 Gig RAM
Worlds Largest DSS Systems -- Windows

2003 Winter Corp.

Way smaller than Unix systems


Way bigger than just 3 years ago
Oracle vs SQL Server (like market share battle for Windows DBMSs)
Also use SANs (Storage Area Networks)
No IBM DB2 UDB
No Teradata
Worlds Largest DSS Systems
-- By Peak Workload
2003 Winter Corp.

2003 Winter Corp.


Where did IBM Mainframes Go ?

1994 2004

Big Big
Iron Silicon
Poof!

-- Goodbye + Hello Linux !


-- Largest databases + Good for --
-- Smaller mainframes (VM, VSE) + Consolidation platform
+ Legacy systems
-- Reliability advantage eroded + Virtualization
-- High cost per CPU (multi-OS platform)
Oracle Rising

Joined the Top Ten list 3 to 5 years ago


8i added essential DSS technologies ...

+ Partitions
+ New ROW ID (for bigger databases)
+ Thorough Parallelism (DML, DDL, utilities)
+ Index improvements
(bit mapped IXs, function-based, desc, others)
+ Resource Manager (proactive)
+ Materialized Views
+ Large memory mgmt
+ Optimizer is Partition-aware
+ Online DDL operations and Utilities
Example Oracle Warehouses
Amazon Best Buy Colgate Telecom Italia
Mobile

System HP Superdome Sun 15K IBM p690 HP AlphaServer


Regatta
Architecture SMP SMP SMP Cluster

Storage EMC EMC IBM EMC


Processors 64 24 24 2 node cluster

Oracle Version 9i 8i 9i 8i
DB Size 13 T 6.3 T 3.8 T 16 T
Number of 600 4025 27,000 1,200
Tables
Clickstream Sales Varied detail Call detail records
Detail Data data Transaction data data

User Population 800 16,000 6,200 400

Concurrent 55-60 600-700 600-700 55


Users
2 2 n/a 3
DBAs
2003
Peak Workload 4300 queries / 150,000 queries / 14,200 steps / 700 M records
Winter
day 4 hour period day loaded / day
Corp.
Why Not Oracle Clustering ?

+ Great for non-disruptive scaling of existing systems

. . . But the biggest systems tend not to use it

-- Unix mainframe no longer requires clustering


for reliability, availability or easy scalability

-- Clustering means complexity in minimizing the

-- Locking issues

9i improved this via Cache Fusion


but SMP Unix mainframe will still be favored
Wheres SQL Server 2000 ?

Big in OLTP but lacks essential DSS technologies ...

-- Parallelism restricted to SELECTs

-- Needs it for other DML, DDL, utilities

-- Partitions

-- Wintel restriction

Yukon ?
-- Many new features. . . ready for Top Ten DSS ?
(Features = partitioning, database mirroring, mirrored backups, online Indexing & Restore, fast recovery,
ANSI 1999 T-SQL, CLR support, native XML, XML Query, better .NET support,
Reporting Services, Service Broker (async messaging), extensible data types)
Wheres Open Source ?

Linux

+ 2.6 kernel now out


+ More CPUs (to 16)
+ More RAM (> 4+ Gig)
+ Better threading, file system support

MySQL and PostgresQL

-- Top out at 500,000 page views per day (EWeek 2003)


(or 15 per second)
+ Improving rapidly

Prediction open source will support big databases


but not Top Ten list sites
Risks of Large DWs

40% of IT projects fail due to Management (time & budget issues)

Large warehouses are unforgiving -- Survey.com

Design issues critical


Database Design
Query design (and EXPLAINs)
ETL design and scheduling

Pre-program wherever possible


(control users and the resources they use)

Monitoring and alerts

Scale gradually (staggered loads on a schedule)

Benchmarks (after each Scaling Point)


Risks of Large DWs

Partitioning data properly is critical

For better physical management (utilities)


Optimizers use this info
Parallelism via multiple partitions

How to partition

Depends on data usage


Examples: geographical, hash, unique id, ranges
III. Worlds Biggest OLTP Systems
Worlds Largest OLTP Systems

2003 Winter Corp.

Wintel mainframes arrive !


SQL Server arrives
Use SANs
CA can do the job (but has tiny overall database market share)
Oracle has big systems -- but not in the top ten
Worlds Largest OLTP Systems
-- Unix -- Windows

2003
Winter Corp.

2003
Winter Corp.
Worlds Largest OLTP Systems
-- By Number of Rows

2003
Winter Corp.

2003
Winter Corp.
OLTP Observations

Wintel mainframes w/ SQL Server displace MVS/CICS

SQL Server dominates Wintel OLTP

Great for pre-programmed, resource-limited txns

Oracle dominates Unix OLTP


IV. Observations
Architectures

Shared-disk
Clusters Shared-nothing
(Massively Parallel Processing or MPP)

Large SMP
mainframe

The architectural debate means


far less than it used to !
Vendor Architectures
Product: Architecture: Implementation:

DB2 UDB for z/OS Shared-disk clustering DB2 Data Sharing on Sysplex

DB2 UDB for LUW Shared nothing DB2 UDB ESE partitioning
feature

Oracle Shared-disk clustering Real Application Clusters (RAC)


or SMP -- previously known as
Oracle Parallel Server
(OPS)
SQL Server 2000 Shared nothing
or SMP Customer-developed
partitioning based on SQL
Server features
Teradata Shared nothing
Teradata on NCR MPP
DBMS Licensing Costs

+ Low-cost SQL Server supports the


biggest OLTP systems

-- Pressure on Teradata to keep its niche Teradata


$$$$$
+ Open Source DBMSs have a role
but its not Top Ten databases Oracle

DB2 UDB Biggest DSS


Systems

Biggest OLTP
SQL Server 2000 Systems

Open Source
(MySQL, PostgreSQL) Database pricing varies by the options
$ selected and by the deal an IT organization
cuts with the vendor.
Your mileage may vary! TCO ?
DW Labor Costs

2002 Survey.com

Like TCO, Labor Costs may be an un-measurable

Figures applicable across sites ?


Every vendor claims lowest labor costs
Terabytes per DBA may be non-linear!
1 or 2 DBAs for a 24/7 site ?
Development staff will be larger than Maintenance staff
Your mileage will vary
Multi- Machine Mixed Systems

Sabre /
Travelocity

45 Linux w/ 17 Himalaya
Non-stop w/
MySQL servers Master database

(Fare look-up (Transactional updates)


EWeek, 2/23/04 and routing)
Multi- Machine Mixed Systems
Omaha
* 50,000 to 68,000 daily sessions
Steaks * 1 year in Production / 8 Million sessions

17 Linux w/ DB2
MySQL servers
ISeries
(Shopping cart)

(Transactional
updates)
EWeek 2003
Conclusions
Databases are growing exponentially

IT is closing in on Scientific/Research databases

Multiple machine mixed systems are becoming popular

(Monolithic central databases are no longer the only game in town)

Mixed use databases are becoming more common

Multiple applications
Read and update

Open Source supports large systems -- but not Top Ten

VLDBs are instructive but unique in some ways


?
? ? ?
?
questions...

?
? ? ?