Sie sind auf Seite 1von 51

Module 10: Sizing

After completing this module, you will be able to: Determine column sizing requirements based on chosen data type. Determine physical table row size. Determine table sizing requirements via estimates and empirical evidence. Determine index sizing requirements via estimates and empirical evidence.

Row Format
Only for a V2R5 Table with a PPI. Only if Variable Length Columns are declared.

R o w L e n g t h

Row ID
S p a Uniq. r Value e

Row Hash

P VAR Fixed UncomVAR r CHAR Length pressed CHAR e Offsets Columns Columns Columns s e n c e Part. # B i t s

R e f. A r r a y P t r.

2 2 2

0-n Bytes 2
Data for Compressible Columns that are neither Compressed nor NULL.

1 - n Bytes

SELECT * returns columns in the order declared in the CREATE TABLE.

Presence Bits
Single Value Compression (prior to V2R5): Based on the column attributes, Teradata may need 0, 1, or 2 presence bits represent data storage per column. Multi-Value Value Compression (V2R5 Feature):

Based on the column attributes, Teradata may need 0 to 9 presence bits represent data storage per column.
NULL & COMPRESS One or More Whole Bytes # of Presence Bits Description 0 1 1 1 2 No NULLs; no values are compressed No NULLs; compresses the value specified Allows NULLs; nothing is compressed Allows NULLs; only NULLs are compressed Allows NULLs; compresses 'value' specified

Column Attribute NOT NULL NOT NULL COMPRESS 'value' Nullable Nullable COMPRESS Nullable COMPRESS 'value'

NULL and COMPRESS



The default for all columns is NULLable and NOT compressible. NULL column values are NOT automatically compressed.

COMPRESS compresses NULL column values to zero space.


COMPRESS <constant> compresses NULL and occurrences of <constant> to zero space.
How many bits are allocated?

CREATE TABLE ( emp_num dept_num country :

Employee INTEGER NOT NULL, INTEGER COMPRESS, CHAR(20) COMPRESS 'Australia',

0 1 2

COMPRESS all fixed length columns where at least 10% of the rows
participate.

COMPRESS creates smaller rows, therefore more rows/block and fewer


blocks.

You cannot use the COMPRESS option with:



Primary Index Referenced and referencing data columns (PK-FK) VARCHAR, VARBYTE, and VARGRAPHIC data types CHAR, BYTE, and GRAPHIC data types > 255 characters

NOT NULL Clause


In relational systems, NULL means missing or unknown.

The default for all columns is NULLable.


Specify NOT NULL on the CREATE TABLE as appropriate: All PRIMARY KEY columns All UNIQUE columns All columns with NN (No NULLS) constraints Any index may contain NULLs unless explicitly prohibited.

NOT NULL permits storage of zero and blank values.

Multi-Value Compression (V2R5 Feature)


What is Multi-Value Compression? A V2R5 enhancement that allows up to 255 distinct values (plus NULL) to be compressed per fixed width column a.k.a., Value List Compression. Enhances system cost and performance of high data volume applications. Provides significant performance improvement for general ad-hoc workloads and full-table scan applications.

Reduces storage cost by storing more logical data per unit of physical capacity. Performance is improved because there is less physical data to retrieve during scan-oriented queries.
Best candidate for compression - fixed width column with a small number of frequently occurring values in large table used frequently for FTSs. Compression in V2R4 provided up to 25% capacity savings and scan performance increases up to 35%. V2R5 Compression is expected to yield even greater savings.

Rules for Compression


Up to 255 values (plus NULL) can be compressed per column. The maximum size of a compress value remains 255 bytes. Only fixed-width columns can be compressed. Primary index columns cannot be compressed. ALTER TABLE does not support compression on existing columns. You can not add a compress value to a column if doing so would cause the table header row to exceed its maximum length (64 KB).
Number of Presence Bits needed for Multi-Value Compression: Compressed Values 1 2-3 4-7 8 - 15 16 - 31 32 - 63 64 - 127 128 - 255 # of Bits 1 2 3 4 5 6 7 8

Note: If column is "nullable", there will be 1 additional presence bit.

Implementing Multi-Value Compression


CREATE TABLE accepts a list of values for field attribute 'COMPRESS'. ALTER TABLE accepts a list of values for field attribute 'COMPRESS'.

Compress the top 15 most populated countries. Compress the Sex field.
CREATE TABLE People ( Name VARCHAR(100), Address VARCHAR(100), Country CHAR(100) COMPRESS ( 'Australia', 'Bangladesh', 'Brazil', 'China', 'England', 'France', 'Germany', 'India', 'Indonesia', 'Japan', 'Mexico', 'Nigeria', 'Pakistan', 'Russian Federation', 'United States of America' ), Sex CHAR(1) COMPRESS ('F', 'M') );

Add a Education column.


ALTER TABLE People ADD Education CHAR(10) COMPRESS ('ELEMENTARY', 'MIDDLE', 'HIGH', 'COLLEGE', 'POST GRAD') ;

Multi-Value Compression vs. VARCHAR


There is no general rule evaluate the options.

VARCHAR will generally be better when difference of maximum and


average field length is high, and a low percentage of fields are compressible.

Compression will generally be better when difference of maximum and


average field length is low, and a high percentage of fields are compressible.

When neither is a clear winner, then use VARCHAR since it uses


slightly less CPU resource.

Note: Compression does not carry forward into spool files. VARCHAR is carried into spool, but not uncompressed.

V2R5 Compression
Compression is not supported on the following data types: INTERVAL VARCHAR TIME VARBYTE TIMESTAMP VARGRAPHIC

Compression is supported on the following data types: DATE (4) SMALLINT (2) DOUBLE (8) CHAR (N) (N<256) INTEGER (4) DECIMAL (1, 2, 4, or 8) BYTEINT (1) FLOAT/REAL (8) BYTE (N) (N<256)

Columns with frequently occurring values may be highly compressed. Examples:

NULLs Flags Age (in years) # of Children Reason Category

Zeros Spaces Gender Credit Card Type Automobile Make Codes

Default Values Binary Indicators (e.g., T/F) Education Level State, County, City Status

ANSI and Teradata Data Types


Teradata
INTEGER SMALLINT BYTEINT DATE TIME, TIME WITH TIME ZONE TIMESTAMP, TIMESTAMP WITH TIME ZONE

ANSI (Core) equivalent


INTEGER SMALLINT

FLOAT FLOAT FLOAT DECIMAL(n,m) DECIMAL(n,m) CHAR(n) VARCHAR(n), CHAR VARYING(n) LONG VARCHAR
BYTE(n), VARBYTE(n) GRAPHIC(n) VARGRAPHIC(n), LONG VARGRAPHIC

FLOAT REAL DOUBLE PRECISION DECIMAL(n,m) NUMERIC(n,m) CHAR(n)

Data Types
BYTEINT
S I G N

-128 to +127

Non-ANSI

SMALLINT
S I G N

-32,768 to +32,767

INTEGER
S I G N

-2,147,483,648 to +2,147,483,647

Data Types (Date and Time)


DATE (4 Bytes) ((YYYY - 1900)) * 10000 + (MM * 100) + DD

TIME (6 Bytes) TIME WITH TIME ZONE (8 Bytes)

hh:mm:ss.ssssss

hh:mm:ss.ssssss +/- hh.mm

TIMESTAMP (10 Bytes)

Date + Time

TIMESTAMP WITH TIME ZONE (12 Bytes)

Date + Time + Zone

DATE, TIME, and TIMESTAMP are also SQL functions. CURRENT_DATE, CURRENT_TIME, and CURRENT_TIMESTAMP represent
values.

Data Types (Decimal, Numeric, Float, Real, Double Precision)


Decimal and Numeric
DECIMAL [ ( n [ , m ] ) ] n = 1 - 18 m=0-n Default = (5, 0) Stored in scaled binary Number of Number of Digits Bytes

1 to 2
3 to 4 5 to 9 10 to 18

1 byte
2 bytes 4 bytes 8 bytes

FLOAT, REAL, and DOUBLE PRECISION


Notes:

Range is 2 * 10 -307 to 2 * 10 +308 15 significant decimal digit precision. Manipulated in IEEE floating point format. Corresponds to, but is not identical to, IBM normalized 64 bit floating point.
S I G N Exponent 11 Bits Fraction/Mantissa 52 Bits

8 Bytes

Data Types (CHAR, VARCHAR, LONG VARCHAR)


CHARACTER 1 byte per character. Stored in 8 bit ASCII. Conversion to/from host done by the system. National Character values. Japanese single-byte Katakana.

CHAR ( n ) n = 1 - 64000 Fixed length character string

VARCHAR ( n ) n = 1 - 64000 Variable length character string

2
2 byte column offset identifies location in row

LONG VARCHAR Equivalent to VARCHAR (64000)

2 2 byte column offset identifies location in row

Data Types (BYTE and VARBYTE)



BYTE Stored in host format. Never translated by the Teradata Database Handled as if they were n-byte, unsigned binary integers. Suitable for digitized image information

BYTE ( n ) n = 1 - 64000 Fixed length binary string

VARBYTE ( n ) n = 1 - 64000 Variable length binary string

2 2 byte column offset identifies location in row

Data Types (GRAPHIC, VARGRAPHIC, LONG VARGRAPHIC)


VARGRAPHIC

2 bytes per character. Used for double-byte Kanji and Hiragana, and
Chinese double-byte Hanzi values.

GRAPHIC ( n ) n = 1 - 32000 Fixed length multi-byte character string; n is the length in logical characters. VARGRAPHIC ( n ) n = 1 - 32000 Variable length multi-byte character string; n is the length in logical characters.

2
2 byte column offset identifies location in row

LONG VARGRAPHIC Equivalent to VARGRAPHIC (32000)

2 byte column offset identifies location in row

Variable Column Offsets


Offset Array 50 c1 75 c2 75 c3 100 x Variable Length Data

c1 data 5 0

c3 data 7 5 1 0 0

Offset values tell the starting location of a variable column.


Determine the column length by subtracting its starting location
from the next columns starting location.

The definition of variable length columns requires one additional 2byte offset that locates the end of the final variable column.

Sizing Considerations

Compress only columns where at least 10% to 20% of the rows participate. COMPRESS will create smaller rows, and smaller rows are generally more efficient. Compress columns whose NULL values are not subject to changes. Compression saves space but costs computational overhead. Adding a column that is not compressible expands all rows. Adding a column that is compressible and there are no spare presence bits expands all rows. Dropping a column changes all row sizes where data is present.

Use VARCHAR whenever the space savings offset the overhead.

Row Size Calculation Form


Table Name ____________________
Variable Column Data Detail
Column Name Type Max Average Data Type BYTEINT SMALLINT INTEGER DATE TIME(ANSI) TIME with ZONE TIMESTAMP # of Columns * * * * * * * Size 1 2 4 4 6 8 10 = = = = = = = TOTAL

TIMESTAMP/ ZONE DECIMAL 1-2 3-4 5-9 10-18 FLOAT Fixed Variable

SUM(a) = SUM(a) = SUM of the AVERAGE number of bytes expected for the variable column. SUM(n) = SUM of the CHAR and GRAPHIC column bytes. ** For V2, round up to an even number of bytes.

12 1 2 4 8 8 SUM(n) SUM(a) LOGICAL SIZE Overhead Partitioned Primary Index Overhead (2) Variable Column Offsets (__ * 2 ) + 2 ; zero if no variable columns _____ Bits for Compressible Columns _____ Nullable Columns _____ / 8 (Quotient only) PHYSICAL ROW SIZE

* * * * * *

= = = = = = = = = = =
=

14

= =

Example: Sizing a Row


EMPLOYEE
EMP # PK,SA INT SUPV EMP # FK INT DEPT # FK INT JOB CODE FK SMALL INT LAST NAME NN Char Fix 20 FIRST NAME NN Char Var 30 HIRE DATE NN DATE BIRTH DATE NN DATE SALARY AMOUNT NN DEC (10,2)

Using this logical row layout, the next page will size a typical row of the Employee table.

Example: Completing the Row Size Calculation Form


Table Name EMPLOYEE
Variable Column Data Detail
Column Name First Name Type CV Max 30 Average 14 Data Type BYTEINT SMALLINT INTEGER DATE TIME(ANSI) TIME with ZONE TIMESTAMP 1 3 2 # of Columns * * * * * * * Size 1 2 4 4 6 8 10 = = = = = = = 2 12 8 TOTAL

TIMESTAMP/ ZONE DECIMAL 1-2 3-4 5-9 10-18 FLOAT Fixed Variable

SUM(a) =

14

SUM(a) = SUM of the AVERAGE number of bytes expected for the variable column. SUM(n) = SUM of the CHAR and GRAPHIC column bytes. ** For V2, round up to an even number of bytes.

12 1 2 4 1 8 8 1 SUM(n) 1 SUM(a) LOGICAL SIZE Overhead Partitioned Primary Index Overhead (2) Variable Column Offsets (_1_ * 2 ) + 2 ; zero if no variable columns ___0__ Bits for Compressible Columns ___3__ Nullable Columns ___3__ / 8 (Quotient only) PHYSICAL ROW SIZE

* * * * * *

= = = = = = = = = = =
=

8 20 14 64 14 0 4

= =

0 82

Row Size Calculation Form


Table Name ____________________
Variable Column Data Detail
Column Name Type Max Average Data Type BYTEINT SMALLINT INTEGER DATE TIME(ANSI) TIME with ZONE TIMESTAMP # of Columns * * * * * * * Size 1 2 4 4 6 8 10 = = = = = = = TOTAL

TIMESTAMP/ ZONE DECIMAL 1-2 3-4 5-9 10-18 FLOAT Fixed Variable

SUM(a) = SUM(a) = SUM of the AVERAGE number of bytes expected for the variable column. SUM(n) = SUM of the CHAR and GRAPHIC column bytes. ** For V2, round up to an even number of bytes.

12 1 2 4 8 8 SUM(n) SUM(a) LOGICAL SIZE Overhead Partitioned Primary Index Overhead (2) Variable Column Offsets (__ * 2 ) + 2 ; zero if no variable columns _____ Bits for Compressible Columns _____ Nullable Columns _____ / 8 (Quotient only) PHYSICAL ROW SIZE

* * * * * *

= = = = = = = = = = =
=

14

= =

Row Size Exercise


CALL
TAKEN BY PLACED BY PLACED BY ORIGINAL CALL EMPLOYEE CUSTOMER CONTACT EMPLOYEE CALL CALL NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER DATE PK, SA FK, NN FK FK FK FK NN UPI NUSI NUSI 100 100 Example of Constraint Number CALL TIME NN CALL STATUS CODE FK, NN NUSI CALL CALL TYPE PRIORITY AREA CODE CODE CODE FK, NN FK, NN NUSI SYSTEM PART NUMBER CATEGORY FK FK

PHONE

EXTENSION

DOMAIN NAME Area_Code Call_Number Call_Priority_Code Call_Status_Code Call_Type_Code Contact_Number Customer_Number Date Employee_Number Extension Part_Category Phone System_Number Time

DATA TYPE SMALL INT INT SMALL INT SMALL INT CF INT INT D INT INT INT INT INT TIME

MAX BYTES 2 4 2 2 2 4 4 4 4 4 4 4 4 6

This table will be partitioned via RANGE_N on Call_Date with Monthly intervals.

Row Size Calculation Form


Table Name CALL
Data Type BYTEINT SMALLINT INTEGER DATE TIME(ANSI) TIME with ZONE TIMESTAMP # of Columns * * * * * * * Size 1 2 4 4 6 8 10 = = = = = = = TOTAL

Variable Column Data Detail


Column Name Type Max Average

3 10 1 1

6 40 4 6

TIMESTAMP/ ZONE DECIMAL 1-2 3-4 5-9 10-18 FLOAT Fixed Variable

12 1 2 4 8 8 SUM(n) 1 SUM(a) LOGICAL SIZE

* * * * * *

= = = = = = = = =
= =

SUM(a) = SUM(a) = SUM of the AVERAGE number of bytes expected for the variable column. SUM(n) = SUM of the CHAR and GRAPHIC column bytes. ** For V2, round up to an even number of bytes.

Overhead Partitioned Primary Index Overhead (2)

2 0 58 14 2

Variable Column Offsets (___ * 2 ) + 2 ; zero if no variable columns = ______ Bits for Compressible Columns ______ 9 Nullable Columns ______ = 9 / 8 (Quotient only) PHYSICAL ROW SIZE =

+1 75 76

Sizing Tables and Indexes


These options on the Teradata Database support variable length rows:
COMPRESS VARCHAR LONG VARCHAR VARBYTE

Variable length blocks within a table are also supported. These features also make accurate space estimates for tables and their
indexes more difficult.

Physical row size and table row count determine space requirements.

Table Headers
One row per AMP per Table. Table headers are a separate subtable. Minimum table header block size is 512
bytes (1 sector) per AMP.
V2R4 Example of Table Header
STANDARD ROW HEADER
LENGTH, ROW ID, PRESENCE/SPARE BYTES

FIELD 2 OFFSET FIELD 3 OFFSET FIELD 4 OFFSET FIELD 5 OFFSET FIELD 6 OFFSET FIELD 7 OFFSET FIELD 8 OFFSET FIELD 9 OFFSET EXTRA OFFSET

Typically, a table header will be at least


1024 bytes. For example:

DATABASE AND TABLE NAMES DATABASE ID OTHER INTERNAL INFO


CREATION DATE, PROTECTION, TYPE OF JOURNALING, JOURNAL ID, STRUCT VERSION, etc.

F I E L D 1 F 2 3 F 4

Tables with 4 or more columns Tables with 2 columns and a NUSI Compressed values are maintained
within the table header.

INDEX DESCRIPTORS
(36 BYTES * # INDEXES) PLUS 20 BYTES PER INDEX COLUMN ALWAYS NULL

FASTLOAD & RESTORE INFORMATION


USUALLY NULL

BASE COLUMN INFO


COUNT OF COLUMNS, LOCATION OF FIRST FIXED FIELD, NUMBER OF PRESENCE BITS, etc. 24 BYTES

Multiple values that are compressed


for a column are maintained in an array in the table header.

COLUMN INFORMATION FOR EACH COLUMN


20 BYTES PER COLUMN (+ COMPRESS VALUE) DATA TYPE, OFFSET WITHIN ROW NULLIBLE/NOT NULLIBLE, COMPRESS/NO COMPRESS, PRESENCE BIT LOCATION, etc.

F I E L D 5

Maximum value of approximately 64K. The base table header covers all of its
secondary index subtables.

RESTARTABLE SORT INFORMATION


USUALLY NULL ALWAYS NULL ALWAYS NULL ALWAYS NULL

F 6 7 8 9

ROW LENGTH or REF. ARRAY POINTER

Sizing a Data Table


Block sizes vary within a table, so compute a range.

Typically, the maximum block size is 63.5 KB bytes, and a typical block size is 48 KB.
Formula:
(BlockSize - 38) / RowSize = RowsPerBlock RowCount / RowsPerBlock = Blocks NumAmps * 1024 = Header (Blocks * BlockSize) + Header = NO FALLBACK (BlockSize = Typical Block Size) (rounded down) (rounded up)

(Blocks * BlockSize) * 2 + Header = FALLBACK


Parameters: 38 = Block Header + Block Trailer 1024 = Typical table header size

(BlockSize = Typical Block Size)

BlockSize NumAmps RowCount RowSize

= Typical block size in bytes = Number of AMPs in the system = Number of table rows expected = Physical row size

Note: For large tables, table headers and block overhead (38 bytes) add a minimal amount of size to the table. Therefore, multiply row size by number of rows and double for Fallback.

Table Sizing Exercise


Given this data, estimate the size of a table with Fallback and a typical block size of 48K.

BlockSize NumAmps RowCount RowSize

= 49,152 bytes (48K) = 20 = 501,000,000 = 98 bytes (includes overhead)

Formula:
(BlockSize - 38) / RowSize = RowsPerBlock RowCount / RowsPerBlock = Blocks NumAmps * 1024 = Header (Blocks * BlockSize) + Header = No Fallback (Blocks * BlockSize) * 2 + Header = Fallback (round down) (round up)

Calculation:
(49,152 - 38) / 98 = 501 rows per block 501,000,000 / 501 = 1,000,000 blocks 20 * 1024 = 20,480 for table headers (1,000,000 * 49,152) + 20,480 = 49,152,020,480 (1,000,000 * 49,152) * 2 + 20,480 = 98,304,020,480

(No Fallback) (Fallback)

An easier way to estimate this table size:


501,000,000 x 98 bytes x 2 (Fallback) = 98,196,000,000

Estimating the Size of a USI Subtable


Row ID of USI Row Length
Row Hash Uniq. Value

Secondary Index Value

Base Table Row Identifier


Part. # Row Hash Uniq. Value

Ref. Array Pointer

2
Bytes

4
Spare & Presence

7 Variable
Row Offsets & Misc. (>=7)

2 (opt.)

There is one Index row for each Base Table row. USI subtable row size = (Index value size + 29 or 31)
Where 31 = + + + + 4 8 9 2 8 (Row Header and Row Ref. Array pointer) (This row's Row ID) (Spare, presence, and offset bytes) (Optional Partition # for PPI tables) (Base table Row ID)

To estimate the amount of space needed for a USI subtable, you can use the following formulas.
For tables with NPPI, USI Subtable Size = (Row count) * (index value size + 29) For tables with PPI, USI Subtable Size = (Row count) * (index value size + 31)

Note: Double this figure for Fallback.

Estimating the Size of a NUSI Subtable


Row Length Row ID of NUSI
Row Hash Uniq. Value

Secondary Index Value

Table Row ID List


P RH U 2 4 4 P RH U 2 4 4

Ref. Array Pointer

2
Bytes

4
Spare & Presence

Variable
Row Offsets & Misc. (>=7)

8/10

8/10

There is at least one index row per AMP for each distinct index value that is in the base table on that AMP.

To estimate the size of a NUSI subtable,


Size = (Row count) * 8 (or 10 for PPI tables) + ( (#distinct values) * (Index value size + 21) * MIN ( #AMPs, Rows per value ) ) MIN( ___ , ___ ) use the smaller of the two values. Double this figure for Fallback.
Example: More typical rows/value than AMPS: (50 rows/value, 10 AMPS) Every AMP probably has every value. Every AMP has a subtable row for every value. More weakly selective. More rows returned from an equality search. More AMPs than typical rows/value: (10 AMPS, 5 rows/value) NOT Every AMP has every value. NOT Every AMP has a subtable row for every value. More strongly selective. Fewer rows returned from an equality search.

Estimating the Size of a Reference Index Subtable


Row Length Row ID of RI
Row Hash Uniq. Value

Valid Flag

Foreign Key Value

Count

Ref. Array Pointer

2
Bytes

4
Overhead

0+ Variable

Row Offsets

Optional Presence and variable length indicators

There is one reference index row for each distinct foreign key value.

RI subtable row size = (Index value size + 25)


Where 25 = + + + + 4 8 8 1 4 (Row Header and Row Ref. Array pointer) (This row's Row ID) (Overhead and row offset bytes) (Validity flag) (Count)

To estimate the size of a Reference Index (RI) subtable, you can use the following formula.
RI Subtable Size = (Distinct count) * (index size + 25)

Double this figure for Fallback.

Index Sizing Exercise


A 1,000,000 row Fallback table, on a 20-AMP system, has an Integer USI and a 50 Row per Value CHAR(29) NUSI with 20,000 distinct values. Assume the table is partitioned. Estimate the space for each secondary index. Formulas:
USI Size = Row Count * (IndexValueSize + 31)

1,000,000 * (4 + 31 + 1) * 2 = 72,000,000
NUSI Size = (Row Count * 10)

+
(#distinct values) * (Index value size + 21) * MIN ( #AMPs , Rows per value ) (10,000,000 + (20,000 * (29 + 21) * 20)) * 2 = 60,000,000

Note: +1 - Rows are allocated on even offsets within the data block.

Empirical Sizing
The best way to size a production table, including indexes is: 1. Load a known percentage of rows onto the system. 2. Query the DD through the view DBC.TableSize. 3. Create one index. 4. Query the DD through the view DBC.TableSize. 5. Repeat steps 3 and 4 as necessary. 6. Multiply the results to determine the production size. Example: Step 1 Load 1% of a table onto a system. Step 2 SELECT SUM(CurrentPerm) FROM DBC.Tablesize WHERE DatabaseName = DATABASE AND TableName = 'Daily_Sales' ; Sum(CurrentPerm) 671,744 Step 3 CREATE INDEX (sales_date) ON Daily_Sales; Step 4 SELECT SUM(CurrentPerm) FROM DBC.Tablesize WHERE DatabaseName = DATABASE AND TableName = 'Daily_Sales' ; Therefore, index size is: 914,944 671,744 243,200
Note: The same query without the SUM keyword returns per/AMP figures which reveal distribution efficiency.

Sum(CurrentPerm) 914,944

Spool Space
Maximum spool space needs vary with table size, use (type of application),
and frequency of use.

Large systems use more spool to duplicate tables on each AMP.

Cylinders not currently used for data may be used for spool.
The user's spool amount may be changed dynamically. Avoid unnecessary copying or redistribution of entire tables to spool.

As user concurrency and/or SQL complexity increases, add more SPOOL.


Running out of Spool Space

If a user exceeds their Spool space limit, they will receive the following error
message.
2646 No more spool space in username

If the AMP runs out of Spool space (insufficient available cylinders for Spool),
the following message will be displayed.

2507 - Out of spool space on disk

Release of Spool
Intermediate Spool

Intermediate Spool results are held until the (LastUse) Explain step.
Output Spool Output Spool results are held until: Last spool Response - BTEQ CLOSE cursor - PreProcessor ERQ, Terminate function - CLI Session ends (Job Abort, timeout, logoff, etc.) System is restarted

System Restart - each AMP rebuilds its Master Index from its Cylinder Indexes.
The AMPs delete all spool files by moving them to the Free Cylinder List. This costs only one I/O per spool cylinder, and saves maintaining the Master Index on disk.

System Sizing Exercise


Some general guidelines for estimating system size: 20% (or more) of total space for Spool 5% of total space for PJs/development/staging 5% of total space for DBC & Transient Journal 20 - 50% of data size for indexes

If not using Fallback, multiply the amount of raw data by a factor of 2 or 3. If using Fallback, multiply the amount of raw data by a factor of 4 or 5.
Example: User raw data Estimate of Vdisk space needed Proof: Estimate of Vdisk space - Spool (20%) - PJs/development/staging (5%) - DBC & Transient Journal (5%) 400 GB 1600 - 2000 GB

1600 - 2000 GB - 400 GB - 100 GB - 100 GB 1000 - 1400 GB 400 GB 80 - 200 GB 960 - 1200 GB

A 4 node system, each with 7 AMPs, and each AMP with 72 GB of Vdisk space would meet this requirement. 4 x 7 x 72 GB = 2016 GB of available space

User raw data 20 - 50% for indexes Fallback (for data and indexes)

Sizing Summary
Accurate row counts and sizes are needed to get good space estimates.

Database sizing includes:


Tables + Fallback + Secondary Indexes + Fallback + Reference Indexes + Fallback +

Join Indexes + Fallback + Hash Indexes + Fallback + Permanent Journal (dual or single) + Stored Procedure space + Spool space + Temporary space

Review Questions
1. Which of the following can be used with the COMPRESS option? a. Referencing columns - Foreign Key b. Referenced column - Primary Key as a USI c. Unique Primary Index d. Non-unique Secondary Index

2. Which section of a row identifies the starting location of variable length data column data and is present only if variable length columns are declared? a. Uncompressed Columns b. VARCHAR Columns c. Presence Bits d. Column Offsets 3. How can you override the default that a column with a NULL value will require row space? a. Use the COMPRESS option on the column as part of the CREATE TABLE statement. b. When creating the user, set the default so that columns will default to COMPRESS when creating a table. c. Use the NOT NULL option on the column as part of the CREATE TABLE statement d. Use the DEFAULT NULL option on the column as part of the CREATE TABLE statement. 4. What is the minimum space the table headers will take for a 6-column table on a 10 AMP system? a. 10240 bytes b. 4096 bytes c. 5120 bytes d. 1024 bytes 5. What DD view can you query to get sizing information about tables? _____________________

Module 10: Review Question Answers


1. Which of the following can be used with the COMPRESS option? a. Referencing columns - Foreign Key b. Referenced column - Primary Key as a USI c. Unique Primary Index d. Non-unique Secondary Index

2. Which section of a row identifies the starting location of variable length data column data and is present only if variable length columns are declared? a. Uncompressed Columns b. VARCHAR Columns c. Presence Bits d. Column Offsets 3. How can you override the default that a column with a NULL value will require row space? a. Use the COMPRESS option on the column as part of the CREATE TABLE statement. b. When creating the user, set the default so that columns will default to COMPRESS when creating a table. c. Use the NOT NULL option on the column as part of the CREATE TABLE statement d. Use the DEFAULT NULL option on the column as part of the CREATE TABLE statement. 4. What is the minimum space the table headers will take for a 6-column table on a 10 AMP system? a. 10240 bytes b. 4096 bytes c. 5120 bytes d. 1024 bytes 5. What DD view can you query to get sizing information about tables? DBC.Tablesize

Lab Exercises
Lab Exercise 10-1
Purpose In this lab, you will compress multiple values for a column in order to reduce Perm space. What you need Populated AU.Accounts table and an empty table in your database Tasks 1. Populate your Accounts table from the AU.Accounts table using the INSERT/SELECT statement:

INSERT INTO Accounts SELECT * FROM AU.Accounts;


Using the DBC.TableSize view, what is the amount of Perm space used. Accounts =___________ 2. Create a new table, named "Accounts_MVC", based on the Accounts table except compress the following city names: Culver City, Hermosa Beach, Los Angeles, and Santa Monica Populate your Accounts_MVC table from the AU.Accounts table using INSERT/SELECT. Using the DBC.TableSize view, what is the amount of Perm space used. Accounts_MVC =___________

Lab Exercises
Lab Exercise 10-2
Purpose In this lab, you will use populate tables, determine tables sizes, and create secondary indexes. What you need Populated AU.Trans table and an empty table in your database Tasks 1. Determine the size of your empty Trans table using the DBC.TableSize view (SELECT with and without the SUM aggregate function). Size of empty Trans = _______________ What size are the table headers on each AMP? _______________ 2. Using SHOW TABLE, the Row Size Calculation form and the Sizing a Data Table Formula, estimate the size of this table; assume 15,000 rows. Estimated size of Trans = _______________ 3. Populate your Trans table from the AU.Trans table using the following INSERT/SELECT statement: INSERT INTO Trans SELECT * FROM AU.TRANS; Use the SELECT COUNT(*) function to verify the number of rows. ___________

Lab Exercises
Lab Exercise 10-2 (cont.)
Tasks 4. Using the DBC.TableSize view, determine the actual size of the Trans table by using the SUM aggregate function. Size of populated Trans = _______________

5. Create a USI on the Trans_Number column. Estimate the size of the USI = _______________ Actual size of the USI = _______________ (use the empirical sizing technique)

6. Create a NUSI on the Trans_ID column. Estimate the size of the NUSI = ______________ (Hint: use DISTINCT function) Actual size of the NUS I= ______________ (use the empirical sizing technique)

Lab Exercises
Lab Exercise 10-3
Purpose In this lab, you will determine tables sizes and establish referential integrity between two tables. What you need Populated PD tables and empty tables in your database Tasks 1. Populate your Employee and Emp_Phone tables from the PD.Employee and PD.Emp_Phone tables using the following INSERT/SELECT statements. INSERT INTO Employee SELECT * FROM PD.Employee; INSERT INTO Emp_Phone SELECT * FROM PD.Emp_Phone; 2. Using the DBC.TableSize view, determine the actual size of the Emp_Phone table by using the SUM aggregate function. Size of populated Emp_Phone table = _______________

Lab Exercises
Lab Exercise 10-3 (cont.)
Tasks 3. The Foreign key is Employee_Number in PD.Emp_Phone and the Primary Key is the Employee_Number in PD.Employee. Create a References constraint on Employee_Number using the following SQL statements. ALTER TABLE Emp_Phone ADD CONSTRAINT fk1 FOREIGN KEY (Employee_Number) REFERENCES Employee (Employee_Number); (use the HELP CONSTRAINT Emp_Phone.fk1; to view constraint information. 4. Using the DBC.TableSize view, determine the actual size of the Emp_Phone table by using the SUM aggregate function. Estimate the size of the Reference Index = _______________ Size of populated Emp_Phone with references index = _______________ Size of references index = _______________ 5. Drop the Foreign Key constraint by executing the following SQL command.

ALTER TABLE Emp_Phone DROP CONSTRAINT fk1;

Lab Solutions for Lab 10-1


Lab Exercise 10-1
1. Populate your Accounts table from the AU.Accounts table using the INSERT/SELECT statement:

INSERT INTO Accounts SELECT * FROM AU.Accounts;


Using the DBC.TableSize view, what is the amount of Perm space used. Accounts = 1,804,288 2. Create a new table, named "Accounts_MVC", based on the Accounts table except compress the following city names: Culver City, Hermosa Beach, Los Angeles, and Santa Monica

CREATE SET TABLE Accounts_MVC, FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL (ACCOUNT_NUMBER INTEGER NOT NULL, NUMBER INTEGER, STREET CHAR(25), CITY CHAR(20) COMPRESS ('Hermosa Beach', 'Culver City', 'Los Angeles','Santa Monica'), STATE CHAR(2), ZIP_CODE INTEGER, BALANCE_FORWARD DECIMAL(10,2), BALANCE_CURRENT DECIMAL(10,2)) PRIMARY INDEX ( ACCOUNT_NUMBER );
Populate your Accounts_MVC table from the AU.Accounts table using INSERT/SELECT. Using the DBC.TableSize view, what is the amount of Perm space used. Accounts_MVC = 1,404,828

Lab Solutions for Lab 10-2


Lab Exercise 10-2
1. Determine the size of your empty Trans table using the DBC.Tablesize view (SELECT with and without the SUM aggregate function).

SELECT SUM(CurrentPerm) FROM DBC.Tablesize WHERE DatabaseName = DATABASE AND TableName = 'Trans' ; Sum(CurrentPerm) 8192
Size of empty Trans = 8192 (Captured from an 8 AMP system)

What size are the table headers on each AMP? 1024 2. Using SHOW TABLE, the Row Size Calculation form and the Sizing a Data Table Formula, estimate the size of this table; assume 15,000 rows.

Each row is 24 bytes long plus 14 bytes for overhead = 38 bytes 38 x 15,000 = 570,000 x 2 (Fallback) = 1,140,000 bytes approx.
Estimated size of Trans = 1,140,000 3. Populate your Trans table from the AU.Trans table using the following INSERT/SELECT statement:

INSERT INTO Trans SELECT * FROM AU.TRANS;


Use the SELECT COUNT(*) function to verify the number of rows. 15,000

SELECT COUNT(*) FROM Trans;

Count(*) 15000

Lab Solutions for Lab 10-2 (cont.)


Lab Exercise 10-2 (cont.)
4. Using the DBC.Tablesize view, determine the actual size of the Trans table by using the SUM aggregate function. Size of populated Trans = 1,153,024

SELECT WHERE AND

SUM(CurrentPerm) FROM DBC.Tablesize DatabaseName = DATABASE TableName = 'Trans' ; (Estimated size was 1,140,000)

Sum(CurrentPerm) 1153024
5. Create a USI on the Trans_Number column.

CREATE UNIQUE INDEX (Trans_Number) on Trans;


Estimate the size of the USI = 960,000

(4 + 29) x 15,000 = 480,000 x 2 (Fallback) = 990,000 bytes approx.


Actual size of the USI = 1,026,048 (use the empirical sizing technique)

SELECT WHERE AND

SUM(CurrentPerm) FROM DBC.Tablesize DatabaseName = DATABASE TableName = 'Trans' ;

Sum(CurrentPerm) 2179072 2,179,072 - 1,153,024 = 1,026,048

Lab Solutions for Lab 10-2 (cont.)


Lab Exercise 10-2 (cont.)
6. Create a NUSI on the Trans_ID column.

CREATE INDEX (Trans_ID) on Trans;


Estimate the size of the NUSI = _______________

SELECT COUNT(DISTINCT(Trans_ID)) FROM Trans; Count(Distinct(TRANS_ID)) 975 (15,000 x 8) + ( 975 x (4 + 21) x 8 ) = 315,000 bytes approx. 315,000 x 2 (Fallback) = 630,400 bytes approx.
Actual size of the NUSI = 523,264 (use the empirical sizing technique)

SELECT WHERE AND

SUM(CurrentPerm) FROM DBC.Tablesize DatabaseName = DATABASE TableName = 'Trans' ;

Sum(CurrentPerm) 2702336 2,702,336 - 2,179,072 = 523,264

Lab Solutions for Lab 10-3


Lab Exercise 10-3
1. Populate your Employee and Emp_Phone tables from the PD.Employee and PD.Emp_Phone tables using the following INSERT/SELECT statement:

INSERT INTO Employee SELECT * FROM PD.Employee; INSERT INTO Emp_Phone SELECT * FROM PD.Emp_Phone;
2. Using the DBC.Tablesize view, determine the actual size of the Emp_Phone table by using the SUM aggregate function. Size of populated Emp_Phone = 124,928

SELECT WHERE AND


3.

SUM (CurrentPerm) FROM DBC.Tablesize DatabaseName = DATABASE TableName = 'Emp_Phone' ;

Sum(CurrentPerm) 124,928

The Foreign key is Employee_Number in the Emp_Phone table and the Primary Key is the Employee_Number in the Employee table. Create a References constraint on Employee_Number using the following SQL statements. ALTER TABLE Emp_Phone ADD CONSTRAINT fk1 FOREIGN KEY (Employee_Number) REFERENCES Employee (Employee_Number); (use the HELP CONSTRAINT Emp_Phone.fk1; to view constraint information.

HELP CONSTRAINT Emp_Phone.fk1;

Name Type FK1 REFERENCE

State Index ID Foreign Key Columns ... VALID 0 EMPLOYEE_NUMBER ...

Lab Solutions for Lab 10-3 (cont.)


Lab Exercise 10-3 (cont.)
4. Using the DBC.Tablesize view, determine the actual size of the Emp_Phone table by using the SUM aggregate function.

SELECT COUNT(DISTINCT(Employee_Number)) AS "Count" FROM Emp_Phone; Count 1000 (4 + 25) x 1,000 = 29,000 x 2 (Fallback) = 58,000 bytes approx.
Estimate the size of the Reference Index = 58,000
Size of populated Emp_Phone with references index = 190,464

SELECT WHERE AND

SUM (CurrentPerm) FROM DBC.Tablesize DatabaseName = DATABASE TableName = 'Emp_Phone' ;

Sum(CurrentPerm) 190464

190,464 - 124,928 = 65,536

Size of references index = 65,536 5. Drop the Foreign Key constraint by executing the following SQL command.

ALTER TABLE Emp_Phone DROP CONSTRAINT fk1;