Sie sind auf Seite 1von 11

Block Level Compression In Teradata 14.

Block Level Compression In


Teradata 14.0
Introducing Temperature Based
Compression

By: Smarak Das


Employee Id: 391485
Project: Kaiser Permanente
Role: Teradata DBA
1

Block Level Compression In Teradata 14.0

1. INTRODUCTION
Two key advantages of data compression include:
(a) Less disk space required to physically store the data
(b) Improved throughput when reading or writing to disk
Teradata 13.10 introduced new ways to compress data in the database, bringing to
3 the total approaches possible:
(a) Multi Value Compression (MVC)
(b) Algorithmic Level Compression (ALC)
(c) Block Level Compression (BLC)
This Document focuses on Block Level Compression (BLC) and its enhanced
functionalities introduced in Teradata 14.

2. AUDIENCE
The targeted audience for this document is experienced Teradata Database
administrators, application implementers, and Teradata associates who have a
background in database management and implementation. This Document has
been preapred by referencing several Teradata Guides and Orange Books.

Block Level Compression In Teradata 14.0


3. BLOCK LEVEL COMPRESSION BASICS
Block level compression is a space-reduction feature, not a performance feature.
All rows in each data block are included in a single act of compression.
Once a table or some number of cylinders from a table are compressed, if a query
needs to access even one column from a single row, the data block in its entirety
must be decompressed. In order for the query to process the data, the compressed
version of the data block will be brought into the FSG cache (the data block
memory cache), and then decompressed into its original version.
Even though one query decompresses a data block, the next query that accesses
that same data block must decompress the data block again, even if the
compressed version of the block stays in the cache. Decompressed data blocks
cannot be shared across sessions. In addition, if the same session re-accesses a
compressed data block, it will have to decompress the data block a second time.

When a row is modified, its entire data block must first be decompressed in
memory. The change is made to the appropriate row, and then the entire data
block is compressed once again. If fallback has been defined on the table, the
fallback copy will also undergo decompression followed by recompression to
accommodate update activity. Because there is overhead associated with these
compression and decompression steps, BLC is typically targeted for tables that are
infrequently accessed and even less frequently changed. These types of tables are
often referred to as cold data.

Block Level Compression In Teradata 14.0


Block Level Compression can be applied as follows:
(a) Empty Table: Using Query Band options or appropriate DBS Control Settings.
(b) Non-Empty Table: Using Ferret Utility.
Teradata currently uses the ZLIB library, based on the Lempel-Ziv algorithm for all
software block level compression.

4. SPACE REDUCTION FUNDAMENTALS


The primary purpose of block level compression is to reduce the space on disk
required to physically store a table. It is strongly recommended that you use the
maximum allowed block size (127.5 KB) for tables that will be compressed. This can
be done as either a table level attribute (with CREATE/ALTER TABLE) or DBS
Controls File System.
The figure below illustrates data block packing that happens during the compression
process:

Factors affecting the Degree of Compression:


(a) Individual data blocks within a table may have different reduction rates.
(b) If MVC or ALC is defined on a table when it is compressed at the block level,
then the degree of reduction from adding BLC will be less than it would have
been if preexisting compression had not been present, but overall the compression
level will be increased.
(c) Fallback secondary index sub tables may be compressed, but primary secondary
index data is not compressed. So if your table contains indexes before
compression, the post-compression reduction rate will appear to be smaller than if
it had no indexes at the time of compression.
4

Block Level Compression In Teradata 14.0


(d) Some data blocks are considered too small to provide any meaningful
compression, and those blocks will be bypassed during the compression process.
(e) The patterns of data within the data block will impact how effectively
compression can be applied.

5. Using Manual BLC as it Exists in Teradata Database 13.10


Block level compression can be costly, particularly when it is applied to all data
blocks of the primary and fallback copy of a table, as it is in Teradata 13.10.
The three major costs attributed to compression come from:
Applying the initial compression (a one-time cost)
Uncompressing the data blocks when they are accessed
Recompressing the data blocks when they are modified
The best candidates for block level compression are tables whose data blocks do not
have to be decompressed very often. These would be tables that hold cold data (very
infrequently accessed data) that are rarely or never updated (static tables).
If actively-accessed tables are being considered for manual compression, then BLC
will be a better fit for platforms that have plenty of excess CPU, to account for the
additional overhead involved. Because of the increase in CPU required for
decompression on actively-accessed tables, BLC applied on non-cold data is usually
not recommended for environments that have high CPU utilization (80% or greater).
The next chapter provides examples that illustrate this CPU overhead.

6. Temperature Based Block Level Compression


When considering block level compression on a table, a new enhancement makes it
possible to only compress data that is infrequently accessed, and leave the more
frequently used data blocks uncompressed. Temperature based block level
compression may be applied on a table-by-table basis, or by default on all user tables.
When temperature based BLC is enabled, the Teradata Virtual Storage (TVS)
infrastructure is used to determine the temperature of the data being held in disk
storage. Temperature is an indication of how often the data is accessed. A cold
temperature indicates the data is accessed less often than other data on the system,
whereas a hot temperature indicates the data is accessed more often. This temperature
determination performed by TVS is used.
5

Block Level Compression In Teradata 14.0

6.1 SETTING THE THRESHOLD FOR COMPRESSION


One of the tasks performed by the TVS routines rates all cylinders by
relative temperature, and then determines which cylinders are to be marked
as cold.
The threshold that determines compression depends on a DBS Control
setting that is found in the Compression set of fields:
TempBLCThresh = COLD (COLD, WARM, HOT, 0 100)
Zero specifies the default, which is Cold. The lowest threshold that you can
select is .001. The TempBLCThresh setting indicates the temperature
threshold at which data will be compressed, when under the control of
temperature based BLC. Any data that is colder than what the setting
specifies will be targeted for compression. Here is what the different
options mean:
COLD: By default, the lowest 20% of the allocated perm cylinders
on the platform (this excludes spool tables), once they have been
sorted by temperature (as described above), will be labeled as cold. If
this option is selected, the 20% of cylinders (by default) with the
lowest relative temperature will be compressed.
WARM: When specifying WARM, then all of the data cylinders that
have been classified as both WARM and COLD will be compressed.
Only the hot cylinders will remain uncompressed
HOT: Selecting this option will effectively compress all data on the
system. If that is the desired result, the same result can be achieved
with less overhead by avoiding use of the temperature based feature,
and instead using manual block level compression. There is no
current valid use case for this option.
6

Block Level Compression In Teradata 14.0


0 100: This option allows you to determine where you would like
the threshold between compressed and non-compressed storage to be.
The number that you specify here, which can be up 3 decimal places,
will be applied to the sorted list of cylinder temperatures. For
example, if you want 30% of your allocated cylinders (the coldest
30%) to be compressed, you could assign 30 for this parameter.
Specifying a percent will only affect the Cold-Warm boundary,
because only that boundary is considered when deciding the
compression status of a cylinder.

Sequence of Activities performed as per Temperature-Based Block Level Compression:


(a) TVS keeps track of access counts for each cylinder and periodically re-evaluates
the access counts. AutoTempComp Background task of BLC will take the
information provided by the TVS routines and determine which cylinders should
be compressed and which should be uncompressed. The task, as is true with all
file system activity, runs independently on each AMP. This means that until a
steady state is reached across the system, different portions of the same table will
be compressed or not compressed on different AMPs.
(b) Access counts are fed into a formula which assigns a number for each cylinder
that represents its relative temperature compared to other cylinders. This list of
numbers has a range into the billions and is capable of representing a high degree
of differentiation between cylinders.
7

Block Level Compression In Teradata 14.0


(c) This list of numerically-based cylinder temperatures are sorted low to high, and
two boundaries are set based on pre-defined thresholds: 1.) The Cold-Warm
boundary; and 2.) The Warm-Hot boundary. The boundary that demarcates cold
data from warm data is most relevant with temperature based BLC because that
boundary separates cylinders that need to be compressed (or just left compressed)
and cylinders that need to be uncompressed (or just left uncompressed).
(d) Each data block I/O, whether a read or a write, will increment the temperature
counter for that cylinder. These access counts feed into the temperature rating
algorithm.

Block Level Compression In Teradata 14.0


6.2

AUTOCYLPACK

A separate background task called AutoCylPack will come along behind


AutoTemp-Comp and will attempt to combine cylinders of recently compressed
data blocks. This will only happen if a DBS Control File System parameter called
AutoCylPack-Colddata is changed to TRUE.
AutoCylPack-Colddata has a default of FALSE. Setting the default in this
manner is intended to prevent AutoCylPack from reading cold cylinders, and
inadvertently heating them up. However, the benefit of quick consolidation of
compressed data blocks on to fewer cylinders will in most cases outweigh the risk
of heating up the cylinders for temperature based compression.

Block Level Compression In Teradata 14.0


7. Additional Considerations Using Temperature Based BLC
Consider these additional points having to do with Temperature -Based BLC:
(a) Determining whether a cylinder is hot, warm or cold is based on statistics
that are collected by the Teradata Virtual Storage (TVS) routines.
Temperature is a relative value, not an absolute one. If you access half a
tables data each day, and the other half of the data every other day, the
second group would be labeled colder than the first group, even though it is
processed frequently.
(b) When you first enable temperature based BLC on a non-TVS system,
compression and decompression activity will not begin immediately. When
first enabled, the TVS routines have to collect metrics for a period of time.
It may be as long as a week before any meaningful metrics is produced.
(c) Metrics will need to be collected over a long enough period of time to
accurately represent most of the query patterns on a system. In order to
achieve a fair view of activity on your system, consider enabling
temperature based compression as a single first step, and run that way for a
month, without specifying any tables for temperature based compression.
(d) Compressing data could actually make the data have a warmer temperature
than when it was uncompressed. This is due to the fact that after
compression and cylinder consolidation, each cylinder contains more data
than it previously had. Because temperature is measured at the cylinder
level, a greater number of data blocks could mean a higher cylinder access
count.

(e)

If you wish to disable automatic compression, but still be able to compress


tables manually, without consideration of temperature, you can simply turn
off the master parameter that controls the temperature based compression
functionality: EnableTempBLC = FALSE.

One side-effect to be aware of is that some tables will be left in a mixed state, with some
of the cylinders compressed, and other cylinders not. This is not necessarily a problem
that needs to be fixed.
10

Block Level Compression In Teradata 14.0


8. CONCLUSION
Block level compression on all the primary data blocks (as well as all the fallback
data blocks, if present) has the potential for providing the greatest space reduction,
but it is also the most costly in terms of decompression overhead if compressed tables
are accessed or updated. Use block level compression on primary data when one or
more of these situations exists:
The platform is extremely space-constrained, and achieving the maximum
degree of compression is a priority
MVC combined with ALC does not offer enough size reduction
The database contains large tables which are:
(a) Infrequently accessed and rarely updated (the optimal choice for BLC)
(b) Frequently accessed or updated
AND lots of spare CPU cycles exist (CPU is <=60% utilized)
OR very strong workload management techniques restrict the
CPU used for decompression

When to Use Temperature Based Block Level Compression:


Temperature based compression is not suited for all tables. Tables that are historical
and infrequently accessed are better defined in manual mode. Temperature based
compression is most appropriate in these cases:
Large tables that experience frequent access to a subset of the data.
Tables that remains relatively stable in their temperature profile over time.
On large partitioned primary index (PPI) tables where most access activity is
concentrated in the current partitions, but older partitions are relatively
dormant.
On columnar tables, due to the clustering of frequently accessed data.
If adequate amounts of space are not being saved using temperature based
compression, consider raising the compression temperature threshold, which will
increase the percent of cylinders that qualify for compression. Another alternative is
to make a broader list of tables eligible for temperature based compression.
11

Das könnte Ihnen auch gefallen