Beruflich Dokumente
Kultur Dokumente
So the logical schema structure is a key aspect in the development of any system. There are many data
structures that can store data in an ecient and reliable way.
In this resource well talk about two very important logical structures in data warehousing: star schema
and snowake schema. These two distinct ways of putting together large quantities of data are the most
commonly used in data warehouse structures.
Therefore we will describe them to the detail, providing a lot of useful and fundamental information.
StarSchema
Regarding to the logical organization of your data, the star schema is the easier and most
straightforward way to structure all the information.
The architecture of this kind of schemas is quite simple and provides several advantages, as well as some
disadvantages, which we describe below.
Design
In the star schema we have one or more centralized fact tables and one or multiple dimensions linked
to it. These dimensions are only related to the fact table, so the only structural link they have is to that
specic table.
The fact table relates to the dimensions having their primary keys as foreign keys, and other extra
attributes relevant to the data warehouse. Therefore, each composition of primary keys identify a unique
fact or piece of information. In most cases, the fact tables are in the 3rd normal form and the dimension
ones are not normalized.
http://www.dataonfocus.com/starschemaandsnowflakeschema/ 1/4
1/26/2017 StarSchemaandSnowflakeSchemaDataOnFocus
This way, the core data of the database is separated from its attributes, which characterize that
information.
Advantages
The implementation of this structure provides your data warehouse some benets. Below we describe
the most important ones:
Simple structure of the data Easy to understand how elements are connected. Simplies the
reporting of the information.
Most common Easy to integrate with another tools.
Queries more eective The queries in these systems are usually simpler since the data doesnt
follow some strict rules of normalization. Another reason for this is the lesser number of tables to
join.
Performance enhancements The performance has substantial gains due to the de-normalized
form of the data.
Optimized for large data sets Due to the best performance of the system and its queries, the
star schema is ecient on data warehouses or data marts with huge data sets.
Rapid aggregational actions Tasks like sum, average, count, and others are performed quickly on
this systems.
Good for OLAP
Disadvantages
The de-normalization of star schema structural data has also some disadvantages. In this chapter we
focus the most important ones:
Poor Data Integrity Due to the non normalized structure of these tables, information can be
replicated, creating several anomalies in the data.
Long time loading dimension table When the data integrity is low and replication values high,
loading time of the tables increases.
More disk space
Additional processing Usually some controlling processes are added, to avoid the data integrity
issue.
Harder Complex Queries Since the data schema is built specically to analyse a set of data, its
de-normalized organization makes it harder to develop new complex queries.
No Many-to-Many This schema has no many-to-many relationships
Example
To mature your thoughts on this star schema information, we provide a simple visual example of one:
As you can see, this structure resembles a star, thus the name of this logical schema.
http://www.dataonfocus.com/starschemaandsnowflakeschema/ 2/4
1/26/2017 StarSchemaandSnowflakeSchemaDataOnFocus
SnowflakeSchema
On this chapter well describe what is the snowake schema, its benets and disadvantages.
Design
The snowake schema architecture is based on one or more central fact tables. These tables relate
themselves to one or multiple dimensions. For its part the dimension tables can be connected to other
dimension tables.
This structure is normalized along the multiple dimensions which are related to each other, therefore
providing a more normalized logical structure. These connections can appear with multiple levels of
depth depending on the level of complexity of the structure.
Each related dimension table can have multiple parent dimension tables resulting in a more complex, and
normalized, system.
This schema is usually used to support data warehouse or data marts specic needs, improving theirs
capability.
Advantages
There are obvious dierences between the two structures referred on this article. Here we describe the
benets of this snowaking approach:
Better Data Quality The information stored on the dimensions has usually far less anomalies
than in star schema.
Less Storage Used Due to the optimization on the dimension tables, a lot of the storage space is
spared with the signicant decrease of dat replication.
http://www.dataonfocus.com/starschemaandsnowflakeschema/ 3/4
1/26/2017 StarSchemaandSnowflakeSchemaDataOnFocus
Better Specic Query Performance Specic views are optimized by this structure since it is built
to support them because such queries are optimized.
Optimized Tools There are several tools built to work with this kind of data organization.
More Structured Data Information is obviously much more organized than in non-normalized
structures.
Disadvantages
Besides all of the benets stated above, there are also negative points of the usage of this structure:
Not Fully Normalized This structure imposes some normalization to the dimension tables.
However this isnt fully achieved and, to ensure data quality, some extra processing tasks need to
be developed.
High Join Complexity The structure of the data is far more complex to analyse and to work with
than in non-normalized schemas.
Poor General Performance Despite the storage and organization benets, this schema usually
performs worse on general queries due to the high number of join operations needed to retrieve
non-specic data.
Example
http://www.dataonfocus.com/starschemaandsnowflakeschema/ 4/4