Beruflich Dokumente
Kultur Dokumente
The standard architecture for building enterprise data warehouses over the past 20 years has been a
combination of third normal form (3NF) (Inmon) and Dimensional Modelling (Kimbal). When data
warehouses were first built, businesses opted for a big bang approach and tried to build the entire
warehouse prior to producing reports. This took a long time, was very expensive and didn’t offer much
immediate value.
Over time, companies and organizations have generally adopted a more incremental approach and built
parts of the warehouse as the business areas demanded it. The data explosion over recent years has seen
a demand for information to be collected and crunched at a faster pace, leaving the traditional methods
unable to keep pace with the changing data landscape.
Data vault modelling is a more recent hybrid approach, which gathers data from multiple sources and is
specifically designed to be resilient to environmental changes. Dan Linstedt, the creator of this methodology,
describes it as:
“A detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more
functional areas of business. The design is flexible, scalable, consistent and adaptable to the needs of the
enterprise.”
Data Vault is very prescriptive in the use of tables and attributes, but is built from simple, easily-understood
building blocks which all members of the team can easily converse around, understand, adapt and conform
to. This reduces the dependence on a modelling standard arbiter, often a bottleneck, and decomposes the
problem into manageable pieces.
Most important is that the building blocks themselves are simple enough to easily be modified and
recombined in place to deal with inevitable change.
An example of the power of the Data Vault approach, is that these concepts are defined so that all the hubs
can be populated in parallel, then all the links in parallel, then all the satellites in parallel, making a simple
and rapid batch easily available.
System of record
Data vault separates Hubs and Links from the database content which is held in Satellite tables. Each
attribute change in the operational systems is written as a new time stamped row to a satellite table. This
means all changes in the operational systems are captured in the Data Vault model and data for any point in
time can be extracted.
Performance
The vertical partitioning of data into Hubs, Links and Satellites is fundamental to the Data Vault architecture
and enables parallel processing of data and therefore shorter load cycles. This means the Data Vault
architecture handles near real-time loading better than any other architecture. New Hubs, Links and
Satellites records are inserted, records are not updated; this is more efficient and therefore faster.
Our experience is that modelers, architects and ETL developers all appreciate the opportunity to explore
potential productivity-improving modelling and development techniques and, if managed well, the experiment
itself will build knowledge and distinctions that will improve your design under other modelling regimes.
Formal training and certification paths are available in Australia (through C3 and other specialist training
providers). This is valuable for introducing your team to the initial concepts. The modelling method is very
well documented in text and videos on http://learndatavault.com.
There are also technical advice forums and support from the community of Data Vault modelers through
LinkedIn and other technical-social platforms.
The concept of ‘late binding’ where data isn’t integrated until it’s needed is gathering momentum. Indeed
Data Vault is a good option between 3NF rigidity and extreme flexibility (Hyper Generalization).
Using the Data Vault methodology enables you to work on projects with ‘high business value’ rather than
doing all of the work and then determining what you want to use. Finally, Data Vault works well with an Agile
approach because it slices the data into smaller parts and allows for new data sources to be added without
impacting the existing design.
So if you’re considering what modelling methods to use in your organization to extract the relevant
information from your data warehouse in less time than traditional methods, contact us to find out more about
Data Vault and how it can help you.
About EY
EY is a global leader in assurance, tax, transaction and
advisory services. The insights and quality services we
deliver help build trust and confidence in the capital markets
and in economies the world over. We develop outstanding
leaders who team to deliver on our promises to all of our
stakeholders. In so doing, we play a critical role in building a
better working world for our people, for our clients and for
our communities.
EY refers to the global organization, and may refer to one or
more, of the member firms of Ernst & Young Global Limited,
each of which is a separate legal entity. Ernst & Young
Global Limited, a UK company limited by guarantee, does
not provide services to clients. For more information about
our organization, please visit ey.com.
eyc3.com
ey.com/analytics
Contact details:
analytics@eyc3.com