Sie sind auf Seite 1von 5

<-- Back to full color view

Data Warehousing After the Bubble


by Lou Agosta
Originally published December 4, 2008
Printer-friendly
Email to a friend
Email to myself
Listen now
Download MP3
Comments
One thing is clear Basel II and Sarbanes-Oxley did not work. Instead, credit default
swaps (CDSs) were regarded as free money rather than insurance against which
reserves had to be set aside to cover inevitable losses. This data was not captured
in the underlying risk data warehouse because the mortgage and CDS products
were not supposed to be a hazard. Oops. For my part, I am still trying to understand
how a hard working house cleaner and her house painter husband got a $624K
mortgage a paltry quarter of a million dollars was not enough? Mortgages were
considered safe, long-term investments on the part of banks and other long-term
loan originators. However, at the risk of 20-20 hindsight, of which there is no
shortage, this approach was from the days back when banking was supposed to be
boring and innovative financial instruments such as CDSs and packages of
subprime mortgages were just a glimmer in the investment bankers eye.
So what is the lesson here? Meaning is use; and data in isolation is worthless. It is
information that is useful as the way to eliminate or reduce uncertainty. No data
warehouse can contain all dimensions of a business, industry or market; and when
so many variables are changing simultaneously, blind spots happen. The inside of a
bubble is a lot more comfortable than the reality outside it. Things were neither as
rosy as they seemed; nor are they as dark as they now appear. People still need
food to eat; transportation, including cars, to get around; and places to live. Mark
Twain is reported to have said, Buy land. They arent making any more of it. I
believe he added but make sure it is not under water. The bursting of the bubble
and the ongoing economic challenges will reinforce several trends in data
warehousing.
Open source data warehousing. This will accelerate open sources readiness for
enterprise deployment. Functions such as heart beat that make data warehouses
capable of supporting mission-critical applications that require mirroring, rollback,
automatic failover, redo and related components of high availability are now a
requirement. These capabilities are a work in progress, but at an accelerating pace,
given the urgency of the situation. Open source databases in particular MySQL
from Sun and Postgres Plus from EnterpriseDB are working their way through the
BeyeNETWORK: Data Warehousing After the Bubble http://www.b-eye-network.com/print/9222
1 of 5 08/08/2014 23:18
enterprise as components of appliances, column-oriented data marts and related
applications. These applications are often support-oriented and are not mission
critical, but provide a testing and training ground for the next generation of frontline
infrastructure.
Open source is not for the technologically faint of heart. It is optimally deployed in
connection with a support package from a vendor that is going to be available on
weekends, holidays and when you least expect to need it. Still, the potential for
disruption of the existing installed base of the standard relational databases is
significant. While information technology is holding up relatively well amidst the
recession, it is hard to see how that can continue when customers in finance, retail,
hospitality, travel and manufacturing are taking it on the chin. In contrast, open
source remains a bright spot where innovation promises to return improved
productivity, which after all is the best way of creating new opportunities for cost
reduction, efficiency and profitability.
Data warehousing in the clouds. As noted in my recent article, cloud computing
has come up fast with companies such as Amazon, Google and Salesforce.com
stealing a march on the information technology infrastructure stalwarts such as Dell,
HP, IBM, Microsoft and Oracle. Cloud computing differs from all the usual suspects
the grid, software as a service (SaaS), and simple web hosting by providing
virtualization of the entire technology stack and a retail interface for the purchase of
business applications in small increments with which to run a medium-sized
business or perhaps an enterprise. The service level agreement (SLA) for the
application in the cloud is one that a business person can understand,
accommodating data persistence, system reliability, redundancy, security and
business continuity. However, the catch is that the SLA is still in the process of being
defined in an enterprise context. Thus, cloud computing is best suited for small and
medium-sized organizations that can afford to be flexible about their requirements in
order to save a few nickels on infrastructure.
Back to basics: data quality. Data quality remains an issue as some customers
just disappear, leaving only an entry in the data warehouse to be cleared up. As
rapid seismic changes in consumer behavior occur, other customers move into
demographically different categories and no longer have the same marketing, buying
or shopping profiles. Retail discovers the returning popularity of lay away plans,
which require their own application profile. The basic question of data warehousing
remains more important than ever who is buying or using what product or service,
and when and where are they doing so? In any crisis and breakdown of what is
ordinary in this case, the end of living beyond ones means the natural tendency
is to overreact. In that respect, a single, high quality data point (fact) from a data
warehouse is worth a thousand opinions. Stay the course.
Back to basics: front end. Perception of business value migrates in the direction of
the user interface. In addition to enterprise front end from SAP/Business Objects or
IBM/Cognos, upstarts are offering engaging variations on the dashboard theme. For
example, for those interested in new options, check out LogiXML and SiSense. A
front end vendor that blurs the distinction front/back in an interesting way and
reaches back to data sources, providing ETL-like access in addition to analytics, is
Lyza Software. Now layer open source on top of this market. Pentaho is more than
an open source front end since it aspires to data mining and data integration results.
However, it did get its start in reporting and dashboards. When successful, all the
laborious work of upstream data integration will result in an "Aha!" experience as the
business analyst gains an insight about customer relations, product offerings or
BeyeNETWORK: Data Warehousing After the Bubble http://www.b-eye-network.com/print/9222
2 of 5 08/08/2014 23:18
market dynamics. A new and better user interface is not in itself the cause of the
breakthrough. Without the work of integrating the upstream data, the result would
not have been possible.
Back to basics: middle layer. Data integration is arguably a trend with many of the
enterprise application integration (EAI), extract, transform and load (ETL) and
customer data integration (CDI) service vendors leading the charge. Now layer open
source on top of it. An interesting approach to open source ETL is provided by
Apatar. In addition to a compelling pricing proposition, Apatar is building a
community of users by means of a shareable database of ETL maps submitted and
maintained, in the spirit of open ETL, in its Forge database. While it is improbable
that anyone elses application is exactly like yours, business problems fall into
categories and one is likely to be close to what you are seeking. This is a great way
to avoid reinventing the wheel and to jumpstart a project. Data integration requires
schema integration. A schema is a database model (structure) that accurately
represents the data in such a way that it is meaningful. To compare entities such as
customers, products, sales or store geography across different data stores, the
schemas must be reconciled in terms of consistency and meaning. If the meanings
differ, then translation (transformation) rules must be designed and implemented.
The point is that IT developers cannot plug into data integration by purchasing a
plug in for a tool without also undertaking the design work to integrate (i.e., map
and translate) the schemas representing the targets and sources.
Back to basics: back end. Design consistent and unified definitions of product,
customer, channel, sales or store geography, etc. This is the single most important
action an IT department can undertake regarding a data warehousing architecture.
Front line data warehousing with clickstream applications are here to stay, and key
data dimensions and attributes now also include those relevant to the Web such as
page hierarchies, sessions, user IDs and shopping carts. Every department (finance,
marketing, inventory, production) wants the same data in different form thats why
the star schema design and its data warehouse implementation were invented.
Extensive research is available on how to avoid the religious wars between data
warehouses and data marts by means of a flexible data warehouse design. The
previously cited comments on open source databases and data warehousing in the
clouds are relevant here. According to my calculation, that constitutes front end,
middle, and back end open source options from which to assemble a complete
system. Obviously, enterprise customers will find value is having even more choices,
and those are coming. In my opinion, economic uncertainty will be a benefit to open
source and its users. IT benefits from the available bandwidth that developers may
have now to start something really engaging (cool), and it limits the downside
financial risk. Win-win.
Plenty of blame and finger-pointing is available as the responsibilities for the housing
bubble, credit default swaps (CDS), and packages of toxic mortgage debit get
passed around like hot potatoes. Self-scrutiny on the part of Barney Frank, Chuck
Schumer and Henry Waxman, members of Congress who urged on the excesses of
mortgage lenders Freddie Mac and Fannie Mae are noticeably absent. It is true that
Alan Greenspan in testimony before Congress indicated that one of the problems
was that he had bad data; but that was in the context of acknowledging that his
point of view on regulation was in need of more work.
1
This is the moral equivalent
when decoded from central banker talk of the former Fed chairman saying that he
was wrong, the recognition of which I shall cherish no matter how long I live.
Fortunately for Greenspan, he already published his book, because his reputation
BeyeNETWORK: Data Warehousing After the Bubble http://www.b-eye-network.com/print/9222
3 of 5 08/08/2014 23:18
now looks to have been as inflated as the price of housing in the year 2006.
However, the one thing that no one has yet done is blame it on the data warehouse.
Accurate, timely data is more important than ever before, and the data warehouse is
one of the best ways of assuring it. Seriously, I expect there to be more work in the
public sector building data warehouses (as well as transactional systems) to
navigate through the economic and political dynamics.
End Note:
Neil Irwin and Amit Paley, "Greenspan says he was wrong on regulation." The
Washington Post, October 24, 2008.
1.
SOURCE: Data Warehousing After the Bubble
Lou Agosta
Lou Agosta is an independent industry analyst, specializing in data
warehousing, data mining and data quality. A former industry analyst at Giga
Information Group, Agosta has published extensively on industry trends in data
warehousing, business and information technology. He is currently focusing on
the challenge of transforming Americas healthcare system using information
technology (HIT). He can be reached at LAgosta@acm.org.
Editor's Note: More articles, resources, and events are available in Lou's
BeyeNETWORK Expert Channel. Be sure to visit today!
Recent articles by Lou Agosta
Data Warehouses Must Learn New Tricks in Big Data Era
An Interview with Malcolm Gladwell
An Interview with John Sall, Co-Founder of SAS
The Healthcare Information Technology (HIT) Market is Poised for Growth
More BeyeNETWORK Articles
Information Management Strategies from William McKnight Summary
This excerpt is from William McKnights newest book Information
Management: Strategies for Gaining a Competitive Advantage with Data.
Through this book, William McKnight helps you understand the value of
information in your enterprise. In this somewhat technical chapter, he reminds
us that in order to be an excellent information manager or strategist, there is
technical information you must know.
Business unIntelligenceInsight and Innovation Beyond Analytics and Big
Data Summary
Is there still a need for the data warehouse? In this excerpt from his new book,
Barry Devlin looks at why the data warehouse can no longer retain its old role
of being all things to all informational needs.

BeyeNETWORK: Data Warehousing After the Bubble http://www.b-eye-network.com/print/9222
4 of 5 08/08/2014 23:18
Comments
Want to post a comment? Login or become a member today!
Be the first to comment!

Copyright 2004 2014. Powell Media, LLC. All rights reserved.
BeyeNETWORK is a trademark of Powell Media, LLC
BeyeNETWORK: Data Warehousing After the Bubble http://www.b-eye-network.com/print/9222
5 of 5 08/08/2014 23:18

Das könnte Ihnen auch gefallen