Sie sind auf Seite 1von 22

Cloud Computing Models for Data Warehousing

Cloud Computing Models for Data Warehousing

2012 Technology White Paper

Cloud Computing Models for Data Warehousing

TABLE OF CONTENTS
Executive Summary....................................................................................................... 1 Cloud Computing Concepts .......................................................................................... 2 Utility Computing .................................................................................................... 2 Cloud Definitions..................................................................................................... 3 Core Characteristics of a Cloud ......................................................................... 3 Cloud Deployment Models ............................................................................... 3 The Benefits Reported for Cloud Computing ......................................................... 4 The Promise of Cloud Computing for Data Warehousing ...................................... 5 Lower Costs ....................................................................................................... 5 Faster Delivery .................................................................................................. 6 Performance and Scalability ............................................................................. 6 Agility ................................................................................................................ 7 Database Workloads and Their Fit with Cloud Infrastructure ...................................... 8 Shared-nothing Databases are Required to Support BI in the Cloud ..................... 8 Shared-nothing Databases: Necessary but Not Sufficient...................................... 9 Public Versus Private Clouds for BI and Analytic Databases ...................................... 10 Challenges with the Public Cloud Today ............................................................... 10 Benefits of Private Cloud over the Traditional Server Model ............................... 11 Cloud Adoption Preferences ................................................................................. 13 Private Clouds and the Data Warehouse in Action .................................................... 14 Creating a Consolidated Data Platform .......................................................... 14 Costs, Budgeting and Planning ....................................................................... 15 Managing Performance .................................................................................. 16 Agility .............................................................................................................. 17 Conclusion and Recommendations ............................................................................ 19 About the Author ........................................................................................................ 20

Cloud Computing Models for Data Warehousing

Executive Summary
Cloud computing is creating a new era for IT by providing a set of services that appears to have infinite capacity, immediate deployment and high availability at trivial cost. It's the result of the evolution of computing and communications technology from a high-value asset to a simple commodity. In that evolution, the focus shifts from the concept of computing as a physical thing in a data center to computing as a service, like electricity, that is accessible from the nearest network connection. Today most organizations are looking at cloud as a way to lower data center and IT provisioning costs. While cost reduction is a real benefit, there is more value in the increased speed, flexibility and ease of delivery in cloud environments. The only way to gain these advantages is by changing the approach and practices for delivering applications and data. The real change in IT is a change to how work is done rather than adding a new technology. Early worries about loss of control over the environment are being outweighed by the combination of lower costs, faster deployments and simpler scalability. Even so, not all deployments are moving to public cloud providers. Many organizations are adopting private clouds for some applications because of technical, performance and regulatory reasons. Database workloads are a particularly challenging area for the cloud environment. As a rule, cloud deployments beyond a moderate scale favor shared-nothing database architectures designed to run transparently in a multi-node environment. Despite the availability of these databases, performance and scalability of relational query workloads can suffer in the public cloud. We are still in an early period of standardization and design of software to run in the cloud. Not all workloads are suitable for deployment on a collection of small virtualized commodity servers today. Business intelligence and analytic database workloads fall into this area, raising the importance of careful analysis for fit with both public and private cloud options. There are other reasons for not using public cloud environments. Data privacy and security regulations can prevent an organization from using a public cloud. Data movement and data management between internal systems and the cloud may be enough of a challenge that it eliminates any speed or cost advantages associated with using the cloud. Private clouds offer a solution to these challenges. A private cloud is like a single-tenant version of a public cloud. The dedicated nature of a private cloud resolves the privacy and regulatory difficulties and can resolve some of the technical challenges with BI workloads. There are tradeoffs between public and private clouds that make hybrid solutions likely for the next five to ten years. Teradata Active Data Warehouse Private Cloud is the first real example of a private cloud for data warehousing workloads on the market, embodying key elements of self-service, pay-foruse and elastic growth and shrinkage of resources.

Page 1

Cloud Computing Models for Data Warehousing

Cloud Computing Concepts


Utility Computing
Cloud computing is a model for delivering IT platform infrastructure. It's a shift from the idea of computing platforms as hardware and software products to the idea of computing platforms as a service used by applications, much as a household appliance uses electricity as a service. This utility computing model parallels the evolution of the electric industry. In the early days of electricity there was no electric grid. Many small electric companies started with their own generators and wires running directly to customers, with the biggest demand being for street lights. Organizations and individuals wanting a reliable and controlled electric supply installed their own generation equipment that was sized to their needs. Generation and transmission technology matured, going through a commoditization process. Standards developed for electricity and the electric market consolidated into a smaller number of suppliers with interconnected service. The availability of electricity as a service meant there was no longer a need for private generators. It's rare to find an individual with their own home generator today.

Figure 1: Private home generator circa 1918

Electricity available as a metered service to organizations resulted in a savings in capital assets since generators and transformers were no longer needed, a savings in resources to supply the generators (coal or oil), and an equally large savings in operations and engineering labor for the people who maintained the equipment. In similar fashion, the IT industry grew, spread and has been consolidating around a small number of large platform suppliers. The IT market today is much like the market for electricity a hundred years ago: organizations with a desire for reliable and controlled computing buy and run their own equipment. Cloud computing is the inevitable result of the commoditization of hardware and communications technologies. The combination of computing power and the ability to access it from anywhere means there is less need to buy and maintain hardware, much like universal access to electricity reduced the need for private generators. The important aspect of computing is the work done by applications and the data they generate. Computing platforms have become a commodity service that can be transmitted from a remote location. This is a fundamental disruption to the IT industry, a disruption we're still at the beginning of.

Page 2

Cloud Computing Models for Data Warehousing

Cloud Definitions
It's important to define terms before progressing. The most important is cloud computing, which the National Institute of Standards and Technology defines as " a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction."1
Core Characteristics of a Cloud

The key elements NIST defines for a cloud that differentiate it from a cluster of servers in a data center or rented hardware at a hosting provider are: On-demand self-service. A consumer should be able to acquire computing resources as needed without requiring human interaction with the service provider. Network accessibility. The capabilities provided should be available over a network using standard client software that is independent of any underlying hardware. Resource pooling. The computing resources are allocated from a shared pool in a way that is transparent to the consumers of the service. The resources can be dynamically reassigned based on demand and have no strict dependence on physical location. Elasticity. Capacity should be dynamically provisioned so that it can grow or shrink on demand, and should appear as if it comes from an unlimited pool of resources. Measured service. Resources should be delivered in a "pay-for-use" model where the consumer is charged based on actual use of resources. The consumer should have the ability to monitor and control resource use, making the billing process transparent.
Cloud Deployment Models

Cloud architecture is presented as being either public or private, which implies both who has access and where it is located. The difference most commonly described is that public clouds pool resources and share them across many organizations, while private clouds are dedicated to a single organization. The tradeoff is important, as it means that private clouds don't have the benefit of pooled resources outside a single firm. This dedicated nature of the private cloud implies that costs will be higher since they can't be shared across multiple organizations. There is also a hard limit to the scalability of the infrastructure, beyond which more physical resources need to be added. The other differentiator between public and private clouds is the ability to place a private cloud on-site, to own the environment but place it at a managed-hosting provider, or to obtain dedicated infrastructure from a cloud service provider. The use of a private cloud is driven by the need to maintain control of the service delivery environment. This could be due to industry requirements, regulatory controls or specific performance requirements. Most organizations face data privacy or security regulations that prevent them from locating data in uncertified facilities or locations.
1

The NIST Definition of Cloud Computing, http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

Page 3

Cloud Computing Models for Data Warehousing

The Benefits Reported for Cloud Computing


The core benefits of a cloud over a traditional server environment vary by need and the type of systems deployed. Figure 2 lists the benefits reported by IT decision makers as reasons for moving to the cloud2.

Benefits stated as reasons for using cloud computing


Pay only for what we use
Hardware savings Software license savings Lower labor costs Lower outside maintenance costs Reduce IT support needs

50%

47%
46% Cost reduction

44%
42% 40% 40% 39% 39% 39% 39% Reduce time to value

Able to take advantage of latest functionality Rapid deployment Relieve pressure on internal resources Able to scale IT resources to meet needs Resolve problems related to updating/upgrading

Figure 2: Benefits seen as reasons for deploying in a cloud deployment.

The reasons are easily grouped into one of two categories: cost savings or time-to-value. Cost is the biggest single factor for most people, but the real justifications are more complex, involving several combined benefits. A large part of the reduction in costs is from shared services and the metered billing. The pooled resources of the public cloud provider mean that they can keep all equipment at a higher utilization than a traditional server farm, reducing the per-unit cost below what is normally possible in a dedicated data center. In a private cloud the resources are limited to one company, but may still be pooled across lines of business or departments, providing some of the same benefits. The ability to turn off or reduce the amount of resource dedicated to an application when it's not needed reduces the cost of running that application. In the case of a data warehouse, the hardware is usually sized to the peak workload. The data warehouse will use fewer resources than the peak most of the time. Paying for only the resources that are needed and paying when they are needed can drastically reduce costs. The ability to provision resources immediately without the procurement, delivery or setup time involved in the traditional model is a key element to speeding projects. The reduced costs and the provisioning speedup have a secondary benefit. Many small projects that were too hard to deliver in the timeframe required, or that couldn't meet the ROI hurdle, become viable in the new environment.

Source of data: IBM global survey of IT and line-of-business decision makers.

Page 4

Cloud Computing Models for Data Warehousing

The Promise of Cloud Computing for Data Warehousing


Organizations often look at the IT department first when cutting costs. IT is an obvious starting point because of the combination of high capital costs, high labor costs and the challenge of explaining the value of IT. The business intelligence (BI) group is one area of increasing expense in many organizations. BI and analytics have been in the top five IT spending priorities for several years according to multiple CIO surveys. With increasing spending comes increased scrutiny.
Lower Costs

Cloud computing is seen as a potentially inexpensive alternative to conventional data warehouse servers. The lowering of costs is due to a number of factors. First is the economy of scale that a cloud provider has when purchasing and pooling resources across a large customer base. This achieves utilization higher than an internal data center allowing the per-unit cost of the provider to be less than the per-unit cost in IT. Second, the unit cost of incremental growth is less expensive in the cloud. In the traditional model, the incremental cost of scaling up a data warehouse involves adding expensive server resources, upgrading to a larger server or adding nodes to a cluster. Because this is a capital cost, it is usually planned far in advance and more resources are purchased than are immediately needed. In the cloud, the incremental cost is limited to only the resources needed at a point in time and is paid after use. The elastic environment of cloud computing translates into cost savings. The model is usage based, so resources may be reduced when they aren't needed, and increased when required. This same elasticity applies to development and test environments, which can be shut down when not needed. In a traditional environment, the hardware and software must be purchased up front. There is no need for hardware upgrades in the public cloud, simplifying operations. The reason for upgrades is due to the aging of physical assets and the need for increased capacity or performance. The complexity of managing upgrades disappears when the provider manages these as part of the service. If hardware is changed by the service provider, it will be transparent to the virtual machine layer running on top of that hardware. If additional capacity or performance is needed by IT, resources are added on demand, bypassing the need for upgrades. The cost of operations is lower in the public cloud because the provider delivers this as part of the service. There are no data center, systems, storage or network administration costs. Only the DBA or developer must be involved to determine capacity and performance needs. The proper cost comparison between traditional and cloud models is not just hardware, but the total picture of data center and operations costs associated with the environment.

Page 5

Cloud Computing Models for Data Warehousing

Faster Delivery

Using the cloud for database infrastructure removes procurement and provisioning barriers that slow projects. Imagine if, instead of waiting months for budget approval, order processing, shipment, setup and configuration, you could start development on projects immediately and deploy them into production with no delays due to operations. If it took less than an hour to add the capacity needed, what would you do differently? For a start, resources dedicated to provisioning could be moved to more productive work. Project delays based on capacity and performance would no longer be a problem. On-demand self service enables faster provisioning of resources for a data warehouse, whether it's for the initial install or adding additional resources for more capacity or better performance. The benefit isn't limited to the production environment. Ongoing projects in a BI program can be delivered more quickly since development and testing environments can be created and removed as needed. The entire procurement process is accelerated by the pay-for-use model. Because the cloud is treated as a service, it is paid for as an operational expense rather than as a capital expense. This removes the need for a capital budget for infrastructure, simplifying the IT budgeting process. Unexpected BI requests and unplanned projects are common in most organizations. These usually require additional hardware resources. The need for a capital acquisition can slow or stall a request as the budget changes are allocated and approved. If costs can be expensed then they can easily be paid by the group making the request without the need for a slow budget approval process.
Performance and Scalability

Performance and scalability are the two biggest challenges faced by most data warehouse DBAs. Performance management is even more challenging for operational BI workloads with strict performance service level agreements (SLA). This involves a lot of effort managing workloads, tuning, and possibly segregating some work to a separate environment. The approach taken to performance management in the cloud is different. The cloud allows hardware resources to fluctuate around demand. The static model of hardware in the traditional environment is replaced by a dynamic model of computing resource delivery. There is no need to size a public cloud environment for a planned peak capacity to meet an SLA. Resources are effectively unlimited, so the DBA can specify an SLA and allow the environment to automatically adjust resources as the system is running. This replaces the need to throttle work that is interfering with performance of critical workloads. Public cloud database offerings today lack this type of SLA management. An additional tool for the DBA is the ability to provide feedback to the departments that consume more resources than expected. The consumers can decide whether the work they are doing is worth the additional cost of dynamic resources.

Page 6

Cloud Computing Models for Data Warehousing

This changes the nature of discussion about performance. It's now a discussion about the cost of doing a set of work. The cost remains static whether the work is done on one machine in a hundred hours or a hundred machines in one hour. IT can discuss with users the value of the work being done, and they can make rational decisions about work, cost and timeliness.
Agility

The combination of increased speed, lower cost and the elastic nature of the cloud translate into agility for the data warehouse. The faster turnaround of BI projects means a more responsive BI group that is better able to address unplanned requests. The measure of a BI organization is their ability to handle the normal workload and meet unexpected demands. Every organization has unplanned projects. Many smaller projects have a time limit after which they are no longer valuable. Today, one-off projects and those with an unclear benefit are hard to justify because of the capital budgeting process and the time and effort to provide resources. In a cloud environment they can be built at lower cost, and if they don't have the expected benefit they can quickly be shut down with no sunk cost in hardware or software licenses. The combination of fast provisioning for development and new production workloads and the lower cost of doing so allows the BI group to complete more of these projects. It also allows the BI group to give some control over resources and priorities to others in the organization, supporting more agile BI development practices.

Page 7

Cloud Computing Models for Data Warehousing

Database Workloads and Their Fit with Cloud Infrastructure


Using the public cloud is not a simple migration of applications. Some applications benefit more than others, and some are more appropriate for private than public cloud deployment. The most important factor for deciding the suitability of the public cloud is the workload. Workloads have different characteristics which make them more or less suited to today's typical cloud environment. There are three primary system workloads in IT. OLTP Transaction processing is a mixed read-write workload which can be lightly to very write-intensive. OLTP requires low latency response, accesses small amounts of data at one time and has predictable access patterns with few, if any, complex joins between tables. Business intelligence BI workloads are read-intensive, with writes usually done during offhours or in ways that don't compete with queries. While quick response times are desired, they are not typically in the sub-second range that OLTP requires. Data access patterns are more unpredictable, read lots of data and can have many complex joins. Analytics Analytic workloads are both compute-intensive and read-intensive, similar in many ways to BI except that access patterns are more predictable. They generally access entire datasets at one time, sometimes with complex joins prior to doing computations. Most analytic workloads are done in a batch mode, with the output used downstream via BI or other applications. The success of large scale and extremely high concurrency workloads at Web and online startups demonstrates how well the cloud can support transaction processing. These workloads are simpler to run at scale because the data volume and complexity of an individual transaction is small and easy to isolate, simplifying the back-end database. It's easier to gain scalability by spreading the back-end work across virtual servers in the cloud.

Shared-nothing Databases are Required to Support BI in the Cloud


Business intelligence and analytic database workloads are at an intersection of requirements that makes them harder to run in the public cloud. BI queries normally retrieve some, but not all, of the data. This selectivity poses challenges for brute force cloud processing models. The query needs of BI can't be met by the cloud databases available today because they are either single-node or non-relational. Single-node databases can't scale in a cloud environment. A public cloud is more like a collection of equally-sized small nodes that are used as building blocks. Increasing the resources of a single node in public clouds is very limited when compared to on-premises hardware. If a database can't grow past the boundary of a single cloud node, then the system has an inherent performance limit. The non-relational or "NoSQL" databases are designed more like object stores, with limited to no SQL support or ability to join tables. Typical BI queries require joins across many tables, and they process significantly more data than what is found in a single transaction. These databases are designed mostly to address OLTP problems, much like the standard relational databases in use today.

Page 8

Cloud Computing Models for Data Warehousing

Another complication is the mismatch between BI tools and cloud databases. Most cloud databases are non-relational, making them incompatible with the SQL-based query, reporting and dashboard tools in use today. Architects face a technical problem when moving to a cloud environment: relational databases are the primary choice to support BI workloads, but conventional databases are a poor match for public cloud infrastructure. These databases are designed to run on a single system or in a shared-disk cluster. The way to increase database performance for larger data volumes is by making the servers larger. The optimal database architecture for cloud deployment is a shared-nothing database. The massively parallel processing (MPP) database model matches the architecture of a cloud environment. Shared-nothing relational databases, unlike their traditional counterparts, are designed to function in a distributed hardware environment such as the cloud.

Shared-nothing Databases: Necessary but Not Sufficient


Cloud computing is more than a hardware platform. The concept includes the ability to provision easily, to dynamically adjust resources as needed, and to pay for use rather than paying upfront for hardware and software. A data warehouse platform that is truly a cloud service should include all these capabilities. While provisioning public cloud resources may be simple and inexpensive, extending a database across more nodes is usually not. Adding resources to a database in the public cloud can require extensive work by administrators to redistribute data. Self-service data and resource provisioning is required in order to speed projects. Database licensing presents a second obstacle to provisioning. Most MPP database vendors have some form of scalable pricing based on data volume or nodes today. The concept of elasticity challenges the way vendors sell their products. An elastic model implies that resources can grow and shrink. Current vendor licensing assumes that resources and costs can only grow. Most database vendors do not allow customers to pay based on actual use. These vendors lack the concept of an abstract service with usage that can be metered, unlike the Teradata Active Data Warehouse Private Cloud which provides flexible pricing options that can grow and shrink based on resource usage. Without this, they have no way to measure the use of the database and charge for it. They are trying to sell product in a service delivery world.

Page 9

Cloud Computing Models for Data Warehousing

Public Versus Private Clouds for BI and Analytic Databases


The public cloud solves many problems, but it introduces new challenges as well. Platform requirements vary based on workload characteristics, data sensitivity and business requirements. It's important to understand what the challenges are in order to determine which cloud deployment model, public or private, can deliver the greatest benefit.

Challenges with the Public Cloud Today


Performance for BI Workloads. Public cloud infrastructure is dramatically different from the conventional hardware environment used for data warehousing. The cloud is built with uniform commodity components rather than high-end servers. This means there are no high-speed interconnects between nodes, no high-speed I/O subsystems, and direct-attached high-speed disks are rare. These differences can slow the performance for BI workloads. Data warehouse databases are usually configured to work with hardware that is designed for heavy I/O workloads, something not normally considered in public clouds. This means that even if they are able to run in the cloud, they probably will not perform well. Public clouds are multi-tenant. A node in the public cloud is mostly likely not a server, but one of several virtual machines running on a single server. While the memory, CPU and storage are dedicated, the virtual machines generally share the same internal bus, network hardware and I/O channels. Most databases expect hardware to be dedicated rather than shared. This can lead to hidden conflicts as an unknown virtual machine saturates the underlying server's shared resources, causing one node of the database to run slower. Private clouds can deliver a better performing option than public clouds. The hardware can be configured for heavy database workloads, either by IT or by a vendor. In an appliance model, the vendor has already specified the proper hardware configurations. Legal and Regulatory Challenges with Data. The use of a private cloud is often driven by the need to maintain stricter control over data. Multiple organizations' data can be mingled in the same database, creating a liability if the public cloud provider accidentally exposes that data or systems that access it. Privacy laws in many countries regulate where data can be stored or moved. In a public cloud, it's not normally apparent where the data is stored. This makes it difficult, if not impossible, for an organization to dictate that their data remain within a specific country's borders. Use of a public cloud will be bound by these regulations. Using a private cloud allows an organization to control the physical location of the infrastructure and who has network access to the systems. This provides most of the public cloud computing benefits in situations where regulatory controls dictate that data reside only in certain locales or data centers, or be stored in non-shared databases.

Page 10

Cloud Computing Models for Data Warehousing

Data Management and Integration. The public cloud adds complexity to data management. Most data integration and data management tools in use today are not designed for compatibility with the infrastructure and communication standards. They also share the same license incompatibilities with databases for deployment in a cloud environment. The location of data that is loaded into the database is another consideration. When the data to be loaded is largely external, data movement is less of a challenge. When the data originates inside a data center and has to be sent out over the typical bandwidth-constricted network connections, data movement can be an obstacle. In the case of ETL-style workloads where data can move back and forth many times, performance can suffer. There is a possibility of increased cost too, as some cloud providers add charges for data movement into or out of the cloud. Security Requirements. The public access nature of the cloud and multi-tenant cloud databases is a problem for some IT organizations. Most databases assume they are running inside a private data center, not exposed as an endpoint on the Internet. They haven't been designed for open public exposure. This creates additional compliance and security costs because there are more components to monitor and exceptions to standard practices that must be managed. The software and personnel costs can outweigh the benefits of running the database externally. According to one survey, 75 percent of financial services companies said that concerns about data security and privacy were the biggest obstacle to using a public cloud.3

Benefits of Private Cloud over the Traditional Server Model


The ability to run BI workloads in a public cloud is limited today. Until public cloud infrastructure can be configured to take into account the specialized needs of BI and analytic query workloads, performance at scale will be challenging. The regulatory, security and privacy concerns will prevent some organizations from using the public cloud. For this reason, many organizations are using private clouds while the public cloud technologies and practices mature and standardize. The private cloud offers the control that is needed over the environment to meet these regulatory, security and data management concerns while still delivering the performance, cost and scalability benefits. IT departments are concerned about resource utilization in the data center. A data warehouse or mart has highly variable workloads that can leave an expensive server idle for long periods. With multiple marts, the inefficiency is even more pronounced as multiple servers are underused. A private cloud offers more efficient use of these resources. With a private cloud, the data warehouse environment can be sized to the baseline workload in order to maintain much higher server utilization. When the workload increases, the elastic facilities allow the environment to maintain a constant high utilization while scaling up, and then shrink back to the baseline as the workload subsides.

IBM global CIO survey, 2010.

Page 11

Cloud Computing Models for Data Warehousing

A private cloud provides the ability to consolidate multiple data marts onto a single platform and maintain utilization rates of more than 90% for all of the environments, taking advantage of the elastic capability to keep performance consistent. This provides significant cost savings from efficient use, as well as reducing license and maintenance costs by running a leaner data warehouse environment. Figure 3 compares some of the key attributes of the traditional server-based and private cloud models. A private cloud can deliver many of the public cloud benefits for a data warehouse while avoiding some of the limitations of the public cloud.

Traditional server model Initial purchase time Initial install time Time for incremental purchase weeks to months days to weeks after receiving hardware weeks to months (including CapEx approvals) High: servers, storage, network, software, resources to configure the environment CapEx, typically TCO or ROI justification; depreciation over extended period High High High

Private cloud weeks to months days to weeks after receiving hardware minutes to hours*

Startup costs

Moderate: servers, storage, network, software, resources to configure the environment ** Mix of CapEx and OpEx, monthly fee based on use*** Low**** High High

Cost model

Incremental scale costs Performance for BI/DW workloads Control over data location and access

Figure 3: Comparison of key attributes in the deployment models. ____________________________________________________________________________________________ There are some aspects of private cloud where the answer depends on the context * Cloud speed for incremental scale unless at a hard physical boundary where more physical hardware is needed, then there is a larger additional cost for most cloud capacity. ** Startup costs vary. A white box hardware environment requiring full purchase and configuration is similar to the traditional model, therefore high. A vendor appliance format for a private cloud offering is lower. *** CapEx for the initial environment; OpEx for the ongoing scale and elastic properties. **** Low cost to scale until hitting a hardware boundary requires another capacity purchase.

Page 12

Cloud Computing Models for Data Warehousing

Cloud Adoption Preferences


Due to the limitations of the public cloud, adoption for data warehouses is still low. Private cloud use for some workloads is growing faster than the same use in the public cloud, and varies by industry. Financial services is an example of an industry with many different workloads on large volumes of data. Figure 4 shows the preference of decision makers in financial services for deployment of the three core workloads4.
44% 45% 11%
Prefer not to use cloud

Transactional databases

Data warehouses or data marts

35% 44% 21% 39%


52% 9%

Private cloud preference Public cloud preference

Data mining, text mining, or other analytics

Figure 4: Stated deployment preferences for different workloads in the financial services industry.

The financial services industry tends to favor options that allow for more control over data because of the many security, privacy and legal requirements it faces. The industry profile is more likely to match mainstream IT behavior than the many examples of cloud use reported by Web startups and on-line businesses. The innovators and early adopters in the technology industry don't share the technical and organizational barriers that many IT organizations face.

Source of data: IBM global survey of IT and line-of-business decision makers.

Page 13

Cloud Computing Models for Data Warehousing

Private Clouds and the Data Warehouse in Action


This section describes the experiences of several organizations moving from traditional serverbased data warehousing to an Analytic Private Cloud that delivers scalability, stable performance, self-service provisioning, and permits more flexible deployment and payment options.
Creating a Consolidated Data Platform

Data warehousing was historically built around the idea of a single reporting system rather than the concept of delivering a platform for multiple uses. The primary goal of data warehousing is a combination of centralized management of important data as well as providing the capability to build systems on top of the data warehouse that can use the data. Our industry has not paid as much attention to the latter. Organizations need more than a passive repository. They need a platform than enables multiple, different uses of data. A platform addresses data infrastructure that will support different workloads, data latencies and performance requirements. One IT executive describes the difference between the system-oriented view of the data warehouse and the view of the data warehouse as a platform as being like the difference between using a server and using the cloud. A platform allows multiple, sometimes incompatible uses. "The idea of a cloud gives us something we didnt have with the traditional model. The cloud allows us to build an enterprise data warehouse but without the constraint of enforcing a single universal data model to solve all needs," he said. This allows them to have an enterprise data warehouse, but also attach data marts to it within the same platform. The private cloud provides an alternative to building many marts scattered across the organization. There is still a central data model, but also the ability to manage other arrangements of the data, or data that is not included in the core data model. "The [Teradata] Data Lab lets us support separate environments for different groups that are integrated and managed within a single platform," he said. It may appear to be easier to give different groups their own data marts focused on their specific needs because the incremental cost of expanding a difficult-to-scale single database is too high. As one BI director says, "If you have a central data warehouse with performance problems, then adding to it is unlikely, and instead you add another mart." The challenge with this divided approach is not the cost of managing single systems, but the total cost of the BI environment. While individual marts address specific needs, they introduce redundancy and complexity. They make integrating data harder because it is stored in silos, introducing data reconciliation problems. Most of those who consolidated multiple data marts onto a Teradata platform were less concerned about gaining better performance than they were about reducing the cost and complexity of their BI environment. "We want a central clearing house for data without the messy moving around of data to lots of databases we have today," said one IT manager of his goal. "Building a [single] information repository is where the [Teradata Analytic Private] cloud is especially valuable."

Page 14

Cloud Computing Models for Data Warehousing

Organizations need better ways to manage and deliver data. The challenge is that the more you make data a centralized resource in a traditional server model, the more difficult it is to support, and the higher costs go. Creating multiple data silos as a way around these problems adds more complexity and cost. Private clouds offer a better way to deliver this environment.
Costs, Budgeting and Planning

Cost savings is still the biggest driver for people looking at cloud computing alternatives. It's a reaction to the rising costs from data growth and the increasing cost of performance as the data grows. The incremental growth of infrastructure is reduced during budget tightening periods. "Capital budgets are very competitive," said one BI manager. "When times are tougher, we get asked whether we've done everything we can to squeeze out more performance before they'll approve any new money. If we're at the absolute limit, then you can add resources." Worry about the cost of a desired level of performance and the incremental cost of scale are symptomatic of a way of thinking: data warehouse as a product rather than as a service. The metered payment model of the cloud lessens these concerns for IT. As one BI manager described it, "You buy a thing with limited capacity and you're constrained by it, instead of buying as much or as little capacity as you need at one time." The new model for planning data warehouse infrastructure is the way we use electricity. A product-centric model means diligent capacity planning and up-front payments, while a service model means paying for what you need when you need it. The cloud pay-for-use model allows companies to shift the mix of money from capital expenses (CapEx) to operating expenses (OpEx). Capital budgets take longer term planning and are done annually at most firms. This is at odds with the dynamic nature of information use, where capacity and performance needs fluctuate throughout the day, month to month and over annual cycles. "For us it's about not having to commit large amounts of capital up-front for data projects that might not last," said one BI executive. "Getting CapEx is really tough. Avoiding that is the number one benefit for us," reported another BI executive. "Every time we need to upgrade our systems [during the year] we needed to purchase hardware. This took months every time. Now we don't have to go to the executive committee. We pay for what we consume." The benefit isn't solely consumption-based pricing. Consumers can provision on demand in small increments rather than the large increments of a server-based environment. This allows resources to be committed a little at a time. With the cloud, responsibility for much of the BI platform budget is pushed to the business departments as part of their operating expenses. This has the added benefit of faster management approval for BI projects. There are no high up-front costs, no capital budget approval process, and each department funds only what it uses. A budgeting challenge pointed out by one BI manager was the way costs are allocated for projects shared by several departments. "If the priorities for one department change and they

Page 15

Cloud Computing Models for Data Warehousing

drop out, everything changes. We can't buy what we planned, and we have to redo our justification and ROI model. This takes weeks." The flexible pricing models of Teradatas Analytic Private Cloud allow purchases like this to be easily readjusted. A benefit of the on-demand and pay-for-use model is transparency for the data warehouse budget and better visibility into use. Most organizations allocate the total CapEx cost across departments. In this model, it's common for some departments to complain that they are being unfairly hit with more than their fair share of costs. To deliver pay-for-use, monitoring is better in the cloud environment. Better monitoring can prove or disprove departments' unfairness claims. Metered billing aligns payment with consumption. "One department was paying 50% of the data warehouse budget under the old allocation model," reported one BI manager. "The problem is they used about 20% of the resources while another department paid 20% and used 40%. Both groups were surprised when we showed them the real numbers."
Managing Performance

As one IT executive stated, "Performance management is everyone's biggest issue here." The challenge with performance is that data warehouse workloads are characterized by unpredictable demand. "Unexpectedly spiking workloads are the major problem," he said. The only solution in a server-based model is to size the environment for the expected peak workload and adjust the resources annually or if lucky, a few times per year. On-demand and automatic resource provisioning and the elastic nature of the Teradata Analytic Private Cloud can help with unplanned demand. BI applications, operational BI and management dashboards often have performance SLAs. The goal of the SLAs is to keep the user experience consistent. It's simpler to maintain consistency in a cloud environment by allowing resources to fluctuate around a specified performance level. This is even more important if one consolidates data marts onto the same platform. The marts can't have a negative impact on performance for existing users, and the mart shouldn't perform worse than it did when it was standalone. Elastic on-demand resources can be used to support this consolidation of workloads. "Scale and performance in minutes, not months" is how a DBA manager describes it. There is a choice with elasticity. One can keep costs steady with a performance ceiling, or keep performance steady with variable costs. The only option in non-cloud environments is the former. "We see consumption jump immediately after an upgrade because of the backlog of things that could impact performance for other more important applications," he said. "By the end of the year, the warehouse is hitting a performance ceiling, and we have to delay or stop new requests. That problem goes away with the Teradata [Active Data Warehouse] cloud." New BI workloads are a source of growing performance demands. Exploratory analysis and data discovery can require large volumes of data. Unlike reporting, they need to deliver the data interactively in real time. Unlike dashboards, data access is highly unpredictable.

Page 16

Cloud Computing Models for Data Warehousing

This holds true for analytics projects as well. Many analytics projects require high consumption of resources for periods of a few days. These projects use enough resources to slow everyone else down, or they are moved lower in workload priority so they don't affect other users of the warehouse. Either way, someone is hurt by the performance problem. In a private cloud the elastic resources and self-service provisioning support these new uses. Performance can be maintained at a steady level by allowing resources to fluctuate based on the current workloads in the warehouse.
Agility

There are several aspects to agility. Speed of deployment is a large factor in BI projects because of the long lead times associated with hardware procurement and setup. The on-demand capability of a cloud allows a team to quickly develop, test and deploy projects that would require coordination across several operations groups in a traditional environment, taking weeks or months longer. "On-demand resources mean we can deliver projects as needed," said an IT manager using Teradata Data Labs. "We can set up the environment for a project overnight. If the project wasn't valuable enough to the sponsors, we can end it and the resources are just given up with no penalty or cost." In a central data warehouse the higher ROI projects get done first. A side effect is creation of a set of projects that get perpetually pushed down in priority because they don't offer enough of the right kind of benefits to justify working on them. Speed to deploy and to expand resources in a Teradata Active Data Warehouse Private Cloud removes bottlenecks that halt smaller projects. Many business projects aren't planned more than a year in advance when the data warehouse team plans hardware and software purchases. Departments build these unplanned projects themselves when the BI team can't deliver them. "This is what really happens to unplanned projects here. If they are important enough, they get done independently outside the data warehouse as independent data marts," reported a DBA. These data silos recreate the problems of irreconcilable metrics, integration complexity and cost that a data warehouse is supposed to remove. On-demand provisioning and an elastic platform allow IT to deliver these projects. Provisioning delays don't exist in an on-demand cloud, allowing the BI team to deliver projects when needed. With a scalable on-demand platform like the Teradata Active Data Warehouse Private Cloud, it's also possible for departments to do self-funded projects but build them on the warehouse. "We had massive growth in consumption after moving to a more flexible platform because we were always hemmed in by capital investment in the past," said one IT executive. "Now we no longer have that artificial limit on what to do with limited resources."On a Teradata Active Data Warehouse Private Cloud the BI group can provide a platform for data marts or one-off projects that is identical to the data warehouse, simplifying the overall environment. The result is faster project delivery, which means the ability to complete more projects and the ability to do smaller projects that were often not valuable enough to justify the months of

Page 17

Cloud Computing Models for Data Warehousing

effort to develop and deploy. Many of these small projects are also temporary in nature, or the benefit is uncertain until they've been done as a proof of concept. These examples of organizations deploying various flavors of self-service provisioning, ondemand capacity and elastic models demonstrate key benefits. They show how the data warehouse might evolve into a data platform capable of serving a broader set of needs while simplifying the environment.

Page 18

Cloud Computing Models for Data Warehousing

Conclusion and Recommendations


Our current assumptions about the data warehouse environment are changing. With the existing product view, hardware and software platforms can be more expensive, take a long time to purchase, take a long time to provision once purchased, are available in large increments, require up-front payment and are assets that are depreciated over a long term. The computing-as-a-service view inverts these assumptions. The cloud view is that the platform is an inexpensive commodity service, purchased and provisioned immediately, paid for in small increments in arrears, and treated as an expense rather than a depreciable asset. The public cloud is still evolving both technically and procedurally. It's maturing as a platform for applications and computational processing. As a platform for BI and analytic database workloads, the public cloud is still immature. Private clouds for data warehouse workloads deliver many of the benefits of public clouds without some of the problems. Successfully adopting cloud computing for BI and analytic database workloads requires a shift in technology and methodology. The discussion with organizations using elements of a private cloud model highlights several key findings. A private cloud is the better option today for gaining the scalability, elasticity, and performance management capabilities cloud computing provides. The private cloud also offers data warehouse teams a more controlled environment than the public cloud for learning how to change development and management practices while the public cloud matures from today's early state. An environment without pay-for-use software licensing doesn't offer the cost and deployment benefits that a private cloud should offer, making this an important element to evaluate. Paying for use is a major change in the organization. A key recommendation is to continue with the existing budget allocations while running the new payment model in parallel. This allows everyone to understand the OpEx budget impact, how processes must change, and how much business units actually use the data warehouse. Good usage monitoring by the platform is key. While important for billing, ongoing monitoring provides an early warning of unexpected spikes in use. This is helpful to manage SLAs as well as avoiding end-of-month billing surprises for departments. Capacity planning and performance management change, becoming simpler, but dont go away. Monitoring provides the baseline for planning both expense budgets and the resource needs. There are enough differences between the private cloud and the traditional model that transition planning is vital to success. There are no cookbooks for the best way to implement a private cloud for data warehousing. One must learn how people will use the new environment, and then adjust development, deployment and administration practices to follow. We are still in an early stage of the market. The benefits have been demonstrated by early adopters. As the cloud market matures, data warehousing will be an important beneficiary.

Page 19

Cloud Computing Models for Data Warehousing

About the Author


Mark Madsen is a research analyst focused on analytics, business intelligence and information management. Mark is an award-winning architect and former CTO whose work has been featured in numerous industry publications. He is an international speaker and author. For more information, or to contact Mark, visit http://ThirdNature.net.

Third Nature is a research and consulting firm focused on new practices and emerging technology for business intelligence, analytics and information management. The goal of the company is to help organizations learn how to take advantage of new information-driven management practices and applications. We offer consulting, education and research services to support business and IT organizations and technology vendors.

Teradata is the worlds largest company focused on analytic data solutions through integrated data warehousing, big data analytics, and business applications. Only Teradata gives organizations the advantage to transform data across the organization into actionable insights empowering leaders to think boldly and act decisively for the best decisions possible. Visit teradata.com

Third Nature Inc., 2012. All rights reserved. Teradata and the Teradata logo are registered trademarks of Teradata Corporation and/or its affiliates in the U.S. and worldwide. No part of this document may be reproduced in any form or incorporated into any information retrieval system, electronic or mechanical, without the permission of the copyright owner. Inquiries regarding permission or use of material contained in this document should be addressed to: Third Nature, Inc. PO Box 1166 Rogue River, OR 97537 EB-6592 > 0412

Page 20

Das könnte Ihnen auch gefallen