Sie sind auf Seite 1von 30

Meeting the Challenge of Big Data

Learn about the


opportunities for
harnessing big data.
Gain insight into
architecture and tools.
See how to grow
revenues, reduce costs
and gain competitive
advantage.

Meeting the Challenge of Big Data

Table of Contents

Chapter 1 - Spotlight on Big Data

Chapter 2 - Architecture

Chapter 3 - Acquire

10

Chapter 4 - Organize

12

Chapter 5 - Analyze

14

Chapter 6 - Decide

16

Chapter 7 - Big Data Software

18

Chapter 8 - Engineered Systems

22

Resources

26

Meeting the Challenge of Big Data


Chapter 1 - Spotlight on Big Data
Chapter 1

Spotlight on Big Data

Business and government can never have too much information for
making the organization more efficient, profitable, or productive. For
that reason organizations turned to powerful data stores, including
very large databases (VLDBs), to meet their information storage and
retrieval needs. Due to exponential data growth in recent years, we have
embraced new storage technologies and the enterprise database now
shares the spotlight with complementary technologies for storing and
managing big data.
There are four key characteristics that define big data: Volume, Velocity,
Variety and Value. Volume and velocity arent necessarily new problems
for IT managers; these issues are just amplified today. The distinguishing
characteristics of big data that do create new problems are the variety
and low density value of the data. Big data comes in many different
formats that go beyond the traditional transactional data formats. It is
also typically very low density; one single observation on its own doesnt
have a lot of value. However, when this data is aggregated and analyzed,
meaningful trends can be identified.
Gartner Research Vice President Merv
Adrian, discusses drivers for big data

Exponential Data Growth


The global data explosion is driven in part by technology such as digital
video and music, smartphones, and growth of the internet. For example,
clickstream data became available for hundreds of millions of internet
users after the browser became a universal client. Social networks have
grown so large that the scope of data mining activity now encompasses
hundreds of millions. Smartphones that can provide information for
location-based services will soon be in the hands of 1 billion users. There
is useful information to be derived from these disparate sources such as
Web server logs, data streams from instruments, real-time trading data,
blogs, and social media such as Twitter and Facebook.

Online or mobile financial transactions, social


media traffic, and GPS coordinates now generate
over 2.5 quintillion bytes [exabytes] ...
every day1
Applications and Benefits
Today the processing of terabyte-size, and even petabyte-size, data sets
is within the budget of many organizations due to inexpensive CPU
cycles and low-cost storage. That puts many organizations in a position
to benefit from big data.

Watch Video

Big Data, Big Impact: New Possibilities for International Development,


World Economic Forum

Meeting the Challenge of Big Data


Chapter 1 - Spotlight on Big Data

Big data enables an organization to gain a much greater understanding


of their user and customer base, their operations and supply chain,
even their competitive or regulatory environment. When handled
correctly, big data will have a positive impact on the top line and bottom
line, enabling better services and better decisions based on improved
business intelligence. Organizations can analyze big data to develop and
refine sophisticated predictive analytics that can reduce costs and deliver
sustainable competitive advantage.

IDC expects the Big Data technology and


services market to grow to $16.9 billion in 2015
with a compound annual growth rate (CAGR) of
40 percent.2
IDC

Perhaps because of the benefits and the useful applications for big data,
industry analysts have forecast rapid growth in the market for big data
technology and services.
Developing a big-data strategy is complex with different kinds of data, new-use
cases, and additional software. Above all,
whats the value to the business?

See More

When organizations use big data to develop a better understanding of


customers and users it generates benefits that are seen across both
industry and government. The retail industry, for example, generates
data sets for clickstream monitoring, consumer sentiment analysis,
and making recommendations when a customer is online. In financial
services, enhanced knowledge of the customer enables fraud detection
and prediction, as well as analysis of spending habits to increase
profitability per customer. And in both public and private healthcare, big
data is expected to deliver cost reductions and efficiencies that will also
result in better patient care.

Watch Video

IDC Worldwide Big Data Technology and Services 2012-2015 Forecast, doc
#233485, March 2012

Meeting the Challenge of Big Data


Chapter 2 - Architecture
Chapter 2

Architecture

Big data represents a sea change in the technology we draw upon for
making decisions. Organizations will integrate and analyze data from
diverse sources, complementing enterprise databases with data from
social media, video, smart mobile devices, and other sources. The
evolution of information architectures to include big data will likely
provide the foundation for a new generation of enterprise infrastructure.
To exploit these diverse sources of data for decision-making, an
organization must develop an effective strategy for acquiring, organizing,
and analyzing big data, using it to generate new insights about the
business and make better decisions.

See More

Each step in the process of refining big data has requirements that are
best served by matching the right hardware and software to the job at
hand. Existing data warehouse infrastructure can grow to meet both
the scale of big data and the different analytics needs. But handling the
initial acquisition and organization of the new data types will require new
software, most notably Apache Hadoop.
Hadoop contains two main components: the Hadoop Distributed File
System (HDFS) for data storage, and the MapReduce programming
framework that manages the processing of the data. The Hadoop tool
suite enables organizations to organize raw (often unstructured) data and
transform it so it can be loaded into data warehouses and data marts for
integrated analysis.
Hadoop lets a cluster or grid of computers tackle big data workloads
by enabling parallel processing of large data sets. It operates primarily
with HDFS, which is fault-tolerant and can scale out to many clusters
with thousands of nodes. Hadoop MapReduce also provides capabilities
for analysis operations on massive data sets using a large number of
processors. For example, researchers at Yahoo sorted a petabyte of data
in 16.25 hours running Hadoop MapReduce on a cluster of 3,800 nodes.
Although Hadoop MapReduce is well suited to problems with key/value
data sets, its not intended for operations that require complex data or
transactions.
Hadoop is a core building block for most
big data architectures. It provides both data
acquisition and storage, and has three main
uses within organizations.
Watch Video

Meeting the Challenge of Big Data


Chapter 3 - Acquire
Chapter 3

Acquire

Organizations need to choose the right storage technology for new data
with a clear understanding of both the kind of data they plan to store
as well as how they will use it. While there are many specialist storage
technologies tuned for particular scenarios, there are two primary use
cases to be aware of.

The sources of big data are numerous including both human- and
machine-generated data feeds. Acquisition of data from sources like
online activity, RFID, instrumentation, social media, clickstreams, and
trading systems is characterized by a large volume of transactions,
high velocity of data flow, and greater variety of data formats. Required
latency varies, from interactive systems that deliver a service and need
subsecond responses, to more batch-oriented systems that store data for
offline analysis later.
Gartner Research Vice President Merv
Adrian discusses big data technologies.
Watch Video

The diversity of content requires software to operate on structured and


unstructured data, often in high-throughput scenarios. An effective bigdata solution must provide storage and processing capacity to collect,
organize, and refine large volumes of data, even petabyte-size data sets.

Systems that are more batch-oriented with less stringent requirements


for response time, updates, and queries often use the Hadoop Distributed
File System (HDFS). Where time constraints are more stringent, with
applications needing subsecond query response times, or frequent updates
to existing data, some form of NoSQL database is usually required.
NoSQL emerged as companies, such as Amazon, Google, LinkedIn and
Twitter struggled to deal with unprecedented data and operation volumes
under tight latency constraints. Analyzing high-volume, real time data,
such as web-site click streams can provide significant business advantage
by harnessing unstructured and semi-structured data sources to develop
new business analysis models. Consequently, enterprises built upon a
decade of research on distributed hash tables (DHTs) and utilized either
conventional relational database systems or embedded key/value stores,
such as Oracles Berkeley DB, to develop highly available, distributed keyvalue stores.
Organizations acquire and store a variety
of structured and unstructured information.
They must understand whether their use case
requires subsecond interactive response or
comprises somewhat slower batch operations.
Watch Video

10

11

Meeting the Challenge of Big Data


Chapter 4 - Organize
Chapter 4

Organize

Developers today typically create custom-written Java code that, in


conjunction with the MapReduce programming framework, processes and
transforms the data on the node where it is stored. Overall, data movement
is therefore minimized since only final results of preprocessing are
uploaded to the data warehouse.

Deriving value from big data is a multiphase process that takes raw data
and refines it into useful information. Data acquisition, such as taking
data from streams and social media feeds, is a precursor to transforming
and organizing data to derive business value. Pre-processing is used to
weed out less useful data and structure what is left for analysis. Because
big data comes in many shapes, sizes, and formats, this transformation
is an important prerequisite to moving the data into the analytics
environment.

Hadoop MapReduce can preprocess and


transform data for loading into an Oracle data
warehouse.
See More

By prepping data to load into Oracle Exadata Database Machine, we set


the stage for integrated analysis with traditional enterprise data.

After weve collected big data, we need to


transform and organize it as a precursor for
additional refinement with analytics.
Watch Video

The refining of big data enables it to be analyzed alongside your


enterprise data. After raw data has been acquired, using data stores such
as Hadoop Distributed File System (HDFS) or a NoSQL database, it can
be preprocessed for loading into an analytics environment, such as a
data warehouse running on Oracle Exadata Database Machine. This type
of workload is often handled using Apache Hadoop.

12

See More

13

Meeting the Challenge of Big Data


Chapter 5 - Analyze
Chapter 5

Analyze

Advanced Analytics

Organizations have been deriving useful information through the


combination of building mathematical models and sifting through
large volumes of data for a long time. Once refined, big data expands
existing models and is a potential rich new source of insight for business
intelligence applications that use the data warehouse.
Big data analysis is different. See how it
can uncover why things happen and what
kind of new analytics tools and processes
supplement what you already have.
Watch Video
The data warehouse is key to big data analysis. While data comes from
many sources, new insight comes from an integrated analysis of all data
together. Hence, the modern data warehouse now becomes a repository
for the data summaries created by Hadoop as well as more traditional
enterprise data.
New data sources are differentthe data itself is often less well
understood, but may also be inherently less precise or only indirectly
relevant to the problem. So, to derive value from big data, we must turn to
an analysis process of iteration and refinement. Each iteration can either
reveal new insight, or simply enable an analyst to rule out a particular line
of inquiry. Big data analysis is about uncovering new relationships rather
than reporting on a well-understood data set.

14

While traditional analysis tools are still important, advanced analytics


involving both statistical analysis and data mining are required to get the
most out of big data. A large user community has turned to the open source
R statistical programming language that has been evolving since 1997. Very
popular among analysts and data scientists, R is also widely used in the
academic world, so theres a ready pool of trained R developers coming
along.
One use of statistical techniques called predictive analytics has gained
traction across multiple industries including finance, retail, insurance,
healthcare, pharmaceuticals, and telecommunications. Predictive analytics
can exploit and use customer data to build and optimize predictive
models. Organizations are using predictors, for example, to guide
marketing campaigns and make them more effective. The surge of interest
in predictive analytics has been made possible by gains in computing
horsepower. With todays tools, predictive analytics can create sophisticated
models and execute a variety of scenarios across large sets of data.

See More

15

Meeting the Challenge of Big Data


Chapter 6 - Decide
Chapter 6

Decide

When we make decisions in todays world awash in data, we can use


powerful tools to distill data and present information, making for a more
intelligent decision-making process. Using automated analysis, we can
make decisions that are data driven. We can turn big data into actionable
insight and, with the right technology, do it in real time.

See More

Visualizations and business intelligence dashboards are a powerful assist to


decision-making, particularly when dealing with massive amounts of data.
Statistical software is a key element of analytics, business intelligence, and
decision support. The Web interface for running scripts of the R statistical
analysis language can be integrated into dashboards, providing analysis
and streaming graphics for the decision-making process.

Decisions in Real Time


The volume and velocity of big data have put new emphasis on scalability
and performance of analytics and business intelligence tools. Improvements
in server capacity, high-speed interconnects, and network bandwidth have
contributed to the emergence of a new generation of software that provides
in-memory, in-database, and real-time analytics.
In-memory databases, for example, give us the capacity for real-time
decision-making. The 64-bit addressing capability of modern systems
means we can configure servers with a terabyte (TB) of memory. That
capacity means databases, some in excess of a billion rows, can be loaded
into memory to sustain high-performance, low-latency processing, which
results in faster decision-making.

Firms that adopt data-driven decision-making


have output and productivity that is 5-6% higher
than what would be expected. 1
Brynjolfsson, Hitt, and Kim, Strength in Numbers: How Does Data-Driven
Decision Making Affect Firm Performance? (April 22, 2011).

16

17

Meeting the Challenge of Big Data


Chapter 7 - Big Data Software
Chapter 7

Big Data Software

Oracle offers a powerful stack of software, including new functionality


specifically designed to handle the new challenges of big data. All
components can run both on Oracle engineered systems as well as
customer-integrated hardware.

Oracle Endeca Information Discovery


Oracle Endeca Information Discovery is an enterprise data discovery
platform for advanced exploration and analysis of complex and varied
data. Information is loaded from disparate source systems and stored
in a faceted data model that dynamically supports changing data. This
integrated and enriched data is made available for search, discovery and
analysis via interactive and configurable applications. Oracle Endecas
intuitive interface empowers business users to easily explore big data to
determine its potential value.

Oracle NoSQL Database


Applications, having diverse architectures and performance
requirements, also have diverse requirements for data storage and
retrieval capabilities. Many big data applications require a fast, strippeddown data store that supports interactive queries and updates with a
large volume of data.
Oracle NoSQL Database can quickly acquire and organize schemaless, unstructured, or semistructured data. It is an always available,
distributed key-value data store with predictable latency and fast
response to queries, supporting a wide range of interactive use cases.
And it has a simple programming model, making it easy to integrate into
new big data applications.

18

Get Fast Answers to New Questions with


Information Discovery
Watch Video

Oracle Data Integration


Oracle Data Integrator provides data extraction, loading, and
transformation (E-LT) for Oracle Database, Oracle Applications, and other
3rd party application sources. Oracle GoldenGate provides high-volume,
real-time transformation and loading of data into a data warehouse or
data mart. Together these products work with Oracle Big Data Connectors
to provide a gateway to integrating big data. The big-data explosion has
added to the importance of those products because big data is not useful if
its siloed.

19

Meeting the Challenge of Big Data


Chapter 7 - Big Data Software

Oracle Big Data Connectors

Oracle Advanced Analytics

Oracle has developed a suite of software for easily integrating Oracle


Database with Hadoop. Oracle Big Data Connectors are available with
Oracle Big Data Appliance or as individual software products. They
facilitate access to the Hadoop Distributed File Systems (HDFS) from the
Oracle Database and data loading to Oracle Database from Hadoop. They
also provide native R interface to HDFS and the MapReduce framework
and enable Oracle Data Integrator to generate Hadoop MapReduce
programs.

We often see big data and analytics used in the same sentence because
technology gains have enabled us to analyze increasingly large data sets.
Not the least of those gains is the capacity for Oracle Database to embed
analytics in the database, an architectural solution that provides scalability,
performance, and security. This architecture offloads analytics work from
RAM-limited computers and puts analytics processing closer to the data.
This eliminates unnecessary network round trips, leverages an enterpriseclass database, and lowers hardware costs.

Gartner Research Vice President Merv


Adrian discusses integrating big data
into your data center
Watch Video

20

Oracle Advanced Analytics turns Oracle Database into a sophisticated


analytics platform ready for big data analytics. It combines the capabilities
of Oracle Data Mining with Oracle R Enterprise, an enhanced version of
the open source R statistical programming language. Oracle Advanced
Analytics eliminates network latency that results from marshalling data
between a database and external clients doing analytics processing.
This can produce a 10x to 100x improvement in performance compared
to processing outside the database. Encapsulating analytics logic in
the database also exploits the databases multilevel security model and
enables the database to manage real-time predictive models and results.

21

Meeting the Challenge of Big Data


Chapter 8 - Engineered Systems
Chapter 8

Engineered Systems

Oracles software stack is the foundation for a powerful line of engineered


systems that will help you quickly find new insights and unlock the value
in big data.

Oracles engineered systems enable organizations to deploy big data


solutions as a complement to operational systems, data warehousing,
analytics, and business intelligence processing. Engineered systems
are preintegrated, so easier to deploy and support, and they deliver
optimized performance. They can be deployed alone or alongside existing
infrastructure.

Oracle Big Data Appliance 3-D Demo


View 3-D Demo

Clearly, Oracles release of Oracle Big Data


Appliance signifies a full commitment to
Hadoop as a first-class citizen of the Oracle
data platform. Its price, $450,000 for 216 CPU
cores backed by 648TB of storage and the same
Infiniband backplane used by Oracle Exadata and
Oracles other engineered systems, is definitely
competitive. 1
1

Oracle mainstreams its Hadoop platform with Cloudera [Oracle Enterprise

22 Manager] deal, Tony Baer, Ovum, January 2012.

Oracle Big Data Appliance is a comprehensive, enterprise-ready


combination of hardware and software that makes getting started with big
data easy and fast. It is designed to run both Hadoop and Oracle NoSQL
Database for data acquisition, and to run Hadoop MapReduce algorithms
to organize the data and load it into a data warehouse for integrated
analysis.

23

Meeting the Challenge of Big Data


Chapter 8 - Engineered Systems

Customers discuss the benefits of Oracle


Exalytics

Oracle has partnered with Cloudera to include the Cloudera Distribution


as part of Oracle Big Data Appliance. This ensures that customers have
access to a fully integrated and supported distribution of Hadoop, which
has tens of thousands of nodes in production, speeding deployment and
reducing ownership costs.
Oracle Exadata Database Machine represents a leading-edge
combination of hardware and software that is easy-to-deploy, completely
scalable, secure and redundant. Innovative technologies such as
Exadata Smart Scan, Exadata Smart Flash Cache, and Hybrid Columnar
Compression enable Exadata to deliver extreme performance for
everything from data warehousing to online transaction processing to
mixed workloads. Oracle Exadata uses a massively parallel architecture
and a high-speed InfiniBand network to sustain high-bandwidth links
between the database servers and storage servers and also to other
engineered systems like Oracle Big Data Appliance and Oracle Exalytics.
Infosys discusses the benefits of Oracle
Big Data Appliance
Watch Video

Oracle Exadata supports deployment of massive data warehouses and


the iterative analysis needed to uncover new relationships and develop
new insight. Once this new analysis is operationalized, it becomes
available to decision-makers who can act upon it and realize the
business value.

24

Watch Video

Oracle Exalytics In-Memory Machine is an integrated hardware and


software solution that provides in-memory analytics for rapid decisionmaking without breaking the budget. It can be deployed to support
demand forecasting, revenue and yield management, pricing, inventory
management, and a myriad of other applications. Plus, it can be linked
by a high-speed InfiniBand connection to data warehouse on Oracle
Exadata providing real-time analytics for business intelligence applications
accessing large data warehouses.
Oracle Exalytics In-Memory Machine delivers speed of thought analysis.
And this fundamentally changes how you interact with your BI software,
enabling you to get more out of your data and to generate more business
value.

Conclusion
To derive real business value from Big Data, you need the right tools to
capture and organize a wide variety of data types from different sources,
and to be able to easily analyze it within the context of all your enterprise
data. Oracles engineered systems and complementary software provide
an end-to-end value chain to help you unlock the value of big data.

25

Meeting the Challenge of Big Data

Resources
Videos
White Papers
Datasheets
Reports
Web

Gartner Research Vice


President Merv Adrian
discusses drivers for big data

Episode 5: Using Statistical


Analysis To Generate New
Insight

Gartner Research Vice President


Merv Adrian discusses big data
technologies

Land OLakes Get Fast


Answers to New Questions
with Information Discovery

Gartner Research Vice President


Merv Adrian discusses integrating
big data into your data center

Infosys Discuss Benefits of


Oracle Big Data Appliance

Episode 1: Developing a Big


Data Strategy

Customers Discuss the


Benefits of Oracle Exalytics

Podcasts
Blogs

Episode 2: Understanding the


Basics of Hadoop

Social Media
Episode 3: Comparing HDFS
and NoSQL

Episode 4: Transforming and


Organizing Data with Hadoop

26 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

27

Meeting the Challenge of Big Data

Resources

Big Data for the Enterprise

Videos

Oracle Big Data Connectors

White Papers

Oracle NoSQL Database

Datasheets

Oracle Data Mining

Reports

Oracle Information Architecture: An Architects


Guide to Big Data

Web
Podcasts

Big Data and Enterprise Data: Bridging Two


Worlds with Oracle Data Integration

Blogs
Social Media

26 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

27

Meeting the Challenge of Big Data

Resources

Oracle Big Data Appliance Datasheet

Videos

Oracle Exadata Database Machine X2-8


Datasheet

White Papers
Datasheets
Reports

Oracle Exadata Database Machine X2-2


Datasheet
Oracle Exalytics In-Memory Machine X2-4
Datasheet

Web
Podcasts
Blogs
Social Media

26 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

27

Meeting the Challenge of Big Data

Resources
Videos
White Papers
Datasheets
Reports
Web
Podcasts
Blogs

McKinsey report: Big Data: The Next


Frontier for Innovation, Competition, and
Productivity
IDC report: Oracles All-Out Assault on the
Big Data Market: Offering Hadoop, R, Cubes,
and Scalable IMDB in Familiar Packages
Forrester Report: Oracle Exadata Raises the
Bar on Database Appliances
World Economic Forum report: Big
Data, Big Impact: New Possibilities for
International Development
Ovum report on analytics

Social Media
From Overload to Impact: An Industry
Scorecard on Big Data Business Challenges

26 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

27

Meeting the Challenge of Big Data

Resources

www.oracle.com/bigdata

Videos

www.oracle.com/exadata

White Papers

www.oracle.com/exalytics

Datasheets
Reports
Web
Podcasts
Blogs
Social Media

26 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

27

Meeting the Challenge of Big Data

Resources

Oracle NoSQL Database with Dave Segleau

Videos

Big Data Panel Discussion

White Papers

Oracle and Cloudera with Mike Olson, CEO of


Cloudera

Datasheets
Reports

Understanding Big Data Analysis with the R


Language

Web
Podcasts
Blogs
Social Media

26 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

27

Meeting the Challenge of Big Data

Resources

Oracle Big Data Platform

Videos

Oracle R Enterprise

White Papers

Oracle NoSQL Database

Datasheets

Oracle Database Insider

Reports
Web
Podcasts
Blogs
Social Media

26 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

27

Meeting the Challenge of Big Data

Resources

Follow Oracle Database on Facebook

Videos

Follow Oracle Database on Twitter

White Papers

Follow Oracle Database on LinkedIn

Datasheets

Follow Oracle Database on Google+

Reports
Web
Podcasts
Blogs
Social Media

26 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

27

Meeting the Challenge of Big Data

oracle.com/bigdata

E-Book Popup Pages

Meeting the Challenge of Big Data


Chapter 1 - Spotlight on Big Data

Big data enables an organization to gain a much greater understanding


of their user and customer base, their operations and supply chain,
even their competitive or regulatory environment. When handled
correctly, big data will have a positive impact on the top line and bottom
line, enabling better services and better decisions based on improved
business intelligence. Organizations can analyze big data to develop and
refine sophisticated predictive analytics that can reduce costs and deliver
sustainable competitive advantage.

IDC expects the Big Data technology and


services market to grow to $16.9 billion in 2015
with a compound annual growth rate (CAGR) of
40 percent.2
IDC

Perhaps because of the benefits and the useful applications for big data,
industry analysts have forecast rapid growth in the market for big data
technology and services.
Developing a big-data strategy is complex with different kinds of data, new-use
cases, and additional software. Above all,
whats the value to the business?

See More

When organizations use big data to develop a better understanding of


customers and users it generates benefits that are seen across both
industry and government. The retail industry, for example, generates
data sets for clickstream monitoring, consumer sentiment analysis,
and making recommendations when a customer is online. In financial
services, enhanced knowledge of the customer enables fraud detection
and prediction, as well as analysis of spending habits to increase
profitability per customer. And in both public and private healthcare, big
data is expected to deliver cost reductions and efficiencies that will also
result in better patient care.

Watch Video

IDC Worldwide Big Data Technology and Services 2012-2015 Forecast, doc
#233485, March 2012

Meeting the Challenge of Big Data


Chapter 2 - Architecture
Chapter 2

Architecture

Big data represents a sea change in the technology we draw upon for
making decisions. Organizations will integrate and analyze data from
diverse sources, complementing enterprise databases with data from
social media, video, smart mobile devices, and other sources. The
evolution of information architectures to include big data will likely
provide the foundation for a new generation of enterprise infrastructure.
To exploit these diverse sources of data for decision-making, an
organization must develop an effective strategy for acquiring, organizing,
and analyzing big data, using it to generate new insights about the
business and make better decisions.

See More

Each step in the process of refining big data has requirements that are
best served by matching the right hardware and software to the job at
hand. Existing data warehouse infrastructure can grow to meet both
the scale of big data and the different analytics needs. But handling the
initial acquisition and organization of the new data types will require new
software, most notably Apache Hadoop.
Hadoop contains two main components: the Hadoop Distributed File
System (HDFS) for data storage, and the MapReduce programming
framework that manages the processing of the data. The Hadoop tool
suite enables organizations to organize raw (often unstructured) data and
transform it so it can be loaded into data warehouses and data marts for
integrated analysis.
Hadoop lets a cluster or grid of computers tackle big data workloads
by enabling parallel processing of large data sets. It operates primarily
with HDFS, which is fault-tolerant and can scale out to many clusters
with thousands of nodes. Hadoop MapReduce also provides capabilities
for analysis operations on massive data sets using a large number of
processors. For example, researchers at Yahoo sorted a petabyte of data
in 16.25 hours running Hadoop MapReduce on a cluster of 3,800 nodes.
Although Hadoop MapReduce is well suited to problems with key/value
data sets, its not intended for operations that require complex data or
transactions.
Hadoop is a core building block for most
big data architectures. It provides both data
acquisition and storage, and has three main
uses within organizations.
Watch Video

Meeting the Challenge of Big Data


Chapter 4 - Organize
Chapter 4

Organize

Developers today typically create custom-written Java code that, in


conjunction with the MapReduce programming framework, processes and
transforms the data on the node where it is stored. Overall, data movement
is therefore minimized since only final results of preprocessing are
uploaded to the data warehouse.

Deriving value from big data is a multiphase process that takes raw data
and refines it into useful information. Data acquisition, such as taking
data from streams and social media feeds, is a precursor to transforming
and organizing data to derive business value. Pre-processing is used to
weed out less useful data and structure what is left for analysis. Because
big data comes in many shapes, sizes, and formats, this transformation
is an important prerequisite to moving the data into the analytics
environment.

Hadoop MapReduce can preprocess and


transform data for loading into an Oracle data
warehouse.
See More

By prepping data to load into Oracle Exadata Database Machine, we set


the stage for integrated analysis with traditional enterprise data.

After weve collected big data, we need to


transform and organize it as a precursor for
additional refinement with analytics.
Watch Video

The refining of big data enables it to be analyzed alongside your


enterprise data. After raw data has been acquired, using data stores such
as Hadoop Distributed File System (HDFS) or a NoSQL database, it can
be preprocessed for loading into an analytics environment, such as a
data warehouse running on Oracle Exadata Database Machine. This type
of workload is often handled using Apache Hadoop.

12

See More

13

Meeting the Challenge of Big Data


Chapter 4 - Organize
Chapter 4

Organize

Developers today typically create custom-written Java code that, in


conjunction with the MapReduce programming framework, processes and
transforms the data on the node where it is stored. Overall, data movement
is therefore minimized since only final results of preprocessing are
uploaded to the data warehouse.

Deriving value from big data is a multiphase process that takes raw data
and refines it into useful information. Data acquisition, such as taking
data from streams and social media feeds, is a precursor to transforming
and organizing data to derive business value. Pre-processing is used to
weed out less useful data and structure what is left for analysis. Because
big data comes in many shapes, sizes, and formats, this transformation
is an important prerequisite to moving the data into the analytics
environment.

Hadoop MapReduce can preprocess and


transform data for loading into an Oracle data
warehouse.
See More

By prepping data to load into Oracle Exadata Database Machine, we set


the stage for integrated analysis with traditional enterprise data.

After weve collected big data, we need to


transform and organize it as a precursor for
additional refinement with analytics.
Watch Video

The refining of big data enables it to be analyzed alongside your


enterprise data. After raw data has been acquired, using data stores such
as Hadoop Distributed File System (HDFS) or a NoSQL database, it can
be preprocessed for loading into an analytics environment, such as a
data warehouse running on Oracle Exadata Database Machine. This type
of workload is often handled using Apache Hadoop.

12

See More

13

Meeting the Challenge of Big Data


Chapter 5 - Analyze
Chapter 5

Analyze

Advanced Analytics

Organizations have been deriving useful information through the


combination of building mathematical models and sifting through
large volumes of data for a long time. Once refined, big data expands
existing models and is a potential rich new source of insight for business
intelligence applications that use the data warehouse.
Big data analysis is different. See how it
can uncover why things happen and what
kind of new analytics tools and processes
supplement what you already have.
Watch Video
The data warehouse is key to big data analysis. While data comes from
many sources, new insight comes from an integrated analysis of all data
together. Hence, the modern data warehouse now becomes a repository
for the data summaries created by Hadoop as well as more traditional
enterprise data.
New data sources are differentthe data itself is often less well
understood, but may also be inherently less precise or only indirectly
relevant to the problem. So, to derive value from big data, we must turn to
an analysis process of iteration and refinement. Each iteration can either
reveal new insight, or simply enable an analyst to rule out a particular line
of inquiry. Big data analysis is about uncovering new relationships rather
than reporting on a well-understood data set.

14

While traditional analysis tools are still important, advanced analytics


involving both statistical analysis and data mining are required to get the
most out of big data. A large user community has turned to the open source
R statistical programming language that has been evolving since 1997. Very
popular among analysts and data scientists, R is also widely used in the
academic world, so theres a ready pool of trained R developers coming
along.
One use of statistical techniques called predictive analytics has gained
traction across multiple industries including finance, retail, insurance,
healthcare, pharmaceuticals, and telecommunications. Predictive analytics
can exploit and use customer data to build and optimize predictive
models. Organizations are using predictors, for example, to guide
marketing campaigns and make them more effective. The surge of interest
in predictive analytics has been made possible by gains in computing
horsepower. With todays tools, predictive analytics can create sophisticated
models and execute a variety of scenarios across large sets of data.

See More

15

Meeting the Challenge of Big Data


Chapter 6 - Decide
Chapter 6

Decide

When we make decisions in todays world awash in data, we can use


powerful tools to distill data and present information, making for a more
intelligent decision-making process. Using automated analysis, we can
make decisions that are data driven. We can turn big data into actionable
insight and, with the right technology, do it in real time.

See More

Visualizations and business intelligence dashboards are a powerful assist to


decision-making, particularly when dealing with massive amounts of data.
Statistical software is a key element of analytics, business intelligence, and
decision support. The Web interface for running scripts of the R statistical
analysis language can be integrated into dashboards, providing analysis
and streaming graphics for the decision-making process.

Decisions in Real Time


The volume and velocity of big data have put new emphasis on scalability
and performance of analytics and business intelligence tools. Improvements
in server capacity, high-speed interconnects, and network bandwidth have
contributed to the emergence of a new generation of software that provides
in-memory, in-database, and real-time analytics.
In-memory databases, for example, give us the capacity for real-time
decision-making. The 64-bit addressing capability of modern systems
means we can configure servers with a terabyte (TB) of memory. That
capacity means databases, some in excess of a billion rows, can be loaded
into memory to sustain high-performance, low-latency processing, which
results in faster decision-making.

Firms that adopt data-driven decision-making


have output and productivity that is 5-6% higher
than what would be expected. 1
Brynjolfsson, Hitt, and Kim, Strength in Numbers: How Does Data-Driven
Decision Making Affect Firm Performance? (April 22, 2011).

16

17

Das könnte Ihnen auch gefallen