Sie sind auf Seite 1von 15

Unlocking Big Data

Building Big with Big Data


Now companies are in the middle of a

renovation that forces them to be analytics-driven to


continue being competitive. Data analysis provides a complete insight about their business. It
also gives noteworthy advantages over their competitors. Analytics-driven insights compel
businesses to take action on service innovation, enhance client experience, detect irregularities in
process and provide extra time for product or service marketing. To work on analytics driven
activities, companies require to gather, analyse and store information from all possible sources.
Companies should bring appropriate tools and workflows in practice to analyse data rapidly and
unceasingly. They should obtain insight from data analysis result and make changes in their
business process and practice on the basis of gained result. It would help to be more agile than
their previous process and function.

Introduction
Few noteworthy facts:
Everyday 1 terabyte new trade data is being
produced by New York Stock Exchange.
10 billion photos are being hosted by facebook that takes 1
petabyte storage space.
Around 2.5 petabytes data is being stored by ancestry.com.
Internet data is growing by the rate of 20 terabytes per month. Currently internet
archive stores approximately 2 petabytes data.
Around 15 petabytes data per year being generated by The Large Hadron Collider near Geneva.

As per Gartner report, big data has already become focal point of discussion for companies.
Now most of the organization will be concentrating and finalizing on process to make
investment in big data.

Appropriate usage of Big Data


IT providers should gain excellent knowledge and skill on
big data to become champion of big data so that they can
stay pertinent in the context of ever changing industry. IT
vendors are not dealing with just one distinct technology
or one huge sector but they have to work with several
technologies and pertain to various industries. Companies
are looking for business renovation competences from
vendors by accepting big data for:

Research & Development


In the case of research and development, contribution of big data
is more about diversity, practicality and sometimes about quantity.
The main data analytics competency is the capacity to imagine
associations and patterns among available information and data.
Enterprises should combine real-time data with clinical data. They
should mine genetic data and understand regional and
population data. By doing this, organizations can start quickly
identifying reasons for research failure. It also helps to create
more proficient trials.
Companies can also do rapid discovery
and get faster approval on new innovation that leads to reduce
the expenditure too.

Customer Behavior Analysis


Customer behavior data has been drastically changed because of
internet, social media such as facebook, twitter etc. Earlier cash
registers and Point-of-Sale systems were ways of running a
business. This system was not able to keep a record of every move
of a consumer. Old systems have been replaced by e-commerce
websites. e-commerce websites records every move of a
consumer in the process of purchase. Product feedback used to be
taken through a phone call. Now consumer expresses their opinion
on purchased product or service through social media that is
digitally recorded. All these data can be analysed which will help to
enhance product or service.

The capability to gather, interpret and take advantage of


huge volume of data from customer, social media and
real-time information on product demand and supply is
one of the most important aspects of business.To have
competitive advantage, improve sales,increase customer
loyalty and product enhancement can be achieved by
investing in appropriate technology to analyse important
business information and data. Companies should
improve their capability to store and rapidly analyse these
humongous data with the help of right tool and obtain
business insights to work on them.

Threat Management
Precise risk assessment can help to make high
quality decision, reduce costs and comply with
regulatory guidelines. There is humongous data
available to analyze. Companies require a
universal workflow and thought process to
successfully detect and evaluate all threat
possibilities, well-known or anonymous, that their
company might encounter. Businesses should
detect all threats to the organization. Be a threat
on companys brand image or data violation or
regulatory guidelines. Post threat detection,
organization must analyze their impact on
business opportunities. Big data analysis can
help to maintain a balance between threat and
opportunity.

Business Analysis Tools


Enterprises are not able to manage huge amount
and type of data and need for quick analysis to
obtain actionable insights. Below are few tools
that can be used for data and business analysis:

Jaspersoft BI Suite
It is also an open source package that creates
reports from database column. One of the most
valuable features of this
package is ability to
convert SQL tables into PDF. Companies are using
this feature to present the table into PDF format
and discuss in meetings. The JasperReports
Server provides software to suck up data from
storage platforms such as:
MongoDB
Cassandra
Redis
Riak
CouchDB
Neo4j

Hadoop

Pentaho Business Analytics

It is 9 years old open source


data processing platform.
Cloudera started providing
support in 2008 for the same.
Now MapR and Hortonworks are
also providing support. Hadoop
jobs are written in Java.

Pentaho started as engine to produce reports. Now it is entering into big data
amaking simple to gather data from new sources. Pentaho's tool can be hooked up
with NoSQL databases like MongoDB and Cassandra. Post connection with
database, columns can be dragged and dropped into views. It presents in such a
way that it seems information has been taken from SQL database.

Tableau Desktop and Server


Tableau Desktop visualization tool we can look at data in unique way, then analyse
and view in different way. Tableau is trying to provide a mechanism that allows
slicing and dicing of data time and again as per requirement.

Splunk
Splunk It is not precisely a report-producing tool or a group of AI routines. However
it generates reports along the way. It builds a directory of data. This indexing is
flexible. Splunk makes sense of log files as it already tuned to a particular
application.
There are few more tools such as Karmasphere Studio and Analyst, Talend Open
Studio, Skytree Server that can be utilize for business and data analysis.
Organizations will get into big data with their own unique thought process.
Companies would be focusing on analytics and agility as they would want to take
advantage of big data and IT. Conventional businesses will not get altered but
innovative technologies would alter business process and practices that would help
organizations to be more agile.

Analyzing Unstructured Data


Information
Splunk digitization with high volume of multi-channel transaction has resulted into data flood. The always growing speed of
digital data has forced the worlds combined data to twofold. As per Gartner report, approximately 80% data apprehended by a
company is unstructured data. It includes data from consumer calls, emails and opinion on social platforms. In addition to this,
huge amount of data is being generated through diagnostic information logged by various user devices. In first place, organized
data itself is so huge that it demands a humongous effort to analyse the same. Making sense out of unstructured data would
be far more difficult than structured data.
Companies should understand structured, semi- structured and unstructured information to reach at important business
decisions. Enterprises can take right decisions such as defining consumer sentiment, customizing offers etc only after analysing
all available data.
While going through huge amount of data might seem a tough job but at the end it would be rewarding. By going through
unstructured data sets, relation and pattern can be found out by detecting connection between unrelated data sources. Trends
can be discovered through this analysis method that would be useful insight for a business.

Route to Analyze Unstructured Data


Use relevant data sources
To start, it is essential to understand data sources that are significant for the
analysis. Streaming videos, chat, emails, voice files and web logs, all of them
comes under unstructured data sources. If the information is only loosely
connected to the issue, it must be kept aside. Only relevant data sources
should be used for analysis that would result into relevant outcome.

Define analytics requirement


An analysis may become useless in case end requirement is not
defined. It is key to know what kind of result is expected.
Expectation could be volume, pattern, reason, impact or altogether
something different. Also, usage roadmap for analysis result should
be given so that it can be utilize during predictive analysis prior to
segmentation and integration.

Pick technology stack for data incorporation and storage


Fresh data can be brought from various data sources. The
analysis result should be kept in a technology stack or in
cloud storage so that it is simpler to get data for analysis
purpose. Picking data storage system is dependent on
various aspects such as scalability, quantity, and velocity
needs. It is essential to pick right technology stack for data
incorporation and storage. Project information architecture
can be set only after evaluation of final requirement against
technology stack.

Below are few business needs and the corresponding


mapping of the technology stack:
Real- Time: Real time quote is very important for
e-commerce organizations. It needs following real-time
actions and bring offerings on the basis of predictive
analysis results. Storm, Flume and Lambda are some of
the technologies that provide the same.
Accessibility:This is vital to consume data from social
media. The technology should make sure that data loss
does not happen in real-time stream. Data redundancy
plan should be incorporated in the project. Messaging
queue such as Apache Kafka can be used to hold
incoming information.
Multi- tenancy: Another important aspect is the
capability to separate information and resources from
various user groups. Big Data solutions must be capable
of supporting multi tenancy circumstances. Consumer
data, feedaback and insights are sensitive and extremely
important. Data isolation is vital to fulfil confidentiality
requirements.
Security logs: HBase or Cassandra with flexible column
families can be used to process unstructured web logs or
security logs.

Use data lake to keep data before sending to


data warehouse
Conventionally, a company gathered data, cleaned it and stored
like if data source was HTML file, only text will be extracted
stored. Other information from HTML file will be lost in such a
way that it seems the same has been lost while storing in data
warehouse. The plea of this preceding approach was that the
data was in an unspoiled, changeable format. It could be used
on the basis of requirement. Though, with the arrival of Big Data,
data lake is being utilize to store the data in its original format.
So that when it is thought beneficial and required for a reason
data can be provided in its original format. It protects the data
with all information that might help in analysis.

Clean the Data


It is advised to clean up a copy of data and keep the original file
in native format. For example, a text file can have plenty of noise
that vague important information. It is good method to remove
noise such as whitespace, symbols while changing casual text
into a formal document. Spoken language should be specified
and kept separately. Duplicate information should be removed.

Recover Valuable Data


Parts- of- Speech tagging can be used for finding general
entities such as person, company, location and connections
among them. It is called natural language processing and
semantic analysis. With this, frequency matrix can be built to
know the word trend and pattern in the text.

Ontology Assessment
Connections among sources and entities can be built to create
specific structured database through analysis. It might be a time
consuming task but obtained insights would be significant to
any business.

Data Modeling and Text Mining


Data should be classified and segmented post database
creation. It will consume less time while utilizing supervised and
unsupervised machine learning such as:
K- means
Logistic Regression
Nave Bayes
Support Vector Machine Algorithms
Consumer behavior resemblances and comparisons can be
found out through these tools. It would help to design a
campaign. The nature of consumers can be identified with
sentiment analysis of opinions and feedbacks.

Impact Measurement
It is important that analysis results are shared in a tabular and graphical format. It should give actionable insights.
Information should be rendered in such a way so that it can be accessed and utilized on handheld device or web based tool.
It would help end user to make the most out of analysis result. ROI should be measured in terms of investment & cost and
also in terms of improvement in process efficiency and effectiveness.
The actual worth is in usage of data analysis for 360 degree insight. It should have combine analysis of structured and
unstructured data. Structured data can forecast consumer behavior. Unstructured data analysis can reveal motive behind
such behavior. Fresh data sources like social platforms are vital to companies as they offer unique information that can be
analyzed. Data scientists need to equip themselves with new and appropriate skills to analyse unstructured data.

About Orchestrate
Orchestrate is a US based business process management
organisation with Headquarter in Dallas, USA. Orchestrate
satisfies to the diverse outsourcing requirements of clients
in an extensive range of businesses, including IT, finance,
mortgage, utilities and healthcare. Orchestrate is continuously motivated to add significance to clients businesses
through efficient back office practices and noteworthy cost
savings.

1330, Capital Parkway, Carrollton, Texas 75006


sales@orchestrate.com | Toll Free: 800-232-5130

www.orchestrate.com

Das könnte Ihnen auch gefallen