Sie sind auf Seite 1von 14

MCA SEMESTER – IV

Subject Name: Data Analytics with R


Subject Code: 3640005

UNIT – I
Introduction to Data Analysis
Overview of Data Analytics (DA)
Analysis of data, also known as data analytics, is a process of inspecting,
cleansing, transforming, and modeling data with the goal of discovering useful
information, suggesting conclusions, and supporting decision-making.

Data analytics technologies and techniques are widely used in commercial


industries to enable organizations to make more-informed business decisions
and by scientists and researchers to verify or disprove scientific models,
theories and hypotheses.

Data analytics is the science of extracting patterns, trends, and actionable


information from large sets of data. As a term, data analytics predominantly
refers to an assortment of applications, from basic business intelligence (BI),
reporting and online analytical processing (OLAP) to various forms of advanced
analytics.

Business Intelligence (BI) is a broad category of computer software solutions


that enables a company or organization to gain insight into its critical
operations through reporting applications and analysis tools.

OLAP is an acronym for Online Analytical Processing. OLAP performs


multidimensional analysis of business data and provides the capability for
complex calculations, trend analysis, and sophisticated data modeling.

Advanced Analytics is the autonomous or semi-autonomous examination of


data or content using sophisticated techniques and tools, typically beyond
those of traditional business intelligence (BI), to discover deeper insights, make
predictions, or generate recommendations.

Data analytics initiatives can help businesses increase revenues, improve


operational efficiency, optimize marketing campaigns and customer service
efforts, respond more quickly to emerging market trends and gain a
competitive edge over rivals -- all with the ultimate goal of boosting business
performance. Depending on the particular application, the data that's analyzed
can consist of either historical records or new information that has been
processed for real-time analytics uses. In addition, it can come from a mix of
internal systems and external data sources.

Why is big data analytics important? (Need of Data Analytics)

There are four types of big data BI that really aid business:

1. Prescriptive – This type of analysis reveals what actions should be taken.


This is the most valuable kind of analysis and usually results in rules and
recommendations for next steps.
2. Predictive – An analysis of likely scenarios of what might happen. The
deliverables are usually a predictive forecast.
3. Diagnostic – A look at past performance to determine what happened
and why. The result of the analysis is often an analytic dashboard.
4. Descriptive – What is happening now based on incoming data. To mine
the analytics, you typically use a real-time dashboard and/or email
reports.
Big data analytics helps organizations harness their data and use it to identify
new opportunities. That, in turn, leads to smarter business moves, more
efficient operations, higher profits and happier customers.

1. Cost reduction. Big data technologies such as Hadoop and cloud-based


analytics bring significant cost advantages when it comes to storing large
amounts of data – plus they can identify more efficient ways of doing
business.

2. Faster, better decision making. With the speed of Hadoop and in-
memory analytics, combined with the ability to analyze new sources of
data, businesses are able to analyze information immediately – and
make decisions based on what they’ve learned.

3. New products and services. With the ability to gauge customer needs
and satisfaction through analytics comes the power to give customers
what they want. Davenport points out that with big data analytics, more
companies are creating new products to meet customers’ needs.
Classification of Data
Structured Data

Structured data concerns all data which can be stored in database SQL in table
with rows and columns. They have relational key and can be easily mapped
into pre-designed fields. Today, those data are the most processed in
development and the simplest way to manage information.

But structured data represent only 5 to 10% of all informatics data.

Semi structured data

Semi-structured data is information that doesn’t reside in a relational database


but that does have some organizational properties that make it easier to
analyze. With some process you can store them in relation database (it could
be very hard for some kind of semi structured data), but the semi structure
exist to ease space, clarity or compute.

Examples of semi-structured: CSV, XML and JSON documents are semi


structured documents, NoSQL databases are considered as semi structured.

But as Structured data, semi structured data represents a few parts of data (5
to 10%).
Unstructured data

Unstructured data represent around 80% of data. It often includes text and
multimedia content. Examples include e-mail messages, word processing
documents, videos, photos, audio files, presentations, WebPages and many
other kinds of business documents. Note that while these sorts of files may
have an internal structure, they are still considered « unstructured » because
the data they contain doesn’t fit neatly in a database.

Unstructured data is everywhere. In fact, most individuals and organizations


conduct their lives around unstructured data. Just as with structured data,
unstructured data is either machine generated or human generated.

Here are some examples of machine-generated unstructured data:

 Satellite images: This includes weather data or the data that the
government captures in its satellite surveillance imagery. Just think
about Google Earth, and you get the picture.
 Scientific data: This includes seismic imagery, atmospheric data, and
high energy physics.
 Photographs and video: This includes security, surveillance, and traffic
video.
 Radar or sonar data: This includes vehicular, meteorological, and
oceanographic seismic profiles.

The following list shows a few examples of human-generated unstructured


data:

 Text internal to your company: Think of all the text within documents,
logs, survey results, and e-mails. Enterprise information actually
represents a large percent of the text information in the world today.
 Social media data: This data is generated from the social media
platforms such as YouTube, Facebook, Twitter, LinkedIn, and Flickr.
 Mobile data: This includes data such as text messages and location
information.
 Website content: This comes from any site delivering unstructured
content, like YouTube, Flickr, or Instagram.

And the list goes on.

The unstructured data growing quickiest than the other, and their exploitation
could help in business decision.
A group called the Organization for the Advancement of Structured
Information Standards (OASIS) has published the Unstructured Information
Management Architecture (UIMA) standard. The UIMA « defines platform-
independent data representations and interfaces for software components or
services called analytics, which analyze unstructured information and assign
semantics to regions of that unstructured information. »

Many industry watchers say that Hadoop has become the de facto industry
standard for managing Big Data.

Characteristics of Data

There is lot of buzz around data these days. Businesses, big and small, have
started relying on data analytics for critical business decisions. However, it is
observed that not all businesses are able to leverage the benefits of data
analytics in the same ratio. Let us try to understand the reason behind this.

There are five data characteristics that are the building blocks of an efficient
data analytics solution: accuracy, completeness, consistency, uniqueness, and
timeliness. Understanding each of these will help us in understanding why
different businesses are not able to leverage the benefits of data analytics in
the same ratio.

Accuracy
When they are insights extracted from a well-developed and well-tested data
analytics solution, we are assuming that the data is reliable and accurate.
However, flaws in data collection, data storage, or data retrieving will result in
unreliable data and this will reduce the accuracy of the insights extracted by a
data analytics solution.
Completeness
The insights or information extracted by a data analytics solution depends a
great deal on the completeness of the data. Partial data or a dataset with lot of
missing values represents an incomplete picture. Thus, the degree of
completeness of a data determines the accuracy of a data analytics solution.

Consistency
The consistency within a dataset is another important factor that determines
the degree of accuracy of a data analytics solution. A consistent dataset is less
prone to errors and results in better accuracy of a data analytics solution.

Uniqueness
One of the essential components of any business is high quality data. This data,
if used properly, can make a company competitive or can keep a company
competitive. Thus, the degree of uniqueness of data explains the efficiency of a
data analytics solution. In order to add value to any business, the data should
be unique and distinctive.

Timeliness
A data analytics solution that uses out-dated data can restrict a company from
achieving their goals or from surviving in a competitive arena. New and current
data is more valuable to a business than old out-dated data. Though old data
should not be completely over-looked by a data analytics solution, but
emphasis should be placed on the current data.
Applications of Data Analytics/ Uses of Data Science

Using data science, companies have become intelligent enough to push & sell
products as per customers purchasing power & interest. Here’s how they are
ruling our hearts and minds:

Internet Search

When we speak of search, we think ‘Google’. Right? But there are many other
search engines like Yahoo, Bing, Ask, AOL, Duckduckgo etc. All these search
engines (including Google) make use of data science algorithms to deliver the
best result for our searched query in fraction of seconds. Considering the fact
that, Google processes more than 20 petabytes of data everyday. Had there
been no data science, Google wouldn’t have been the ‘Google’ we know today.
Digital Advertisements (Targeted Advertising and re-targeting)

If you thought Search would have been the biggest application of data science
and machine learning, here is a challenger – the entire digital marketing
spectrum. Starting from the display banners on various websites to the digital
bill boards at the airports – almost all of them are decided by using data
science algorithms.

This is the reason why digital ads have been able to get a lot higher CTR than
traditional advertisements. They can be targeted based on user’s past
behaviour. This is the reason why I see ads of analytics trainings while my
friend sees ad of apparels in the same place at the same time.

Recommender Systems

Who can forget the suggestions about similar products on Amazon? They not
only help you find relevant products from billions of products available with
them, but also adds a lot to the user experience.

A lot of companies have fervidly used this engine / system to promote their
products / suggestions in accordance with user’s interest and relevance of
information. Internet giants like Amazon, Twitter, Google Play, Netflix,
Linkedin, imdb and many more uses this system to improve user experience.
The recommendations are made based on previous search results for a user.
Image Recognition

You upload your image with friends on Facebook and you start getting
suggestions to tag your friends. This automatic tag suggestion feature uses face
recognition algorithm. Similarly, while using whatsapp web, you scan a barcode
in your web browser using your mobile phone. In addition, Google provides
you the option to search for images by uploading them. It uses image
recognition and provides related search results. To know more about image
recognition, check out this amazing (1:31) mins video:

https://www.analyticsvidhya.com/blog/2015/09/applications-data-science/
Speech Recognition

Some of the best example of speech recognition products are Google Voice,
Siri, Cortana etc. Using speech recognition feature, even if you aren’t in a
position to type a message, your life wouldn’t stop. Simply speak out the
message and it will be converted to text. However, at times, you would realize,
speech recognition doesn’t perform accurately. Just for laugh, check out this
hilarious video(1:30 mins) and the conversation between Cortana & Satya
Nadela (CEO, Microsoft).

https://www.analyticsvidhya.com/blog/2015/09/applications-data-science/

Gaming

EA Sports, Zynga, Sony, Nintendo, Activision-Blizzard have led gaming


experience to the next level using data science. Games are now designed using
machine learning algorithms which improve / upgrade themselves as the
player moves up to a higher level. In motion gaming also, your opponent
(computer) analyzes your previous moves and accordingly shapes up its game.
Price Comparison Websites

At a basic level, these websites are being driven by lots and lots of data which
is fetched using APIs and RSS Feeds. If you have ever used these websites, you
would know, the convenience of comparing the price of a product from
multiple vendors at one place. PriceGrabber, PriceRunner, Junglee, Shopzilla,
DealTime are some examples of price comparison websites. Now a days, price
comparison website can be found in almost every domain such as technology,
hospitality, automobiles, durables, apparels etc.

Airline Route Planning

Airline Industry across the world is known to bear heavy losses. Except a few
airline service providers, companies are struggling to maintain their occupancy
ratio and operating profits. With high rise in air fuel prices and need to offer
heavy discounts to customers has further made the situation worse. It wasn’t
for long when airlines companies started using data science to identify the
strategic areas of improvements. Now using data science, the airline
companies can:

1. Predict flight delay

2. Decide which class of airplanes to buy

3. Whether to directly land at the destination, or take a halt in between


(For example: A flight can have a direct route from New Delhi to New
York. Alternatively, it can also choose to halt in any country.)

4. Effectively drive customer loyalty programs

5. Southwest Airlines, Alaska Airlines are among the top companies who’ve
embraced data science to bring changes in their way of working.

6. Fraud and Risk Detection


One of the first applications of data science originated from Finance discipline.
Companies were fed up of bad debts and losses every year. However, they had
a lot of data which use to get collected during the initial paper work while
sanctioning loans. They decided to bring in data science practices in order to
rescue them out of losses. Over the years, banking companies learned to divide
and conquer data via customer profiling, past expenditures and other essential
variables to analyze the probabilities of risk and default. Moreover, it also
helped them to push their banking products based on customer’s purchasing
power.

Delivery logistics

Who says data science has limited applications? Logistic companies like DHL,
FedEx, UPS, Kuhne+Nagel have used data science to improve their operational
efficiency. Using data science, these companies have discovered the best
routes to ship, the best suited time to deliver, the best mode of transport to
choose thus leading to cost efficiency, and many more to mention. Further
more, the data that these companies generate using the GPS installed,
provides them a lots of possibilities to explore using data science.
Miscellaneous

Apart from the applications mentioned above, data science is also used in
Marketing, Finance, Human Resources, Health Care, Government Policies and
every possible industry where data gets generated. Using data science, the
marketing departments of companies decide which products are best for Up
selling and cross selling, based on the behavioral data from customers. In
addition, predicting the wallet share of a customer, which customer is likely to
churn, which customer should be pitched for high value product and many
other questions can be easily answered by data science. Finance (Credit Risk,
Fraud), Human Resources (which employees are most likely to leave,
employees performance, decide employees bonus) and many other tasks are
easily accomplished using data science in these disciplines.

Das könnte Ihnen auch gefallen