Big Data Unidad 2

Unidad 2.
Big Data
Analytics(Aprendizaje
No-Supervisado)
Unidad 2. Big Data Analytics(Aprendizaje No-Supervisado)
2.1. Big Data en R ( Paquetes: biglm, party, ff,

bigmemory, bigtabulate, snow, ff).
2.2. Big Data Analytics (Aprendizaje No-Supervisado).
2.3. Máquinas de Clustering.
2.4. Big Data Analytics en Modelos Paralelos de
Aprendizaje No Supervisado.
2.5. Big Data Analytics para Calibrar Modelos Paralelos
de Aprendizaje No Supervisado.
2.2. Big Data Analytics
(Aprendizaje No-
Supervisado).
Se usa para construir modelos

descriptivos
Big Data Analytics
Big Data Analytics es el proceso de examinar conjuntos de
datos grandes y variados, es decir, Big Data, para
descubrir patrones ocultos, correlaciones desconocidas,
tendencias del mercado, preferencias del cliente y otra
información útil que puede ayudar a las organizaciones a
tomar decisiones comerciales más informadas.
Impulsado por sistemas analíticos especializados y
software, el análisis de big data puede señalar el camino
hacia diversos beneficios comerciales, incluyendo nuevas
oportunidades de ingresos, marketing más efectivo, mejor
servicio al cliente, mejor eficiencia operativa y ventajas
competitivas sobre sus rivales. https://www.techopedia.com/definition/28659/big-
data-analytics
https://www.simplilearn.com/data-science-vs-big-data-vs-data-analytics-article
6. «Big Data»- «Big Analytics» (1/1)
Se requieren de complejas operaciones matemáticas

(machine learning, statical learning, clustering, trend
detection,…)
Análisis Exploratorio,
Modelos Predictivos,
Modelos de sumarización,
Modelos simbólicos.
Comprender el ciclo de vida del proyecto de
análisis de datos
Comprender los problemas de análisis de

datos
Comprender el ciclo de vida del
proyecto de análisis de datos
1.- Identificar del problema
2.- Diseño del requerimiento de
datos
3.- Preprocesamiento de datos
4.- Realizando analítica sobre los

datos
5.- Visualización de los datos

Etapas a seguir para trabajar con Big Data
ETL(Extract, transform and load)
Extraer (Extract)
Transformar (Transform)
Cargar (Load)
Procesamiento en paralelo(Parallel procesing )
Problemas para comprender el
análisis de datos
Explorando la categorización de
páginas web.
Calcular la frecuencia de los cambios
en el mercado de valores.
Predecir el precio de venta de un libro
azul para bulldozers (estudio de caso).
Exploring web pages categorization
CSV file.
Google Analytics data source

RGoogleAnalytics
Calcular la frecuencia de los cambios en el
mercado de valores.
Predecir el precio de venta de un libro azul
para bulldozers (estudio de caso).
CUARTA PRÁCTICA – Entregar 27 de septiembre.
INTEGRACIÓN DE HADOOP Y R ( Revisión en
computadora de sobre la ejecución en tiempo real):
MATERIAL DE CONSULTA RECOMENDADO(en Carpeta LIBROS

contenida en la carpeta MATERIAL DE BIG DATA PARA EL
INTERNET DEL TODO):
Big.Data.Analytics.With.R.And.Hadoop(Nov.2013).
Hadoop-TheDefinitive Guide.(2015)4.edition. Tom.White.
EVIDENCIA A ENTREGAR:
Programas ejecutándose correctamente.
Reporte de la práctica que contenga Portada, Introducción,
Desarrollo, Conclusiones, Referencias(formato APA) que describan
toda las actividades realizadas.
Introducción al Machine
Learning
It is the intersection of statics, database science, and computer
science. It is powerfull tool, capable of finding actionable insight in
large quantities of data.
Learning involves the abstraction of data into a structured
representation, and the generatization of this structure into action. In
more practical terms, a machine learner uses data containing examples
and features of the concept to be learnead , and summarizes this data
in the form of a model, which is then used for predictive or descriptive
purposes.
Introduction to machine learning
Machine learning is a branch of artificial
intelligence that allows us to make our
application intelligent without being explicitly
programmed. Machine learning concepts are
used to enable applications to take a decision
from the available datasets. A combination of
machine learning and data mining can be used
to develop spam mail detectors, self-driven cars,
speech recognition, face recognition, and online
transactional fraud-activity detection.
Identification of unwanted spam messages in e-mail.
Segmentation of customer behavior for targeted advertising.
Forecasts of weather behavior and long-term climate changes.
Reduction of fraudulent credit card transactions.
Actuarial estimates of financial damage of storms and natural
disasters.
Prediction of popular election outcomes.
Development of algorithms for auto-piloting drones and self-driving
cars.
Optimization of energy use in homes and office buildings.
Projection of areas where criminal activity is most likely.
Discovery of genetic sequences linked to diseases.
Machine learning algorithm takes data and
identifies patterns that can be used for action.
In some cases, the results are so successful that
they seem to reach near-legendary status.
input data actions
Machine Learning
Definition
That a machine is said to learn if it is able to

take experience and utilize it such that its
performance improves up on similar
experiences in the future. This definition is
fairly exact, yet says little about how machine
learning techniques actually learn to transform
data into actionable knowledge.
Tom M. Mitchell
Due to the relative youth of machine learning as a

discipline and the speed at which it is
progressing, the associated legal issues and social
norms are often quite uncertain and constantly in
flux. Caution should be exercised when obtaining
or analyzing data in order to avoid breaking laws,
violating terms of service or data use
agreements, abusing the trust, or violating
privacy of the customers or the public.
How do machines learn?
Regardless of whether the learner is a human or a machine, the basic learning
process is similar. It can be divided into three components as follows:
Data input: It utilizes observation, memory storage, and

recall to provide a factual basis for further reasoning.
Abstraction: It involves the translation of data into
broader representations.
Generalization: It uses abstracted data to form a basis for
action.
Steps to apply machine learning
to your data
1. Collecting data: Whether the data is written on paper, recorded in text files and spreadsheets, or stored in
an SQL database, you will need to gather it in an electronic format suitable for analysis. This data will serve as
the learning material an algorithm uses to generate actionable knowledge.
2. Exploring and preparing the data: The quality of any machine learning project is based largely on the
quality of data it uses. This step in the machine learning process tends to require a great deal of human
intervention. An often cited statistic suggests that 80 percent of the effort in machine learning is devoted to
data. Much of this time is spent learning more about the data and its nuances during a practice called data
exploration.
3. Training a model on the data: By the time the data has been prepared for analysis, you are likely to have a
sense of what you are hoping to learn from the data. The specific machine learning task will inform the
selection of an appropriate algorithm, and the algorithm will represent the data in the form of a model.
4. Evaluating model performance: Because each machine learning model results in a biased solution to the
learning problem, it is important to evaluate how well the algorithm learned from its experience. Depending
on the type of model used, you might be able to evaluate the accuracy of the model using a test dataset, or
you may need to develop measures of performance specific to the intended application.
5. Improving model performance: If better performance is needed, it becomes necessary to utilize more
advanced strategies to augment the performance of the model. Sometimes, it may be necessary to switch to a
different type of model altogether. You may need to supplement your data with additional data, or perform
additional preparatory work as in step two of this process.
Who is using Machine Learning
Algorithms?
Google has its intelligent web search engine,
which provides a number one search, spam
classification in Google Mail, news labeling in
Google News, and Amazon for recommender
systems.
There are many open source frameworks
available for developing these types of
applications/frameworks, such as R, Python,
Apache Mahout, and Weka.
Types of machine-learning algorithms
Unsupervised machine-learning
algorithms(descriptive models).
Supervised machine-learning
algorithms(predictive models).
Recommender systems.
NOTE: If you load a dataset that won't be able to fit into your machine memories
and you try to run it, the predictive analysis will throw an error related to machine
memory, such as Error: cannot allocate vector of size 990.1 MB. The solution
is to increase the machine configuration or parallelize with commodity hardware.
Aprendizaje de máquinas
No-Supervisado
Unsupervised machine-learning algorithms.
Aprendizaje no supervisado
En machine learning(aprendizaje de
máquinas), el aprendizaje no supervisado se
usa para encontrar estructuras ocultas dentro
de los datasets.
Se distingue del Aprendizaje supervisado por
el hecho de que no hay un conocimiento a
priori.
No Supervisado vs. Supervisado
● Aprendizaje No Supervisado
– Clustering: particionar los datos en grupos
cuando no hay categorías/clases disponibles.
– Solo requiere instancias, pero no etiquetas.
– Sirve para entender y resumir los datos.
● Aprendizaje Supervisado
– Clasificación y regresión.
– Requiere instancias etiquetadas para
entrenamiento.
Aprendizaje no supervisado
El aprendizaje de máquinas no-supervisadas incluye

varios algoritmos, algunos de ellos son los
siguientes:
• Consideraremos los algoritmos mas populares
que son los de clustering.
Redes Neuronales Artificiales (Artificial neural
networks)
• Cuantización vectorial (Vector quantization).
Actividad complementaria
Leer capítulo 1 del libro del libro

Machine Learning with R - Second Edition,Brett Lantz,
2015 (Dropbox\BigData\MATERIAL DE BIG DATA PARA EL
INTERNET DEL TODO\UNIDAD 2_MATERIALES\LIBROS).
Revisar y analizar las prácticas realizadas en R, con
la finalidad de realizar el día miércoles algunos
ejercicios relacionados con el conocimiento de las
prácticas vistas.
Una introducción a R.
ARCHIVO:
1_Manejo_y_Entendimiento_Datos_R
MANEJO Y
ENTENDIMIENTO DE
DATOS EN R
2_Estadisticas_ff
3_TransDatFameRToDFff
4_1_NomArchUsaff
4_LeyeArchGigan
Aplicaciones para extraer información
Gephi
Netvizz---de Facebook
Bases de Datos existentes
https://www.kaggle.com/c/titanic
http://github.com
https://github.com/gephi/gephi/wiki/
Datasets
https://data.worldbank.org/topic/cli
mate-change?end=2014&start=1960

Big Data Unidad 2

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Big Data Unidad 2

Hochgeladen von

Copyright:

Verfügbare Formate

Unidad 2.

2.1. Big Data en R ( Paquetes: biglm, party, ff,

Se usa para construir modelos

Se requieren de complejas operaciones matemáticas

Comprender los problemas de análisis de

4.- Realizando analítica sobre los

5.- Visualización de los datos

ETL(Extract, transform and load)

Google Analytics data source

MATERIAL DE CONSULTA RECOMENDADO(en Carpeta LIBROS

input data actions

That a machine is said to learn if it is able to

Due to the relative youth of machine learning as a

Data input: It utilizes observation, memory storage, and

El aprendizaje de máquinas no-supervisadas incluye

Leer capítulo 1 del libro del libro

Das könnte Ihnen auch gefallen