Sie sind auf Seite 1von 5

COURSE SYLLABUS

Module 1 - What is Big Data?


1. Characteristics of Big Data
2. What are the V’s of Big Data?
3. The Impact of Big Data
Module 2 - Big Data - Beyond the Hype
1. Big Data Examples
2. Sources of Big Data
3. Big Data Adoption
Module 3 - The Big Data and Data Science
1. The Big Data Platform
2. Big Data and Data Science
3. Skills for Data Scientists
4. The Data Science Process
Module 4 - BDUse Cases
1. Big Data Exploration
2. The Enhanced 360 View of a Customer
3. Security and Intelligence
4. Operations Analysis
Module 5 - Processing Big Data
1. Ecosystems of Big Data
2. The Hadoop Framework

https://cognitiveclass.ai/
 Module 1 - Big Data - Beyond the Hype
1. Big Data Skills and Sources of Big Data
2. Big Data Adoption

 Module 2 - What is Big Data?


1. Characteristics of Big Data - The Four V's
2. Understanding Big Data with Examples

 Module 3 - The Big Data Platform


1. Key aspects of a Big Data Platform
2. Governance for Big Data

 Module 4 - Five High Value BDUse Cases


1. Overview of High Value BDUse Cases
2. Examples

 Module 5 - Technical Details of Big Data Components


1. Text Analytics and Streams
2. Cloud and Big Data
Day 1
1.Introduction To Big Data

 What is Big Data?


 Usage of Big Data in real world situations

2. Data Processing Lifecycle

 Collection
 Pre-processing
 Hygiene
 Analysis
 Interpretation
 Intervention
 Visualisation
 Sources of Data

Technical Components (Optional). The below modules will be covered end of the day. Introduction to
Python

 Jupyter
 Interactive computing
 Functions, arguments in Python

Introduction to Pandas

Day 2
3. Source of Data
Data collection is expensive and time consuming. In some cases you will be lucky enough to have
existing datasets available to support your analysis. You may have datasets from previous analyses,
access to providers, or curated datasets from your organization. In many cases, however, you will not
have access to the data that you require to support your analysis, and you will have to find alternate
mechanisms. Twitter data is a good example as, depending on the options selected by the twitter
user, every tweet contains not just the message or content that most users are aware of. It also
contains a view on the network of the person, home location, location from which the message was
sent, and a number of other features that can be very useful when studying networks around a topic
of interest.

 Network Data
 Social Context Data
 Sendor Data
 Systems Data
 Machine log data
 Structured Vs Unstructured Data

4. First Order Analysis and exploration

 Basic Statistics
 Analyse your dataset and determine features
 Data validation
 Noise and bias
 Random errors
 Systematic errors
5. Graph Theory
Technical Components (Optional). The below modules will be covered end of the day. Introduction to
NetworkX

 Adjacency Matrix
 Clustering
 Create a Graph
 Measure centrality
 Degree distribution

6. Second order analysis


According to the SAS institute, machine learning is a method of data analysis that automates
analytical model building. Using algorithms that iteratively learn from data, machine learning allows
computers to find hidden insights without being explicitly programmed where to look. There are two
main classes of machine learning algorithms: (i) supervised and (ii) unsupervised learning. Exactly
what does learning entail? At its most basic, learning involves specifying a model structure f that
hopefully can extract regularities for the data or problem at hand as well as the appropriate objective
function to optimize using a specified loss function. Learning (or fitting) the model essentially means
finding optimal parameters of the model structure using provided input/target data. This is also called
training the model. It is common (and best practice) to split the provided data into at least two sets –
training and test data sets.

 Machine Learning
 Meta Data
 Training data and test data
 Identifying Features

Technical Components (Optional). The below modules will be covered end of the day.

 Introduction to Scikit-learn
 Introduction to Mlxtend

Day 3
7. Rolling out Big Data projects
Hypothetical Big Data project use case: Cybersecurity measures within a company in relation to
insider threats. The company hosts thousands of applications for various business functions. The
context will be User Behavior Analytics. Signals include, login meta data for each application, location
data, network data, employee data, performance appraisal data, travel data, deaktop activity data.
The analytics is focused on determining a risk score based for each user.

Technological component or trend:

The technology component in the insider threat context requires collection and processing of the
following data:

 User Data
 Application logs
 Access data
 Business data
 Assets, CMDB
 User activity
 Network data
A layered approach for data processing is ideal starting with implementation of a ETL (Extract,
Transform, Load). Processing of data is done through tools.

 Extract, Transform, Load


 Data processing
 Normalization
 Correlations
 Risk profiling
 Data lake

The last layer is the data lake which stores all structured and unstructured data. This can be accessed
through libraries such as pandas, hadoop, graph db etc.,

The data lake will enable building algorithms to determine risky behavior and send alerts. The
objective is to prioritize the alerts based on a risk score. Example, a user accessing a certain
application from a specific ip address with a recent low rating on his performance appraisal and has
booked a long holiday will be flagged as high risk.

 Project Management
 Different Phases
 Technology components
 Privacy
 System architecture

Technical Components (Optional). The below modules will be covered end of the day.

 K-Anonimity
 Data Coarsing
 Data suppression

Final Exam
40 Questions

Pass mark: 65%

Das könnte Ihnen auch gefallen