Sie sind auf Seite 1von 2

18CPT22 DATA MINING AND ANALYTICS LT PC

3 0 0 3
Preamble:
Data mining is study of algorithms for finding patterns in large data sets. It is an integral part of modern
industry, where data from its operations and customers are mined for gaining business insight. It is also
important in modern scientific endeavors. Data mining is an interdisciplinary topic involving, databases,
machine learning and algorithms. Data Analytics is the science of analyzing data to convert information to
useful knowledge. This knowledge could help us understand world better, and in many contexts enable us to
make better decisions. This course will expose to the data analytics practices executed in the business
world. The course willintroduce the key areas of analytical process and deals about how data is created,
stored, accessed, and analyzed using HADOOP, R and NoSQL.
Course Outcomes: Upon completion of the course, students will be able to:
1. Describe the basic concepts of data mining.
2. Analyze data by utilizing Classification, Cluster and implement using R
3. Leverage the insights from big data analytics
4. Analyze data by utilizing using HADOOP framework
5.
Use various NoSql alternative database models depending on application
UNIT 1 DATA MINING 9
Introduction to Data Mining – Types of data and patterns to be mined-Technologies-Targeted Applications
– Major Issues in Data Mining- Data Preprocessing
UNIT 2 DATA ANALYSIS 9
Classification: Decision Tree & Naïve Bayes - Rule Mining - Cluster Analysis, Types of Data in Cluster
Analysis, Partitioning Methods, Hierarchical Methods, Density Based Methods, Grid Based Methods,
Model Based Clustering Methods, Clustering High Dimensional Data - Predictive Analytics – Data analysis
using R.
UNIT 3 INTRODUCTION TO BIG DATA 9
Big Data – Definition, Characteristic Features – Big Data Applications - Big Data vs Traditional Data -
Risks of Big Data - Structure of Big Data - Challenges of Conventional Systems - Web Data – Evolution of
Analytic Scalability - Evolution of Analytic Processes, Tools and methods - Analysis vs Reporting -
Modern Data Analytic Tools.
UNIT 4 HADOOP FRAMEWORK 9
Distributed File Systems - Large-Scale File System Organization – HDFS concepts - MapReduce
Execution, Algorithms using Map Reduce, Matrix-Vector Multiplication – Hadoop YARN
UNIT 5 BIG DATA FRAMEWORKS 9
Introduction to NoSQL – Aggregate Data Models – Hbase: Data Model and Implementations – Hbase
Clients – Examples – .Cassandra: Data Model – Examples – Cassandra Clients – Hadoop Integration. Pig –
Grunt – Pig Data Model – Pig Latin – developing and testing Pig Latin scripts. Hive – Data Types and File
Formats – HiveQL Data Definition – HiveQL Data Manipulation – HiveQL Queries
TOTAL : 45 PERIODS
TEXT BOOKS:
1. Bill Franks, ―Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with
Advanced Analytics‖, Wiley and SAS Business Series, 2012.
2. David Loshin, "Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools,
Techniques, NoSQL, and Graph", 2013.
3 Jiawei Han, Micheline Kamber , Jian Pei, “Data Mining: Concepts and Techniques”, Third Edition ,
2013
REFERENCES:
1. Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics: Emerging
Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013.
2. P. J. Sadalage and M. Fowler, "NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot
Persistence", Addison-Wesley Professional, 2012
3. Richard Cotton, "Learning R – A Step-by-step Function Guide to Data Analysis, , O‘Reilly Media,
2013
e-RESOURCES:
1. https://lintool.github.io/MapReduceAlgorithms/ed1n.html
2. saphanatutorial.com/hadoop-cluster-architecture-and-corecomponents/

Das könnte Ihnen auch gefallen