Sie sind auf Seite 1von 20

Data Mining

(DM)
SPRING 2017

Lecture 1:
Introduction

Ms Ansif Arooj, University of Education, S & T, Township Campus Laho


Introduction
INSTRUCTOR, STUDENTS AND COURSE
Course Description and
Web Page
This course will provide a comprehensive introduction to
the data mining process; build theoretical and conceptual
foundations of key data mining tasks such as item set
mining and clustering; discuss analysis and
implementation of algorithms; and introduce major sub-
areas such as text and web mining.

Course Page: https://sites.google.com/site/uespring2017/


Textbook(s)/Supplementary Readings

Data Mining: Concepts and Techniques,


J. Han, M. Kamber, and J. Pei,
Third Edition, Morgan Kaufmann Publishers, 2012.
Reference:
Web Data Mining, B. Liu, Springer, 2006.
Introduction to Information Retrieval, C. Manning et al.,
Cambridge University Press, Available Online, 2008.
Introduction to Data Mining, V. Tan et al. Addison-Wesley,
2009.
Tools and Technologies: Weka
Grading Policy
Instrument Description Weigh
t
Class In-class exercises and evaluation
Exercises
Assignments Assigned during important stages of the
course to apply and practice the learnt
concepts 20%
Project and One group project
presenation
Quizzes In-class (un)announced 15 minutes
tests
Mid-Term A single 90-minute exam from the
Exam material covered during the first 6-7 20%
weeks
Submission Policy: Late penalty is 10% per day for maximum of 2 days
Final Exam Will cover the entire course. At least
75% of the material would be post mid 60%
Lets Start!
WHAT IS DATA MINING AND WHY DO WE
NEED IT?

*Slides edited from Han and Kambers online lecture slides


Think this world of data
deeply
What is data?
What is database?
Cont..
What is Big Data? (3 Vs )
What is data ware house?
What is Information?
What is Knowledge?
Why we need Knowledge?
Why Data Mining?
Why Data Mining?
The Explosive Growth of Data: from terabytes to peta-bytes
Data collection and data availability
Automated data collection tools, database systems,
Web, computerized society
Major sources of abundant data
Business: Web, e-commerce, transactions, stocks,
Science: Remote sensing, bioinformatics, scientific
simulation,
Society and everyone: news, digital cameras, YouTube
We are drowning in data, but starving for knowledge!
Necessity is the mother of inventionData mining
Automated analysis of massive data sets

12
What is Data Mining?
Definition
Data mining (knowledge discovery from data)
Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful) patterns or
knowledge from huge amount of data.
Process of semiautomatically automatically
analyzing large databases to find patterns that are:
valid: hold on new data with some certainty
novel: nonobvious to the system
useful : should be possible to act on the item
understandable: humans should be able to interpret the
pattern
What Is Data Mining?

Alternative names
Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.
Watch out: Is everything data mining?
Simple search and query processing
(Deductive) expert systems

15
Knowledge Discovery (KDD) Process
This is a view from typical
Knowledge
database systems and data
warehousing communities
Pattern Evaluation
Data mining plays an essential
role in the knowledge discovery
process
Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
16
Data Mining in Business Intelligence

End User
Increasing potential Decisio
to support
n
business decisions
Making
Data Presentation Business
Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses


DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
18
KDD Process: A Typical View from ML and Statistics

Patte
Inform rn
a
Input Data Data Pre- Data Post- Know tion
Processin ledge
Processing Mining
g

Data integration Pattern discovery Pattern evaluation


Normalization Association & Pattern selection
correlation
Feature selection Classification Pattern
interpretation
Dimension reduction Clustering
Pattern visualization
Outlier analysis

This is a view from typical machine learning and statistics communities

19

Das könnte Ihnen auch gefallen