Sie sind auf Seite 1von 66

Data and Knowledge Management

Prof. Rushen Chahal

Page 1

Agenda
Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining

Page 2

Data
Set of discrete, objective facts about events Business - structured records of transactions Little relevance or purpose

Page 3

Information
Message with sender and receiver Meant to change way receiver perceives something Have an impact on his judgment / behavior

Page 4

Data Processing
Contextualize - why was data gathered? Categorize - what are its key components? Calculate - analyze mathematically Condense - summarize in more concise form
Page 5

Information Processing
Compare - in kind and in time Consequences - how used in decisions / actions Connections - relation to other information Conversation - what other people think about this information
Page 6

Agenda
Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining

Page 7

Database
Element Types Structure Models Creation Topology

Page 8

Element
Bit, byte, field, record, file, database Entity, attribute, key field Relation Class, object

Page 9

Database Types
Business database Geographical information database Knowledge database / deductive database Multimedia database Data warehouse Data marts Multimedia and hypermedia database Object-oriented database
Page 10

Database Structure
Data definition language
Schema & subschema

Data Manipulation language


Structured Query Language (SQL) Query By Example (QBE)

Data dictionary

Page 11

Database Models
Hierarchical
One to many TPS or routine MIS

Network
Many to many TPS or routine MIS

Relational
Normalization Ad hoc reports or DSS

Object-oriented
E-commerce Page 12

Database Creation
Conceptual design
Logical view Entity-relationship (ER) diagram Normalization

Page 13

Entity Relationship Diagram


Entity: object or concept Relationship: meaning association between objects Attribute: property of an object
Simple & Composite Single-valued & multi-valued Derived

Key
Primary key Foreign key Page 14

Normalization
A technique for identifying a true primary key for a relation Types
First normal form: not repeating group Second normal form: every non-primary-key attribute is fully functionally dependent on the entire primary key Third normal form: no transit dependency

Page 15

Structured Query Language


Select Join

Page 16

SQL DML - SELECT

SELECT [DISTINCT|ALL] {* | [colexpr [AS newname]][,...] FROM table-name [alias] [,...] [WHERE condition] [GROUP by colm [, colm] [HAVING condition]] ORDER BY colm [, colm]
Page 17

SQL DML - SELECT

SELECT attributes (or calculations: +, -, /, *) FROM relation SELECT DISTINCT attributes FROM relation

Page 18

Examples
SELECT stuname FROM student; SELECT stuid, stuname, credit FROM student; SELECT stuid, stuname, credit+10 FROM student; SELECT DISTINCT major FROM student;
Page 19

SQL DML - SELECT

SELECT attributes (or * wild card) FROM relation WHERE condition

Page 20

Examples
SELECT * FROM student; SELECT stuname, major, credit FROM student WHERE stuid = S114; SELECT * FROM faculty WHERE dept = MIS;

Page 21

SELECT - WHERE condition


AND OR NOT IN NOT IN BETWEEN IS NULL IS NOT NULL LIKE '%' multiple characters LIKE _ single characters
Page 22

Examples
SELECT * FROM faculty WHERE dept = MIS AND rank = full professor; SELECT * FROM faculty WHERE dept = MIS OR rank = full professor; SELECT * FROM faculty WHERE dept = MIS NOT rank = full professor;
Page 23

SELECT * FROM class WHERE room LIKE B_S%; SELECT * FROM class WHERE room NOT LIKE BUS%; SELECT productid, productname FROM inventory WHERE onhand BETWEEN 50 and 100;
Page 24

SELECT companyid, companyname FROM company WHERE companyname BETWEEN G AND K; SELECT productid, productname FROM inventory WHERE onhand NOT BETWEEN 50 and 100; SELECT companyid, companyname FROM company WHERE companyname NOT BETWEEN G AND K;

Page 25

SELECT facname FROM faculty WHERE dept IN (MIS, ACT); SELECT facname FROM faculty WHERE rank NOT IN (assistant, lecture); SELECT customername FROM customer WHERE emailadd IS NOT NULL;

Page 26

SELECT customername FROM customer WHERE creditlimit IS NULL;

Page 27

SELECT - aggregate functions


COUNT (*) COUNT SUM AVG MIN MAX

Page 28

Examples
SELECT COUNT(*) FROM student; SELECT COUNT(major) FROM student; SELECT COUNT(DISTINCT major) FROM student;

Page 29

SELECT COUNT(stuid), SUM(credit), AVG(credit), MAX(credit), MIN(credit) FROM student;

Page 30

SELECT - GROUP

GROUP BY HAVING

Page 31

Examples

SELECT major, AVG(credit) FROM student GROUP BY major HAVING COUNT(*) > 2; SELECT course#, COUNT(stuid) FROM enrollment GROUP BY course# HAVING COUNT(*) > 2;
Page 32

SELECT major, AVG(credit) FROM student WHERE major IN (MIS, ACT) GROUP BY major HAVING COUNT(*) > 2;

Page 33

SELECT - ORDER BY

ORDER BY ORDER BY ... DESC

Page 34

Examples
SELECT facname, rank FROM faculty ORDER BY facname; SELECT facname, rank FROM faculty ORDER BY rank DESC, facname;

Page 35

SELECT - JOIN Tables


Multiple tables in FROM clause MUST have join conditions!!!

Page 36

Examples

SELECT stuname, grade FROM student, enrollment WHERE student.stuid = enrollment.stuid;

Page 37

SELECT enrollment.course#, stuname, major FROM class, enrollment, student WHERE class.course# = enrollment.course# AND enrollment.stuid = student.stuid AND facid = F114 ORDER BY enrollment.course#;
Page 38

SUBQUERY, EXIST, NOT EXIST


SELECT s.stuname, major FROM student s WHERE EXIST (SELECT * FROM enrollment e WHERE s.stuid = e.stuid);

Page 39

SELECT s.stuname, major FROM student s WHERE NOT EXIST (SELECT * FROM enrollment e WHERE s.stuid = e.stuid);

Page 40

Database Creation
Physical design
Physical view Data topology (organization)
Centralized Distributed database
Replicated database Partitioned

Organization & access method


Sequential file Indexed sequential file Direct or random file

Security
Logical, physical, and transmitting Page 41

Selection Criteria
User needs (type of application) Compatibility Portability Reliability Cost Feature Performance Vendors support Others?
Page 42

Agenda
Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining

Page 43

Data Administrator
Clean up data definitions Control shared data Manage distributed data Maintain data quality

Page 44

Clean Up Definitions
Synonyms / aliases Standard data definitions
Names and formats

Data Dictionary
Active Integrated

Page 45

Control Shared Data


Local - used by one unit Shared - used by two or more activities Impact of proposed program changes on shared data Program-to-data element matrix

Page 46

Manage Distributed Data


Geographically dispersed
Whether shared data or not

Different levels of detail


Different management levels

Page 47

Maintain Data Quality


Put owners in charge of data
Verify data accuracy and quality

Purge old data

Page 48

Agenda
Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining

Page 49

The DBMS
Data Base Management System: software that permits a firm to:
Centralize data Manage them efficiently Provide access to applications
Such as payroll, inventory

Page 50

DBMS Components
Data Definition Language (DDL) Data Manipulation Language (DML) Inquiry Language (IQL) Teleprocessing Interface (TP)

Page 51

Definitions
Views:
Physical - how stored Logical - how viewed and used by users

Schema - Overall logical layout of records and fields in a database Subschema: Individual users logical portion of database (view)

Page 52

Agenda
Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining

Page 53

Distributing Data
Centralized files Fragemented files
Distribute data without duplication Users unaware of where data located

Page 54

Distributing Data
Replicated files
Data duplicated One site has master file Problem with data synchronization

Decentralized files
Local data autonomy

Page 55

Distributing Data
Distributed files
Client / server systems Stored centrally Portion downloaded to workstation Workstation can change data Changes uploaded to central computer

Page 56

Agenda
Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining

Page 57

Data Warehousing
Collect large amounts of data from multiple sources over several years Classify each record into multiple categories
Age Location Gender

Page 58

Data Warehousing
Rapidly select and retrieve by multiple dimensions
All females in Chicago under 25 years of age

Provide tailored, on-demand reports Data mart: a replicated subset of the data warehouse
A functional or regional area
Page 59

Data Mining
Fitting models to, or determining patterns from, warehoused data Purposes:
Analyze large amount of data Find critical points of knowledge Perform automatic analyses

Page 60

Data Mining Terms


Data Visualization Drill-down Analysis
Hierarchical structure Leads to increasing level of detail

Expert System (ES) methodology


e.g., neural networks

Page 61

Applications
Finance - fraud detection Stock Market - forecasting Real estate - property evaluation Airlines - customer retention Retail - customer targeting

Page 62

Data Mining Example


What type customers are buying specific products? When are the times customers will most likely shop? What types of products can be sold together?

Page 63

Points to Remember
Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining

Page 64

Discussion Questions
How can a database help an organization? Why normalization is very important for building a database? Do you see any problem of the database in your organization?

Page 65

Discussion Questions
What kind of database model is most suitable for
School? Department store? Police?

Some organizations are hesitant to distribute data. These organizations feel that they may lose control.
Do they lose control? Why? Could you suggest a good tactic?

Could Data Mining pose a threat to individual privacy?


Why or why not? If so, how can we mitigate that threat? Do the advantages outweigh the disadvantages?

Page 66

Das könnte Ihnen auch gefallen