Sie sind auf Seite 1von 14

Data mining & security

Data Mining
(knowledge discovery) is the process
of analyzing data from different
perspectives and summarizing it into
useful information
analyze data from many different
dimensions or angles, categorize it,
and summarize
finding correlations or patterns
among dozens of fields

3 confusing terms
Data ----- are any facts, numbers
Information---- The patterns,
associations, or relationships among
all this data
Knowledge---- Information can be
converted into knowledge about
historical patterns and future trends.

Data Warehouses
Collection of databases
centralized data management and
retrieval
Verrrrryyyyyyyyy huuuugeeeeee.

Consider 3 tables-- exercise

Data mining categorises data as:

Clusters
Associations
Patterns
Classes (pre determined groups)

Data mining consists of five major elements:

Extract, transform, and load transaction


data onto the data warehouse system.
Store and manage the data in a
multidimensional database system.
Provide data access to business analysts
and information technology professionals.
Analyze the data by application software.
Present the data in a useful format, such
as a graph or table.

Uses of data mining


Market segmentation common characteristics of customers
who buy the same products from your company.
Customer churn - Predict which customers are likely to leave
your company and go to a competitor.
Fraud detection - Identify which transactions are most likely to
be fraudulent.
Direct marketing - Identify which prospects should be included
in a mailing list to obtain the highest response rate.
Interactive marketing - Predict what each individual accessing
a Web site is most likely interested in seeing.
Market basket analysis - Understand what products or services
are commonly purchased together; e.g., beer and diapers.
Trend analysis - Reveal the difference between a typical
customer this month and last.

Security issues
Confidentialitysensitive data/how
to control what is disclosed or
derieved
Integrity-correctness/wrong data
useless and damaging
Availabilityperformance and
structure
All these w.r.t databases

Privacy and sensitivity


Consider individual data items or
summary results?????
Individual itemssuffer inferences
and aggregation.
Even if a trustworthy neutral party
mines data, privacy is at stake!!!

Data correctness and


integrity
B4 connecting info, we need to
collect and correct info.
Sometimes, not all collected info is
correct.
So, should have only correct data,
else data mining will lead o WRONG
results.

Using comparable data


Data in the databases should be in
same format.
Only then data mining possible and
yields correct results.

Eliminating false matches


Data mining generates both false
+ve and false ve results.
We need to understand properly.
Correctness of results and Correct
Interpretation of results is a major
security issue in data mining.

Availability of data
Only when proper interoperability of
data bases.
Its better not to produce wrong
results, but
No result is not the same as a result
of no correlation
End

Das könnte Ihnen auch gefallen