Sie sind auf Seite 1von 11

Data Mining

W2

Marketing and Sales


Companies precisely record massive amounts of marketing and sales data Applications: Special offers identifying profitable customers (e.g. reliable owners of credit cards that need extra money during the holiday season)
2

Marketing and Sales


Association techniques find groups of items that tend to occur together in a transaction Historical analysis of purchasing patterns Identifying prospective customers Focusing promotional mail outs (targeted campaigns are cheaper than massmarketed ones)
3

Machine Learning & Statistics


Historical difference: Statistics: testing hypotheses Machine learning: finding the right hypothesis But: huge overlap Decision trees (C4.5 and CART) Today: perspectives have converged (joined) Most ML algorithms employ statistical techniques
4

Over fitting
Over fitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Over fitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model which has been over fit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data. However, especially in cases where learning was performed too long or where training examples are rare, the learner may adjust to very specific random features of the training data, that have no causal relation to the target function. In this process of over fitting, the performance on the training examples still increases while the performance on unseen data becomes worse.

Over fitting
Modified strategy E.g. pruning (simplifying a description) Prepruning: stops at a simple description before search proceeds to an overly complex one Postpruning: generates a complex description first and simplifies it afterwards

Data Mining & Ethics


It is widely accepted that before people make a decision to provide personal information they need to know how it will be used and what it will be used for, what steps will be taken to protect its confidentiality and integrity, what the consequences of supplying or withholding the information are, and any rights of claim they may have. Whenever such information is collected, individuals should be told all straight forwardly in plain language they can understand.
8

Data Mining & Ethics


The potential use of data mining techniques means that the ways in which a repository of data can be used may stretch far beyond what was conceived when the data was originally collected. This creates a serious problem: it is necessary to determine the conditions under which the data was collected and for what purposes it may be used. Does the ownership of data bestow the right to use it in ways other than those purported when it was originally recorded? Clearly in the case of explicitly collected personal data it does not. But in general the situation is complex.

Data Mining & Ethics


Ethical issues arise in practical applications Data mining often used to discriminate E.g. loan applications: using some information (e.g. sex, religion, race) is unethical Ethical situation depends on application E.g. same information ok in medical application Attributes may contain problematic information E.g. area code may correlate with race
10

Data Mining & Ethics


Important questions: Who is permitted access to the data? For what purpose was the data collected? What kind of conclusions can be legitimately drawn from it? Caution must be attached to results Are resources put to good use?
11

Das könnte Ihnen auch gefallen