Beruflich Dokumente
Kultur Dokumente
Why now?
Precursors/impulses go back a long time "We have always been an information society": control revolution of the 19th century Industrial revolution: all this stuff , and people, to keep track of Technologies of keeping-track: forms, standards, job descriptions/requirements, schedules, exams, inspections, categories, reports, files, "your permanent record" machine- readable and -processable data: Hollerith machines (from automatic looms), leading to IBM and the rest of the pre-computer information-processing industry statistics: knowing/finding resources, finding patterns, making plans (originally, a "statistician" was someone who advised a state about its resources and those of its enemies) Limited by cost: collecting, storing, examining data all expensive especially when it must be done by hand people are slow
www.stat.cmu.edu/~cshalizi/350/lectures/00/00.html 1/3
4/6/13
people are expensive (time, training) people don't scale (can't just copy programs) people can't explain themselves and when data have to be specially made rather than a by-product of normal activity Computers drastically lower the cost of collecting, storing, accessing and examining data think of drawing plots if nothing else! plus you record transactions on the computer anyway Data-mining is about automating parts of the analysis process look for patterns (what kind of pattern? look how?) preferably interesting ones (interesting to who? how do you tell?) and check that they're not just flukes (for example...) Clinical vs. actuarial judgment as proof-of-concept psychiatrists are worse at predicting patient outcomes than simple decision rules ... but it turns out no profession is better than simple rules (though some are as good) what to do when there are no good professionals?
www.stat.cmu.edu/~cshalizi/350/lectures/00/00.html
2/3
4/6/13
Some Themes
Choice of representation/abstraction is important Choices within method are important Methods and representations are interdependent Choices have to be justified as helping you meet specific goals; beware of optimality criteria! The importance of not fooling yourself and/or programming the machine to fool you: using predictions and perturbations Technical theme: bias/variance or accuracy/precision trade-off Technical theme: adaptability is a partial substitute for knowledge Technical theme: successive approximation/iterative algorithms
www.stat.cmu.edu/~cshalizi/350/lectures/00/00.html
3/3