Sie sind auf Seite 1von 20

INTRODUCTION

Applied Sta+s+cs and Compu+ng Lab Indian School of Business

Applied Sta+s+cs and Compu+ng Lab

LEARNING GOALS
What is the importance of sta=s=cs? When is sta=s=cs needed? Where can sta=s=cs be used?

Applied Sta+s+cs and Compu+ng Lab

A SMALL STORY FROM THE PAST


World War II, the Royal Air Force( RAF) wanted to t their aircraL with armor Where to t this armor? Imagine:
all the aircraLs were shot in the exact same places each German bomber aRacks aircraLs in the exact same manner

There is varia=on. To answer such ques=ons, we need STATISTICS.


Applied S ta+s+cs and Compu+ng Lab Sta+s+cs is the grammar of science KARL PEARSON

THE STORY CONTD


First gather/collect some informa=on Relevant informa=on DATA: relevant informa=on, collected with the aim of answering certain ques=ons

Applied Sta+s+cs and Compu+ng Lab

THE STORY CONTD


Acknowledge variability collect data answer the ques=on. To nd an answer that is valid for the popula=on [the set of all objects on which we want to make inferences] The RAF concluded that armor had to be t in all the places that these aircraLs had bullet holes.
Applied Sta+s+cs and Compu+ng Lab

THE STORY CONTD


Abraham Wald, a famous sta=s=cian didnt agree with this. Fit the armor in places with no damage! The RAF considered only one part of the popula=on. We call a part/subset of the popula=on a sample. This sample (aircraLs that returned aLer combat) is not representa=ve of the popula=on
Applied Sta+s+cs and Compu+ng Lab

THE STORY CONTD


What about those aircraLs that did not survive? The planes that survived showed that the damage they underwent was not fatal. As we will soon see, for more accurate conclusions to be drawn, a sample is always required to be representa=ve of the popula=on that it is taken from.
Applied Sta+s+cs and Compu+ng Lab

EXAMPLE 1
The Used Car industry of the US How are these cars priced? How do you measure the rate at which a used car is to be sold? How can you determine its value based on several characteris=cs ?

Applied Sta+s+cs and Compu+ng Lab

EXAMPLE 1 CONTD
Do leather seats aRract customers more than the size of the engine? If a car of Buick make has a 4 cylinder engine, leather seats but has travelled more miles than a Cadillac with a 6 cylinder-engine, are the prices same? What is the expected retail price of a Chevrolet sedan with a 4 cylinder-engine that is one year old and has already run 12987 miles?
Applied Sta+s+cs and Compu+ng Lab

EXAMPLE 2
Cardiovascular-diseases (CVD) are becoming unfortunately more common If a person were to ask a doctor to evaluate their CVD risk, how would the doctor go about it? We oLen hear that being overweight increases the risk of CVD Not en=rely accurate It is actually the body fat or the adipose along with the degree of obesity
Applied Sta+s+cs and Compu+ng Lab

EXAMPLE 2 CONTD
Studies have shown that individuals with excess body fat in the abdominal area have a higher risk Computed Tomography (CT scan) is the only technique that allows for the precise and reliable measurement of the AT (at any site in the body).
many physicians do not have access to this method to evaluate their pa=ents Irradia=on of the pa=ent (suppresses the immune system) Expensive

Applied Sta+s+cs and Compu+ng Lab

EXAMPLE 2 CONTD
Is there a simpler yet reasonably accurate way to predict AT area? That is:
Easily available Inexpensive Risk free

A group of researchers (Jean-Pierre Desprs, Denis Prudhomme, Marie-Chris7ne Pouliot, Angelo Tremblay, and Claude Bouchard) conducted a study with the aim of predic=ng the area of abdominal AT using simple anthropometric measurements i.e. measurements on the human body Various measurements were considered:
Weight, Subcutaneous skin-fold thickness, Hip circumference, Waist circumference etc

Applied Sta+s+cs and Compu+ng Lab

EXAMPLE 2 CONTD STATISTICAL INVESTIGATION


Now the ques=on is, can any of these help predict the AT area? For example:
Can the waist circumference of an individual predict the amount of AT he/she has?

This is where sta=s=cal inves=ga=on begins. With a ques=on. Observing the WC of all people is not feasible hence only a few people are considered and based on observa=ons made on them inferences can be drawn
Applied Sta+s+cs and Compu+ng Lab

EXAMPLE 3
If the student knew his/her internal assessment marks and previous year CGPA can they get an idea of how they might perform in the nals? Suppose that a student had the following informa=on on 50 students from the previous batch:
Marks in the nal examina=on Marks in 3 internal assessment tests held during the academic year CGPA obtained in the previous year
Applied Sta+s+cs and Compu+ng Lab

EXAMPLE 3 CONTD
Simply using this data and few sta=s=cal tools we can answer various interes=ng ques=ons:
Can a student predict what range his\her nal examina=on score will lie in? If only the best two internal marks are considered, do the 2nd internal marks have a more important eect on the nal score? Does performing well in two internals bad in the other eect the nals? Is it correct to say someone that did well(or not so well) in internals will do well ( or not so well) in the nals? Does the previous years CGPA, which doesnt depend on the present course, eect the nal scores?
If yes , it could mean previous years CGPA captures the innate ability of the student which otherwise we cannot measure!

Applied Sta+s+cs and Compu+ng Lab

EXAMPLE 4
Marke=ng research The scenario:
There are 4 stores : OceStar, Paper & Co., Oce Equipment, Supermarket There are some customers that have visited and made purchases from each of these stores The stores collect certain feed back from each of the customers. Each customer rates each store on a scale from 1 to 5 ( 1 being the lowest and 5 the highest) on the following aRributes:
Large choice ( wide variety) Low prices Service quality Product quality Convenience Preference Score ( overall sa=sfac=on score) Applied Sta+s+cs and Compu+ng Lab

EXAMPLE 4 CONTD
These stores are interested in answering the following ques=ons: What part of the varia=on in the ra=ngs between stores is because of the customers and not the stores themselves? Does a par=cular class of customers (age wise, gender wise, locality wise etc.) prefer a par=cular store? Does a par=cular store serve a par=cular class of people more eciently? Sta=s=cs can help provide answers to the above ques=ons with a reasonable level of accuracy.

Applied Sta+s+cs and Compu+ng Lab

CONCLUDING REMARKS
The diculty in providing straighsorward answers to all the above ques=ons arises from the fact that there is variability.

The idea behind introducing the above few examples is to emphasize on the need for sta=s=cal inves=ga=on when a ques=on needs to be answered or a hypothesis tested for accuracy in the presence of varia=on.
Applied Sta+s+cs and Compu+ng Lab

Dierent cars, dierent users and hence dierent status of the car aLer a year Dierent people, dierent ages, dierent weights etc

CONCLUDING REMARKS CONTD


The rest of this tutorial is intended to help the user understand the concepts behind several sta=s=cal techniques and apply them eec=vely

Applied Sta+s+cs and Compu+ng Lab

Thank you

Applied Sta+s+cs and Compu+ng Lab

Das könnte Ihnen auch gefallen