Sie sind auf Seite 1von 41

APPLIED STATISTICS SOFTWARE

STATISTICS - BASIC CONCEPTS

David Garca Len


www.davidgl.eu
david@davidgl.eu
April 2016
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA)
1

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 1

APPLIED STATISTICS SOFTWARE


STATISTICS BASIC CONCEPTS
1. QUALITATIVE AND QUANTITATIVE RESEARCH
2. DESCRIPTIVE STATISTICS AND INFERENTIAL
STATISTICS
3. POPULATION AND SAMPLE
4. SAMPLING METHODS
5. SAMPLE SIZE
6. TYPES OF VARIABLES

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 2

When you decide how to study a certain


issue, problem or phenomenon, you can
choose either a qualitative or a quantitative
methodology.

Qualitative
Research

Vs

Quantitative
Research

Different
Different methods,
methods, tools
tools and
and
procedures
procedures to
to analyse
analyse information.
information.

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 3

QUALITATIVE RESEARCH (I)


Objective: Understanding the deeply hidden nature of

phenomena.
Obtaining knowledge about emotions, sensitivity thresholds,
barriers, attitudes, evaluations, desires and needs of a target
group.
Qualitative research is inductive (used to start the research
process).
What matters is what was said, not how many times:
processes and meanings are rigorously examined, but not
measured in terms of quantity, amount or frequency.

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 4

QUANTITATIVE RESEARCH
Objective: determining the relationship between an
independent variable and a dependent one.

It allows measuring the extent of phenomena.


Quantitative research is deductive (hypothesis are identified
before research begins).
Quantitative research often requires recruiting hundreds of
participants (for reducing the likelihood of biases).

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 5

Complementary approaches:

Qualitative
Qualitative
Research
Research

Research
Research subject
subject definition.
definition.
Hypotheses
Hypotheses definition.
definition.

Quantitative
Quantitative
Research
Research

Research
Research hypotheses
hypotheses tests.
tests.
Generalizable
Generalizable conclusions.
conclusions.

Quantitative
enumerates,
and qualitative
explains.

Measure what can be measured, and make


measurable what cannot be measured
Galileo Galilei.

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 6

Choose a qualitative method when most


of these conditions apply:
You have no existing research data on
this topic.
You are exploring the reasons why
people do or believe something.
The most appropriate unit of
measurement is not certain
(Individuals? Households?
Organizations?)
The concept is assessed with no clear
demarcation points.

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 7

Choose a quantitative method when


most of these conditions apply:
The research is confirmatory rather
than exploratory (i.e. this is a
frequently researched topic, and
numerical data from earlier research is
available).
You are trying to measure a trend.
There is no ambiguity about the
concepts being measured, and only
one way to measure each concept.

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 8

Statistics is more than just a collection of mathematical techniques,


it is not only putting numbers into formulas or computers.
Statistics is concerned with the
collection, organization and
description of a dataset, and
the use of probability theory to
make predictions that are
useful for taking decisions in an
uncertainty context.
.

Statistics is about
learning from data.
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 9

2. DESCRIPTIVE STATISTICS AND INFERENTIAL STATISTICS

Descriptive Statistics describes show or summarize the basic


features of the data in a study.
Tables (Frequency Distribution)
Graphs
Statitstics (Calculations)

Inferential Statistics deduces (infers) the properties of a


population from the analysis of the properties of a data sample
drawn from it.
Inference: using facts you have to learn about facts you dont have.
(Gary King)
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 10

3. POPULATION AND SAMPLE


Population (universe): the entire set of all individuals, items, or
subjects whose characteristics are being studied. The size of the
population is referred as N.
Parameter: measurable characteristic of a population. For
example, the mean of a population is denoted by the symbol .
Sample: subset of items drawn from a population, and used to test
hypotheses about such population. The size of the sample is
referred as n.
Statistic: measurable characteristic of a sample. Statistics vary
from sample to sample. For example, the mean of a sample is
denoted by the symbol x
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 11

Sometimes, an entire population is analyzed (elections, study of


all members of a (small) association,) and then there is no
inaccuracy or error.
But researchers often rely on samples!
Why shall we choose a sample (instead of the entire population)?
Budget
Budget or
or time
time restrictions
restrictions (e.g.
(e.g. unemployed
unemployed people).
people).
Impossibility
Impossibility of
of identify
identify and
and access
access all
all population
population members
members
(e.g.
(e.g. people
people who
who may
may suffer
suffer insomnia).
insomnia).
Sometimes
Sometimes analyzing
analyzing an
an item
item means
means destroying
destroying itit (e.g.
(e.g.
bulbs
bulbs produced
produced by
by aa certain
certain factory)
factory)

12
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 12

WHICH

Sampling methods

INDIVIDUALS CONSTITUTE THE SAMPLE?

HOW
MANY

Sample size

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 13

4. SAMPLING METHODS

How are the individuals in the sample selected?


Objective: obtain a sample that is representative of the population,
so that our findings could be generalized to the whole group.
SAMPLING
METHODS

Probability

Non-Probability

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 14

SAMPLING

Probability

Non-Probability

Every member of the


population has a known
non-zero probability of
being selected.

Some elements of the population have


no chance of selection, or the
probability of selection can't be
accurately calculated.

The sampling error can be


calculated, and inference
can be undertaken.

The selection is but based on


assumptions regarding the population.
Hence, this sampling does not allows
the estimation of sampling errors and
inference cannot be undertaken
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 15

SAMPLING

Probability

Non-Probability

Simple Random Sampling

Convenience Sampling

Systematic Sampling

Judgement Sampling

Stratified Sampling

Quota Sampling

Cluster Sampling

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 16

Probability Sampling

Simple Random Sampling (S.R.S.): Each member of the


population has an equal and known probability of being
selected.
Each one of them is assigned a number, and the sample is
determined by generating random numbers.
Applicable when population is small, homogeneous and
readily available

Estimates are easy to calculate.


It requires a complete and accurate record of the population.
It can only be done with small populations where all individuals are
identified.

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 17

Probability Sampling

Systematic Sampling. The population is arranged according to


some ordering scheme, then a random start is chosen, and then
and then elements are selected at regular intervals (every kth
element from then onwards) through that ordered list.

Easier to conduct than a simple random sample (gains in time,


effort and cost).
It requires a complete and ordered record of the population.
It can produce biased findings if the population data presents
any hidden order, periodicity or pattern.

Example:
A simple example would be to select every 10th name from the
telephone directory (an 'every 10th' sample, also referred to as 'sampling
with a skip of 10').
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 18

Probability Sampling

Stratiefied Sampling. It involves a dividing the population into


heterogeneous non-overlapping groups (strata), which contain fairly
homogenous individuals. E.g. age-groups, genders.
Then each stratum is sampled as an independent sub-population,
out of which individual elements can be randomly selected, and
have the same chance of being selected.

It allows comparison between strata, and estimates of the


population parameters for each stratum.

Difficulty identifying the appropriate strata, particularly with little


knowledge of the population characteristics.

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 19

Example: A certain company has a staff of 180 employees:

Male, full time

90

Male, part time

18

Female, full time

Female, part time

63

ni

(90/180)x100 =
50

20

(18/180)x100 =
10
(9/180)x100 =
5

4
2
14

(63/180)x100 =
35
is needed,
using

and a sample of n = 40 individuals


stratified proportional
sampling according to those categories.
The first step is to calculate the weight of each group in the total staff:
50% of the sample individuals should be male full time (20 people), 10%
should be male part time (4 people), 5% should be female full time (2
people), and 35% should be female part time (14 people).
Then a SRS within each stratum would be conducted .
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 20

Probability Sampling

Cluster Sampling involves a two-step procedure:


1st population is subdivided into groups (clusters) that are
expected be homogenous amongst each other but heterogeneous
internally, so that each of them is as representative of the
population as possible.
In a 2nd step, a random sample of these clusters is selected, and
either all observations in the selected clusters are included in the
sample (one-stage clustering), or a random sample of elements is
selected within each of these groups (two-stage clustering).

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 21

Probability Sampling

Can reduce time, effort or administrative costs.


Simple when population shows a natural arrangement (e.g.
geographical).
Actual clusters are not completely homogeneous, so the sample
may not be representative

Example:
A chain of hardware stores wants to know the buying profile of its
costumers.
Since it may not be possible to list all of the customers of a chain of
hardware stores, it would be possible to randomly select a subset of
stores (stage 1 of cluster sampling) and then interview a random sample
of customers who visit those stores (stage 2 of cluster sampling).

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 22

Non-Probability Sampling

Convenience Sampling: Individuals are chosen for


convenience or ease: they are ready available or at hand to
the researcher.

Very popular in practice, due to its simplicity.


Elements are selected arbitrarily from the population, so the
sample is not representative of the population.
There is no randomness and the likelihood of bias is high, so it is
only adequate for subjective assessments or pilot studies.

Example: The interviewer has to conduct a survey at a shopping center.


She goes early in the morning on a given day, so the people that s/he
could interview would be limited to those given there at that given time,
which would not represent the views of other members of society in such
an area, if the survey was to be conducted at different times of day and
several times per week.
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 23

Judgment or Purposive Sampling


Sample selection is based on the researchers belief that they
would be appropiate for the study.
Often used in political polling: some districts chosen because
their pattern has in the past provided good idea of outcomes for
whole electorate.

Used very often since it involves a low cost and time effort.
Elements are selected arbitrarily from the population, so the
sample is not representative of the population.
There is no randomness and the likelihood of bias is high, so it is
only adequate for subjective assessments or pilot studies.
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 24

Non-Probability Sampling

Quota Sampling involves a two-step procedure:


In 1st place population is segmented into mutually exclusive subgroups (just as in stratified sampling), following one or more
criterion such as age, income, frequency of purchase, or usage
patterns.
Then, in a 2nd step convenience or judgment of the researcher is
used to select individuals within each group (sample size from
each category is proportional to its weight in he whole population).
It is this second step which makes the technique one of nonprobability sampling

Quick and cheap to apply.

The structure or characteristics of the population have to be known


ex-ante in order to obtain a similar structure for the sample.
As a non probability technique, inference cannot be undertaken.
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 25

Which is the appropiate sampling technique?

It depends on:

Research objectives
Need for statistical analysis and degree of accuracy required.
Available resources (time and funds)
Knowledge regarding the target population

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 26

5. SAMPLE SIZE

HOW
MANY
INDIVIDUALS COMPRISE THE SAMPLE?

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 27

Is sample
size so
important?
Tested
Tested
on
on 26
26
women
women &
&
men
men

Tested
Tested on
on 23
23
women
women

Tested
Tested on
on 18
18
women
women

Tested
Tested on
on 20
20
men
men
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 28

Sample information is
not as accurate and
truthful as population
information.
So, the bigger the
sample is, the more
precise information
offers.

But, on the other hand,


the bigger the sample
is, the more
expensive it is the
sampling process.

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 29

So, which is the optimal sample size?

Insufficient size

No scientific
scientific results
results

Excessive size

Waste of resources
resources

Trade-off between quality and cost of our research

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 30

Degree of variability of
the measured variable

Population size,
N

Confidence level

Population
homogeneity
Sampling Error
(precision required)

The percentage of all possible


samples that can be expected to
include the true population
parameter. (It tells you how sure
you can be)

Sampling Method

Statistical
technique

Maximum expected difference


between the population
parameter and its sample
estimate.

Available time & money!


UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 31

Specific tools (freeware) for calculating sample size:

GP Power 3.1
http://www.psycho.uniduesseldorf.de/abteilungen/aap/gpower3/download-and-register

PS Power and Sample size


http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize

ST Plan
https://biostatistics.mdanderson.org/SoftwareDownload/Singl
eSoftware.aspx?Software_Id=41

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 32

Specific Webs for online calculating sample size:


http://stat.ubc.ca/~rollin/stats/ssize/
http://www.stat.uiowa.edu/~rlenth/Power/index.html
http://www.raosoft.com/samplesize.html
http://epitools.ausvet.com.au/content.php?page=SampleSize
http://statpages.org/index.html#Power

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 33

6. TYPES OF VARIABLES
Variable: any characteristic or attribute that differs for different
subjects.
Variables are classified
according to their
nature or
measurement scale.

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 34

6. TYPES OF VARIABLES
According to
quantitative.
QUALITATIVE or
CATEGORICAL

their

nature,

variables

are

qualitative

Represent characteristics (or categories) that


cannot be measured or quantified. Such
characteristics are not a number, and, if it is a
number, it cannot be used for calculations

Dichotomous: only two categories are defined. (binary,


dummy)
Gender
Gender (male/female),
(male/female), consumer
consumer (yes/no)
(yes/no)
Polytomous: more than two categories are defined.
Marital status, religious group, ZIP-Code
ZIP-Code
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 35

6. TYPES OF VARIABLES
QUANTITATIVE or
NUMERICAL

Represent characteristic
measured or quantified.

that

can

be

Discrete: variable whose values are countable.


Number
Number of
of children
children in
in aa household,
household, times
times aa place
place has
has
been
been visited
visited
Continous: variable that can assume any numerical value over
one or several intervals.
Weight, temperature, salary

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 36

6. TYPES OF VARIABLES
Codification: assigning a certain number to each category of the
qualitative variable. I.e, using numbers to describe the outcomes.
Gender

a) Male

b) Female

Those numbers do not have any meaning, soy they cannot be used for
calculations
Discretisation: Converting a quantitative variable into a qualitative
variable, according to whether or not the quantitative variable exceeds
a critical threshold.
For the variable Monthly income, we can consider the following categories:
If monthly income >= 5.000 : high income
2.000 =< monthly income < 4.999 : medium-high income
1.000 =< monthly income < 1.999 : medium-low income
Loss of
Monthly income < 999 : low income
information

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 37

6. TYPES OF VARIABLES
According to their measurement scale, variables are nominal or
ordinal (if qualitative ), or interval or ratio (if quantitative).

NOMINAL

Numbers
Numbers serve
serve only
only as
as labels
labels for
for
individuals,
individuals, but
but they
they are
are randomly
randomly
Categories
Categories cannot
cannot be
be rank
rank ordered.
ordered.

identifying
identifying
assigned.
assigned.

Gender,
Gender, Marital
Marital status
status

ORDINAL

Categories
Categories can
can be
be ordered
ordered in
in aa hierarchical
hierarchical
fashion,
fashion, but
but values
values cannot
cannot provide
provide relative
relative
distance.
distance.

Ranking
Ranking of
of sportsman,
sportsman, socioeconomic
socioeconomic status,
status, opinion
opinion
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 38

6. TYPES OF VARIABLES
INTERVAL

It provides distance properties, i.e., it allows


comparison
comparison between
between different
different individuals.
individuals. Origin
Origin
(zero
(zero point)
point) is
is arbitrary.
arbitrary.

Time,
Time, temperature
temperature (
( C)
C)

RATIO

It provides
provides assignment,
assignment, order,
order, distance
distance and
and
origin
origin properties.
properties. Origin
Origin (zero)
(zero) has
has aa meaning
meaning of
of
absence.
absence.

Sales, age, number of children in a household


household
Measurement scale determines which statistical techniques
can be applied.
UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 39

6. TYPES OF VARIABLES

In Statistics (probability/inference), we will usually work


with RANDOM VARIABLES.

Differences between a variable and a random variable?


Realisations of a random variable hinge on probability
A sample/dataset is a collection of realised random
variables

Examples of random variables

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 40

APPLIED STATISTICS SOFTWARE


STATISTICS - BASIC CONCEPTS

Thanks for your attention


UCAM MASTER IN BUSINESS ADMINISTRATION (MBA)

41

41

UCAM MASTER IN BUSINESS ADMINISTRATION (MBA) - 41