Sie sind auf Seite 1von 21

Basic Statistics: Module 1

Caring PeopleBuilding Businesses. Building Careers.


Global Reporting & Analytics
By the end of the session, analysts should be able to

Define and describe basic statistical and analytical terms


Identify the appropriate tools and strategies to be used for various
problems

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Agenda
I. Introduction What is Statistics?
A. Terms and Definitions
a. Types of Data
b. Population and Sample
II. Distributions & Summary Statistics
A. Describing the distribution:
a. Shape
1. Symmetry
2. Skewness
3. Modality
b. Center
1. Average\Mean
2. Median
3. Mode
c. Spread
1. Percentile, Deciles and Quartiles
2. Range
3. Inter-quartile Range
4. Variance
5. Standard Deviation

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Types of Data

Numeric Data (Quantitative Data)


Continuous values which takes on continuous scale (e.g., weight in lbs.)

Categorical Data (Quantitative Data)


Ordinal order of values have meaning (e.g., year, weeks)
Nominal describes the object of interest

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Data Type Conversion
The tenure (in months) of Makati Analysts:
44, 90, 80, 135, 21, 53, 29, 128, 47, 11, 15, 49, 66, 49, 21, 110, 23,
50, 48, 50, 47, 45

The data can be further categorized into:

Ordinal Nominal

less than 6 months Not tenured

6 months to less than a year Tenured

1 year to less than 2 years

2 years and more Legend

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Data Type Drill
Determine the type of data for the following:
If possible, how can these be converted to other data types?

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Population vs. Sample

A collection or set of well-defined objects is called a Population.

A subset of a Population is called a Sample.

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Population vs. Sample

The pattern on how the data are distributed across the


measurement scale is called Distribution

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Describing a Distribution
We describe a distribution by its
1. Shape usually described by
Symmetry
Modality
Outliers
2. Center refers to the measure of the
middle or expected value of the
data set
Mean
Median
Mode
2. Spread also called variation,
denotes variability in a distribution
Percentile, Decile, Quartile
Range
Interquartile Range
Standard Deviation and Variance

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Shape: Symmetry

Symmetric
Left and right side of the center are mirror images of each other.

Skewed
Skewed to the Right\Positively Skewed Long tail to the right
Skewed to the Left\Negatively Skewed Long tail to the left

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Shape: Modality

Modality
Refers to the number of peaks in a dataset.
Mode is the most frequent value in a dataset

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Outliers

Outliers
Observations that deviate markedly from the rest of the data
Could result from special causes; may indicate bad data

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Shape of the Tenure Data of Makati Analysts

The tenure (in months) of Makati Analysts:


44, 90, 80, 135, 21, 53, 29, 128, 47, 11, 15, 49, 66, 49, 21, 110, 23,
50, 48, 50, 47, 45
6

3
Frequency

0
24 36 48 60 72 84 96 More

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Measures of Central Tendency
Mean, Median, Mode

Median
Mean Middle value of a set of data that has
Ratio between the sum of values and been put into rank order.
the number of values
Arithmetic average

Mode
Observation with the highest
frequency

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Central tendency measures of the tenure data of
Makati Analysts

Median
Mean 55.05
Mean
Median 48.50
6
Mode 21

3
Frequency

0
24 36 48 60 72 84 96 More

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Measures of Spread

Measures of spread are descriptive statistics that describe how


similar a set of scores are to each other
The more similar the scores are to each other, the lower the measure of spread will be
The less similar the scores are, the higher the measures of spread will be

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Percentiles, Deciles and Quartiles
Percentiles
Are values that divide a set of observations into 100 equal parts
Denoted by P1, P2, , P99
These values are such that 1% of the data falls below P1, 2% falls below P2,
, and 99% falls below P99.
Deciles
Are values that divide a set of observations into 10 equal parts
Denoted by D1, D2, , D9.
These are values are such that 10% of the data falls below D1, 20% falls
below D2, , and 90% falls below D9.
Quartiles
Are values that divide a data set into 4 equal parts
Denoted by Q1, Q2, and Q3
These values are such that 25% of the data falls below Q1, 50% falls below
Q2, and 75% falls below Q3.

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
How to compute for Percentiles

Say we are computing for the 85th percentile (P85)


Let n = total number of observations = 40
1. Sort the data from highest to lowest
2. Compute for (85/100) x 40 = 34
3. You are being referred to the 34th observation.

Compute for the 50th Percentile in the Makati Analysts tenure data
. 44, 90, 80, 135, 21, 53, 29, 128, 47, 11, 15, 49, 66, 49, 21, 110, 23, 50, 48,
50, 47, 45

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
How to compute for
Percentiles, Deciles and Quartiles

Say we are computing for the 85th percentile (P85)


Let n = total number of observations = 40
1. Sort the data from highest to lowest
2. Compute for (85/100) x 40 = 34
3. You are being referred to the 34th observation.

To compute for Quartiles (e.g., Q1 )


(25/100) x 40 = 10
Look for the 10th observation

To compute for Deciles (e.g., D7)


(70/100) x 40 = 28
Look for the 28th observation

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
The Range

The range is defined as the difference between the largest and


smallest scores in the data set,

The interquartile range (IQR) is defined as the difference of the first


and third quartiles
IQR = (Q3 Q1) ; where Q1 is the 1st quartile or the 25th percentile
Q3 is the 3rd quartile or the 75th percentile
a measure that indicates the extent to which the central 50% of
values within the data set are dispersed.

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential
Computing for IQR

Caring PeopleBuilding Businesses. Building Careers.

Company Confidential

Das könnte Ihnen auch gefallen