Sie sind auf Seite 1von 29

September 15

Lecture 1: Introduction to Data

Statistics 1

Theme: Science Behind Data

An Introduction to Data

JP/Stat1/2015 1
September 15

Theme: Science Behind Data

WeAreData.Watchdogs.com

Theme: Science Behind Data

WeAreData.Watchdogs.com

JP/Stat1/2015 2
September 15

Theme: Science Behind Data

WeAreData.Watchdogs.com

Video

The Making of WeAreData

JP/Stat1/2015 3
September 15

Overview

1.1 Types of Data


1.2 Data Collection
1.3 Data Presentation
1.4 Data Analysis
1.5 Mathematical Background

Introduction

What is Data?

JP/Stat1/2015 4
September 15

Introduction

• Data is a collection of facts, such as values or


measurements.

• It can be numbers, words, measurements,


observations or even just descriptions of
things.

1.1 – Types of Data

Types of Data
• Qualitative Data - Data which assume non-
numerical values.

• Quantitative Data - Data which assume


numerical values.

JP/Stat1/2015 5
September 15

1.1 – Types of Data

Types of Data
Data can be classified into the following types:
Categorical Data Numerical Data

Concept The values of these variables are The values of these variables
selected from an determined list involved a counted or measured
of categories. value.
Subtypes None Discrete values are counts of
items.

Continuous values are measures


and any value which can
theoretically occur, limited only
by precision of the measuring
process.
Examples Gender, a variable that has The number of people living in a
categories of “male” and household is a discrete
“female”. numerical variable.

Academic major, a variable The time taken to commute to


having categories such as school is a continuous variable.
“Math”, “Science” etc.

1.1 – Types of Data

[Finger Exercise - Quick Poll]


Q1.1 The body style of an automobile (sedan, coupe,
wagon etc..) is an example of a:

(a) Discrete variable


(b) Continuous variable
(c) Categorical variable
(d) Constant

JP/Stat1/2015 6
September 15

1.1 – Types of Data

Types of Data
• Discrete Data - Data which assume a finite or
countable number of possible values. Usually
obtained by counting.

• Continuous Data - Data which assume an


infinite number of possible values. Usually
obtained by measurement.

1.1 – Types of Data

Data Classification
• Nominal Level - Level of measurement which classifies data into
mutually exclusive categories in which no order or ranking can be
imposed on the data.

• Ordinal Level - Level of measurement which classifies data into


categories that can be ranked. Differences between the ranks do
not exist.

• Interval Level - Level of measurement which classifies data that can


be ranked and differences are meaningful. However, there is no
meaningful zero, so ratios are meaningless.

• Ratio Level - Level of measurement which classifies data that can be


ranked, differences are meaningful, and there is a true zero. True
ratios exist between the different units of measure.

JP/Stat1/2015 7
September 15

1.1 – Types of Data

[Finger Exercise – Quick Poll]


Q1.2 The height of an individual is an example of a:

(a) Discrete variable


(b) Continuous variable
(c) Categorical variable
(d) Constant

1.1 – Types of Data

[Finger Exercise – Quick Poll]


Q1.3 The number of credit cards in a person’s wallet is
an example of a:

(a) Discrete variable


(b) Continuous variable
(c) Categorical variable
(d) Constant

JP/Stat1/2015 8
September 15

1.1 – Types of Data

[Finger Exercise – Quick Poll]


Q1.4 Which of the following is a discrete variable?

(a) The favorite flavor of ice cream of students at a


primary school
(b) The time taken for a student to walk to school
(c) The distance between home of a student and his
primary school
(d) The number of teachers employed at a primary
school

1.1 – Types of Data

[Finger Exercise – Quick Poll]


Q1.5 Which of the following is a continuous variable?

(a) The eye color of children eating at a fast food


chain
(b) The number of employees of a branch of a fast
food chain
(c) The temperature at which a hamburger is cooked
at a branch of a fast food chain
(d) The number of hamburgers sold in a day at a
branch of a fast food chain

JP/Stat1/2015 9
September 15

1.1 – Types of Data

[Finger Exercise – Quick Poll]


Q1.6 The number of cars that arrive per hour at a
parking lot is an example of:

(a) A categorical variable


(b) A discrete variable
(c) A continuous variable
(d) A statistic

1.2 – Data Collection

Data Collection…

JP/Stat1/2015 10
September 15

1.2 – Data Collection

Sources of Data
• Published Sources

Concept – Data available in print or in electronic


form, including data found on websites. Primary
data sources are those published by individual or
group that collected the data. Secondary data
sources are those compiled from primary sources.

Examples – Newspapers sources, Department of


Statistics.

1.2 – Data Collection

Some Data Sources


• Singapore Government Data –
www.data.gov.sg
• World Bank Data –
www.data.worldbank.org
• National Bureau of Economic Research –
www.nber.org/data/
• Guardian.co.uk –
http://www.guardian.co.uk/data

JP/Stat1/2015 11
September 15

1.2 – Data Collection

Sources of Data
• Experiments

Concept – A study that examines the effect on a


variable by varying the values of another variable or
variables, while keeping all else equal. A typical
experiment consists of both a treatment group and
a control group.

Examples – Pharmaceutical companies use


experiments to determine whether a new drug is
effective.

1.2 – Data Collection

Sources of Data
• Surveys

Concept - A process that uses questionnaires or


similar means to get values for the responses
from a group of participants.

Examples – A poll of likely votes, Census exercise


by the Department of Statistics.

JP/Stat1/2015 12
September 15

1.2 – Data Collection

[Finger Exercise – Quick Poll]


Q1.7 You are working on a project to examine the
value of the American dollar as compared to the
English pound. You assessed the internet where you get
this information for the past 50 years. Which method of
data collection were you using?

(a) Published sources


(b) Experiments
(c) Surveys

1.2 – Data Collection

Population Vs Sample
• The population includes all objects of interest
whereas the sample is only a portion of the
population.

• We don't usually work with populations as they


are usually large, and it is often impossible to get
data for every object we're studying. Sampling
does not usually occur without cost, and the
more items surveyed, the larger the cost.

JP/Stat1/2015 13
September 15

1.2 – Data Collection

[Finger Exercise – Quick Poll]


Q1.8 The portion of the population that is selected for
analysis called:

(a) A sample
(b) A frame
(c) A parameter
(d) A statistic

1.2 – Data Collection

• A Census is when you collect data for every


member of the group (the whole
"population").

• A Sample is when you collect data just for


selected members of the group.

JP/Stat1/2015 14
September 15

1.2 – Data Collection

[Finger Exercise – Quick Poll]


Q1.9 The human resource director of a large corporation
wants to develop a dental benefits package and decides
to select 100 employees from a list of all 5000 workers in
order to study their preferences for the various
components of a package. The 100 employees in the
corporation constitutes the:

(a) Sample
(b) Population
(c) Statistic
(d) Parameter

1.3 – Data Presentation

Data Presentation…

JP/Stat1/2015 15
September 15

1.3 – Data Presentation

Presenting Categorical Data


Categorical Data

Tabulating Data Graphing Data

Frequency Summery Table


Bar Chart Pie Chart Pareto
Diagram
2 way
Cross Classification

1.3 – Data Presentation

Presenting Numerical Data


Numerical Data

Frequency Distributions and Stem-and-Leaf Line Charts and Scatter


Plots
Cumulative Distributions Display

Histogram

JP/Stat1/2015 16
September 15

1.4 – Data Analysis

Data Analysis…

1.4 – Data Analysis

The Branches of Statistics


You can use parameters or statistics to describe
your variables or to make analysis using your
data. These purposes define the two branches
of statistics, namely, descriptive statistics and
inferential statistics.

JP/Stat1/2015 17
September 15

1.4 – Data Analysis

The Branches of Statistics


• Descriptive Statistics

Concept – The branch of Statistics that focuses on


collecting, summarizing, and presenting a set of
data.

Examples – The average age of citizens who live in


an area, the variation in the weight of 100 boxes of
rice chosen from a factory.

1.4 – Data Analysis

The Branches of Statistics


• Inferential Statistics

Concept – The branch of statistics that analyzes


sample data to reach conclusions about a
population.

Examples – A survey that sampled 1200 women


which found that 52% of those polled considered
recommendations from friends and family most
reliable as compared to only 5% who considered
advertising as trusted.

JP/Stat1/2015 18
September 15

1.4 – Data Analysis

[Finger Exercise – Quick Poll]


Q1.10 Those methods that involve collecting,
presenting and computing characteristics of a set of
data in order to properly describe the various features
of the data are called:

(a) Statistical inference


(b) The scientific method
(c) Sampling
(d) Descriptive statistics

1.4 – Data Analysis

[Finger Exercise – Quick Poll]


Q1.11 Based on the results of a poll of 500
registered voters, the conclusion that a particular
candidate for president will win in the upcoming
election is an example of:

(a) Inferential statistics


(b) Descriptive statistics
(c) A parameter
(d) A statistic

JP/Stat1/2015 19
September 15

1.4 – Data Analysis

[Finger Exercise – Quick Poll]


Q1.12 Statistical inference occurs when you:

(a) Compute descriptive statistics from a sample


(b) Take a complete census of a population
(c) Present a graph of data
(d) Take the results of a sample and reach
conclusions about a population.

1.4 – Data Analysis

Parameter Vs Statistic
• Parameters are associated with populations
and statistics with samples.

• We compute statistics, and use them to


estimate parameters.

JP/Stat1/2015 20
September 15

1.4 – Data Analysis

Parameter Vs Statistic
• Parameter

Concept – A numerical measure that describes a


variable of a population.

Examples – The percentage of all registered


voters who intend to vote in the next election.

1.4 – Data Analysis

Parameter Vs Statistic
• Statistic

Concept – A numerical measure that describes a variable


of a sample.

Examples – The percentage of registered voters in a


sample who intend to vote in the next election.

Interpretation – Computing Statistics for a sample is the


most common thing since collection population data can
be impractical in most situations.

JP/Stat1/2015 21
September 15

1.4 – Data Analysis

[Finger Exercise – Quick Poll]


Q1.13 A summary measure that is computed
from only a sample of the population is called:

(a) A parameter
(b) A population
(c) A discrete variable
(d) A statistic

1.4 – Data Analysis

[Finger Exercise – Quick Poll]


Q1.14 A numerical measure that is
computed to describe a characteristic of an entire
population is called:

(a) A parameter
(b) A population
(c) A discrete variable
(d) A Statistic

JP/Stat1/2015 22
September 15

1.4 – Data Analysis

[Finger Exercise – Quick Poll]


Q1.15 The human resource director of a large
corporation wants to develop a dental benefits package
and decides to select 100 employees from a list of all
5000 workers in order to study their preferences for the
various components of a package. All the employees in
the corporation constitutes the:

(a) Sample
(b) Population
(c) Statistic
(d) Parameter

1.5 – Mathematical Background Review

Mathematics
Background

JP/Stat1/2015 23
September 15

1.5 – Mathematical Background Review

Mathematical Review
This review aims to go through:

• The basic rules of arithmetic operations

• Notations concerned with absolute value and ‘greater than’


(>) and ‘less than’ (<) signs

• Summation signs to calculate simple statistical measures

• The properties of a straight line graph such as its slope and


co-ordinates.

1.5 – Mathematical Background Review

BODMAS Rule Revisited


• Make sure you understand the rules of BODMAS, which
stands for:

• Brackets
• Of
• Division
• Multiplication
• Addition and
• Subtraction.

This shows the order in which you should prioritise


arithmetic operations.

JP/Stat1/2015 24
September 15

1.5 – Mathematical Background Review

Squares and Square Roots


• When a number x is multiplied by itself we
write x2, (that is x × x, the square of x).

• Remember that x2 is always non-negative.

• It may help you to think of the square root of x


(written as ) as the opposite of the square, so
that Sqrt(x) × Sqrt(x) = x.

1.5 – Mathematical Background Review

Proportions and Percentages


• What is 95% of 200?

• Give 15/25 as a percentage.

• What is 25% of 98/145?

JP/Stat1/2015 25
September 15

1.5 – Mathematical Background Review

The absolute value in Statistics


• One useful sign in statistics is | | (the absolute value of). Statisticians
sometimes want to indicate that they only want to use the positive value
of a number.

• Give the absolute values for:

a. | − 5 |
b. |(11 − 8)|.

1.5 – Mathematical Background Review

‘Greater than’ and ‘less than’ signs


• > means ‘is greater than’

• ≥ means ‘is greater than or equal to’

• < means ‘is less than’

• ≤ means ‘is less than or equal to’

• ≈ means ‘is approximately equal to’.

JP/Stat1/2015 26
September 15

1.5 – Mathematical Background Review

‘Greater than’ and ‘less than’ signs


• For which of the following is x > 3?
2, 3, 7, 9.

• For which is x < 3?

• For which is x ≤ 3?

• d. For which is x2 ≥ 49?

1.5 – Mathematical Background Review

Summation Sign Σ
• One of the basic quantities you will meet is
the arithmetic mean.

• Given x1= 2, x2= 1, x3= 3, x4= 5, x5= 8 find the


mean.

• Mean= ∑ xi / 5.

JP/Stat1/2015 27
September 15

1.5 – Mathematical Background Review

Graphs

. ( x, f(x) )

1.5 – Mathematical Background Review

Graphs

. ( x, f(x) )

JP/Stat1/2015 28
September 15

1.5 – Mathematical Background Review

The Graph of a Linear Function

y
y = 2x + 3

JP/Stat1/2015 29

Das könnte Ihnen auch gefallen