Sie sind auf Seite 1von 37

Chapter 1: Data and Statistics

• Statistics
• Data and Data Sources
• Descriptive Statistics
• Statistical Inference
• Analytics
• Big Data and Data Mining
• Computers and Statistical Analysis
• Ethical Guidelines for Statistical Practice

EC 203 - SPRING 2018 1


What is Statistics?
 The term statistics can refer to numerical facts such as averages, medians, percentages,
and maximums that help us understand a variety of business and economic situations.

 Statistics can also refer to the art and science of collecting, analyzing, presenting, and
interpreting data.

EC 203 - SPRING 2018 2


Applications in Business and Economics
Accounting
Public accounting firms use statistical sampling procedures when conducting audits for their clients.

Economics
Economists use statistical information in making forecasts or to explain interactions between different
economic factors.

Finance
Financial advisors use price/earnings ratios and dividend yields to guide their investment advice on
stocks.

EC 203 - SPRING 2018 3


Applications in Business and Economics
Marketing
Electronic scanners at retail checkout counters are used to collect data for a variety of marketing
research applications.

Production
Statistical quality control charts are used to monitor the output of a production process.

Information Systems
A variety of statistical information helps administrators assess the performance of computer
networks.

EC 203 - SPRING 2018 4


Data and Data Sets
 Data are the facts and figures collected, analyzed, and summarized for presentation
and interpretation.

 All the data collected in a particular study are referred to as the data set for the study.

EC 203 - SPRING 2018 5


Elements, Variables, and Observations
 Elements are the entities on which data are collected
 e.g. students in EC 203
 A variable is a characteristic of the elements
 e.g. age, gender
 The set of measurements for a particular element is called an observation.
 The total number of data values in a complete data set is the number of observations
multiplied by the number of variables.

EC 203 - SPRING 2018 6


Data Set Example:
Company Stock Exchange Annual Sales ($M) Earnings per share ($)
Dataram NQ 73.10 0.86
EnergySouth N 74.00 1.67
Keystone N 365.70 0.86
LandCare NQ 111.40 0.33
Psychemedics N 17.60 0.13

- What’s the data element?


- How many variables?
- How many observations?

EC 203 - SPRING 2018 7


Scales of Measurement
 The scale
• determines the amount of information contained in the data
• indicates the most appropriate data summary method

 Scales of measurement include


• Nominal
• Ordinal
• Interval
• Ratio

EC 203 - SPRING 2018 8


Scales of Measurement
Nominal scale
• Data for a variable consist of labels or names.
• A numeric code may be used; but the variable remains to be nominal.

Example:
Name School
Sean D. Arts and Sciences
Poonam B. Business
Ming Y. Education
Katy C. Arts and Sciences

EC 203 - SPRING 2018 9


Scales of Measurement
Ordinal scale
• The data have the properties of nominal data and the order of the data is meaningful.
• A numeric code may be used.

Example:
Name School Class standing
Sean D. Arts and Sciences Freshman
Poonam B. Business Junior
Ming Y. Education Freshman
Katy C. Arts and Sciences Sophomore

EC 203 - SPRING 2018 10


Scales of Measurement
Interval scale
• The data have the properties of ordinal data, and the interval between observations is
expressed in terms of a fixed unit of measure.
• Interval data are always numeric.

Example:
Name School Class standing GRE score
Sean D. Arts and Sciences Freshman 1350
Poonam B. Business Junior 1200
Ming Y. Education Freshman 1370
Katy C. Arts and Sciences Sophomore 1100

EC 203 - SPRING 2018 11


Scales of Measurement
Ratio scale
• Data have all the properties of interval data and the ratio of two values is
meaningful.
• Ratio data are always numerical.
• Zero value is included in the scale.

• Examples:
• Distance, height, weight, time
• Price of a book: $200 vs. $100, twice the cost

EC 203 - SPRING 2018 12


Categorical and Quantitative Data
• Data can be further classified as being categorical or quantitative.
• The statistical analysis that is appropriate depends on whether the data for the variable
are categorical or quantitative.
• There are more alternatives for statistical analysis with quantitative data.

EC 203 - SPRING 2018 13


Categorical Data
• Labels or names are used to identify values
• Often referred to as qualitative data
• Use either the nominal or ordinal scale of measurement
• Can be either numeric or nonnumeric
• Statistical analyses are rather limited

EC 203 - SPRING 2018 14


Quantitative Data
• Quantitative data indicate how many or how much.
• Quantitative data are always numeric.
• Ordinary arithmetic operations are meaningful for quantitative data.

EC 203 - SPRING 2018 15


Scales of Measurement
Data

Categorical Quantitative

Nominal Ordinal Interval Ratio

EC 203 - SPRING 2018 16


Cross-Sectional and Time Series Data
• Depends on when the data are collected.
• Cross-sectional data are collected at the same or approximately the same point in time.
• Time series data are collected over several time periods.
• Graphs of time series data help analysts:
• identify any trends over time
• project future levels for the time series

EC 203 - SPRING 2018 17


Cross-Sectional and Time Series Data
Examples:
Cross-Sectional: Data on the number of building permits issued in November 2013 in
each of the counties of Ohio.

Time Series: Data on the number of building permits issued in Lucas County, Ohio every
month from 2013 to 2016.

EC 203 - SPRING 2018 18


Time Series Data

EC 203 - SPRING 2018 19


Data Sources
1. Existing Sources
2. Observational Studies
3. Experimental Studies

1. Existing Sources
• Internal company records – Sales and production records, employee records
• Business organizations– Bloomberg, Dow Jones, ACNielsen
• Government agencies - U.S. Department of Labor
• Industry associations, Special-interest organizations, World Bank

EC 203 - SPRING 2018 20


Data Sources
Data Available From Selected Government Agencies:
Government Agency Web address Some of the Data Available
Census Bureau www.census.gov Population data, number of households, household income

Federal Reserve Board www.federalreserve.gov Data on money supply, exchange rates, discount rates

Office of Mgmt. & Budget www.whitehouse.gov/omb Data on revenue, expenditures, debt of federal government

Department of Commerce www.doc.gov Data on business activity, value of shipments, profit by industry

Bureau of Labor Statistics www.bls.gov Customer spending, unemployment rate, hourly earnings, safety
record

EC 203 - SPRING 2018 21


Data Sources
2. Observational Studies:
• Observe and record data on variables of interest; conduct analysis
• Researcher has no control or influence on the variables
• Example:
• To study relationship between smoking and depression, you compare smokers and
nonsmokers using survey data
• Observational since the researcher does not determine/control who smokes and who doesn’t

EC 203 - SPRING 2018 22


Data Sources
3. Experimental Studies:
• Involves controlled conditions and assignment to groups
• First, researcher identifies the variable of interest
• Then one or more other variables are identified and controlled so that data can be obtained
about how they influence the variable of interest.
• Example:
• How does new drug B affects blood pressure compared to generic drug A?
• Identify sample
• Divide sample into two groups: Group 1 receives A; Group 2 receives B
• Collect blood pressure data before and after

EC 203 - SPRING 2018 23


Descriptive Statistics
• Summaries of data that are easy to understand are referred to as descriptive statistics.
• May be numeric, tabular or graphical.

Example:
The manager of Hudson Auto would like to have a better understanding of the cost of parts used
in the engine tune-ups in her shop. She examines 50 customer invoices.

EC 203 - SPRING 2018 24


Example: Hudson Auto Repair
Parts Costs ($) for 50 Tune-ups:

91 78 93 57 75 52 99 80 97 62

71 69 72 89 66 75 79 75 72 76

104 74 62 68 97 105 77 65 80 109

85 97 88 68 83 68 71 69 67 74

62 82 98 101 79 105 79 69 62 73

EC 203 - SPRING 2018 25


Tabular Summary: Frequencies
Parts Cost ($) Frequency Percent Frequency
50-59 2 4%
60-69 13 26%
70-79 16 32%
80-89 7 14%
90-99 7 14%
100-109 5 10%
TOTAL 50 100%

EC 203 - SPRING 2018 26


Graphical Summary: Histogram
Example: Hudson Auto Tune-up Parts Cost
18

16

14

12
Frequency

10

0
50-59 60-69 70-79 80-89 90-99
Parts Cost ($)

EC 203 - SPRING 2018 27


Numerical Descriptive Statistics
• The most common numerical descriptive statistic is the mean.
• It provides a measure of the central tendency, or central location of the data.
• Hudson’s mean cost of parts, based on the 50 tune-ups studied?
• mean=$79

• Standard deviation: measures amount of variation or dispersion.


• Smaller values indicates that the data points tend to be close to the mean
• Hudson’s costs example, SD=$14

EC 203 - SPRING 2018 28


Statistical Inference
• Population versus sample:
• Population: The set of all elements of interest in a particular study.
• Sample: A subset of the population.

• Statistical inference: The process of using data obtained from a sample to make estimates and
test hypotheses about the characteristics of a population.
• Major contribution of statistics

• Census: Collecting data for the entire population.


• Sample survey: Collecting data for a sample.

EC 203 - SPRING 2018 29


Process of Statistical Inference
Example: Hudson Auto

Step 1 Step 2 Step 3 Step 4


• Population • A sample of 50 • The sample data • The sample
consists of all tune engine tune-ups is provides a sample average is used to
ups. Average cost examined. average parts cost estimate the
of parts is of $79 per tune- population
unknown. up. average.

EC 203 - SPRING 2018 30


Analytics

Analytics is the scientific process of transforming data into insight for making better decisions.
Techniques:
1. Descriptive analytics: Describes what has happened in the past.
2. Predictive analytics: Use models constructed from past data to predict the future or to assess
the impact of one variable on another.
3. Prescriptive analytics: The set of analytical techniques that yield a best course of
action.

EC 203 - SPRING 2018 31


Big Data
Big data: Larger and more complex data sets.

Three V’s of Big data:


Volume : Amount of available data
Velocity: Speed at which data is collected and processed
Variety: Different data types

EC 203 - SPRING 2018 32


Big Data

Data warehousing is the process of capturing, storing, and maintaining the data.
Organizations obtain large amounts of data on a daily basis by means of magnetic card readers,
bar code scanners and touch screen monitors.
• Wal-Mart captures data on 20-30 million transactions per day.
• Visa processes 6,800 payment transactions per second.

EC 203 - SPRING 2018 33


Data Mining
• Developing useful decision-making information from large databases.
• Using statistics, mathematics, and computer science, analysts “mine the data” to convert it into
useful information
• Relies heavily on statistical methods
• Also involves artificial intelligence and machine learning

•The major applications used by companies with a strong consumer focus


• Example: Online shopping suggestions

EC 203 - SPRING 2018 34


Data Mining: Model Reliability
• A statistical model that works well for a particular sample does not necessarily mean that it can
be reliably applied to other data.
• Common statistical approach to check reliability
• Divide the data into a training set and a test set
• Use training set for model development, test set for validation
• Huge amounts of data available makes this easy

• Careful interpretation of results and precise modeling is important.

!!!Correlation is not causation!!!

EC 203 - SPRING 2018 35


Ethical Guidelines for Statistical Research
In statistics, unethical behavior can arise if researchers ‘milk’ the data until a desired result is
obtained.
Unethical behavior can include:
◦ Improper sampling
◦ Inappropriate analysis of the data
◦ Developing misleading graphs
◦ Use of inappropriate summary statistics
◦ Biased interpretation of results

One should strive to be fair, thorough, objective, and neutral as you collect, analyze, and present
data.

EC 203 - SPRING 2018 36


Ethical Guidelines for Statistical Research
• Developing misleading graphs:

US Birthrate (per 1000) US Birthrate (per 1000)


16

15 14

14
13.5

13
13
12
12.5
11

12
10
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Source: Word Bank Development Indicators

EC 203 - SPRING 2018 37

Das könnte Ihnen auch gefallen