Beruflich Dokumente
Kultur Dokumente
Uncertainty: Decisions are often based on incomplete information from uncertain events. We use statistical methods and statistical analysis to make decisions in uncertain environment. Population: Sample: A population is the complete set of all items in which an investigator is interested. A sample is a subset of population values.
& Example: Population - High school students - Households in the U.S. Sample - A sample of 30 students - A Gallup poll of 1,000 consumers - Nielson Survey of TV rating Random Sample: A random sample of n data values is one selected from the population in such a way that every different sample of size n has an equal chance of selection.
& Example: Random Selection - Lotto numbers - Random numbers Random Variable: A variable takes different possible values for a given subject of study.
Numerical Variable: A numerical variable takes some countable finite numbers or infinite numbers. Categorical Variable: A categorical variable takes values that belong to groups or categories. Data: Data are measured values of the variable. There are two types of data: quantitative data and qualitative data.
Part I (Chapters 1 11) Quantitative Data: Qualitative Data: & Example: 1. 2. 3. 3. 4. 5. 6. 7. 8. Statistics: Quantitative data are data measured on a numerical scale. Qualitative data are non-numerical data that can only be classified into one of a group of categories.
Temperature Height Age in years Income Prices Occupations Race Sales and Advertising Consumption and Income Statistics is the science of data. This involves collecting, classifying, summarizing, analyzing data, and then making inferences and decisions based on the data collected. The numerical measures of a population are called parameters.
Population Parameters:
& Example: Population average. Sample Statistics: The numerical measures of a sample are called sample statistics.
& Example: Sample average. Descriptive Statistics: Descriptive statistics summarizing data. involves collecting, classifying, and
Inferential Statistics:
Inferential statistics makes statistical inference about the population parameters based on sample information.
Business Decisions: From time to time, we use quantitative analysis to make business decisions. & Example: Economics: Price of a Good, Interest Rate, Mortgage Rate Finance: Returns, Stock Prices Marketing: Advertising, Sales Management: Quality Control
B.1 Describing Data Sets Graphically (Chapter 2) The simplest way to describe data is to use graphs. The following shows two types of graphs: frequency histogram and line graph. B.1.1 Relative Frequency Histogram The relative frequency histogram shows the proportions of the total set of data values that fall in various numerical intervals. & Example: Sale Prices The following data represent sale prices (in thousands of dollars) for a random sample of 25 residential properties sold. 66 89 71 109 42 Sort the data. 36 63 72 84 106 59 129 95 77 36 106 74 72 68 148 50 82 57 101 94 63 84 76 65 112
42 65 74 89 109
50 66 76 94 112
57 68 77 95 129
59 71 82 101 148
Organize the data and construct the following relative frequency distribution table. Class i 1 2 3 4 5 6 Sum Class Limits (30, 49) (50, 69) (70, 89) (90, 109) (110, 129) (130, 149) Freq. ( f i ) 2 7 8 5 2 1 25 Relative Frequency 2/25 =0.08 7/25 = 0.28 8/25 = 0.32 5/25 = 0.20 2/25 = 0.08 1/25 = 0.04 1 3
Relative Frequency 0.4 0.3 0.2 0.1 0 49.5 69.5 89.5 109.5 129.5 149.5 Sale Price
In this graph, 1. 2. 3. 4. The data are classified into 6 classes. Each class has the same width. The width is equal to 20. The graph shows the midpoints of these classes on the horizontal axis. The vertical bar shows the relative frequency of sale prices falling in each class interval.
Relative Frequency
Width =
Width =
148 36 = 18.67 20 . 6
O Exercise: The following data are year-to-day (YTD) returns for a sample of 30 mutual funds.
1 0.9 -0.7
3.8 -1 1
0.9 -4.3 3
Part I (Chapters 1 11) Sort the data as the following: -4.3 0.5 1.1 -4.2 0.6 1.4 -1.1 0.6 2.5 -1.1 0.6 2.7 -1 0.8 3 -0.7 0.9 3.4 -0.5 0.9 3.8 -0.1 0.9 5.1 0.2 1 5.5 0.5 1 9.6
Organize the data and construct the following relative frequency distribution table. Class i 1 2 3 4 5 6 Sum Draw a relative frequency histogram. Class Limits (-5.00, -2.51) (-2.50, -0.01) Freq. ( f i ) Relative Frequency
70
74
72
78
75.
Temperature 80 Temperature 75 70 65
Monday Tuesday Wedn. Thursday Friday
Date
IBM Stock Price 14000 12000 10000 8000 6000 4000 2000 0
80 82 84 86 88 90 92 94 96 98 00
Price
Date
02
The mean of a collection of n data values is the sum of the data values divided by n.
& Example: Calculate the mean of the following daily high temperatures:
70
74
72
78
75.
The mean is
70 + 74 + 72 + 78 + 75 = 73.8 . 5
Notation: Sum and Mean Suppose there is a collection of n data values. These values are represented by x1 , x 2 ,K , x n , . The sum of these values is denoted as
x
i =1
x
i =1
X =
x
i =1
. 7
5.
Population Mean, The mean of a population is denoted as . If the data values of x are represented by x1 , x 2 , K , x N , then the population variance is defined as
x
i =1
B.2.2 Median The median of a collection of data values is the data value in the middle position for sorted data. & Example: Calculate the median of the following daily high temperatures:
70
74
72
78
75.
B.3.1 Range The range of a collection of data values is the difference between the largest and the smallest values. & Example: Calculate the range of the following daily high temperatures:
70
74
72
78
75.
The range is 78 - 70 = 8.
& Example: Sale Prices for Residential Properties Calculate the range. The range is 148 - 36 = 112. O Exercise: YTD Returns Calculate the range. The range is
B.3.2 Variance and Standard Deviation The variance is used to measure the variation of the data values from its mean. The variance of a collection of data values is defined to be the average of the squares of the deviations of the data values about their mean. Sample Variance, s 2 The variance of a sample of n data values x1 , x2 , K, xn is defined as
(x
n 2 s = i =1
X)
n 1
& Example: Prices of product A Suppose the prices of product A in the past five months are
5.
xi
(deviation)2 (xi X )2 4 0 4 1 1 10
1 2 3 4 5 Sum
6 4 2 3 5 20
( x ) n (X ) =
2 i
n 1
& Example: Prices of product A Suppose the prices of product A in the past five months are
5.
10
Part I (Chapters 1 11) i 1 2 3 4 5 Sum The sample mean is X = The sample variance is s = 20 = 4. 5
2
xi 6 4 2 3 5 20
xi2
36 16 4 9 25 90
( x ) n (X ) =
2 i
n 1
90 5 4 = 2.5 . 4
2
O Exercise: Prices of product B Suppose the prices for product B in the past five months are
1.
3 7 5 4 1 20
11
& Example: Prices of Product A The sample standard deviation is s = 2.5 = 1.58 . O Exercise: Prices of Product B Calculate the sample standard deviation. The sample standard deviation is
Population Variance 2 and Population Standard Deviation The population variance is denoted as 2 . For a population with the data values of x1 , x 2 , K , x N and the mean , population variance is defined as
2=
(x
i=1
- )2
= 2 .
Note: Sample mean X , variance s 2 , and standard deviation s are sample statistics. Population mean , variance 2 , and standard deviation are population parameters.
Textbook Exercises: 3.12-3.14, 3.20-3.25, pages 59-60.
12
1 (xi X )3 skewness = n . 3 s When skewness has a large positive value indicates a long right tail. When skewness has a large negative value indicates a long left tail.
& Example: Sale Prices The data set has a positive skewness. Hence, the distribution has a long right tail.
Column1 Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 81 5.293707 76 #N/A 26.46853 700.5833 0.488226 0.658795 112 36 148 2025 25
B.4.2 Kurtosis The kurtosis is a measure of the thickness of the tails of its distribution (or relative frequency histogram) relative to those of a normal distribution. A normal distribution has a kurtosis of three. A kurtosis above three indicates Afat tails.@ The measure of sample kurtosis is defined as
1 (xi X )4 Kurtosis = n . 4 s 13
Excel: Descriptive Statistics (See Appendix) 1. Click on Tools. 2. Click on Data Analysis. (If Data Analysis is not on the list, click ATools@ and AAdd-Ins@. Check AAnalysis ToolPak@ to install the add-in from Microsoft Office CD.) 3. Select Descriptive Statistics; click OK. 4. Complete dialog box: Input range contains data; Output range contains the starting cell with descriptive statistics; select Summary statistics. Click OK.
14
Random Experiment: A random experiment is a process leading to two or more possible outcomes with uncertainty as to which outcome will occur. Random Variable: A random variable is a variable that takes on numerical values determined by the outcome of a random experiment. Usually, there are two usages of random variables. Random Variables for a Population: We can use a random variable to represent different possible data values for a population. This random variable has a probability distribution. & Example: The sale price can be represented by a variable X . Then different data values of sale price can also be represented by ( x1 , x2 ,K , x n ) . The population mean is denoted as X and the population standard deviation is X . Random Variables for Statistical Analysis: Some random variables have interesting probability distributions. These probability distributions are useful in statistical inference. & Example: The random variable Z has a standard normal distribution.
There are two types of random variables. One is discrete random variable and the other is continuous variable.
Discrete Random Variable: Continuous Random Variable:
A discrete random variable takes some countable number of values. A continuous random variable is a random variable taking values on a line interval.
& Example: Age in years - Discrete random variable Income - Discrete Prices - Discrete Temperature - Continuous Height - Continuous Growth rates - Continuous
15
a. b.
P(x ) = 1 .
P(x ) 0 .
& Example: New Products Suppose the number of new products introduced each year is a random variable X . The values and the probabilities of are
x P(x )
3 4 5 6
Mean and Standard Deviation The mean of a discrete random variable X is
x = x P(x ) .
The mean of x is also called the expected value of X ,
E ( ) = x P ( x ) .
16
P(x )
x P(x )
x x
( x x )2 ( x x ) 2 P ( x )
2.56 0.36 0.16 1.96 0.256 0.144 0.048 0.392 0.84
3 4 5 6 Sum
0 1 2 3
17
x x
( x x )2 ( x x )2 P ( x )
P(x )
x P(x )
2 x P( x )
9 16 25 36
2 2 2 2 x = ( x P ( x )) x = 22 4.6 = 0.84 .
18
x 0 1 2 3 Sum
2 The variance is X =
P(x )
x P(x )
2 x P( x )
Textbook Exercises: 5.1-5.8, pages 136, 137; 5.15-5.21, 5.25-5.29, pages 148-150.
C.3 Continuous Random Variable (Sections 6.1, 6.3, 8.3) The probability distribution of a random variable X can be denoted as f ( x ) . The probability distribution of X has the following properties: f (x ) 0 . a. b. Total area under f (x ) is one. c. The probability of x falling within an interval (a, b ) is denoted as P (a < x < b ) . It is the area under the curve f ( x ) between a and b.
One of the most commonly used continuous random variable is normal random variable.
C.3.1 Normal Distribution (Section 6.3) Normal Random Variable and Normal Probability Distribution A normal random variable with a normal probability distribution has the following properties: a. The probability distribution has a bell-shaped. b. The distribution is symmetric about its mean . c. The spread of the distribution is determined by the standard deviation . d. Any normal random variable X with mean and standard deviation can be standardized as a standard normal random variable.
Z=
. 19
Using Standard Normal Probability Distribution Table Case 1. Find P (0 < Z < a ) . & Example:
P (0 < Z < 1.2 ) = 0.3849 . P (0 < Z < 1.76 ) = 0.4608 .
O Exercise:
P (0 < Z < 1.64 ) = P (0 < Z < 1.96 ) =
Case 2. Find P (a < Z < 0 ) . & Example: P ( 1.2 < Z < 0 ) = 0.3849 . P ( 1.76 < Z < 0 ) = 0.4608 . P ( Z < 0 ) = 0 .5 . P ( Z > 0 ) = 0 .5 . O Exercise: P ( 1.28 < Z < 0 ) =
P ( 2.33 < Z < 0 ) =
20
Case 5. The probability P (0 < Z < a ) is given. Find the value of a. & Example: P (0 < Z < a ) = 30% . What is a ? From the table, a = 0.84 . O Exercise: P(0 < Z < a ) = 40% . What is a? Case 6. The probability P(Z > a ) is given. Find the value of a. & Example: P(Z > a ) = 5% , find a . The point a locates on the right-hand side of origin and P(0 < Z < a ) = 0.5 0.05 = 0.45 . With the given probability 0.45, we find a = 1.64 from the table.
21
Let X be a normal random variable with mean and variance 2 . Then random variable X is a standard normal random variable. Also, Z=
& Example: A company produces light bulbs whose life follows a normal distribution with mean 1,200 hours and standard deviation 250 hours. If we choose a light bulb at random, what is the probability that its lifetime will be between 900 and 1,300 hours?
Answers:
900 1200 X 1200 1300 1200 P(900 < X < 1300) = P < < 250 250 250 = P( 1.2 < Z < 0.4 ) = 0.3849 + 0.1554 = 0.5403 .
O Exercise: Anticipated consumer demand for a product next month can be represented by a normal random variable with mean 1,200 units and standard deviation 100 units.
a. b.
What is the probability that sales will be between 1,000 and 1,300 units? What is the probability that sales will exceed 1,100 units?
Answers:
Textbook Exercises: 6.19 abc, 6.20 abc, 6.21 abc, 6.22 abd, 6.23 ab, 6.24 abc, 6.25, 6.26, 6.27 a, 6.31 ab, 6.35 ab, 6.36a, 6.37 ab, pages 208-210.
22
Properties of 1. 2. 3. 4. 5.
t -distribution: Bell-shaped. Symmetrical about t = 0 . The probability distribution has tails that are more spread out than the standard normal distribution. The shape of probability distribution depends on a constant, the degrees of freedom (v). When v is large, t distribution is close to the standard normal distribution.
t Statistical Table The table shows the value of t , such that P(t > t ) = .
For = 0.01 , = 0.025 , and = 0.05 , the values of t for different v are
v t .05 t .025 t.01
5 10 15 20
23
24