Beruflich Dokumente
Kultur Dokumente
Presentation
Items to covered in this
presentation include:
Mean
Median
Mode
Deviation
Variance
Standard Deviation
Sample Statistics
The variance of a sample is given
by:
1
1
2
2
s
( xi x )
n 1 i 1
n 1
N
1
x
i
n
i 1
n
N
1
2 ( xi ) 2
N i 1
The standard deviation = square
xi
i 1
n
n or n-1
In general, whenever a simple random sample is
used, one needs to increase the size of the variance
to compensate xfor the fact that
does not exactly
equal the true mean , hence n-1 is used as the
divisor rather than n.
The term degrees of freedom, is also used to
describe n-1.
x
Median
The median is another central measure; to find it:
Arrange all the samples from highest to lowest (or lowest
to highest)
If n is odd, the median is the value in the middle position
((n+1)/2)
If n is even, the sample median is the average of the
values in positions (n/2) and ((n+1)/2)
Quartiles
The trimmed mean reduces the effect of outliers by
calculating the mean of the data set when p% of the values at
each end (highest and lowest) are trimmed.
Quartiles divide the ranked data set into four equal-size
groups, as closely as possible. Arrange the data set in order.
Percentiles
The percentile value divides a data set so that p%
of the values are less than, and 100(1p)% are
greater than, it.
pth percentile = value at position p(n + 1)/100. Other
definitions exist; this one works well except at the
extremes.
Q1 = 25th percentile; median = 50th percentile; Q3 = 75th
percentile.
Summary Statistics
For a sample, the values calculated above are
known as statistics. For a population, the values
are called parameters.
We want to know the parameters of a population
but it is impractical/impossible to access the entire
population.
That is why we collect a sample and use the
descriptive statistics above to provide a cursory
assessment, or we use inferential statistical
methods to make estimates, test theories, or
formulate models of the parameters.
Data Plots
The most effective method
for reviewing data is
through graphical methods.
Engineers are visual
people and therefore require
charts to be transformed
to graphs.
Histogram
Histograms & frequency distributions
1. Choose boundary points for the class intervals (cells or
bins). Usually, intervals are the same width. Class limits
must not overlap.
2. Find the frequency (number of data values) in each
interval.
3. Calculate the relative frequencies (number of data values
total number in an interval).
4. If the class intervals are the same width, draw rectangles
with heights equal to the frequencies or relative
frequencies.
Histogram (example)
Skewed or Symmetric
A histogram is perfectly symmetric if its right
half is a mirror image of its left half.
Histograms that are not symmetric are referred to
as skewed.
A histogram with a long right tail is said to be
skewed to the right, or positively skewed.
A histogram with a long left tail is said to be
skewed to the left, or negatively skewed.
Box Plot
The box plot is a graphical display that simultaneously
describes several important features of a data set, such as
center, spread, departure from symmetry, and identification
of observations that lie unusually far from the bulk of the
data.
The basic boxplot presents the median, Q1, Q3, the maximum
and the minimum values of the data set.
The width of the box is the interquartile range (Q3-Q1). The
median is marked by a line in the box.
Draw lines from the box to the values that are closest to, but
within, a range of 1.5 IQR (called whiskers or fences).
- Lower whisker = Q1 1.5(Q3 Q1). Upper whisker = Q3 + 1.5(Q3
Q1)
Box Plot
Box Plot
Box Plots
Comparative (side-by-side)
boxplots
Example
Tensile tests for a set of alum. alloy yields the
following results: 15 30 51 20
17
19 20
32 17 15 23 19 15 18 16 22 29
15 13 15 ksi. Plot the data using stem and leaf,
histogram and box plot.
1 | 3 5 5 5 5 [Q1] 5 6 7 7 8 [Q2] 9 9
2 | 0 0 2 [Q3]3 9
3|02
4|
5|1
Median = 18.5
Mode = 15
Example
Time Series
Multivariate Data
Multivariate data occurs when each data
observation in the data set has two or more values
that are possibly related.
We can present bivariate data graphically using a
scatterplot or an x-y diagram.
From a scatterplot, we can assess the following
aspects of the possible relationship between the
two variables:
Direction: positive or negative
Strength: strong, medium, weak, no relationship
Linearity: linear or non-linear.
Multivariate Data
Multivariate Data
Covariance and correlationmeasure the
association or relatedness between variables for
x , s x , y , and s y .
bivariate (x,y) data.
Given bivariate (x,y) data, calculate
The sample covariance gives the direction of the
association but does not give an indication of the
comparative
strength:
n
n
n
n
1
S xy ( xi x )( yi y ) xi yi
n
i 1
i 1
Cov( x, y ) s xy
S xy
n 1
i
i 1
yi
i 1
Multivariate Data
The sample correlation coefficient, r, gives the direction (+/)
and the comparative strength of the association (1 rxy 1):
n
S xx ( xi x ) 2 (n 1) s x2
i 1
n
S yy ( yi y ) 2 (n 1) s 2y
i 1
Multivariate Data
Example
The following are results for
tensile strength and hardness
for a copper alloy. Is there a
correlation?
Tensile Str. Brinell Hardness
106.2
35.0
106.3
37.2
105.3
39.8
106.1
35.8
105.4
41.3
106.3
40.7
104.7
38.7
105.4
40.2
105.5
38.1
105.1
41.6
Example