Sie sind auf Seite 1von 2

1

ICONI370 Ixam 1 IormuIa Sheel



Fic!ds nI 5tatistics
Descrilive Slalislics Used lo describe
smaII sels of dala
InferenliaI Slalislics Uses samIe dala lo
gain insighl inlo Iarger ouIalion(s)

5amp!c vs. Pnpu!atinn
IouIalion aII eoIe/lhings of inleresl
o !"#"$%&%# NumericaI characlerislic
SamIe smaII random subsel of o.
o '&"&()&(* NumericaI characlerislic
Typcs nI data
QuaIilalive dala described using
vords/calegories
o nominaI: by name onIy i.e. no discernabIe
order lo grous
o ordinaI: grous have inherenl
order/sequence

Quanlilalive (inlervaI) dala described using -'s
(abIe lo be averaged sensibIy)
o discrele: oulcomes are *+,-&"./% (can onIy
lake on osilive inlegers)
o conlinuous: oulcomes are $%"),#"./% (can
lake on any vaIue)

Hicrarchy nI Data Vague lo secific (i.e. nominaI, ordinaI, and lhen quanlilalive)

5amp!ing Mcthnds Divided inlo good (robabiIily) and bad (nonrobabiIily) samIes
IrobabiIily samIes AII ilems in ouIalion have knovn robabiIily of being seIecled (nol
necessariIy equaI for aII ilems)

o simIe random samIe (SRS) gives each member of lhe o. an equaI chance of being
seIecled
o e.g. draving names from a hal
o assumes samIe seIeclions dravn vilh reIacemenl or ouIalion infinileIy Iarge
o usuaIIy no big deaI if samIe Iess lhan 5% of ouIalion
o characlerized by indeendence and Iack of bias (lhus, mosl referred melhod)

o syslemalic samIing samIe every 0
lh
observalion |is 'aroximaleIy' random samIej
o e.g. samIe every 5
lh
sludenl on a Iisl of IU sludenls
o samIe size - is seIecled and 0
lh
observalion delermined by
!
" =
#
$

o random slarling oinl is seIecled and every 0
lh
observalion seIecled unliI - ilems
seIecled


2

o slralified random samIe ouIalion is divided inlo Iike grous (caIIed slrala) and samIes
are lhen laken from each grou |is 'aroximaleIy' random samIej
o e.g. divide IU sludenls inlo freshmen, sohomores, |uniors, and seniors and lake samIe
from vilhin each grou
o besl if ouIalion can be divided inlo a fev slrala (vilhin vhich, eoIe/lhings are
quile simiIar)
o slrala musl be muluaIIy excIusive and coIIecliveIy exhauslive

o cIusler samIe Iisl of aII individuaIs cannol be oblained, bul Iisl of grous (nol necessariIy
simiIar) can be oblained, so samIe of grous is laken |is 'aroximaleIy' random samIej
o e.g. randomIy seIecl 4 dorms on IU camus and samIe aII sludenls vilhin lhose dorms
is laken
o if muIliIe ileralions erformed, caIIed a 'muIlislage cIusler samIe'

Non-random samIes Iroduce bias bad |NOT reresenlalive of ouIalionj
o |udgmenl samIe samIe is seIecled based on oinion of some 'exerl'
o e.g. an IU sludenl suggesls lhal aII sludenls in ICONI370 shouId be seIecled lo lake a
survey on oIilicaI reference

o convenience samIe samIes coIIecled based on ease/convenience
o e.g. samIing lhe firsl 50 sludenls vho go lo an IU baskelbaII game

Othcr impnrtant samp!ing tcrms

o samIing frame Iisl of aII ossibIe ilems in lhe ouIalion from vhich seIeclion is laken
o usuaIIy very coslIy/difficuIl lo gel
o census an alleml is made lo samIe lhe enlire ouIalion (usuaIIy sucks)

5ummarizing data Iooking al dala's dislribulion i.e. vhal vaIues are laken on

grahicaIIy
calegoricaI - bar charls (bars lo nol louch), ie charls, and Iarelo diagram
quanlilalive - hislograms (bars usuaIIy louch), slem Iols, and ogive

numericaIIy
cenler
mean - lhe 'average', lhe baIancing oinl
median - lhe middIe ordered vaIue, lhe 50
lh
ercenliIe Q
2
mode - lhe mosl frequenlIy occuring number(s)
sread
range Max ! Min
variance - reresenls lhe 'lyicaI' squared dislance from lhe mean, in squared unils
slandard devialion - reresenls lhe 'lyicaI' dislance from lhe mean, in unils of dala
IQR = Q
3
! Q
1
, vhere
Q
3
= 75
lh
ercenliIe, lhe median of lhe 'lo' haIf
Q
1
= 25
lh
ercenliIe, lhe median of lhe 'bollom' haIf

3
Mutua!!y Exc!usivc vs. Cn!!cctivc!y Exhaustivc

MuluaIIy IxcIusive each dala vaIue beIongs lo one (and onIy one) grou
CoIIecliveIy Ixhauslive lhere musl be a calegory lo vhich every dala oinl beIongs

Cnmputing 5ummary 5tatistics assumes uer/Iover cIasses have cIosed Iimils

SamIe
!
Slalislics
inference
"
IouIalion
!
Iaramelers
Mean/average "

"

Variance "

"

Slandard Devialion "

"

Coeff. of Varialion

"


Noles:

When comaring lvo dislribulions lo see vhich has more 'reIalive varialion' (i.e. more unslabIe,
crazier, elc), ve use lhe coefficienl of varialion i.e. Iarger CV shovs more reIalive varialion.
Mean and median are unique lo a dala sel (mode isn'l and mode is generaIIy unreIiabIe)
Sum of devialions from lhe mean 0 (vhich is vhy ve square lhem vhen finding variance)
OulIiers are (generaIIy) an observalion(s) more lhan 3 slandard devialions from lhe mean.

Cnmputing Pcrccnti!cs Gives Iocalion(s) of ercenliIes e.g. Q1, Q3, elc.
A ercenliIe aIvays references hov much dala is /%))1&2"-1il e.g. being in lhe 80
lh
ercenliIe of lesl
scores means 80% of eoIe did 3+#)% lhan you. Yay!
The !ncatinn of a ercenliIe is comuled as
!
"
#
= "$+ #$
#
#%%
e.g. ve use 4 25 for lhe 25
lh
ercenliIe.
If 54 is a vhoIe number, ve go lo lhal number in lhe ordered sequence. If il is NOT a vhoIe
number, il is belveen lvo numbers and ve use
!
"#$%& +"'%()*+"#",--%& ! "#$%&# lo find ils vaIue.

5ymmctry Discusses hov much dala is on eilher side of lhe mean
Righl (osiliveIy) skeved more dala al Iover vaIues
Lefl (negaliveIy) skeved more dala al higher vaIues

!
!
"
"
#
#
!
=
=
!
!
!
"
"
#
!

=
=
!
!
! "
# $
"
!
"
"
# #
$
!
=
!
=
!
"
!
! "
# $
!
"
"
#
!

!
=
"
=
#
!
! ! = +
!
! ! = +
!"#$%&
'(()
!
"#
$
= !
!"!#$%&'"(
)**+ !"
!

= "
4
The degree of skevedness is measured by Iearson's Second Skevedness Coefficienl
!
"#" ! $%&'()*
#

(furlher from 0 more skeved)

Noles:

o Second Skevedness Coefficienl usuaIIy belveen 3 (slrong -%6"&(7% skev) and +3 (slrong
4+)(&(7% skev)
o Iirsl Skevedness Coefficienl
!
"" ! #$%&'
#
(Iess referred) & bolh coefficienls are unilIess
5ummary 5tatistics Inr Grnupcd/Wcightcd Data

SamIe
!
Slalislics
"
IouIalion
!
Iaramelers
Mean/average "
!
"
#$%
=
&
'
(
'
!
(
'
!
"
!

"#$
=
%
&
'
&
!
(

Variance "
!
!
"#$
"
=
%
&
#'
&
"
"#$
$
"
#
(
"
!
"
#"$
"
=
%
&
#'
&
! (
#"$
$
"
"
)! %

Slandard Devialion "

"


Nole: In lhese comulalions, 8( midoinl of cIass. AIso, if given reIalive frequencies (i.e. ercenls),
ve use lhese reIalive frequencies for 9( and dilch lhe denominalor of lhe given formuIas.
Data Disp!ays grahicaI disIays of dala (deends uon if ve have quaIilalive or quanlilalive dala)

!
"#$%&'$'&()!*+$,-.!
/&)!0-$+'! 1 !2$3-!.%&3)!&.!4&55)+)6'!3$')*7+8!9+)%$'&()!37:,$+&.76.;
<$+!0-$+'! 1 !<$+.!.-7=!.),$+$')!3%$..).!75!7>.)+($'&76.
/$+)'7!0-$+'! 1 !<$+!3-$+'!=?!3$')*7+&).!&6!4)3+)$.&6*!5+)"#)638!9@A?BA;


!
"#$%&'&$&'()!*+$,-.!
/'.&0*+$1 23$+4-$+&!5-)+)!6$+.!7!06.)+($&'0%8.9!:!-)'*-&!7!;+)"#)%4<
=&)1!>?0&.! 2 !@+A)+)A!$++$<!.,?'&!#,!'%&0!.&)1.!$%A!?)$().!)B*! C D!7!CBD
@*'()! 2 !=-05.!-05!1$%<!06.)+($&'0%.!$+)!$&!0+!6)?05!$!,0'%&


Nole: Ior hislograms/slem Iols il's lyicaI for lhe Iover bound of a cIass lo be cIosed and lhe uer
bound lo be oen i.e. an observalion equaI lo lhe Iover bound is in lhe cIass bul an observalion equaI
lo lhe uer bound is in lhe -%:& cIass.


!
! ! = +
!
! ! = +
5
Disp!ay Tcrminn!ngy aIies lo bolh quanlilalive and quaIilalive disIays
Irequency simIy a counl of hov many limes a dala vaIue occurs in a dala sel
o sum of frequencies in a dala sel equaIs lhe samIe size (i.e. lolaI number of
observalions)
ReIalive Irequency same as frequency, onIy reorled as a 4%#*%-&1or 4#+4+#&(+-
o sum of reIalive frequencies equaIs 1
CumuIalive Irequency adds u frequencies as dala vaIues increase i.e. any cumuIalive
frequency leIIs us hov many dala vaIues are /%))1&2"-1+#1%;,"/1&+ lhe one ve're Iooking al.
o can aIso be comuled as cumuIalive reIalive frequency (reorled as a ercenl)
o highesl (Iasl) cumuIalive frequency equaIs lhe samIe size
o highesl (Iasl) cumuIalive #%/"&(7% frequency equaIs 1

Histngrams Shov differenl cIasses of observalions and lheir frequencies (or reIalive frequencies)

righl skeved
Frequency
8 6 4 2 0
14
12
10
8
6
4
2
0


mean > median > mode
beII shaed
Frequency
28 24 20 16 12
12
10
8
6
4
2
0


mean = median = mode
Iefl skeved
Frequency
10 8 6 4 2
9
8
7
6
5
4
3
2
1
0


mean < median < mode
reclanguIar / uniform
Frequency
9 8 7 6 5
12
10
8
6
4
2
0


mean = median = mode

5tcm p!nts (a.k.a. stcm-and-!caI p!nts): Shov individuaI observalions, nol cIasses
righl skeved

slem Ieaf
0 2578
1 36799
2 156
3 3
4 6
5 2
beII shaed

slem Ieaf
0 2
1 3678
2 134679
3 388
4 1
5

Iefl skeved

slem Ieaf
0 6
1 4
2 33
3 69
4 355789
5 24689
uniform

slem Ieaf
0 225677
1 367889
2 134679
3 3888
4 122377
5

Nole: Key is given vilh slem
Iol e.g.

Key: 5 2 $52

Nole: If slem Iol is NOT in order, order vaIues immedialeIy.

Chcbyshcv's Thcnrcm (any shae dislribulion)
al Ieasl
!
"!
"
"
#
"
#
$
%
&
'
$ of dala vaIues viII faII vilhin
0 slandard devialions of lhe mean
b/v < + 2= " al Ieasl 75%
b/v < + 3= " al Ieasl 88.89%, elc
Nole: If given ercenl (4), and asked for hov
many SD's (0), use formuIa
!
" =
"
"! #

6
Thc Empirica! Ru!c assumcs distributinn is symmctrica!/bc!!shapcd

vilhin one slandard devialion from mean i.e.
< 1= " arox. 68%

vilhin lvo slandard devialions from mean i.e.
< 2= " arox. 95%

vilhin lhree slandard devialions from mean
i.e. < 3= " arox. 99.7%
Cnvariancc vs. cnrrc!atinn anaIysis of reIalionshi belveen 2 quanlilalive variabIes
The samIe covariance ()>?) measures lhe slrenglh of lhe Iinear reIalionshi belveen >1and ?.
Since lhe covariance is unbounded and is affecled by lhe unils of > and ?, il's hard lo inlerrel
e.g. unils are lhe unils of bolh variabIes | < )>? s +j

IormuIas !
!
"
#$
=
"#
%
! ##"$
%
!$#
"
&!$
&
!
!
"#
=
""
$
"
"
#"#
$
"
#
#
#
%


Noles:
o If a covariance is osilive, lhe variabIes move in lhe )"$%1direclion. If a covariance is negalive,
lhe variabIes move in a @(99%#%-& direclion.
o If ve muIliIy/divide/add/sublracl aII vaIues by a conslanl, covariance is affecled by lhal
conslanl lo lhal degree e.g. if ve doubIe one variabIe's observalions, lhe covariance doubIes.
o The covariance belveen a variabIe and ilseIf is lhal variabIe's variance
The correIalion coefficienl is a slandardized covariance,
!
" =
#
$%
#
$
! #
%
&
!
! =
"
"#
"
"
#"
#
|1 s # s +1j
# measures lhe slrenglh of lhe Iinear reIalionshi belveen : and A, i.e. lhe cIoser # is lo 1 or +1 lhe
beller.

5cattcr P!nts used lo visuaIize lhe reIalionshi belveen > and ? i.e. vhal ve use if ve &2(-0
lhere mighl be a deendenl reIalionshi belveen lvo variabIes.

# 1
erfecl ()
Iinear reI.

# .50
moderale ()
Iinear reI.

# 0
no Iinear
reIalionshi

# +.85
slrong (+)
Iinear reI.

# +1
erfecl (+)
Iinear reI.
!
!
7
# measures lhe STRINGTH (slrong/veak)
and DIRICTION (+ or ) of lhe Iinear
reIalionshi belveen > and ?.
# musl have lhe same sign as lhe sIoe &
covariance
1 s # s +1
o |cIoser lo 1 or +1 lhe bellerj
CorreIalion does NOT imIy causalion! }usl
because > and ? are correIaled (reIaled), il does
NOT imIy lhal changes in > cause changes in ?.
evare of surious correIalions e.g. correIalion
belveen ice cream saIes and heal slroke.
# is NOT affecled by lhe unils of > and ?, nor is il
affecled by vhal ve caII > and vhal ve caII ?.

Thc '!cast squarcs' rcgrcssinn !inc allemls lo modeI reIalionshi belveen > and ?

A

:

error (aka residuaI)
error acluaI redicled
!
" = # !#"
The errors are verlicaI
dislances belveen lhe oinls
and lhe Iine.

The errors generaIIy sum lo 0.

The Ieasl squares regression
Iine MINIMIZIS lhe sum of
SQUARID errors i.e. ''B
!
"" !"#$
%
"

Irom aIgebra lhe equalion of a Iine
vas: A $: + .

We denole lhe Ieasl squares
regression Iine in slals as:

!
"" = #
#
+ #
$
$

.1 sIoe & .0 Ainlercel


Prcdicting va!ucs nI "!
In order lo make a 'rediclion' or 'eslimalion', simIy Iug in lhe requesled vaIue (hovever, do nol
'exlraoIale' i.e. Iug in siIIy vaIues for >)

Intcrprcting thc s!npc and intcrccpt
.1 sIoe " As > increases by 1 unil, ? is redicled lo increase/decrease by .1 sIoe.
.0 Ainlercel " When > 0, ? is redicled lo be .0. |do nol inlerrel unIess >1 0 is vilhin lhe
range of > and il makes sensej

Prnbabi!ity lhe sludy of chance i.e. hov IikeIy evenls are lo occur
CIassicaI IrobabiIily: (- of vays evenl can occur)/(lolaI - of oulcomes) # knovn robabiIily
ReIalive Irequency IrobabiIily: (- limes evenl occurs)/(- lriaIs) # eslimaled robabiIily
Sub|eclive IrobabiIily: Oinion or |udgmenl assessed # guessed robabiIily

Typcs nI Prnbabi!itics
MarginaI IrobabiIily IrobabiIily of |usl +-% oulcome e.g. robabiIily a randomIy seIecled home
has a ooI
}oinl IrobabiIily IrobabiIily of &3+ oulcomes occurring simuIlaneousIy e.g. robabiIily a
randomIy seIecled home has a ooI "-@ a garage
CondilionaI IrobabiIily IrobabiIily of an oulcome occurring, if anolher one is knovn lo have
occurred e.g. robabiIily a randomIy seIecled home has a ooI 6(7%- lhal il has a garage.

8
Basic ru!cs Inr Iinding prnbabi!itics

!
""#$ =
%!&'!()*+!,-,./!#!0).!1)22,.
/&/)3!%!&'!()*+
, vhere 0 s !(A) s 1 and robabiIilies musl sum lo 1
!(nol A) 1 !(A) # lhe 'ComIimenl RuIe' i.e. !(A
c
) 1 !(A)
!(A ! ) !(A or ) !(A) + !() !(A and ) # lhe 'Addilive RuIe'
!(A ! ) !(A and ) !(A) !() # lhe 'MuIliIicalive RuIe' for indeendenl evenls

!
""# $% =
""#!&'(!$%
""$%
, simiIarIy
!
""# $% =
""$!&'(!#%
""$%
# condilionaI robabiIilies

Indcpcndcnt Evcnts
Tvo evenls are indeendenl if lhe occurrence of one does nol affecl lhe robabiIily of lhe olher.
If indeendenl, !(A ! ) !(A) !() and
!
"" # $# = ""##
If asked hov many observalions ve vouId execl lo see if lvo variabIes are indeendenl, use
formuIa
!
"#$"%&"'!(!
)*+!&*&,-! ! !%*-./0!&*&,-
1),0'!&*&,-


Randnm Variab!c A variabIe vhich reresenls lhe numericaI oulcomes of a random henomenon.

There are lvo differenl lyes of random variabIes.

1. Discrctc RV's RV's assume a finile number of oulcomes. (oulcomes are counlabIe)
2. Cnntinunus RV's RV's assume an infinile number of oulcomes. (oulcomes are measurabIe)

Discrctc Randnm Variab!c Charactcristics

Mean Ixecled VaIue B(>)
!
= "! #""# "
#
$
%
&
& Variance C(>)
!
!
"
= #"" $
"
# ##"$
$
%
&
'
(


TransInrming Randnm Variab!cs

Transformalion of random variabIes is done via Ixeclalion & Variance Oeralors

Oeralion Mean (IV) Variance Slandard Devialion
MuIliIy RV by conslanl, " B(">) "B(>) C(">) "
2
C(>) 'D(">) "'D(>)1
}usl a conslanl, " B(") "1 C(") 0 'D(") 0
2 funclions of lhe same RV B("> + .>)
(" + .)B(>)
C("> + .>)
(" + .)
2
C(>)
'D("> + .>)
(" + .)'D(>)

Nole: SimIy ul, adding/sublracling/muIliIying/dividing by a conslanl affecls lhe mean. Hovever,
onIy muIliIicalion/division by a conslanl affecls variance & slandard devialion.

Das könnte Ihnen auch gefallen