Sie sind auf Seite 1von 17

Mile Karanfilov 2014

1. Experiments and events


Each realization of a set of conditions S is called an experiment. If a person cant affect the
outcome of an experiment then that experiment is called passive experiment. If a person
makes experiments with specific purpose then that experiments are called active
experiment.
Each result of experiments S is called an event of the experiment S. The events are denoted
with capital letters A, B, C
An experiment can be deterministic and non-deterministic. If the outcome of the
experiment is known in advance, then this experiment is deterministic experiment. In the
opposite case, if an experiment can result in different outcomes, even though it is repeated
in the same manner every time, its called non-deterministic or random experiment.
An event A of an experiment S is called a random event if the following two conditions are
satisfied:
1. Experiments S can be repeated in the same manner as often as we want;
2. The relative frequencies of the event A, in each of many series of experiments, are
approximately equal numbers. This means that if we make k series with n1, n2,, nk
experiments, then
1 () 2 ()
()

k
2

1
A certain event of an experiment is the event which appears in each realization of that
experiment. An impossible event of an experiment is the event which never appears in any
realization of that experiment.

2. Sample space
Definition:
i) Outcome or elementary event of an experiment is each logic result of an experiment
which cannot be split in other events. One and only one outcome appears in each
realization of the experiment.
ii) The set of all possible outcomes of a random experiment is called the sample space of
the experiment. The sample space is denoted as (the sample space can be finite,
countable infinite or infinite uncountable set.).

MILE KARANFILOV

3. Random events
Definition:
A random event is a subset of the sample space of a random experiment.
The intersection of the events A and B is an event which appears then and only then two
events A and B appear together.
If the events A and B cannot appear in the same realization of an experiment then they are
called mutually exclusive events. Their intersection is an impossible event, A B = .
The union of events A and B is an event which appears then and only then when at least
one of the two events appears.
The complement of an event A is an event which appears then and only then when the
event A does not appear. It is denoted by .
The difference between two events A and B is an event which appears then and only then
when the event A appears, but the event B does not.

Let A1, A2,, An are events such that =1 = . These sets are called exhaustive sets.

If A1, A2,, An are given events such that Ai Aj = , i j and =1 = then we say
that is presented as a union of mutually exclusive and exhaustive events.

4. Kolmogorov axioms
Definition:
The family is called a -algebra of subsets of , if the following three conditions are
satisfied:
.1. ;
.2. If A , then ;

.3. If Ai , i = 1, 2, then +
=1 ;
Theorem:
For each -algebra , the following properties are satisfied:
i) ;
ii) If Ai , i = 1, 2, then +
=1 ;
iii) If A, B , then A B , A B .
Proof:
i) Using the conditions .1 and .2 from the definition for -algebra, we obtain:
.2

, i.e., the impossible event is also a random event.


=
MILE KARANFILOV

ii) Let A1, A2, . Using that


+

+
=1 = =1

the following is obtained:


.2

.3

Ai , i = 1, 2, , i = 1, 2, +
=1
.2

+
=1 = =1

iii) If we put A1 = A, A2 = B, Ai = , i = 3, 4, in .3, then we have:

+
=1 = A B
On the other side, if we take A1 = A, A2 = B, Ai = , i = 3, 4, in statement ii) of this
theorem, then

+
=1 = A B
Definition:
Let be a -algebra of subsets of . A mapping P : , where is the set of real
numbers, is called a probability, if the following three conditions are satisfied:
P.1. P(A) 0, for each A ;
P.2. P() = 1;
P.3. If Ai , i = 1, 2, and Ai Aj = , for i j, then
+
P (=1
) = +
=1 ( )

Theorem:
The probability P has the following properties:
i) P() = 0;
ii) P (=1 ) = =1 ( )
iii) If A B, then P(A) P(B);
iv) For each A , 0 P(A) 1;
v) P() = 1 P(A);
vi) P(A B) = P(A) + P(B) P(AB).
Proof:
i) P.1 implies that P() 0. Now, we can present as:
=+++
From P.3, we obtain:
P() = P() + P() + P() + = P() + +
=1 ()
MILE KARANFILOV

Therefore, we obtain that +


=1 () = 0. Since P() is non-negative, the last sum will be
0 if and only if P() = 0.

ii) Let An+1 = An+2 = = . Using i) and P.3, we get that P (=1 ) = =1 ( ).
iii) Since A B, the event B can be presented as:
B = B = (A + )B = AB + B = A + B.
Now, using the property ii), we have:
P(B) = P(A) + P(B) P(A),
since P(B) 0.
iv) For each A, A . Now, from the property iii), we can conclude that
P() P(A) P(), i.e.,
0 P(A) 1.
v) Using the equality = A + A and the property ii), we obtain that
P() = P(A) + P(), i.e.,
1 = P(A) + P().
Therefore,
P(A) = 1 P().
vi) The event A B can be presented as:
A B = A + B.
Using ii), we obtain:
P(A B) = P(A) +P(B)
(1)
On the other side, using the equality
B = B = (A + )B = AB + B,
and ii), we have:
P(B) = P(AB) + P(B).
Therefore, P(B) = P(B) P(AB). By replacing the last expression in (1), we obtain that
P(A B) = P(A) + P(B) = P(A) + P(B) P(AB).

5. Discrete probability space


Let (, , P) be a given probability space. If the sample space is a discrete (finite or
countable infinite) set of outcomes, and = (), then (, , P) is called discrete
probability space.

MILE KARANFILOV

Classical definition of probability


Let = {E1, E2,, En} be a given sample space and all outcomes have the equal probability
1

of appearance, i.e., P(Ei) = , i = 1, 2,, n. If A is a random event which contains k

outcomes, then the probability of A is given by


P(A) =

6. Uncountable probability space


Geometrical probability
Let the sample space be an infinite uncountable set, but it allows a geometrical
interpretation and a given figure (denoted by ) has a finite measure (for example if is a
curve m() will be its length).
We propose that m() exists and it is finite. By definition, the random events are subsets
of . Since m() is finite, the measure of all random events will be finite. Then the
probability of a random event A is defined by
P(A) =

()

()

Here, we suppose that all outcomes have an equal probability of appearance. This means
that probability that a point of A will appear is proportional with m(A).

7. Conditional probability
Let (, , P) be a given probability space and B has a positive probability
P(B) 0.
The conditional probability of an event A given an event B, denoted as P(A|B), is defined by
P(A|B)

()
()

, A .

From the equalities


P(A|B) =

()
()

and P(B|A) =

()
()

we can express the probability of intersection of two events:


P(AB) = P(A)P(B|A) = P(B)P(A|B).

Theorem:
For arbitrary n events A1, A2,, An , the probability of their intersection is given by
P(A1A2A3 An1An) = P(A1)P(A2|A1)P(A3|A2A1) P(An|An1 . . .A2A1).
MILE KARANFILOV

This is known as the multiplication rule.

8. Independence of events
Let (, , P) be a given probability space and A, B . such that P(A) > 0 and P(B) > 0.
We say that the event A is independent of the event B, if
P(A|B) = P(A).

Theorem:
The events A and B are independent iff:
P(AB) = P(A)P(B).

Theorem:
If A and B are independent events, then the pairs: A and ; and B; and are
independent, also.
Proof:
If A and B are independent events, then P(AB) = P(A)P(B). Now, A = A = A(B + ) = AB
+ A, so the probability of A is given by:
P(A) = P(AB) + P(A).
Using the independence of A and B, we have:
P(A) = P(A)P(B) + P(A), so
P(A) = P(A)P(A)P(B) = P(A)(1P(B)) = P(A)P().
From the last equality we conclude that A and are independent events.
By symmetry we can conclude that and B are independent. The independence of and
is a consequence of the independence of the previous two pairs.

Definition:
The events A1, A2,, An are mutually independent, if for an arbitrary k (2 k n) and for
arbitrary selection of indexes i1 < i2 < < ik, the following equality is satisfied.
P(Ai1Ai2 Aik) = P(Ai1)P(Ai2) P(Aik).
Definition:
The events A1, A2,, An are pairwise independent, if for arbitrary i, j {1, 2,, n} (i j) Ai
and Aj are independent events, i.e.,
P(AiAj) = P(Ai)P(Aj).

MILE KARANFILOV

9. Total probability theorem


Total probability theorem
+
Let Hi , i = 1, 2, and HiHj = , for i j and =1
= . For probability of an arbitrary
event A , the following formula is satisfied
+
P(A) ==1
( ) (| )
Proof:
Since HiHj = we have that AHi AHj = . Using this, we obtain:
+
A = A = A(+
=1 ) = =1

Now,
+
+
P(A) = P(+
=1 ) = =1 ( )= =1 ( ) (| ).
The random events Hi , i = 1, 2, are called hypotheses.
Their probabilities P(Hi) are called prior probabilities.
Very often, we have to calculate the probabilities P(Hj |A), j = 1, 2, i.e., we have to
evaluate the probability of the hypotheses after the occurrence of the event A. These
conditional probabilities are called posterior probabilities. They can be calculated using
the Bayes theorem.
Bayes theorem
+
Let Hi , i = 1, 2, and HiHj = , for i j and =1
= .
The following formulas are satisfied

P(Hj |A) =

P(j) P(|j)
P()

P(j) P(|j)

= + ( )(| ), j = 1, 2,

=1

Proof:
P(Hj |A) =

( )
P()

P(j) P(|j)
P()

10. Bernoullis trials


A Bernoullis scheme of n experiments (or trials) is a sequence of n independent and equal
trials where appearance of only one event A is considered. Trials are independent if the
outcome from one trial has no effect on the outcome to be obtained from any other.

Pn(k) = P(Bk) = ( )pk qnk

MILE KARANFILOV

11. Random variables


Definition:
A random variable X defined on the probability space (, , P), is a function X: which
is -measurable, i.e.,
{E |X(E) < x} , for each x .
Definition:
A cumulative distribution function of a random variable X is a function F: defined
by
F(x) = P{E |X(E) < x} = P{X < x}, for each x .
Theorem:
If F(x) is a cumulative distribution function of a random variable X, then
P{a X < b} = F(b) F(a).
Proof:
The interval (, b) can be presented as the union of mutually exclusive sets on the
following way:
(, b) = (, a) + [a, b).
Therefore,
{X (, b)} = {X (, a)} + {X [a, b)},
P{X (, b)} = P{X (, a)} + P{X [a, b)},
P{X < b} = P{X < a} + P{X [a, b)},
i.e.,
F(b) = F(a) + P{X [a, b)}.
We obtain that P{a X < b} = F(b) F(a).
Theorem:
A cumulative distribution function of a random variable has the following properties:
F1) It is a non-decreasing function;
F2) lim () = 0,
lim () = 1

F3) It is a left-continuous function in each x0 .


Theorem:
Let X be a random variable and F(x) be its cumulative distribution function. If F(x) has a
discontinuity at x0 then P{X = x0} > 0. If F(x) is continuous function at x0, then P{X = x0} =
0.

MILE KARANFILOV

12. Discrete random variable


A discrete random variable X is a random variable with a discrete (finite or countable
infinite) range.
The range RX of a random variable X, together with suitable probabilities pi, i = 1, 2,
determines the probability mass function.
1 2
X: ( )
1
2

Cumulative distribution function:


1
,
1 ,
1 < 2
1 + 2 ,
2 < 3
F(x) =
1 + 2 + 3 ,
3 < 4 ,

1 + 2 + + 1 ,
1 <
{
1,
>
Relation between the probability mass function and the cumulative distribution function
p i = P{X = xi} = lim+ ( ) ( )

Indicator function/Bernoulli distribution


IA - number of appearances of the event A in one experiment.
It is clear that RIA = {0, 1}, since in only one experiment the event A can appear zero or one
time.
The random variable IA is called an indicator function of the event A or a random variable
with Bernoulli distribution.
Binomial distribution
X - number of appearances of the event A in n Bernoulli trials where p = P(A), and q = 1
p.
The range RX of the random variable X is RX = {0, 1, 2,, n}, and probabilities

pi = P{X = i} = Pn(i) = ( )pi qn-I, i RX.


The random variable X has a binomial distribution with parameters n and p. It is denoted
by X ~ B(n, p).
The Bernoulli distribution/indicator function is a special case of the binomial distribution
obtained for n = 1, i.e., B(1, p).
Discrete uniform distribution
Let X be a random variable which range RX = {x1, x2,, xn} is a finite set and all
1

probabilities are equal, i.e., pi = P{X = xi} = , i = 1, 2,, n.

MILE KARANFILOV

10

Then we say that X has uniform distribution on the set RX.


It is denoted by X ~ U({x1, x2,, xn}).
Hypergeometric distribution
Lets consider a set of n objects such that m of them have a given property A (m < n).
From the given set of n objects, we randomly choose k (k < n).
X - number of k chosen objects which have the property A.
Then RX = {0, 1,, r}, where r = min{m, k} and

(
)( )
pi = P{X = i} =
, i RX.
(
)

Geometric distribution
Lets consider a sequence of independent and equal trials until the event A appears. Let p
= P(A) and q = 1 p. Let X denote the number of trials in the previous described
sequence. The number of trials until the event A appears, can be 1 or 2 or 3 or
Therefore, RX = {1, 2, 3, }.
Ai : the event A appears in the i th trial, i = 1, 2, These events are independent and P(Ai) =
p, for each i = 1, 2, The event {X = i} will occur if the event A does not appear in the first
i1 trials (which means that the event A appears) and the event A appears in the last ith
trial. Therefore,
i1
1
1 )P(

pi = P{X = i} = P(
2
1 Ai) = P(
2 ) P(
1 )P(Ai) = q p, i RX.
A random variable X has a geometrical distribution with parameter p and denote it by
X ~ Geo(p).
Negative binomial distribution
Lets consider a sequence of independent and equal trials until the event A appears exactly
k times. And, p = P(A) and q = 1 p.
X - denotes the number of trials in the previous described sequence.
Therefore, RX = {k, k + 1, k + 2, } and
The event appears 1 times in the first
1
P{X = n} = P {
} = (1)(1 - p)n-k pk
th
1 trials and appears in the last trial
The random variable X has a negative binomial distribution with parameters k and p and
denote it by X ~ NB (k, p).
Poisson distribution
A random variable X has a Poisson distribution, if RX = {0, 1, 2 } and pi = P{X = i} =

. It is denoted by X ~ P(). X can be described as a number of independent events

occurring in a unit time.


MILE KARANFILOV

11

13. Continuous random variable


Let x, x + x (a, b) and x > 0. If there exists the limit

p(x) = lim

{ < +}

then p(x) is called a probability density function of the random variable X.


If a random variable X is given by its cumulative distribution function F(x), then the
suitable probability density function can be presented as

p(x) = lim

( + )()

= F(x)

If p(x) is an integrable function then from the previous equality we find that

F(x) = ()
Here, we suppose that p(x) = 0, for x a or x b.
Definition:
If there exists an integrable function p(x) such that the cumulative distribution function of
X can be presented as

F(x) = ()

(1)

then we say that X is a continuous random variable.


Theorem:
Let X be a continuous random variable given by its probability density function p(x), then:
i) p(x) 0, for each x ;
+

ii) () = 1;

iii) P{a < X < b} = ( )


Proof:
i) Since p(x) = F(x), and the cumulative distribution function F(x) is non-decreasing
function, we can conclude that F(x) 0,
for each x , i.e., p(x) 0, for each x .
ii) Using that lim () = 1 and equality (1), we find that
+

1 = lim () = lim () = ()
+

iii) For a continuous random variable X, the equality P{X = a} = 0 holds for each a .
Therefore,

P{a < X < b} = P{a X < b} = F(b) F(a) = ( ) - ( ) = ( )


MILE KARANFILOV

12

Uniform distribution on the interval (a, b)


A random variable X has an uniform distribution on the interval (a, b) (denoted by X ~
U(a, b)), if its vprobability density function is constant on this interval and equal to 0
outside of the interval, i.e.,
, (, )
p(x) = {
0, (, )
The constant C can be found using the following condition:
+

1 = ( ) = = C (b -a)
Therefore, C =

, so

1
,

p(x) = {

(, )
(, )

0,
The cumulative distribution function of X can be found with the following way:
0,


F(x) = { ,
<
>
1,
Exponential distribution
A random variable X has an exponential distribution with a parameter denoted by
X ~ (), if its probability density function is given by

0,
<0
0
,

p(x) = {

The cumulative distribution function of X is given by


F(x) = {

0,
1 ,

<0
0

Gamma distribution
A random variable X has gamma distribution with parameters and and denote X ~
( , ), if its probability density function is given by

0,
p(x) = { 1
( )
+

where ( ) = 0

<0
,
, 0

1 is a gamma function. If , then , () = ( 1)!.


MILE KARANFILOV

13

If we replace = 1 and = 1/ in the probability density function, then it becomes the


probability density function of exponential distribution. This means that the exponential

() distribution is a special case of the Gamma distribution which is obtained for = 1


and = 1/ .
Normal/Gaussian distribution
A random variable X has normal (or Gaussian) distribution with parameters a and 2, if its
probability density function is given by

p(x) =

1
2

(xa)2
2 2

where a and > 0 are given constants. We denote X ~ N (a, 2)


Cauchy distribution
A random variable X has a standard Cauchy distribution, if its probability density function
is given by

p(x) =

1
(1+ 2 )

14. Random vectors


Definition:
Let X1, X2,, Xn be random variables defined on the same probability space (, , P). The ntuple
X = (X1, X2,, Xn) is called a random vector or multidimensional random variable.
Multinomial distribution
Let n1 + n2 + + nk = N and ni be the number of appearances of the event A in N
experiments. Then
P{X1 = n1,X2 = n2,, Xk = nk} =

!
1 ! 2 ! k !

1 1 2 2

where p i = P(A i), i = 1, 2,, k. If n1 + n2 + + nk N, then


P{X1 = n1,X2 = n2,, Xk = nk} = 0.
We say the random vector (X1, X2,, Xk) has a multinomial distribution.
Definition:
A joint cumulative distribution function of a random vector (X, Y) is a function F : 2
defined by
F(x, y) = P({X < x} {Y < y}) = P{X < x, Y < y}, for all (x, y) 2.
MILE KARANFILOV

14

Theorem:
If F(x, y) is a joint cumulative distribution function of the two-dimensional random vector
(X, Y), then for a < b and c < d, the following equality is satisfied:
P{a X < b, c Y < d} = F(b, d) F(b, c) F(a, d) + F(a, c).
Proof:
P{a X < b, c Y < d} = P{X < b, Y < d} P{X < b, Y < c} - P{X < a, Y < d} + P{X < a, Y <
c} =
F(b, d) F(b, c) F(a, d) + F(a, c).
Properties of a joint cumulative distribution function
If F(x, y) is a joint cumulative distribution function of a random vector (X, Y), then F has
the following properties:
FR1) F(x, y) is a non-decreasing function for each of the variables x and y;
FR2) lim (, ) = 0,
lim (, ) = 0,
lim
(, ) = 1

(,)(+,+)

FR3) F(x, y) is a left-continuous function for each of the variables x and y.

15. Discrete random vector


Definition:
A random vector (X, Y) is discrete, if there exists a discrete (finite of countable infinite) set

R(X,Y) 2 such that P{(X, Y ) (X,Y)


} = 0.

Marginal distribution
The individual probability distribution of a random variable is called marginal probability
distribution.
FX(x) = P{X < x} = P{X < x, Y < +} = lim (, )
+

FY(y) = P{Y < y} = P{X < +, Y < y} = lim (, )


+

Conditional distribution

pX(xi | yj) = P({X = xi}|{Y = yj}) =

({X = xi },{Y = yj })
{Y = yj }

( , )
( )

, x i R X.

Definition:
The conditional probability mass function of X given {Y = yj} (where P{Y = yj} > 0) is
defined by

pX(xi | yj) = =

( , )
( )

, x i R X.
MILE KARANFILOV

15

Analogous, for Y

pY(yj | xi) =

( , )
( )

yj R Y

16. Continuous random vector


Definition:
If there exists an integrable function p(x, y), x, y such that the cumulative distribution
function of (X, Y) can be presented as

F(x, y) = (1 , 2 )2 1
then we say that (X, Y) is a continuous random vector.
Uniform distribution
A random vector (X, Y) has an uniform distribution in the region G 2, if
1

, (, )
0, (, )

p(x, y) = {()

where m(G) is the area of the region G. And wee denote (X, Y) ~ U(G).
Marginal distributions
+

pX(x) = (, )
+

pY(y) = (, )
Definition:
The conditional probability density function of X given {Y = y}, for pY (y) > 0, is defined by

pX(x|y) =

(,)
()

The conditional cumulative distribution function of X given {Y = y}, is defined by

FX(x|y) = ( |) , x

Theorem:
The random variables X and Y are independent if and only if the joint cumulative
distribution function of the vector (X, Y ) can be presented as a product of the marginal
cumulative distribution functions of X and Y , i.e.,
F(x, y) = FX(x)FY(y).
If (X, Y) is a continuous random vector given by its probability density function p(x, y),
then:
p(x, y) = pX(x)pY(y),
x, y .
MILE KARANFILOV

16

17. Functions of random variables


Theorem:
Let X be a continuous random variable which receives values from the interval (a, b) (it is
possible a = or b = + or both). If f is a strictly monotonic function and has a
continuous first derivative on (a, b), then the probability density function of Y = f (X) is
determined by:
pY (x) = pX(f1(x))|(f1)(x)|.
Standard normal distribution
Applying the transformation Y =

on the random variable X ~ N (a, 2), we obtain the

random variable Y ~ N(0, 1) and it is called standard normal distribution.


Creating a new random variable Y by this transformation of X is referred to as
standardizing.
Theorem:
Let X and Y be independent random variables and f, g : be two given functions.
Then the random variables f (X) and g (Y) are independent, too.

MILE KARANFILOV

17

Das könnte Ihnen auch gefallen