Beruflich Dokumente
Kultur Dokumente
P (θ ) = prior
θ of
P( D | θ ) = likelihood
P (θ | D) = posterior
P(D) = prior of D (acts as a normalizing constant of value
) ∫
P ( D | θ ) P (θ )dθ
Our selected application
• To do hypothesis testing given
observed data
• The expected value of the posterior
has to fall under 95% region
(credible interval) of the prior
distribution
• If true, then the hypothesis is
accepted, otherwise rejected
Posterior expectation
• An expected value of the posterior,
E θ [θ ] P ( |D )
• So, ∞
• E P (θ | D ) [θ ] = ∫−∞ θ * P(θ | D)dθ
• Using Bayes’ rule,
∞θ * P( D | θ ) * P(θ )
• =∫
−∞ P( D)
dθ
P( D)
• Now we have changed the distribution
from using the posterior to⇒the
E P (θ |D ) [...] prior
E P (θ ) [...]
• We assume that known sampling method
for the prior distribution exists
Hypothesis testing
• We do the testing to see if the
calculated expected value of the
posterior falls under 95% region of
prior distribution
value
∫−∞ P(θ )dθ < 0.95
• That is, to see if
•
•
Problems
• However, we still have to solve the
integrals appeared in the denominator,
P(D), and in the hypothesis testing
• Analytical method may not work because
closed-form solution may not be found
• Notice that we can convert back and forth
between the integrals and the
expectations
• However, how can we really solve either
an integral or an expected value
Solutions
• Monte Carlo integration (MCI) can be
used to approximate an
expectation/integral involving a
“random” process∑ f ( x )
N
i
E[ f ( x)] ≈
i =1
• N
(CPU)
∫−∞
P (θ )dθ < 0.95
• Each observationN ( µ~
,0.04) (normal
model) 23
• Likelihood ∏
=
i =1
N ( Di ; µ ,0.04) (observations
are independent)
•
• The 23 observations we’ve used are from the
Cavendish’s data:
5.36, 5.29, 5.58, 5.65, 5.57, 5.53,
• 262,140 0.041
524,280 0.080
•
1,048,560 0.159
•
2,097,120 0.317
•
4,194,240 0.631
•
8,388,480 1.261
• 16,776,960 2.523
• 33,553,920 5.076
• 67,107,840 10.368
• 134,215,680 20.516
• 268,431,360 40.332
• We show that the smallest block size can also be used with
the largest problem size (this would not be possible in
our previous work)
Further optimization:
Loop unrolling
(* parallel reduction in the reduce kernel *)
FOR s from num_samples/2 to 64 having s/=2
Sync threads (* make sure that all threads are working on the same
level of the tree *)
IF threadId is less than s THEN
Add s_data[threadId] to s_data[threadId + s]
END IF
END FOR
1>>>(…)
•
• Modified version:
kernel_reduce
<<<num_samples/num_threads,
num_threads>>>(…)
Effect of further
optimization
•
•
•
•
•
• Unfortunately, each introduced optimization
on parallel reduction seems to have a little
gain
• We find that this is due to the other hot spot
in the program that dominates the
computation (that is, the time spent on
Monte Carlo integration
(MCI)
• We want to integrate f in [a,b]
• b
I = ∫ f ( x)dx
• a