Sie sind auf Seite 1von 15

Maximum Likelihood-Like Estimators for

the Gamma Distribution

Zhi-Sheng YE and Nan CHEN

Department of Industrial & Systems Engineering

National University of Singapore, Singapore, 117576

Abstract

It is well-known that maximum likelihood (ML) estimators of the two parameters

in a Gamma distribution do not have closed forms. This poses difficulties in some ap-

plications such as real-time signal processing using low-grade processors. The Gamma

distribution is a special case of a generalized Gamma distribution. Surprisingly, two out

of the three likelihood equations of the generalized Gamma distribution can be used as

estimating equations for the Gamma distribution, based on which simple closed-form

estimators for the two Gamma parameters are available. Intuitively, performance of the

ML-like estimators should be close to the ML estimators. The study consolidates this

conjecture by establishing the asymptotic behaviours of the new estimators. In addi-

tion, the closed-forms enable bias-corrections to these estimators. The bias-correction

significantly improves the small-sample performance.

Keywords: Estimating equations; bias-correction; generalized Gamma distribution; asymp-

totic efficiency.

1
1 Introduction

The Gamma distribution is a two-parameter distribution with probability density function

(PDF)
xk1
fgam (x) = exp (x/) , x > 0, (1)
k (k)

where k > 0 is the shape parameter and > 0 is the scale parameter. Due to the moderate

skewness, the Gamma distribution is a useful model in many areas of statistics when the

normal distribution is not appropriate. For example, it is often used to model frailty and

random-effects. In queueing theory, the gamma distribution is often used as a distribution

for waiting times and service times (Whitt 2000). It is widely used in environmetrics such

as environmental monitoring and rainfall size (Bhaumik and Gibbons 2006; Krishnamoorthy

and Tian 2008). The Gamma distribution is a useful model for lifetime (Meeker and Escobar

1998, Chapter 5.2). It is also a popular model in signal processing (Vaseghi 2008), and

physical and biological sciences (e.g., Bhaumik et al. 2009).

The most popular method in the estimation of the two parameters in the Gamma dis-

tribution is the maximum likelihood (ML) method. Nevertheless, there are no closed-form

expressions for the ML estimators. This poses difficulties in real-time data/signal process-

ing using battery-constrained, memory and CPU deficient mobile hand-held devices (Song

2008). Although the moment estimators for the two Gamma parameters have closed-forms,

they are not efficient estimators under either small samples or large samples, see Figures 1,

2 and 3 below. In order to obtain simple yet efficient estimators for the Gamma parameters,

we need to think outside the box of the two conventional inference methods.

A model outside the box of the Gamma distribution is the generalized Gamma distribu-

2
tion. It is a useful extension of the Gamma distribution with PDF

xk1
exp (x/) , x > 0,
 
fgg (x) = k
(2)
(k)

where > 0 is a parameter and () is the Gamma function. This distribution proposed by

Stacy (1962) is a flexible model that contains the Gamma, Weibull and lognormal distribu-

tions as special cases. Many studies have focused on parameter inference for the generalized

Gamma distribution. See Lawless (1980) and Song (2008), among others. Inference in this

distribution is generally hard. The Gamma distribution is a special case of the generalized

Gamma when = 1. Surprisingly, two estimating equations for the Gamma distribution

can be obtained by first treating the Gamma-distributed data as if they are generalized

Gamma distributed and then obtaining the three likelihood equations based on the gener-

alized Gamma distribution. Estimators based on the two estimating equations have simple

closed forms. We show that both the small sample performance and the asymptotic effi-

ciency of the estimators are almost the same compared to the ML estimators counterpart.

In addition, the closed-forms enable bias-correction to these estimators, which significantly

improves the performance in terms of bias and mean squared errors (MSEs).

The paper is organized as follows. Section 2 derives the ML-like estimators for the Gamma

distribution by looking outside to the generalized Gamma distribution. Large sample prop-

erties of the new estimators are investigated in Section 3. Section 4 studies bias-correction

to the new estimators.

2 The New Estimators

Let X gam(k, ) and X1 , X2 , , Xn be n i.i.d. copies of X, where k and are parameters

of interest and need estimation. Obviously, X gg(k, , ) with = 1. For now, let us

3
pretend that X follows the above generalized Gamma distribution with unknown. Then

the log-likelihood function based on the observed X1 , X2 , , Xn is


n
1 X
(k 1) ln Xi (Xi /) .

lgg (k, , ) = ln k ln ln (k) +
n i=1

The likelihood equations are obtained by taking the partial derivatives of lgg with respect to

k, and , respectively:
n
X
0 = (k) ln + ln Xi , (3)
n i=1
n
1X
0 = k + (Xi /) , (4)
n i=1
n n
kX 1X
0 = 1/ + ln(Xi /) (Xi /) ln(Xi /), (5)
n i=1 n i=1

where () = d ln (x)/dx is the digamma function. Solving the above system of equations

gives the ML estimators of (k, , ). In particular, from (4), we can express as a function

of k and :
!1/
Xi
P
(k, ) = .
nk

Substitute the above display into (5) to give

Xi
P
n
k() = .
n Xi ln Xi ln Xi Xi
P P P

Now, return to the Gamma distribution. We already know that = 1. Use this fact in

the above two displays to obtain the ML-like estimators for k and as
P
n Xi
k = P P P , (6)
n Xi ln Xi ln Xi Xi

and
1  X X X 
= n X i ln X i ln X i Xi . (7)
n2

4
From the viewpoint of estimating equations, k and are obtained based on the two estimat-

ing equations (4) and (5), while the two estimating equations originate from the likelihood

equations of the generalized Gamma distribution.

Another common parametrization of the Gamma distribution is to replace by a rate

parameter = 1/. Under this parametrization, we can go through the above procedure

again to obtain an estimator for as

n2
= P P P , (8)
n Xi ln Xi ln Xi Xi

which is simply the inverse of . On the other hand, the estimator for k remains the same

as (6).

Since the two estimating equations for the Gamma parameters are essentially likelihood

equations of the generalized Gamma distribution, it is expected that the performance of the

proposed estimators should be similar to the ML estimators. In the next section, we show

that the asymptotic efficiency of the proposed estimators are almost the same compared with

the ML estimator counterparts.

3 Large Sample Properties

In this section, we first show that the new estimators are strongly consistent in Theorem

1. Then, the asymptotic normality is established and the asymptotic covariance matrix is

derived in Theorem 2.

Theorem 1 The estimators k, and given in (6), (7) and (8) are strong consistent

estimators of , k and , respectively.

Proof Given the n i.i.d. copies of X gam(k, ), let X, Y , Z be the empirical means

of X, ln X, X ln X, respectively. The mean of X is k. Based on the moment generating

5
function of ln X:
(k + z) z
Mln X (z) = , (9)
(k)

the mean of ln X is (k) + ln . To obtain E[X ln X], note that


xk ln x xk ln x
Z Z
(k + 1)
E[X ln X] = k
exp(x/)dx = exp(x/)dx.
0 (k) (k) 0 k+1 (k + 1)

The above formula implies

E[X ln X] = k[(k + 1) + ln ].

According to the strong law of large numbers,

(X, Y , Z) a.s. (k, (k) + ln , k[(k + 1) + ln ]).

Define two functions

g1 (x, y, z) = z xy, g2 (x, y, z) = x/(z xy).

Both g1 and g2 are continuous at (x, y, z) = (k, (k) + ln , k[(k + 1) + ln ]). An

application of the continuous-mapping theorem yields that

= g2 (X, Y , Z) a.s. k[(k + 1) (k)].

For the arguments on the right-hand side of the above display,

d d (k + 1) d
(k + 1) (k) = [ln (k + 1) ln (k)] = [ln ] = [ln k] = 1/k.
dt dt (k) dt

we have a.s. . By the continuous-mapping theorem again, k = g1 (X, Y , Z) a.s.

k. Since = 1/, its strong consistency is immediate based on the continuous-mapping

theorem.

6
Theorem 2 When n , the two estimators k and in (6) and (7) are asymptotically

normally distributed as

2
0 k [1 + k1 (1 + k)] k[1 + k1 (k + 1)]
n(k k, ) d N , . (10)
2
0 k[1 + k1 (k + 1)] [1 + k1 (k)]

Proof Continue with the proof in Theorem 1 and let X gam(k, ). Then E[X] = k

and E[X 2 ] = k2 + k 2 2 . Based on the moment generating function (9) of ln X, define two

quantities:

vk E[ln X] = (k) + ln ,

uk E[(ln X)2 ] = 1 (k) + 2 (k) + 2(k) ln + ln2 ,

where 1 () is the trigamma function equal to d(x)/dx. By making use of these two

quantities, we can have E[X ln X] = kvk+1 , E[(X ln X)2 ] = 2 k(k + 1)uk+2 , E[X ln2 X] =

kuk+2 , and E[X 2 ln X] = 2 k(k + 1)vk+2 . Based on the above expectations, we can show

after tedious calculations that


n[(X, Y , Z) (k, vk , kvk+1 )] d N (03 , ) ,

where 03 is a zero vector with 3 elements, and



2 k 2 k(1 + vk+1 )

= .

1 (k) k1 (k + 1) + vk+1

2 k(1 + vk+1 ) k1 (k + 1) + vk+1 2 k[(k + 1)uk+2 kvk+1
2
]

Because k = g1 (X, Y , Z) and = g2 (X, Y , Z), the partial derivatives of (g1 , g2 ) with re-

spected to the three arguments (x, y, z) and evaluated at (x, y, z) = (k, (k)+ln , k[(k+

1) + ln ]) are

g1 g1 g1 kvk+1
x y z
k 2
k
A = .

g2 g2 g2
x y z
vk k 1

7

An application of the delta method yields that n(k k, ) is normally distributed with

mean 02 and variance matrix AA0 . After tedious simplifications, we can show that

2
k [1 + k1 (1 + k)] k[1 + k1 (k + 1)]
AA0 = .
2
k[1 + k1 (k + 1)] [1 + k1 (k)]

Therefore, the theorem follows.

We compare the asymptotic efficiency of the new estimators, the ML estimators and the

moment estimators. ML estimators of k and have to be obtained by solving the likelihood

equations numerically. The moment estimators of k and are

( Xi )2 Xi2 ( Xi )2
P P P
n
km = P 2 P , m = P .
n X i ( Xi ) 2 ( Xi )

The asymptotic variance matrix, which is also the Cramer-Rao lower bound, for the ML

estimators of (k, ) is obtained by first deriving the Fisher information matrix and then

inverting it, which is given by



1 k
. (11)
k1 (k) 1

2
1 (k)

The asymptotic variance matrix for the moment estimators can be obtained through the

delta method. Figure 1 shows the asymptotic variances of the three different estimators for

k and . Because the variances of k and / do not depend on , we fixed = 1 and

vary k over the interval [0.1, 3], as shown Figure 1. The asymptotic variances of the moment

estimators are much higher than the others. On the other hand, the two variance curves

of the proposed estimators and the ML estimators are almost the same. Simulation in the

next section shows the same conclusion under small samples. Nevertheless, due to the simple

closed forms, the proposed estimators can be calibrated to yield smaller biases under small

samples, as shown in the next section.

8
25 35
new esimator
new esimator
MLE
MLE
30 moment
moment
20
25

asymptotic var of
asymptotic var of k

15 20

15
10

10

5
5

0 0
0 1 2 3 0 1 2 3
k k

(a) (b)

Figure 1: Asymptotic variances of the new estimators, ML estimators and moment estimators

under different values of k: The left panel is for k and the right panel is for .

9
4 Small Sample Properties

In this section, an unbiased estimator for the scale parameter is obtained by calibrating

the ML-like estimator . Unbiased estimators for the rate and the shape parameters are not

available. Nevertheless, we give a method to calibrate the corresponding ML-like estimators

by comparing the exact covariance and asymptotic covariance between the two estimators

and . A Monte Carlo simulation is used to show the good performance of the calibrated

estimators in terms of bias and MSEs.

4.1 Bias correction

Theorem 3 An unbiased estimator for the scale parameter is

n 1  X X X 
= = n Xi ln Xi ln Xi Xi .
n1 n(n 1)

While an unbiased estimator for 1/k is


P P P
1 =
n 1 n Xi ln Xi ln Xi Xi
kg k = P .
n1 (n 1) Xi

Proof First, express as


" n
#
1 X X
= 2 (n 1) Xi ln Xi Xi ln Xj .
n i=1 i6=j

Note that Xi are i.i.d. gam(k, ), and Xi and ln Xj are independent when i 6= j. According

to the proof in Theorem 1, E[X ln X] = k[(k + 1) + ln ], E[X] = k and E[ln X] =

(k) + ln , Direct calculation yields

1
E[] = {(n 1)nk[(k + 1) + ln ] n(n 1)k[(k) + ln ]} .
n2

Simplify the above display to give

n1
E[] = .
n
10
n
Therefore, an unbiased estimator for is = n1
.

On the other hand, note that k in (6) can be expressed as


P Xi
n
k = P Xi P Xi .
ln Xi ln Xi
P
n

This expression suggests that k is independent of the scale parameter . Based on the results
P
in Pitman (1937, Section 6), k is independent of i Xi . Therefore,
 P 
n Xi h X i
E =E n Xi E[k 1 ] = n2 kE[k 1 ].
k
P P P
But based on (6), the above display is equal to E[n Xi ln Xi ln Xi Xi ], which is

equal to n(n 1). Therefore, E[k 1 ] = n1 1


n
k . An unbiased estimator for k 1 is then
n
n1
k 1 .

Next, we will show that the estimator k can be calibrated to yield a smaller bias. First

note that
n1
cov(k, ) = E[k ] E[k]E[] = k E[k].
n

On the other hand, Theorem 2 suggests that the asymptotic covariance between k and is

Acov(k, ) = k[1 + k1 (k + 1)]/n.

Equate the previous two displays to yield

nk + k[1 + k1 (k + 1)]
E[k] = .
n1

If we expand 1 () as a Laurent series Abramowitz and Stegun (1972, Eqn. 6.4.12) and
n+2
keep the first term only, the right-hand side can be approximated by n1
k. Therefore, a

biased-corrected estimator for k can be


P
n1 n(n 1) Xi
k = k = P P P .
n+2 (n + 2) [n Xi ln Xi ln Xi Xi ]

11
2.5
MLE MLE MLE
new estimator new estimator new estimator
calibrated 0.6 calibrated 2 calibrated
moment moment moment
2

0.5
1.5

bias/rMSE of

bias/rMSE of
bias/rMSE of k

1.5
0.4

0.3 1
1

0.2

0.5 0.5
0.1

0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
k k k

(a) (b) (c)

Figure 2: Absolute values of the biases (thin lines) and the rMSEs (bold lines) of the new

estimators, the calibrated estimators, the ML estimators and the moment estimators when

the sample size is n = 20: The left panel is for k, the middle is for and the right for .

Similarly, by looking into the covariance and asymptotic covariance between and , a

biased-corrected estimator for the rate parameter can be obtained as

n1 n2 (n 1)
= = P P P .
n+2 (n + 2) [n Xi ln Xi ln Xi Xi ]

4.2 Simulation

A simulation is used to assess the performance of the proposed estimators and the effects of

calibration. Because the variance of k and the asymptotic variance of / are independent

of , we set = 1 in the simulation and vary k from 0.2 to 5. We consider two sample

sizes n = 20 and n = 50. The results under different sample sizes give the same conclusion.

Under each sample size, the absolute biases and root MSEs (rMSEs) of different estimators

of k, and are obtaiend based on 100,000 simulation replications.

The results are shown in Figures 2 and 3. According to the results, the performance

12
0.5 0.9
MLE MLE
1.2 new estimator new estimator
MLE 0.45 0.8
calibrated calibrated
new estimator moment
0.4 moment
1 calibrated 0.7
moment
0.35
0.6
0.8

bias/rMSE of

bias/rMSE of
bias/rMSE of k

0.3
0.5
0.25
0.6 0.4
0.2
0.3
0.4 0.15

0.2
0.1
0.2
0.05 0.1

0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
k k k

(a) (b) (c)

Figure 3: Absolute values of the biases (thin lines) and the rMSEs (bold lines) of the new

estimators, the calibrated estimators, the ML estimators and the moment estimators when

the sample size is n = 50: The left panel is for k, the middle is for and the right for .

of the proposed estimators k and , in terms of biases and rMSEs, is almost the same

compared with the ML estimators. The bias calibration to k, and significantly reduces

their biases and improves the performance of these estimators. On the other hand, the

moment estimators always have larger biases and rMSEs. It is interesting to observe that

the unbiased estimator has a larger rMSE compared with . This is because the weight

n/(n 1) used in the calibration of is larger than 1. The calibration decreases the bias

but increases the variance. The increase in the variance overtakes the decrease in the bias,

leading to an increase in the rMSE.

References

Abramowitz, M. and Stegun, I. A. (1972), Handbook of Mathematical Functions: With For-

mulas, Graphs, and Mathematical Tables, no. 55, Courier Dover Publications.

13
Bhaumik, D. K. and Gibbons, R. D. (2006), One-sided approximate prediction intervals for

at least p of m observations from a gamma population at each of r locations, Techno-

metrics, 48(1), 112119.

Bhaumik, D. K., Kapur, K., and Gibbons, R. D. (2009), Testing parameters of a gamma

distribution for small samples, Technometrics, 51(3), 326334.

Krishnamoorthy, K. and Tian, L. (2008), Inferences on the difference and ratio of the means

of two inverse Gaussian distributions, Journal of Statistical Planning and Inference, 138

(7), 20822089.

Lawless, J. F. (1980), Inference in the generalized gamma and log gamma distributions,

Technometrics, 22(3), 409419.

Meeker, W. Q. and Escobar, L. A. (1998), Statistical Methods for Reliability Data, John

Wiley & Sons.

Pitman, E. J. (1937), The closest estimates of statistical parameters, in Mathematical

Proceedings of the Cambridge Philosophical Society, Cambridge Univ Press, vol. 33, pp.

212222.

Song, K.-S. (2008), Globally convergent algorithms for estimating generalized gamma dis-

tributions in fast signal and image processing, IEEE Transactions on Image Processing,

17(8), 12331250.

Stacy, E. W. (1962), A generalization of the gamma distribution, The Annals of Mathe-

matical Statistics, 11871192.

Vaseghi, S. V. (2008), Advanced Digital Signal Processing and Noise Reduction, John Wiley

& Sons.

14
Whitt, W. (2000), The impact of a heavy-tailed service-time distribution upon the M/GI/s

waiting-time distribution, Queueing Systems, 36(1-3), 7187.

15

Das könnte Ihnen auch gefallen