Sie sind auf Seite 1von 10

Energy Vol. 13, No. 1, pp.

63-72, 1988
Printedin Great Britain. All rightsreserved

0360-5442/88$3.00+ 0.00
Copyright@ 1988PergamonJournalsLtd

CHARACTERIZATION
OF INSOLATION DATA FOR USE
IN PHOTOVOLTAIC SYSTEM ANALYSIS MODELS
S. RAHMAN,~

M. A.

KHALLAT

and Z. M.

SALAMEH

1Electrical Engineering Department,


Electrical

Virginia Polytechnic Institute, Blacksburg, VA 24061 and


Engineering Department, University of Lowell, Lowell, Massachusetts, U.S.A.
(Received 20 January 1987)

Abstract-A statistical technique to characterize insolation data for use in photovoltaic (PV) systems
is presented. We start by examining the frequency distribution of long-term insolation data. The
histogram is generated for observed insolation for a particular hour over a month for a number of
years. It is fitted to three distributions (Weibull, /? and log normal). Four goodness-of-fit criteria are
employed in checking the best fit. These are Chi-square, Kolmogorov-Smirnov,
Cramer-Von
Mises-Smimov,
and log-likelihood. SOLMET data from Sterling, Va, Raleigh-Durham, N.C. and
Miami, Fla are analyzed. It is found that the B distribution fits the long-term hourly global horizontal
insolation data best for these three southeastern U.S. locations.

1. INTRODUCTION

There are a number of photovoltaic (PV) systems analysis models that have been used for wide
range of design and economic analyses, and policy studies. These models provide a broad set of
capabilities to be used for characterizing PV technologies and systems. Some of these
capabilities include modeling of cell characteristics, module characteristics, array characteristics, orientation and geometric characteristics, power conditioning characteristics, plant level
characteristics, operation and maintenance characteristics, and site specific characteristics. One
factor that is missing from this long list of characteristics is the characterization of weather
data. It is known that the cell performance is strongly dependent on insolation level, ambient
temperature and wind speed (in order of importance). Existing PV system analysis models use
historical weather data (mostly only insolation level) to calculate the cell or system
performance in a deterministic way. These may be termed as point analysis models in that the
PV system output is a function of known values of weather parameters at a particular site and
time. Therefore,
these point analysis models would not be very useful in evaluating the
potential of PV systems as the expected weather variations over time cannot be reflected.
We attempt to close this gap by providing a technique to characterize the weather data for
use in PV system models. Of the three weather variables mentioned earlier, insolation level is
most closely related to the PV output with ambient temperature
and wind speed being
responsible for second-order
effects only. This paper deals with characterizing only the
insolation data for use in photovoltaic system design and analysis models. Of course, the wind
speed and ambient temperature data can also be characterized in the same manner.
The technique of characterizing the insolation level starts with examining the frequency
distribution of the data. This distribution provides clues concerning the probable distribution(s)
that may fit the data. Subsequently, the data are analyzed to check for the best fit among
several distributions. A number of goodness-of-fit tests are performed in the process of
choosing the best distribution. the distribution parameters are then calculated for the available
data. SOLMET data tapes for Sterling (Va), Raleigh-Durham (N.C.), and Miami (Fla) were
used to test our technique of insolation characterization.

2. BACKGROUND

INFORMATION

There are numerous publications on photovoltaic system models for evaluating electricity
generation. The great majority of these have a deterministic approach. Thus, these models
63

64

S. RAHMANet al.

generally use the known value for insolation, temperature and cell characteristics to determine
the PV power output. In a few cases, however, the probabilistic nature of the variability in
insolation levels have been considered in connection with PV performance evaluation models.
Several of these studies will be discussed here.
Harper and Percival use historical weather data to develop probability information for the
direct and global insolation for each hour of a typical day of each month. They determine the
number of times the reported insolations lie within various ranges of insolation bins and thus
calculate their probability of occurrence. Thus for a given hour of a day a range of insolation
values would be chosen and their corresponding probabilities would be found. This approach
may be adequate when the only consideration is to quantify the nature of irradiance at a given
site. If, however, one wants to characterize the insolation data for use in photovoltaic
performance analysis models, then the data have to be reduced into functions that can be
integrated into these models.
Ku et ala2 have suggested that it is desirable to represent PV generation data as a single
normal distribution to facilitate the merging of PV generation with electric system load. They
also report that their insolation data analysis indicates a bimodal normal distribution, one for
sunny periods and another for cloudy periods. However, they do not, show how to determine
whether long-term insolation data fit any other known (but perhaps more complex)
distribution. For utility-system production-cost
simulations, the PV generation data were
represented by daily averages.
The PVFORM system analysis program, 3 developed at the Sandia National Laboratory,
simulates the hourly performance of PV flat-plate system for a one year period. This code is
based on modeling the plane-of-array insolation, thermal, and power production functions of a
PV system. Typical Meteorological
Year (TMY) tape is used for insolation (both global
horizontal and direct normal), temperature
and windspeed data required by the model.
However, the TMY data are not designed to provide probabilistic information for the following
reasons.
There are 234 locations in the US for which TMY4 data are available. However, the TMY
data for only 26 locations were obtained from the rehabilitated SOLMET stations for which
insolation field measurements exist. Insolation data for the other 208 locations are synthesized
by using cloud-cover data and theoretical insolation values at solar noon under clear-sky
conditions. Furthermore,
as the TMY tapes are constituted from SOLMET data, the
and wind speed and direction data are scanned on a
barometric pressure, temperature,
month-by-month basis, and missing data are replaced by linear inte~olation.4
The process of developing the TMY involved selecting a typical meteorological month
(TMM) for each of the 12 calendar months from the 23 yr data (from SOLMET tapes) and
synthesizing the typical year. Thus TMY data may be adequate for comparing two PV systems
for the same site, but the details necessary for predicting PV generator performance will not be
available from there.
It is possible to get different probability dist~butions for insolation, temperature and wind
speed. Even for the same variable (e.g. insolation) the probability distribution parameters will
vary from one hour to another. Thus it is believed that, if long term field data were available
for a site, then the approach presented in this paper will give better prediction of a PV system
performance.
In the following section we discuss how the insolation data from various sites are analyzed to
determine probability distribution functions that can be utilized in PV performance analysis
models.
3. INSOLATION

HISTOGRAMS

In order to characterize the insolation data for various times during the day, long-term
observations of insolation levels for the same hour of the day (for a large number of days) are
examined. Histograms are plotted to check the type of variations encountered in the global
horizontal insolation data for the same hour on different days. In Figs. 1-6, six such histograms
are presented. These are due to insolation data for the same hour of the day in a particular

Photovokaic

100

200

300

400

Midpoint

500

600

700

system analysis models

60

600

65

420

180

Fig. 1

SO

240

Midpoint

isolation

300

360

420

480

insobtlon

Fig. 2

50

45
40
35
2

30

C
f

25

g
lk

20
15
IO
5

100

200

300

400

500

600

700

0
8cO

100

150

Midpomt insolation

200

Fig. 3

300

350 400

450

Fig. 4

60

60
I50

40

50

40

ii
$

250

Midpoint insolation

30

30

!!
IL

e
lL
20

IO

20
10

0
120

240

360

480

600

720

Midpoint insolation

Fig. 5

840

960

l-l
60

920

WO

240

300

360

Midpoint insolation

Fig. 6

420

480

66

S.

RA H M A N

et

al.

month for five years. Plots are shown for 12 noon and 4p.m. readings in March for Sterling
(Va), Raleigh-Durham (N.C.), and Miami (Fla). For example, Fig. 1 shows the histogram for
the 12 noon insolation for all days during the month of March for 5 yr for Sterling (Va).
These histograms are plotted such that the full spectrum of insolation data is represented.
For example in Fig. 3 we see the midpoint insolation of 200 W/m2 has a frequency of 9. That
implies there are 9 observations between 100 and 300 whose midpoint is 200. Accordingly the
midpoint insolation of 600 W/m2 represents the 15 data points between 500 and 700.
The insolation data are obtained from the SOLMET tapes for Sterling (Va), RaleighDurham (N.C.), and Miami (Fla). These reflect the average global horizontal radiation (in
W/m2) over the hour represented. It should be noted that we have shown the March data for
three stations as samples only. Data for other hours and months were also examined and found
to follow similar trends.
A look at these histograms clearly shows the variations of the insolation levels for the same
hour in different seasons in different locations. Shapes of these histograms, however, suggest
that their relationships can be exploited to fit some form of frequency distribution. This would
be an improvement over the usual approach of predicting the performance (and often life cycle
analysis) of photovoltaic systems by directly using the observed data for a set of hours. The
technique of fitting distributions is discussed in the following section.
4. FITTING

DISTRIBUTIONS

The appropriate probability distribution model needed to describe a random phenomenon,


like the insolation level, is difficult to discern. There are, however, occasions when the required
distribution is determined empirically. Alternatively an assumed distribution is accepted or
rejected on the basis of observed data. Histograms often provide clues for initially selecting
such distributions. This is the approach we have taken for fitting a distribution to observed
insolation data.
Histograms shown in Figs. l-6 suggest that insolation data, for the same hour over a number
of days, would have a form of bimodal probability distribution function. Furthermore,
it is
generally seen that, there would be a concentration of observations at the lower end of the
spectrum, and another concentration at the higher end of the spectrum. There are of course
seasonal and climatological factors that will determine the exact nature of the frequency
distributions.
An examination of the histograms in Figs. l-6 suggests that superposition of two known
probability distribution functions would closely replicate these frequency distributions. Accordingly, the insolation measurements are separated into two groups such that a known frequency
distribution would fit each group of data. This grouping is done visually by looking at the
histograms in Fig. 1. Sometimes it may be necessary to group some of the insolation
measurements (for the same hour) into three parts for a better fit. Then the distribution
functions are weighted according to break points. For example, if we determine that the first
l/3 of the insolation observation have a probability distribution function of F(x) and the other
2/3 have a distribution function of G(x), then the entire data set would be represented by
P{p 5 x} = [F(x)/31 + [2G(x)/3],

(1)

where
F(x) = P{x 2 m}

and

G(x) = P{m 2 x}.

Figures 7-10 represent the histograms for the SOLMET insolation data for Raleigh-Durham
after the two-part grouping. For example, Figs. 7 and 8 represent the distribution functions of
12 noon insolation data in March for Raleigh-Durham in two groups. Each distribution (e.g.
the data in Fig. 7) is then tested for the best fit as described in the following.
Once it is determined that a known distribution would fit the observed data, several
distributions are tested for the best fit. It is observed that superposition of two distribution
functions of the same type would closely replicate the frequency distributions of the insolation
data as shown by these histograms. The data is divided into two groups, each having a

Photovoltaic system analysis models

16
14
5
s
s
P
G

12
10
8
6
4
2
0

Midpomt

insolation

Midpoint tnsolation

Fig. 8

Fig. 7
50-

20

45

40

35

u"

30-

5
g

25-

20-

&
e
IL

10

100

Midpoint insolation

Midpoint msolation

Fig. 9

Fig. 10

unimodal distribution. Three most likely probability distributions are then tested for the best
fit. These are log normal, Weibull and p. The predicted and observed class frequencies are
then compared by using four measures of tit, These are Chi-square, Kolmogorov-Smirnov,
Cramer-Von
Mises-Smirnov
and log-likelihood.
All statistics of fit are computed using
estimated parameters of the distributions with the data grouped into classes.
Clearly this approach is different from the one using typical meteorological year (TMY) data
mentioned by Menicucci.3 The problem that may be encountered in using TMY data for PV
system performance analysis has already been discussed in Sec. 2.
In the following section we discuss briefly the distributions that have been considered for
fitting the insolation data.
4.1. Choice of distributions
Three distributions

chosen to fit the insolation data will now be discussed.


The probability density function is given by

Log normal distribution.

f (x)[ --&---I

(2)

exp[ -k(y)*],

where mean (p) = exp[k + (1/2)c], variance (a) = [exp(c) - l] X exp[2k + c]. This distribution has two parameters k and c.
Weibull distribution. The probability density function is given by
f(x)

= [(klc)(xIc)k-l]

exp[ - (x/c)],

O<X<~,

(3)

S. FMMAN

et al.

Fig. 11

where c = scale factor and k = shape factor. This is also a two-parameter distribution and has
been used to characterize also wind speed data. A sample Weibull distribution is shown in
Fig. 11.
p distribution. A probability distribution appropriate for a random variable where values are
bounded, between finite limits a and b, is the /3 distribution. Its density function is given by

f(x) = [l/B&,
where k and c are the parameters

c)] ]@ $-$+;_;;]

as&b,

(4)

of the distribution. Also


1
B(k, c) =
Xk-1( 1 - X)=--ldx,
I
B(k, c) = [&(c),T(k

The mean and variance of the /? distribution

+ c)].

in Eq. (4) are

@=a++-(b-n),
kc
OX2 = (k + C)2(k + c + 1) (b - a)29

when the values of the variate are limited between 0 and 1.0 (i.e. a = 0.0, b = 1.0) then Eq. (4)
can be called the standard /.I distribution. Figure 12 shows6 the standard /I density function with
different values of k and c. Since this distribution is bounded between two finite limits, it is
expected to be able to replicate the random nature of the insolation levels for any given hour
over a number of days.

Fig. 12

69

Photovoltaic system analysis models

The four goodness-of-fit criteria that have been utilized in determining


distribution of insolation data are briefly discussed in the Appendix.
5. RESULTS

AND

the best fit for the

DISCUSSION

Results obtained by applying the four goodness-of-fit criteria to determine the best
distribution fit (out of the three examined) for the grouped data are presented and discussed in
this section. Before fitting the distribution, the data are grouped into sets which have single
modes. As a sample we present the distribution fit for the March data obtained from the
Raleigh-Durham, NC SOLMET tape. Figures 9-10 show the two part grouping for the 4 p.m.
data during that month.
These two groups of data are separately fitted to three distributions, namely: log normal,
Weibull and /?. In Table 1, we present the D-statistics for the Kolmogorov-Smirnov
(K-S) test
for the three distributions tested for the group one data for all four hours. These D-statistics
are compared against the critical value of D for a 1% significance level (1~= 0.01) for the
sample sizes as shown. It is seen that for the three out of four hours the D-statistics for the beta
distribution are below the critical values. It, therefore passes the K-S tests at this significance
level. The sample size, and the fact that for the first group of data there are wide variations in
the level of insolation, may have contributed to the narrow margin in meeting the test. It is
clear, however, that beta distribution provides the best fit in comparison with Weibull or log
normal distributions, because the K-S tests are not successful for those distributions.
The information presented in Table 2 is similar to that of Table 1, except that group 2 data
has been used. Since our sample sizes are greater we have performed the K-S test for a 5%
significance level ((w= 0.05). It is observed that beta distribution best satisfies the K-S test for
all four hours.
In Table 3, we present the k and c parameters for j3 distribution for the group 1 data. the
number of observations and the maximum values of the observations at those hours are also
listed.
Table 1. Analysis of the K-S test for group 1 data.
No

Ti me .

Obs e r v a t i o ns

1000

106

D- St a t I St l C

Cr I t I Ca l

of

ho ur

Lo g no r ma l

We l bul l

Be t a

0 1583

02809

0. 2535

01922

02530

01524

Do

1300

94

0 1681

02776

1400

89

0 1728

02656

0. 2278

01382

1600

87

01748

02971

0. 2577

01675

Table 2. Analysis of the K-S test for group 2 data.


D- Sl a t ~s t ~c

Cr i t i c a l

Ti me .

No . o f

ho ur

Obs e r v a t i o ns

Da

Lo g no r ma l

10- w

204

00952

0 2 CO5

00816

0. 0700

1200

204

00952

0. 1931

00850

0. 0675

01101

00847

00471

00498

14 00

221

00915

02280

1600

223

00911

01617

We l bul l

Table 3. k and c Parameters for group 1 data.


Ti me ,
ho ur

No of
Obs e wa t l o ns

Max Va l ue
f wa t t s / s am

l o. w

106

359

0 7696

12321

1300

94

448

05341

08301

1 4 M)

89

386

05569

0 7305

1600

87

240

0 6280

08791

Be t a

70

S. RMM4t4etal.

Table 4. k and c Parameters for group 2 data.

1o: al

204

671

12: al

204

846

14: c a

221

793

16: oO

223

505

Table 5. Summary of multiple tests on 3 distributions


for group 1 data.
Ti me ,
Dislr

LO9

Normal

ho ur

K-S

Test

Goodness-of-lit

c-v-s

L- L

I O: 0 0

1 3 : MJ

1 4 : c Kl

16: oO

1 o : OO
We i bul l

13: oo
1 4 : Ou
16: 00

1 o : M)
13: oo
14: oo
16. 00

In Table 4, we present the k and c parameters for beta distribution for group 2 data. The
number of observations and the maximum values of those observations are also shown.
Once we have validated the beta distribution fit using the K-S test we would like to see how
do the log normal and Weibull distributions compare when several other goodness-of-fit tests
are applied. In Table 5, we have summarized the results of the distribution fit, for the group 1
data, in the order the test statistics came out, with 1 being the best. The goodness-of-fit tests
applied were: (i) Chi-square (CS), (ii) Kolmogorov-Smirnov
(K-S), (iii) Cramer-Von
Mises-Smirnov (C-V-S) and (iv) log likelihood (L-L). It is clear that the beta distribution
better represents the Raleigh-Durham insolation data than the other two distributions.
In Table 6, we have summarized the results of a similar exercise involving group 2 data. We
see that the p distribution represents the Raleigh-Durham insolation data better than the other
two distributions. There is one exception, however. For 16:OOhr, the K-S test indicates that
the Weibull distribution has a better D-statistic than the beta distribution. If we look at Table 2
we shall find that the D-statistics for Weibull and /? distributions at this hour are very close and
they both are well below the critical D-statistic shown for that hour. Thus, the beta distribution
fits the observed insolation data equally well.
We have presented our results based on March data for a number of years in RaleighDurham. Similar results have been seen for other months as well. We have also observed that,
for Sterling, Va and Miami, Fla /3 distribution best fits the hourly insolation data.
We conclude that, for these three stations in the southeastern
United States, the /3
distribution represents the long term hourly insolation data best. As is shown in Eq. (4), this is

Photovoltaic system analysis models

71

Table 6. Summary of multiple tests on 3 distributions


for group 2 data.
Ti me ,
Di s t r

T-

Go o dne s s a f - l i t Te s t

ho ur

c-s

K- S

c-v-s

L- L

1o: oa

LOCI

12al

Normal

14: w

16- 00

I O. 0 0

We i bul l

seta

1200

14. 00

16: W

I OM)

1200

1400

16. 00

a two-parameter distribution. The k and c parameters can easily be determined using statistical
packages. This distribution can then be used in photovoltaic performance analysis models.
Some examples may include PV system capacity factor calculation and energy credit analysis.
This technique is expected to better represent the insolation resource for a site than the current
practice of using TMY data, or actual observed data for only a few weeks or months to analyze
PV system performance.
Even though our data-characterization
technique has been exemplified with regard to
insolation, a similar treatment may be given with long-term temperature and wind-speed data
that are used in PV performance analysis models.
Acknowledgemenrs-All
research was performed at the Virginia Polytechnic
supported by the Carolina Power & Light Company.

institute.

A part of this work was

REFERENCES
1. J. R. Harper and C. D. Percival, Solar Electric Technologies: Methods of Electric Utility Value Analysis. 64 p.,
SERl/TR-214-1362, Solar Energy Research Institute, Golden, Colo. (May 1982).
2. W. S. Ku, N. E. Nour, T. M. Piascik, A. H. Firester, A. J. Stranix, and M. Zonis, Economic Evaluation of
Photovoltaic Generation Applications in a Large Electric Utility System, IEEE Trans. Power, Apparatus Systems
PAS-102, (8) (August 1983).
D. F. Menicucci, PVFORM-A
Photovoltaic System Analysis Program, draft report, Sandia Nafional Laboratory
(February 1985).
Typical Meteorological Year (TMY) Users Manual, National Climatic Center, Asheville, N.C. TD-9734 (April
1981).
R. E. Walpole and R. H. Myers, Probability and Statistics for Engineers and Scientists, 2nd ed. Macmillan, New
York (1978).
A. H.-S. Ang and W. H. Tang. Probability Concepts in Engineering Planning and Design, Vol. 1. Wiley, New York
(1976).
H. T. Schreuder, W. L. Haffey, E. W. Whitehorne, and B. J. Dare, Maximum Likelihood Estimation for Selected
Distributions (MLESD), School of Forest Resources, North Carolina State University, Technical Report No. 61
(1978).
APPENDIX
Goodness-of -Fit Statistics

Four goodness-of-fit statistics are checked for each distribution in order to choose parameters
density function that fit the available insolation data best. These statistics are given here.

for the probability

S. RAHuAN et al.

72
Al. Chi-square statistics

x2 = 5 (flP,i - nP,,)2/nP,,,
i=l

where n = sample size, POi= observed probability for class i, Pci= computed probability
classes the data is grouped into, K = n when the data are ungrouped, POi= l/n for all i.
A2. Kolmogorov-Smirnov

for class i, K = number of

d = maxi IS,, - IQ,

where SOi= observed cumulative probability in class i, & = computed cumulative probability in class i.
A3, Cramer-Von Mises-Smirnov
w2 = 5 (SOi- F,.)%P,,.
i=l
A4. Log likelihood
The log likelihood test takes different forms for different distributions as shown in the following.
Nornurl distribution.
In L = - 5 (1 + In 2n + In 0)
where u = standard deviation.
Weibull distribution.
1-(k-l)p+(k-l)lnC-In

( )I,
f

where p = mean and k, c are defined in Eq. (3).


/J distribution.
In L = -n[ In[f(k + c)/r(k)r(c)]
These parameters

- (k + c - 1) ln(b - a) + (k - 1) t$I POiIn@, - a) + (c - 1) z$I POiln(b - .x,)1.

have already been defined for Eq. (4).

Das könnte Ihnen auch gefallen