Beruflich Dokumente
Kultur Dokumente
63-72, 1988
Printedin Great Britain. All rightsreserved
0360-5442/88$3.00+ 0.00
Copyright@ 1988PergamonJournalsLtd
CHARACTERIZATION
OF INSOLATION DATA FOR USE
IN PHOTOVOLTAIC SYSTEM ANALYSIS MODELS
S. RAHMAN,~
M. A.
KHALLAT
and Z. M.
SALAMEH
Abstract-A statistical technique to characterize insolation data for use in photovoltaic (PV) systems
is presented. We start by examining the frequency distribution of long-term insolation data. The
histogram is generated for observed insolation for a particular hour over a month for a number of
years. It is fitted to three distributions (Weibull, /? and log normal). Four goodness-of-fit criteria are
employed in checking the best fit. These are Chi-square, Kolmogorov-Smirnov,
Cramer-Von
Mises-Smimov,
and log-likelihood. SOLMET data from Sterling, Va, Raleigh-Durham, N.C. and
Miami, Fla are analyzed. It is found that the B distribution fits the long-term hourly global horizontal
insolation data best for these three southeastern U.S. locations.
1. INTRODUCTION
There are a number of photovoltaic (PV) systems analysis models that have been used for wide
range of design and economic analyses, and policy studies. These models provide a broad set of
capabilities to be used for characterizing PV technologies and systems. Some of these
capabilities include modeling of cell characteristics, module characteristics, array characteristics, orientation and geometric characteristics, power conditioning characteristics, plant level
characteristics, operation and maintenance characteristics, and site specific characteristics. One
factor that is missing from this long list of characteristics is the characterization of weather
data. It is known that the cell performance is strongly dependent on insolation level, ambient
temperature and wind speed (in order of importance). Existing PV system analysis models use
historical weather data (mostly only insolation level) to calculate the cell or system
performance in a deterministic way. These may be termed as point analysis models in that the
PV system output is a function of known values of weather parameters at a particular site and
time. Therefore,
these point analysis models would not be very useful in evaluating the
potential of PV systems as the expected weather variations over time cannot be reflected.
We attempt to close this gap by providing a technique to characterize the weather data for
use in PV system models. Of the three weather variables mentioned earlier, insolation level is
most closely related to the PV output with ambient temperature
and wind speed being
responsible for second-order
effects only. This paper deals with characterizing only the
insolation data for use in photovoltaic system design and analysis models. Of course, the wind
speed and ambient temperature data can also be characterized in the same manner.
The technique of characterizing the insolation level starts with examining the frequency
distribution of the data. This distribution provides clues concerning the probable distribution(s)
that may fit the data. Subsequently, the data are analyzed to check for the best fit among
several distributions. A number of goodness-of-fit tests are performed in the process of
choosing the best distribution. the distribution parameters are then calculated for the available
data. SOLMET data tapes for Sterling (Va), Raleigh-Durham (N.C.), and Miami (Fla) were
used to test our technique of insolation characterization.
2. BACKGROUND
INFORMATION
There are numerous publications on photovoltaic system models for evaluating electricity
generation. The great majority of these have a deterministic approach. Thus, these models
63
64
S. RAHMANet al.
generally use the known value for insolation, temperature and cell characteristics to determine
the PV power output. In a few cases, however, the probabilistic nature of the variability in
insolation levels have been considered in connection with PV performance evaluation models.
Several of these studies will be discussed here.
Harper and Percival use historical weather data to develop probability information for the
direct and global insolation for each hour of a typical day of each month. They determine the
number of times the reported insolations lie within various ranges of insolation bins and thus
calculate their probability of occurrence. Thus for a given hour of a day a range of insolation
values would be chosen and their corresponding probabilities would be found. This approach
may be adequate when the only consideration is to quantify the nature of irradiance at a given
site. If, however, one wants to characterize the insolation data for use in photovoltaic
performance analysis models, then the data have to be reduced into functions that can be
integrated into these models.
Ku et ala2 have suggested that it is desirable to represent PV generation data as a single
normal distribution to facilitate the merging of PV generation with electric system load. They
also report that their insolation data analysis indicates a bimodal normal distribution, one for
sunny periods and another for cloudy periods. However, they do not, show how to determine
whether long-term insolation data fit any other known (but perhaps more complex)
distribution. For utility-system production-cost
simulations, the PV generation data were
represented by daily averages.
The PVFORM system analysis program, 3 developed at the Sandia National Laboratory,
simulates the hourly performance of PV flat-plate system for a one year period. This code is
based on modeling the plane-of-array insolation, thermal, and power production functions of a
PV system. Typical Meteorological
Year (TMY) tape is used for insolation (both global
horizontal and direct normal), temperature
and windspeed data required by the model.
However, the TMY data are not designed to provide probabilistic information for the following
reasons.
There are 234 locations in the US for which TMY4 data are available. However, the TMY
data for only 26 locations were obtained from the rehabilitated SOLMET stations for which
insolation field measurements exist. Insolation data for the other 208 locations are synthesized
by using cloud-cover data and theoretical insolation values at solar noon under clear-sky
conditions. Furthermore,
as the TMY tapes are constituted from SOLMET data, the
and wind speed and direction data are scanned on a
barometric pressure, temperature,
month-by-month basis, and missing data are replaced by linear inte~olation.4
The process of developing the TMY involved selecting a typical meteorological month
(TMM) for each of the 12 calendar months from the 23 yr data (from SOLMET tapes) and
synthesizing the typical year. Thus TMY data may be adequate for comparing two PV systems
for the same site, but the details necessary for predicting PV generator performance will not be
available from there.
It is possible to get different probability dist~butions for insolation, temperature and wind
speed. Even for the same variable (e.g. insolation) the probability distribution parameters will
vary from one hour to another. Thus it is believed that, if long term field data were available
for a site, then the approach presented in this paper will give better prediction of a PV system
performance.
In the following section we discuss how the insolation data from various sites are analyzed to
determine probability distribution functions that can be utilized in PV performance analysis
models.
3. INSOLATION
HISTOGRAMS
In order to characterize the insolation data for various times during the day, long-term
observations of insolation levels for the same hour of the day (for a large number of days) are
examined. Histograms are plotted to check the type of variations encountered in the global
horizontal insolation data for the same hour on different days. In Figs. 1-6, six such histograms
are presented. These are due to insolation data for the same hour of the day in a particular
Photovokaic
100
200
300
400
Midpoint
500
600
700
60
600
65
420
180
Fig. 1
SO
240
Midpoint
isolation
300
360
420
480
insobtlon
Fig. 2
50
45
40
35
2
30
C
f
25
g
lk
20
15
IO
5
100
200
300
400
500
600
700
0
8cO
100
150
Midpomt insolation
200
Fig. 3
300
350 400
450
Fig. 4
60
60
I50
40
50
40
ii
$
250
Midpoint insolation
30
30
!!
IL
e
lL
20
IO
20
10
0
120
240
360
480
600
720
Midpoint insolation
Fig. 5
840
960
l-l
60
920
WO
240
300
360
Midpoint insolation
Fig. 6
420
480
66
S.
RA H M A N
et
al.
month for five years. Plots are shown for 12 noon and 4p.m. readings in March for Sterling
(Va), Raleigh-Durham (N.C.), and Miami (Fla). For example, Fig. 1 shows the histogram for
the 12 noon insolation for all days during the month of March for 5 yr for Sterling (Va).
These histograms are plotted such that the full spectrum of insolation data is represented.
For example in Fig. 3 we see the midpoint insolation of 200 W/m2 has a frequency of 9. That
implies there are 9 observations between 100 and 300 whose midpoint is 200. Accordingly the
midpoint insolation of 600 W/m2 represents the 15 data points between 500 and 700.
The insolation data are obtained from the SOLMET tapes for Sterling (Va), RaleighDurham (N.C.), and Miami (Fla). These reflect the average global horizontal radiation (in
W/m2) over the hour represented. It should be noted that we have shown the March data for
three stations as samples only. Data for other hours and months were also examined and found
to follow similar trends.
A look at these histograms clearly shows the variations of the insolation levels for the same
hour in different seasons in different locations. Shapes of these histograms, however, suggest
that their relationships can be exploited to fit some form of frequency distribution. This would
be an improvement over the usual approach of predicting the performance (and often life cycle
analysis) of photovoltaic systems by directly using the observed data for a set of hours. The
technique of fitting distributions is discussed in the following section.
4. FITTING
DISTRIBUTIONS
(1)
where
F(x) = P{x 2 m}
and
Figures 7-10 represent the histograms for the SOLMET insolation data for Raleigh-Durham
after the two-part grouping. For example, Figs. 7 and 8 represent the distribution functions of
12 noon insolation data in March for Raleigh-Durham in two groups. Each distribution (e.g.
the data in Fig. 7) is then tested for the best fit as described in the following.
Once it is determined that a known distribution would fit the observed data, several
distributions are tested for the best fit. It is observed that superposition of two distribution
functions of the same type would closely replicate the frequency distributions of the insolation
data as shown by these histograms. The data is divided into two groups, each having a
16
14
5
s
s
P
G
12
10
8
6
4
2
0
Midpomt
insolation
Midpoint tnsolation
Fig. 8
Fig. 7
50-
20
45
40
35
u"
30-
5
g
25-
20-
&
e
IL
10
100
Midpoint insolation
Midpoint msolation
Fig. 9
Fig. 10
unimodal distribution. Three most likely probability distributions are then tested for the best
fit. These are log normal, Weibull and p. The predicted and observed class frequencies are
then compared by using four measures of tit, These are Chi-square, Kolmogorov-Smirnov,
Cramer-Von
Mises-Smirnov
and log-likelihood.
All statistics of fit are computed using
estimated parameters of the distributions with the data grouped into classes.
Clearly this approach is different from the one using typical meteorological year (TMY) data
mentioned by Menicucci.3 The problem that may be encountered in using TMY data for PV
system performance analysis has already been discussed in Sec. 2.
In the following section we discuss briefly the distributions that have been considered for
fitting the insolation data.
4.1. Choice of distributions
Three distributions
f (x)[ --&---I
(2)
exp[ -k(y)*],
where mean (p) = exp[k + (1/2)c], variance (a) = [exp(c) - l] X exp[2k + c]. This distribution has two parameters k and c.
Weibull distribution. The probability density function is given by
f(x)
= [(klc)(xIc)k-l]
exp[ - (x/c)],
O<X<~,
(3)
S. FMMAN
et al.
Fig. 11
where c = scale factor and k = shape factor. This is also a two-parameter distribution and has
been used to characterize also wind speed data. A sample Weibull distribution is shown in
Fig. 11.
p distribution. A probability distribution appropriate for a random variable where values are
bounded, between finite limits a and b, is the /3 distribution. Its density function is given by
f(x) = [l/B&,
where k and c are the parameters
c)] ]@ $-$+;_;;]
as&b,
(4)
+ c)].
@=a++-(b-n),
kc
OX2 = (k + C)2(k + c + 1) (b - a)29
when the values of the variate are limited between 0 and 1.0 (i.e. a = 0.0, b = 1.0) then Eq. (4)
can be called the standard /.I distribution. Figure 12 shows6 the standard /I density function with
different values of k and c. Since this distribution is bounded between two finite limits, it is
expected to be able to replicate the random nature of the insolation levels for any given hour
over a number of days.
Fig. 12
69
AND
DISCUSSION
Results obtained by applying the four goodness-of-fit criteria to determine the best
distribution fit (out of the three examined) for the grouped data are presented and discussed in
this section. Before fitting the distribution, the data are grouped into sets which have single
modes. As a sample we present the distribution fit for the March data obtained from the
Raleigh-Durham, NC SOLMET tape. Figures 9-10 show the two part grouping for the 4 p.m.
data during that month.
These two groups of data are separately fitted to three distributions, namely: log normal,
Weibull and /?. In Table 1, we present the D-statistics for the Kolmogorov-Smirnov
(K-S) test
for the three distributions tested for the group one data for all four hours. These D-statistics
are compared against the critical value of D for a 1% significance level (1~= 0.01) for the
sample sizes as shown. It is seen that for the three out of four hours the D-statistics for the beta
distribution are below the critical values. It, therefore passes the K-S tests at this significance
level. The sample size, and the fact that for the first group of data there are wide variations in
the level of insolation, may have contributed to the narrow margin in meeting the test. It is
clear, however, that beta distribution provides the best fit in comparison with Weibull or log
normal distributions, because the K-S tests are not successful for those distributions.
The information presented in Table 2 is similar to that of Table 1, except that group 2 data
has been used. Since our sample sizes are greater we have performed the K-S test for a 5%
significance level ((w= 0.05). It is observed that beta distribution best satisfies the K-S test for
all four hours.
In Table 3, we present the k and c parameters for j3 distribution for the group 1 data. the
number of observations and the maximum values of the observations at those hours are also
listed.
Table 1. Analysis of the K-S test for group 1 data.
No
Ti me .
Obs e r v a t i o ns
1000
106
D- St a t I St l C
Cr I t I Ca l
of
ho ur
Lo g no r ma l
We l bul l
Be t a
0 1583
02809
0. 2535
01922
02530
01524
Do
1300
94
0 1681
02776
1400
89
0 1728
02656
0. 2278
01382
1600
87
01748
02971
0. 2577
01675
Cr i t i c a l
Ti me .
No . o f
ho ur
Obs e r v a t i o ns
Da
Lo g no r ma l
10- w
204
00952
0 2 CO5
00816
0. 0700
1200
204
00952
0. 1931
00850
0. 0675
01101
00847
00471
00498
14 00
221
00915
02280
1600
223
00911
01617
We l bul l
No of
Obs e wa t l o ns
Max Va l ue
f wa t t s / s am
l o. w
106
359
0 7696
12321
1300
94
448
05341
08301
1 4 M)
89
386
05569
0 7305
1600
87
240
0 6280
08791
Be t a
70
S. RMM4t4etal.
1o: al
204
671
12: al
204
846
14: c a
221
793
16: oO
223
505
LO9
Normal
ho ur
K-S
Test
Goodness-of-lit
c-v-s
L- L
I O: 0 0
1 3 : MJ
1 4 : c Kl
16: oO
1 o : OO
We i bul l
13: oo
1 4 : Ou
16: 00
1 o : M)
13: oo
14: oo
16. 00
In Table 4, we present the k and c parameters for beta distribution for group 2 data. The
number of observations and the maximum values of those observations are also shown.
Once we have validated the beta distribution fit using the K-S test we would like to see how
do the log normal and Weibull distributions compare when several other goodness-of-fit tests
are applied. In Table 5, we have summarized the results of the distribution fit, for the group 1
data, in the order the test statistics came out, with 1 being the best. The goodness-of-fit tests
applied were: (i) Chi-square (CS), (ii) Kolmogorov-Smirnov
(K-S), (iii) Cramer-Von
Mises-Smirnov (C-V-S) and (iv) log likelihood (L-L). It is clear that the beta distribution
better represents the Raleigh-Durham insolation data than the other two distributions.
In Table 6, we have summarized the results of a similar exercise involving group 2 data. We
see that the p distribution represents the Raleigh-Durham insolation data better than the other
two distributions. There is one exception, however. For 16:OOhr, the K-S test indicates that
the Weibull distribution has a better D-statistic than the beta distribution. If we look at Table 2
we shall find that the D-statistics for Weibull and /? distributions at this hour are very close and
they both are well below the critical D-statistic shown for that hour. Thus, the beta distribution
fits the observed insolation data equally well.
We have presented our results based on March data for a number of years in RaleighDurham. Similar results have been seen for other months as well. We have also observed that,
for Sterling, Va and Miami, Fla /3 distribution best fits the hourly insolation data.
We conclude that, for these three stations in the southeastern
United States, the /3
distribution represents the long term hourly insolation data best. As is shown in Eq. (4), this is
71
T-
Go o dne s s a f - l i t Te s t
ho ur
c-s
K- S
c-v-s
L- L
1o: oa
LOCI
12al
Normal
14: w
16- 00
I O. 0 0
We i bul l
seta
1200
14. 00
16: W
I OM)
1200
1400
16. 00
a two-parameter distribution. The k and c parameters can easily be determined using statistical
packages. This distribution can then be used in photovoltaic performance analysis models.
Some examples may include PV system capacity factor calculation and energy credit analysis.
This technique is expected to better represent the insolation resource for a site than the current
practice of using TMY data, or actual observed data for only a few weeks or months to analyze
PV system performance.
Even though our data-characterization
technique has been exemplified with regard to
insolation, a similar treatment may be given with long-term temperature and wind-speed data
that are used in PV performance analysis models.
Acknowledgemenrs-All
research was performed at the Virginia Polytechnic
supported by the Carolina Power & Light Company.
institute.
REFERENCES
1. J. R. Harper and C. D. Percival, Solar Electric Technologies: Methods of Electric Utility Value Analysis. 64 p.,
SERl/TR-214-1362, Solar Energy Research Institute, Golden, Colo. (May 1982).
2. W. S. Ku, N. E. Nour, T. M. Piascik, A. H. Firester, A. J. Stranix, and M. Zonis, Economic Evaluation of
Photovoltaic Generation Applications in a Large Electric Utility System, IEEE Trans. Power, Apparatus Systems
PAS-102, (8) (August 1983).
D. F. Menicucci, PVFORM-A
Photovoltaic System Analysis Program, draft report, Sandia Nafional Laboratory
(February 1985).
Typical Meteorological Year (TMY) Users Manual, National Climatic Center, Asheville, N.C. TD-9734 (April
1981).
R. E. Walpole and R. H. Myers, Probability and Statistics for Engineers and Scientists, 2nd ed. Macmillan, New
York (1978).
A. H.-S. Ang and W. H. Tang. Probability Concepts in Engineering Planning and Design, Vol. 1. Wiley, New York
(1976).
H. T. Schreuder, W. L. Haffey, E. W. Whitehorne, and B. J. Dare, Maximum Likelihood Estimation for Selected
Distributions (MLESD), School of Forest Resources, North Carolina State University, Technical Report No. 61
(1978).
APPENDIX
Goodness-of -Fit Statistics
Four goodness-of-fit statistics are checked for each distribution in order to choose parameters
density function that fit the available insolation data best. These statistics are given here.
S. RAHuAN et al.
72
Al. Chi-square statistics
x2 = 5 (flP,i - nP,,)2/nP,,,
i=l
where n = sample size, POi= observed probability for class i, Pci= computed probability
classes the data is grouped into, K = n when the data are ungrouped, POi= l/n for all i.
A2. Kolmogorov-Smirnov
where SOi= observed cumulative probability in class i, & = computed cumulative probability in class i.
A3, Cramer-Von Mises-Smirnov
w2 = 5 (SOi- F,.)%P,,.
i=l
A4. Log likelihood
The log likelihood test takes different forms for different distributions as shown in the following.
Nornurl distribution.
In L = - 5 (1 + In 2n + In 0)
where u = standard deviation.
Weibull distribution.
1-(k-l)p+(k-l)lnC-In
( )I,
f