Beruflich Dokumente
Kultur Dokumente
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/305354401
CITATIONS READS
0 95
5 authors, including:
Some of the authors of this publication are also working on these related projects:
Assessment of pickling impact on reducing emitter clogging in subsurface drip irrigation View project
All content following this page was uploaded by Azizallah Izady on 08 October 2017.
To cite this article: A. Mahabbati, A. Izady, M. Mousavi Baygi, K. Davary & S. M. Hasheminia
(2016): Daily soil temperature modeling using ‘panel-data’ concept, Journal of Applied Statistics,
DOI: 10.1080/02664763.2016.1214240
1. Introduction
Soil temperature is one of the most important meteorological factors in agricultural man-
agement [31], due to its great impact on plant growth. Temperature difference between
the soil and the atmosphere is the primary driving force for soil water evaporation. Opti-
mal temperatures are necessary for seed germination and normal growth of plants [23].
Furthermore, rate of most chemical reactions is affected by soil temperature profile [20].
However, soil temperature data in different depths is only available at meteorological sta-
tions, and all over the world, only few percent of the weather stations monitor it. Therefore,
there would be a great interest for modeling or predicting soil temperature profile and its
spatial variations.
There are several methods for predicting the soil temperature, such as: analytical models
[8,9,22], Fourier techniques [11,21], empirical equations (e.g. [30]), and artificial neural
networks (ANNs) [10,34]. Although analytical models are accurate due to proven mathe-
matical and physical background, they are inapplicable for practical purposes because of
the size of the model and a lot of assumptions [35]. The problem with Fourier transform
method is that its coefficients are just suitable for a particular site, which means they are not
practical for simulations over many different sites [40]. ANN are greatly suited for dynamic
nonlinear system modeling. However, these models tend to be used when understanding of
the system is inadequate, and obtaining accurate predictions is more important than con-
ceptualizing the actual physics of the system [7]. Although empirical models are simple
and easy to use, they require large data bases from which to develop empirical coefficients
for each specific site [27].
To look for a new method improving the modeling capabilities in this field, the main
objective of this study was to investigate the possibility of ‘Panel Data’ concept [2,15] which
seems to have the potential to predict soil temperature variability both spatially and tempo-
rally. Despite the vast application of the Panel-Data modeling in economies [13,16,24,28],
its application in the field of environmental sciences is very young being initiated with the
study of Izady et al. [17], who developed a Panel-Data-based model for predicting tem-
poral fluctuations and spatial variations of groundwater level. The term ‘panel-data’ refers
to the pooling of observations on a cross-section over several time periods. This can be
achieved by surveying a number of observation sites or stations and following them over
time. On the other hand, panel-data analysis endows regression analysis with both spa-
tial and temporal dimensions. The spatial dimension pertains to a set of cross-sectional
units of observations. The temporal dimension pertains to periodic observations of a set of
variables characterizing these cross-sectional units over a particular time-span. The terms
‘spatial’ and ‘cross-sectional’ are used here in the sense of data, and not in the sense of
physical landforms.
have a good contact with the surrounding soil. The data were divided into two sub-sets for
parameter training over the period of 2001 to 2008, and validation for the year of 2009. For
more details about annual soil temperature data refer to Electronic Supplementary Material
(ESM).
4 A. MAHABBATI ET AL.
Soil temperature was measured three times daily (9:00, 12:00 and 15:00 GTM), and
the average value was used for model development. Average daily air temperatures were
obtained by calculating the average amount of minimum and maximum air temperatures
for each day. The panel-data models were developed using air temperatures for one day and
one week before in order to predict the average soil temperature of the next day. To capture
the short and long-term effects of weather, average temperature of the day before (Td−1 )
and the average air temperature of the past week (T w ) were used. For more details about
air temperatures and rainfall amounts refer to ESM. The use of these parameters has been
widely reported in the literature for soil temperature prediction [3,30,40]. This relation can
be formulated in a panel-data model, as follows:
where T s is the average soil temperature at any Julian day (°C), α is the general intercept
(°C), Td−1 is the daily air temperature (°C) at Julian day d−1, T w is the air temperature
of the past week (°C), Rd−1 is the daily rainfall (mm) at Julian day d−1, μi and λt are
unobservable individual and time effects, respectively, and β 1 , β 2 and β 3 are coefficients of
independent variables.
In addition, classic linear regression (CLR) models were adopted to predict soil temper-
ature using the same data. In these models, the relationship between the dependent and
independent variables can be formulated as follows:
where i and t denotes the cross-section and time-series dimension, respectively, N is the
number of cross-sections, T is the length of the time-series for each cross-section, y is
a dependent-variable vector, X is an independent variable matrix, α is a scalar, β is the
coefficient of the independent-variable matrix, and u is the error component in the model.
The performance of any estimation procedure for the model regression parameters
depends on the statistical characteristics of the error components in the model. The panel-
data procedure estimates the regression parameters in the preceding model under several
common error structures. These error structures consist of one and two-way fixed and
JOURNAL OF APPLIED STATISTICS 5
where µi denotes the unobservable individual-specific effect and ν it denotes the remainder
disturbance. Note that µi is time-invariant and it accounts for any individual-specific effect
that is not included in the regression. The remainder disturbance ν it varies with individuals
and time and can be thought of as the usual disturbance in the regression. Similarly, the
specifications for the two-way model are:
In order to obtain the generalized least square (GLS) estimator [5] of the regression
coefficients, the −1 is required. This is a huge matrix for typical panels and is of dimen-
sion (NT + NT). After calculating −1 using the method of Wansbeek and Kapteyn
[36,37], GLS can be used as a weighted least-squares estimator to obtain coefficients for
Equation (6).
where Zλ is the matrix of time dummies that may be included in the regression to esti-
mate the λt , if they are fixed parameters and I NT is an identity matrix of dimension NT. In
order to obtain the GLS estimator of the regression coefficients, the −1 is required. After
calculating −1 using a method developed by Hsiao [15], GLS can be used as a weighted
least-squares estimator to obtain coefficients.
where R20 is the sum square error (SSE) of the pooled model and R21 is the SSE of the fixed
effects model. If F is larger than a critical (tabulated) value, then the null hypothesis is
rejected. It reveals the existence of fixed effects between unobservable individual-specific
effects and regressors. After understanding the existent effect between individuals, it is
necessary to find whether there are any random effects between individuals. With regard
to this objective, different tests are proposed.
JOURNAL OF APPLIED STATISTICS 7
For the random two-way error-component model, Breusch and Pagan [4] suggested the
Lagrange multiplier (LM) test. The assumptions are as follows:
H 0 : No random effects (the pooled model) (σμ2 = σλ2 = 0)
H 1 : Random effects exist (σμ2 0 and σλ2 0)
The LM test statistic is given by:
nT ũ (IN ⊗ JT )ũ 2
LM = LM1 + LM2 = 1−
2(T − 1) ũ ũ
nT ũ (JN ⊗ IT )ũ 2
+ 1−1− , (9)
2(N − 1) ũ ũ
where ũ is the SSE of the pooled model and J is a matrix of ones of dimension T or N.
LM is asymptotically distributed as a χ 2 . If LM is larger than the critical value, then the
null hypothesis is rejected. It means that there are random effects between unobservable
individual-specific effects and regressors.
The Hausman specification test [14] is another classical test of whether the fixed or ran-
dom effects model should be used. The main question here is whether there is significant
correlation between the unobserved individual-specific random effects and the regressors.
If there is no such correlation, then the random effects model may be more powerful. If
there is such a correlation, the random effects model would be inconsistently estimated,
and the fixed effects model would be the model of choice, which is as follows:
H 0 : E(Xit μi ) = 0 → No correlation; random effects are consistent and efficient
H 1 : E(Xit μi ) = 0 → Correlation exists; fixed effects are consistent
Hence, the Hausman test statistic is given by:
The statistic m is asymptotically distributed as χk2 where k denotes the number of regres-
sors. If m is larger than the critical value, then the null hypothesis is rejected and the fixed
effects model is selected. To implement the theory and to estimate or analyze panel-data
models, StataSE software version 10 was used.
In summary, panel-data analysis is a method of studying a particular subject within
multiple sites, periodically observed over a defined time frame. Moreover, with spatial
observations and enough cross-sections, panel-data analysis permits the researcher to
study the dynamics of change with time-series [17].
Different criteria were used in order to evaluate the effectiveness of the model and its
ability to make proper predictions, as well as to compare the two models. These included
coefficient of determination (R2 ), root mean square error (RMSE), mean biased error
(MBE), relative error (RE) and Akaike information criterion (AIC). The R2 , RMSE and
RE are well known and only the MBE and AIC coefficient are defined here:
n
(xi − yi )
MBE = i=1 , (11)
n
where x and y are measured and estimated temperatures, respectively, and n is the number
of observations.
AIC = 2k + n log(RSS/n), (12)
8 A. MAHABBATI ET AL.
where K is the number of model parameters and n is the sample size and RSS is the Residual
Sum of Squares.
Table 2. Number of observations, performance indices and parameters for winter in different depths.
R2 RMSE AIC
Depth Number of observations PD CLR PD CLR PD CLR
5 cm 6300 0.97 0.84 1.01 2.38 10.6 12.9
10 cm 6300 0.97 0.87 0.85 1.96 10.2 12.5
20 cm 6300 0.97 0.88 0.81 1.72 9.9 12.4
30 cm 6300 0.97 0.87 0.77 1.60 9.8 12.3
50 cm 6300 0.97 0.81 0.64 1.57 8.0 9.9
100 cm 6300 0.96 0.50 0.53 1.78 7.7 10.4
Table 3. Number of observations, performance indices and parameters for the spring in different
depths.
R2 RMSE AIC
Depth Number of observations PD CLR PD CLR PD CLR
5 cm 6370 0.97 0.91 1.29 2.97 11.0 13.3
10 cm 6370 0.97 0.92 1.01 2.48 10.7 12.9
20 cm 6370 0.98 0.91 0.98 2.30 10.5 12.8
30 cm 6370 0.98 0.90 0.92 2.29 10.4 12.8
50 cm 6370 0.98 0.85 0.82 2.33 8.6 10.1
100 cm 6370 0.98 0.79 0.68 2.38 8.5 10.4
JOURNAL OF APPLIED STATISTICS 9
Table 4. Number of observations, performance indices and parameters for summer in different depths.
R2 RMSE AIC
Depth Number of observations PD CLR PD CLR PD CLR
5 cm 6440 0.94 0.84 1.12 2.08 10.3 13.1
10 cm 6440 0.96 0.85 0.85 1.78 10.0 12.5
20 cm 6440 0.96 0.81 0.73 1.69 9.9 11.9
30 cm 6440 0.97 0.75 0.64 1.81 9.2 12.5
50 cm 6440 0.97 0.64 0.59 1.90 8.1 9.8
100 cm 6440 0.97 0.35 0.46 2.14 7.3 10
Table 5. Number of observations, performance indices and parameters for the autumn in different
depths.
R2 RMSE AIC
Depth Number of observations PD CLR PD CLR PD CLR
5 cm 6440 0.98 0.94 0.97 2.24 10.5 12.8
10 cm 6440 0.99 0.95 0.83 1.99 10.2 12.5
20 cm 6440 0.99 0.96 0.82 1.88 10.1 12.4
30 cm 6440 0.99 0.95 0.76 1.82 10.0 12.3
50 cm 6440 0.99 0.92 0.69 1.85 8.3 10.1
100 cm 6440 0.99 0.86 0.53 1.99 8.1 10.2
Table 6. The model error (RMSE and MBE) of panel-data for winter in different stations and depths in
2009.
5 cm 10 cm 20 cm 30 cm 50 cm 100 cm
Stations RMSE MBE RMSE MBE RMSE MBE RMSE MBE RMSE MBE RMSE MBE
Ghoochan 1.21 −0.13 1.09 −0.23 0.88 −0.44 0.86 −0.28 0.84 −0.52 0.75 −0.36
Golmakan 1.44 −0.06 1.33 −0.27 1.06 −0.33 0.82 −0.02 0.76 −0.07 0.66 0.31
Gonabad 1.53 0.39 1.21 0.02 1.30 0.37 1.13 0.12 0.96 0.41 0.83 0.29
Kashmar 1.25 −0.09 1.30 −0.01 1.16 −0.05 1.12 −0.35 0.91 0.12 0.96 −0.05
Mashhad 1.49 0.11 1.44 0.06 1.29 0.39 1.22 0.29 0.77 0.46 0.59 0.47
Neishabour 1.37 −0.07 1.32 −0.08 1.05 −0.14 0.89 0.18 0.71 0.27 0.65 0.23
Sabzevar 1.52 0.42 1.39 0.26 1.34 0.44 1.19 0.74 1.04 0.61 0.93 0.63
Sarakhs 1.49 0.12 1.36 0.14 1.32 0.73 1.13 0.43 0.92 0.34 0.87 0.65
Torbat H 1.42 −0.65 1.29 −0.67 1.29 −0.63 1.13 −0.67 0.86 −0.33 0.81 −0.07
Torbat J 1.41 0.00 1.38 −0.43 1.12 −0.49 0.99 −0.16 0.85 −0.26 0.68 −0.14
10. Therefore, the two-way fixed-effects model was opted for each depth and period as the
most adequate model (24 models).
Tables 2–5 present performance indices and the number of observations of models for
each period and depth, respectively. All parameters were found statistically significant at
the level of P < 0.01 except Rd−1 for depths of 50 cm and 100 cm for all models and Td−1
for the third season at the depths of 100 cm. The RMSE of models varied from 0.46 to
1.29 for PD and from 1.57 to 2.97 for CLR. The R2 were significantly high and ranged
between 0.94 and 0.99 for panel-data models, while these values were lower for CLR, rang-
ing between 0.35 and 0.96. Note that R2 and RMSE for CLR is calculated based on Thiessen
area of meteorological stations. In fact, these indices were calculated for each station sepa-
rately at a specified depth. The data presented in tables also show that in panel-data models,
as the depth of soils increased, the RMSEs declined and coefficients of Td−1 and Rd−1
decreased. It shows that the effect of daily air temperature was in inverse proportion to
10 A. MAHABBATI ET AL.
Table 7. The model error (RMSE and MBE) of panel-data for spring in different stations and depths in
2009.
5 cm 10 cm 20 cm 30 cm 50 cm 100 cm
Stations RMSE MBE RMSE MBE RMSE MBE RMSE MBE RMSE MBE RMSE MBE
Ghoochan 1.26 −0.09 1.36 −0.23 1.19 −0.72 1.16 −0.83 1.02 −0.84 0.72 −0.45
Golmakan 1.39 −0.54 1.24 −0.56 1.23 −0.57 0.94 0.03 0.92 −0.48 0.81 −0.37
Gonabad 1.43 0.53 1.26 0.39 0.94 −0.22 0.88 −0.30 0.84 −0.51 0.86 −0.45
Kashmar 1.39 −0.68 1.18 0.15 1.05 0.17 0.91 −0.50 0.87 −0.18 0.86 0.51
Mashhad 1.26 −0.19 1.28 0.06 1.21 −0.69 0.98 0.51 0.94 0.47 0.92 0.38
Neishabour 1.44 0.10 1.34 0.35 1.29 −0.59 1.29 −0.63 1.24 −0.61 1.12 −0.52
Sabzevar 1.47 −0.79 1.36 −0.68 1.27 −0.57 1.19 −0.52 1.13 −0.58 0.89 −0.26
Sarakhs 1.47 0.41 1.19 0.10 1.17 0.68 1.14 0.46 1.01 0.51 0.94 0.36
Torbat H 1.61 −0.82 1.37 −0.77 1.31 −0.72 1.23 −0.59 1.13 −0.52 0.98 −0.42
Torbat J 1.52 −0.74 1.43 −0.73 1.37 −0.77 1.21 −0.65 1.18 −0.53 1.05 −0.49
Table 8. The model error (RMSE and MBE) of panel-data for summer in different stations and depths in
2009.
5 cm 10 cm 20 cm 30 cm 50 cm 100 cm
Stations RMSE MBE RMSE MBE RMSE MBE RMSE MBE RMSE MBE RMSE MBE
Ghoochan 1.13 −0.22 0.97 −0.58 0.93 −0.73 0.86 −0.32 0.84 −0.45 0.64 −0.21
Golmakan 1.21 −0.18 1.17 −0.38 1.01 −0.21 0.99 0.09 0.93 0.02 0.93 −0.38
Gonabad 1.33 0.43 1.13 0.20 0.85 −0.40 0.72 −0.07 0.67 0.27 0.57 −0.19
Kashmar 1.19 −0.66 1.15 −0.02 1.02 −0.22 1.12 −0.43 0.96 0.20 0.82 −0.01
Mashhad 1.28 −0.44 1.11 −0.27 1.07 0.26 1.02 0.39 0.94 0.46 0.94 0.49
Neishabour 1.29 −0.54 1.22 0.21 1.16 −0.56 1.14 −0.54 1.03 −0.38 0.98 −0.74
Sabzevar 1.44 −0.56 1.35 −0.49 1.21 −0.48 1.03 −0.55 0.95 −0.39 0.84 −0.30
Sarakhs 1.22 −0.34 1.18 −0.58 1.07 0.22 0.96 0.16 0.89 0.36 0.74 −0.02
Torbat H 1.46 −0.51 1.31 −0.55 1.29 −0.42 1.03 −0.62 0.95 −0.20 0.90 −0.68
Torbat J 1.25 −0.04 1.09 −0.51 1.09 −0.66 0.96 −0.32 0.88 −0.10 0.85 −0.47
Table 9. The model error (RMSE and MBE) of panel-data for autumn in different stations and depths in
2009.
5 cm 10 cm 20 cm 30 cm 50 cm 100 cm
Stations RMSE MBE RMSE MBE RMSE MBE RMSE MBE RMSE MBE RMSE MBE
Ghoochan 1.26 −0.53 1.24 −0.61 1.15 −0.64 0.92 −0.43 0.72 −0.49 0.69 −0.56
Golmakan 1.43 −0.46 1.22 −0.32 1.07 0.02 0.98 0.34 0.79 0.22 0.82 −0.17
Gonabad 1.33 −0.58 1.30 −0.64 1.24 −0.58 1.20 −0.52 1.16 −0.61 1.09 −0.48
Kashmar 1.13 −0.32 1.11 −0.56 1.09 −0.58 0.90 −0.52 0.73 0.06 0.79 −0.15
Mashhad 1.31 −0.72 1.26 −0.72 1.07 −0.39 0.82 0.10 0.77 −0.09 0.68 −0.12
Neishabour 1.25 −0.57 1.18 −0.61 1.13 −0.53 1.06 −0.22 1.01 −0.49 0.95 −0.52
Sabzevar 1.36 0.37 1.28 0.21 1.03 0.70 0.99 0.64 0.92 0.06 0.72 −0.55
Sarakhs 1.40 0.55 1.27 0.30 1.05 0.57 1.08 0.67 1.04 0.62 1.03 0.69
Torbat H 1.46 −0.69 1.33 −0.61 1.27 −0.72 1.19 −0.68 1.09 −0.59 1.02 −0.60
Torbat J 1.43 −0.60 1.34 −0.58 1.02 −0.54 1.02 0.00 0.95 −0.39 0.94 −0.59
depth; which seemed to be reasonable. This confirms the well-known relation between
long-term air temperatures and deep soil temperatures. Moreover, AIC of models varied
from 7.3 to 11 for PD and from 9.8 to 13.3 for CLR, which shows the superiority of PD
compared with CLRs.
JOURNAL OF APPLIED STATISTICS 11
Table 10. The model RE of panel-data for winter in different stations and depths in 2009.
5 cm 10 cm 20 cm 30 cm 50 cm 100 cm
Stations RE RE RE RE RE RE
Ghoochan 0.082 0.077 0.077 0.083 0.098 0.125
Golmakan 0.089 0.087 0.082 0.078 0.084 0.094
Gonabad 0.071 0.062 0.081 0.078 0.085 0.122
Kashmar 0.057 0.062 0.065 0.071 0.072 0.133
Mashhad 0.077 0.079 0.086 0.087 0.081 0.120
Neishabour 0.070 0.071 0.080 0.080 0.080 0.105
Sabzevar 0.080 0.087 0.110 0.116 0.58 0.213
Sarakhs 0.063 0.065 0.076 0.073 0.072 0.141
Torbat H 0.079 0.077 0.084 0.082 0.088 0.148
Torbat J 0.069 0.074 0.077 0.082 0.087 0.116
Table 11. The model RE of panel-data for spring in different stations and depths in 2009.
5 cm 10 cm 20 cm 30 cm 50 cm 100 cm
Stations RE RE RE RE RE RE
Ghoochan 0.057 0.064 0.063 0.064 0.065 0.060
Golmakan 0.053 0.048 0.054 0.043 0.052 0.062
Gonabad 0.048 0.047 0.042 0.039 0.046 0.064
Kashmar 0.065 0.056 0.052 0.045 0.048 0.055
Mashhad 0.056 0.063 0.067 0.054 0.059 0.067
Neishabour 0.048 0.050 0.063 0.065 0.073 0.089
Sabzevar 0.055 0.059 0.062 0.060 0.066 0.070
Sarakhs 0.056 0.054 0.056 0.056 0.055 0.061
Torbat H 0.095 0.087 0.089 0.088 0.089 0.094
Torbat J 0.056 0.060 0.068 0.062 0.072 0.083
Table 12. The model RE of panel-data for summer in different stations and depths in 2009.
5 cm 10 cm 20 cm 30 cm 50 cm 100 cm
Stations RE RE RE RE RE RE
Ghoochan 0.096 0.088 0.092 0.101 0.131 0.162
Golmakan 0.071 0.095 0.085 0.094 0.103 0.185
Gonabad 0.102 0.096 0.093 0.093 0.093 0.259
Kashmar 0.075 0.073 0.077 0.096 0.098 0.138
Mashhad 0.067 0.071 0.087 0.093 0.116 0.203
Neishabour 0.077 0.116 0.174 0.163 0.172 0.294
Sabzevar 0.124 0.139 0.173 0.149 0.180 0.257
Sarakhs 0.076 0.090 0.092 0.094 0.103 0.142
Torbat H 0.116 0.124 0.151 0.140 0.166 0.250
Torbat J 0.094 0.100 0.140 0.138 0.162 0.252
Table 13. The model RE of panel-data for autumn in different stations and depths in 2009.
5 cm 10 cm 20 cm 30 cm 50 cm 100 cm
Stations RE RE RE RE RE RE
Ghoochan 0.058 0.059 0.059 0.051 0.041 0.046
Golmakan 0.057 0.052 0.051 0.051 0.047 0.062
Gonabad 0.047 0.048 0.049 0.050 0.054 0.066
Kashmar 0.042 0.042 0.046 0.039 0.036 0.050
Mashhad 0.050 0.051 0.047 0.040 0.042 0.052
Neishabour 0.051 0.052 0.056 0.055 0.058 0.069
Sabzevar 0.059 0.060 0.056 0.057 0.057 0.053
Sarakhs 0.055 0.052 0.045 0.051 0.054 0.072
Torbat H 0.060 0.058 0.061 0.061 0.063 0.074
Torbat J 0.056 0.055 0.050 0.058 0.057 0.072
to this figure, the mean differences in most periods, were scattered in the range of −1 to
1°C for panel-data models (which seems to be acceptable according to similar researches
(e.g. [30,32]) and from −1.5 to 1.5°C for CLR ones. The patterns of differences from mea-
sured temperatures were similar for panel-data and CLR models except in the depths of
50 and 100 cm, where panel-data models had a considerably greater performance. More-
over, during the Julian days of 110–140, all models overestimated the soil temperatures
continuously in all depths. On the other hand, as Tables 6–9 indicate, RMSEs declined
by depths in almost all stations and all periods in panel-data models. This could be due
to the fact that the variability of temperature declines with depth. Nevertheless, based on
Tables 10–13, REs did not show a significant change based on depth except at the depth of
100 cm in which REs were almost slightly higher than other depths. Furthermore, RMSEs
varied from 0.57 to 1.61 and MBEs varied in the range of −0.83 to 0.74. Despite the sig-
nificant RMSE patterns, MBEs were depth-independent. The lower RMSEs happened in
Ghoochan station which can be explained by the fact that this station had the closest range
of fluctuations in its annual air temperature regime. The closest average of MBEs to zero
was occurred in Mashhad station because the characteristics of this station (height, average
temperature and rainfall) was the closest one to the average of all 10 stations. In contrast,
the biggest RMSEs and MBEs were occurred in the station of Torbat-Heydarieh. This can
be justified by the fact that this station had the widest range of fluctuations in its annual
air temperature regime and also it is located in the boundary of Khorasan-Razavi Province
which is near the dryer and hotter southern area. In addition, the mean of residuals were
different from zero in most of the stations and for most of the depths. In General the means
of residuals which their MBE were less than −0.3 or more than 0.3 were significantly dif-
ferent from zero and it was depth independent. It should also be mentioned that none of
the stations had biased residuals (consistently positive or consistently negative), while all
of them were distributed normally. It should be noted that residuals do not demonstrate
correlation.
Figure 2. Differences between the average of measured and estimated amounts of soil temperatures
during the year 2009 (°C) at the depth of (a) 5 cm, (b) 10 cm, (c) 20 cm, (d) 30 cm, (e) 50 cm and (f) 100 cm
considering all stations.
models to include time effects. Furthermore, as shown in Tables 14–17, although the aver-
age of MBEs did not show any supremacy of neither panel-data nor CLR performances,
CLR models were the ones which experienced the extreme amounts of MBEs. The RMSEs
declined constantly from shallower depths – 5 cm – to deeper ones – 100 cm – in PD
models; whereas, deeper depths – 50 and 100 cm – experienced some rises in CLR mod-
els. The minimum average of RMSEs occurred during summer for both models, while
14 A. MAHABBATI ET AL.
the maximum ones happened during spring and winter seasons for PD and CLR models,
respectively.
According to Table 18, the t-tests illustrated that the null hypotheses of panel-data RMSE
average was equal to or more than CLR RMSE average rejected by t-tests; hence, the panel-
data RMSEs were significantly less than the CLR ones. Nevertheless, t-tests did not prove
any supremacy of neither panel-data nor CLR performances when the averages of MBE
were compared.
JOURNAL OF APPLIED STATISTICS 15
4. Conclusions
Panel-data models showed RMSEs from 0.46 to 1.29°C, considerably lower than those of
CLRs. Also, the averages of the R2 for each season and depth were acceptable for PD mod-
els – varied between 0.94 and 0.99 – which were significantly greater than those of CLR,
which were between 0.35 and 0.96. The PD models could predict soil temperatures at var-
ious depths and stations with mean errors in the range of −1 to 1°C for most of the year.
Nonetheless, in some days during the second season (April and May) mean errors were
larger, there was a constant overestimation of up to 2°C, especially in shallower depths.
The overestimation of soil temperatures in the second season can be explained by the fact
that the rainfall was significantly higher (more than twice) than the normal almost all over
Khorasan-Razavi during spring, summer and autumn (especially during April and May)
in 2009. In spite of the mean errors were found at a reasonable level, the absolute errors
on certain locations and in shallower depths might, occasionally, have been in the order
of 2.7°C, which can be explained by the higher fluctuations in the depths of 5 and 10 cm.
In addition, the validation indicated that the panel-data models were useful and reliable
for prediction of soil temperature; nevertheless, the effect of heavy and unusual rainfalls
could lead to overestimation. Moreover, it should be noted that the accuracy of predic-
tions improved by increasing soil depth. In contrast, it should be mentioned that in CLR
models, the RMSEs did not follow the same pattern by increase in soil depth (while the
same independent variables were used). Moreover, the RMSEs were substantially lower for
panel-data models compared with CLR ones. Consequently, it seems that panel-data mod-
els can predict variables with more sinusoidal and organized patterns than those which
have delay effects, much better than CLR ones.
Future investigations on application of panel-data to soil temperature modeling may
comprehensively find pros and cons of panel-data approach in comparison with other
16 A. MAHABBATI ET AL.
methods. Further studies are encouraged to examine the potentials of the panel-data
concept for a broader usage and modeling capability in meteorology and environmental
sciences.
Acknowledgements
The authors would like to thank Dr Majid Sarmad and Ms Zahra Khoshkam from Department of
Statistics of Ferdowsi University of Mashhad for their insightful suggestions that led to a substantial
improvement of the manuscript.
Disclosure statement
No potential conflict of interest was reported by the authors.
References
[1] M. Arellano, Panel Data Econometrics, 2nd ed., Oxford University Press, New York, 2003.
[2] B.H. Baltagi, Econometric Analysis of Panel Data, 3rd ed., John Wiley & Sons, New York, 2005.
[3] B. Bond-Lamberty, C., Wang, and S.T. Gover, Spatiotemporal measurement and modeling of
stand-level boreal forest soil temperatures, Agr. Forest Meteorol. 131 (2005), pp. 27–40.
[4] T.S. Breusch and A.R. Pagan, The Lagrange multiplier test and its applications to model
specification in econometrics, Rev. Econom. Stud. 47 (1980), pp. 239–253.
[5] M.W. Browne, Generalized least squares estimators in the analysis of covariance structures, South
African Statist. J. 8 (1974), pp. 1–24.
[6] G.C. Chow, Tests of equality between sets of coefficients in two linear regressions, Econometrica
28 (1960), pp. 591–605.
[7] I.N. Daliakopoulos, P. Coulibaly, and I.K. Tsanis, Groundwater level forecasting using artificial
neural networks, J Hydrol. 309 (2005), pp. 229–240.
[8] F. Droulia, S. Lykoudis, I. Tsiros, N. Alvertos, E. Akylas, and I. Garofalakis, Ground temperature
estimations using simplified analytical and semi-empirical approaches, Solar Energy 83 (2009),
pp. 211–219.
[9] Z. Gao, L. Bian, Y. Hu, L. Wang, and J. Fan, Determination of soil temperature in an arid region,
J. Arid Environ. 71 (2007), pp. 157–168.
[10] R.K. George, Prediction of soil temperature by using artificial neural network algorithms,
Nonlinear Anal. 47 (2001), pp. 1737–1748.
[11] E.A. Graham, Y. Lam, and E.M. Yuen, Forest understory soil temperatures and heat flux calcu-
lated using a Fourier model and scaled using a digital camera, Agric. Forest Meteorol. 150 (2010),
pp. 640–649.
[12] B.H. Hall, The relationship between firm size and firm growth in the US manufacturing sector,
J. Ind. Econ. 35 (1987), pp. 583–606.
[13] M. Harding and C. Lamarche, Least square estimation of a panel data model with multifactor
error structure and endogenous covariates, Econom. Lett. 111 (2011), pp. 197–199.
[14] J. A. Hausman, Specification tests in econometrics, Econometrica 46 (1978), pp. 1251–1271.
[15] C. Hsiao, Analysis of Panel Data, 2nd ed., Cambridge University Press, London, 2003.
[16] C. Hsiao, M. Pesaran, and A. Tahmiscioglu, Maximum likelihood estimation of fixed effects
dynamic panel data models covering short time periods, J. Econometrics 109 (2002), pp. 107–150.
[17] A. Izady, K. Davary, A. Alizadeh, B. Ghahraman, M. Sadeghi, and A. Moghaddamnia, Appli-
cation of ‘panel-data’ modeling to predict groundwater levels in the Neishaboor Plain, Iran,
Hydrogeol. J. 20 (2012), 435–447. doi:10.1007/s10040-011-0814-2.
[18] A. Jebamalar, S. Raja, and S. Bai, Prediction of annual and seasonal soil temperature variation
using artificailneural network, Indian J. Radio Space Phys. 41 (2012), pp. 48–57.
[19] A. Kangasharju, Regional variations in firm formation: Panel and cross-section data evidence
from Finland, Pap. Reg. Sci. 79 (2000), pp. 355–373.
JOURNAL OF APPLIED STATISTICS 17
[20] D. Kirkham and W.L. Powers, Advanced Soil Physics, first ed. John Wiley & Sons, New York,
1972.
[21] A. Krishnan and R.S. Kushwaha, Analysis of soil temperatures in the arid zone of India by Fourier
techniques, Agric. Meteorol. 10 (1972), pp. 55–64.
[22] P. Kumar and A. Kaleita, Assimilation of near-surface temperature using extended Kalman filter,
Adv. Water Sources 26 (2003), pp. 79–93.
[23] R. Lal and M.K. Shukla, Principles of Soil Physics, 1st ed., Marcel Dekker, New York, 2004.
[24] L. Lee and J. Yu, Some recent developments in spatial panel data models, Reg. Sci. Urban Econ.
40 (2010), pp. 255–271.
[25] L. Leng, T. Zhang, L. Kleinman, and W. Zhu, Ordinary least square regression, orthogonal
regression, geometric mean regression and their applications in aerosol science, J. Phys. (2007),
Conference Series 78 012084.
[26] S. Liu, Matrix results on the Khatri-Rao and Tracy-Singh products, Linear Algebr. Appl. 289
(1999), pp. 267–277.
[27] Y. Luo, R.S. Loomis, and T.C. Hsiao, Simulation of soil temperature in crops, Agric. Forest
Meteorol. 61 (1992), pp. 23–38.
[28] M. Mouchart and J. Rombouts, Clustered panel data models: An efficient approach for nowcast-
ing from poor data, Int. J. Forecast. 21 (2005), pp. 577–594.
[29] Y. Mundlak, On the pooling of time series and cross section data, Econometrica 46 (1978),
pp. 69–85.
[30] F. Plauborg, Simple model for 10 cm soil temperature in different soils with short grass, Eur. J.
Agron. 17 (2002), pp. 173–179.
[31] N. Rosenberg, Microclimate: The Biological Environment, 1st ed., John Wiley & Sons, New York,
1974.
[32] D.J. Timlin, Y.A. Pachepsky, B.A. Acock, J. Simunek, G. Flerchinger, and F. Whisler, Error anal-
ysis of soil temperature using measured and estimated hourly weather data with 2DSOIL, Agric.
Syst. 72 (2002), pp. 215–239.
[33] G. Trenkler, A Kronecker matrix inequality with a statistical application, Econom. Theory 11
(1995), pp. 654–655.
[34] M.R. Veronez, A.B. Thum, A.S. Luz, and D.R. Dasilva, Artificial neural networks applied in
the determination of soil surface temperature-SST, 7th International Symposium on Spatial
Accuracy Assessment in Natural Resources and Environmental Sciences, 2006.
[35] J. Wang and R.L. Bras, Ground heat flux estimated from surface soil temperature, J. Hydrol. 216
(1999), pp. 214–226.
[36] T.J. Wansbeek and A. Kapteyn, A simple way to obtain the spectral decomposition of variance
components models for balanced data, Commun. Statist. A11 (1982), pp. 2105–2112.
[37] T.J. Wansbeek and A. Kapteyn, A note on spectral decomposition and maximum likelihood
estimation of ANOVA models with balanced data, Statist. Probab. Lett. 1 (1983), pp. 213–215.
[38] J.M. Wooldridge, Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge,
2002.
[39] R. Yaffee, A primer for panel data analysis. Available via DIALOG, 2003. Available at http://
www.nyu.edu/its/pubs/connect/fall03/yafee_primer.html (accessed 15 April 2008).
[40] D. Zheng, E.R. Hunt, and S.W. Running, A daily soil temperature model based on air temperature
and precipitation for continental applications, Climate Res. 2 (1993), pp. 183–191.