Beruflich Dokumente
Kultur Dokumente
Geoderma
journal homepage: www.elsevier.com/locate/geoderma
A R T I C LE I N FO A B S T R A C T
Handling Editor: Alex McBratney S index as a soil physical quality is the slope of the soil water release curve on a mass base at its inflection point
Keywords: on a logarithmic matric potential scale. It is necessary to select the most influential soil properties on S index to
Regression predict it. In the current study, to select properties that influence S index, a hybrid algorithm: genetic algorithm
Pedotransfer functions (GA) in combination with an artificial neural network (ANN) was designed. The potential power of the GA-ANN
Soil quality algorithm in setting up a framework for identifying the most determinant parameters of S index in an Iranian
Soil water characteristics curve semiarid region was also investigated. Results of GA-ANN algorithm for selecting best dataset showed that five
properties including soil organic matter (SOM), sand, clay, carbonate calcium equivalent (CCE), and bulk density
(BD) had the lowest error. The ANN method resulted in a higher model efficiency (R2 = 0.92, RMSE = 0.00065
and MAPE = 0.14% than MLR (multiple linear regression (approach (R2 = 0.0.016, RMSE = 2.72 and
MAPE = 43. 16%). Results of sensitivity analysis for the ANN model showed that BD and CCE had the highest
and the lowest effects on S index, respectively. Considering the sensitivity analysis results and also that fact that
measurement of the S index is not cost-efficient, it is suggested that bulk density be used as an indicator of soil
quality.
1. Introduction leading to a change in the shape of the SWRC and in turn in the S value.
Dexter et al. (2007) insisted on two additional supporters of the S as an
The quality of a soil is defined as follows: its capacity to function adequate SPQI index: first, “the same S values have the same physical
within land use and ecosystem boundaries, continue biological pro- meaning in widely different soils;” this is not true about “other soil
ductivity, preserve environmental quality, and further plant, animal, physical properties such as bulk density” (BD). Second, with the help of
and human health. Although soil quality cannot be directly measured, it the S, a more objective measurement with higher resolution (low
can be evaluated with the help of soil quality indicators. Recently, S coefficient of variation and standard error) is provided, compared with
index has been well known as an important indicator of soil physical other measures, e.g. subjective visual examination of the SPQIs in the
quality, first proposed and discussed by Dexter (2004b). It is the slope field. The S was theoretically defined as a measure of soil micro-
of the soil water release curve (SWRC) on a mass base at its inflection structure, which would directly allow estimation of several important
point on a logarithmic matric potential scale. S index was applied an soil physical properties by its value.
“easy and unambiguous measure” (Dexter, 2004b) through observa- Though correlation of S index with other soil physical properties has
tions on various soil properties; these observations were integrated for been studied, according to our knowledge, there is still a lack of enough
obtaining of soil physical indicators (SPQIs). Several researches have research about selection of appropriate soil properties for predicting S
proved suitability of S index for identification of SPQIs. For instance, index. This study tries to fill this gap and suggest further studies.
according to Dexter (2004a, 2004b) S index correlates with several Feature selection (FS) is the best approach for selecting important
important soil physical properties. As he stated, textural and structural properties. FS is generally employed in machine learning. It is espe-
pores, in the SWRC, are represented as pores smaller and larger than cially helpful when the learning task involves datasets with a lot of
those corresponding with the inflection point, respectively. S index is dimensions. It is mainly aimed at choosing a set of features or properties
used as an indicator of SPQIs because soil physical degradation is al- by throwing out those which have low predictive information as well as
ways related to some change in the distribution of the structural pore, the strongly correlated redundant properties (Vieira et al., 2010). The
⁎
Corresponding author.
E-mail address: h.shekofteh@vru.ac.ir (H. Shekofteh).
https://doi.org/10.1016/j.geoderma.2019.113908
Received 23 February 2019; Received in revised form 8 July 2019; Accepted 7 August 2019
Available online 17 August 2019
0016-7061/ © 2019 Elsevier B.V. All rights reserved.
H. Shekofteh and A. Masoudi Geoderma 355 (2019) 113908
availability of an array of input data poses a challenge to the more moisture retention curve data. Soil texture, bulk density and moisture
conventional statistical methods such as regression and classification retention curve data were used as input data in RETC 6.0 software and
analysis. For instance, the employment of various input features for the parameters of the equation (n and θr) (Van Genuchten, 1980) were
deriving a pedo-transfer function to predict the S index may require predicted. Then S index was calculated as follows (Dexter, 2004b):
estimation of a sizable number of parameters in a regression analysis
1 −(1 + m)
hence more data points to be measured. In an ideal situation, for each S = −n (θsat − θr ) ⎡1 + ⎤
⎣ m⎦ (1)
feature which is input in a regression analysis, a separate set of in-
formation should be added. If the input variables are strongly corre- If m = 1–1/n is applied:
lated, the redundancy of the available information may impact the
1
accuracy of the regression model negatively (Pal and Foody, 2010). 2n − 1 ⎡⎣ n − 2⎤⎦
S = n (θsat − θr ) ⎡ ⎤
Recently, meta-heuristic algorithms, inspired by the nature, have been ⎣ n−1⎦ (2)
employed for feature selection purposes in soil science. See Shirani −1
where n (empirical parameter), and θr (residual water content (gg ))
et al. (2015) for an example of a particle swarm optimization (PSO) for
are obtained by RETC software but θsat (saturated water content
selecting the most influential properties on soil physical quality indices
(gg−1)) was measured. Since S index is always negative, the modulus of
and Shekofteh et al. (2017) for an application of ant colony organiza-
S index was used in this paper.
tion optimization (ACO) for selecting effective properties on soil CEC.
One of the nature inspired meta-heuristic algorithms is genetic al-
gorithm (GA). In this study, a hybrid algorithm GA-ANN (Artificial 2.3. Selection of features affecting the S index
Neural Network) was used for selecting the best subset affecting the S-
index. In this research nine properties were considered as follows: elec-
The aims of this study were as follows: to assess the GA in combi- trical conductivity, sand, silt, clay, CCE, BD, PD, soil pH, and SOM as
nation with ANN to select the best set of input properties influencing the inputs and S index as the output.
the S index, to do modeling after selecting the properties that influence Below is given a summary of GA and ANN method.
the S index by ANN and multiple llinear regression (MLR) approaches,
and to compare ANN and MLR results. 2.3.1. Genetic algorithm
GA works on the basis of the evolution theory in biology: the more
2. Materials and methods fitted a population, the more probable the continuation and reproduc-
tion of its subsequent. Main data set in GA consists of N features that are
2.1. Study area first divide into the subsets d1, d2, d3 … with n1, n2, n3 members.
Afterwards, these subsets are transferred to the artificial neural network
The studied area of the present study was the same as the area (ANN). In the ANN, data are again divided into two parts: train and test,
studied in Shekofteh et al. (2017). To select the set of features which with a ratio of 70 to 30, respectively. Data errors are then computed.
affect S index, soil samples from parts of Rabor Region (29° 27′ N to 38° The error (root mean square error (RMSE)) is obtained by the sum of
54′ N and 56° 45′ E to 57° 16′ E) were collected. Robor, a city in the the weights for train and test data errors as follows:
south east of Kerman province, is a semiarid farming area which has RMSE final = 0.7RMSEtrain + 0.3RMSEtest (3)
cold temperature. The mean annual temperature of Rabor is 15 °C and
its annual precipitation is 250 mm. According to the Soil Survey Staff The mean error for 50 runs of the ANN model is considered as the
(2014), the soils studied belonged mainly to Typic Calcixerepts sub- final error for each result. Next, the GA computes data fitness with
group. The main textural classes in the area are loam and sandy loam. regard to error value; the best subset is selected based on its value.
2
H. Shekofteh and A. Masoudi Geoderma 355 (2019) 113908
3
H. Shekofteh and A. Masoudi Geoderma 355 (2019) 113908
Train data
0.036 y = 0.913x + 0.0031
R² = 0.9519
0.0358
0.0356
Predicted S index
0.0354
0.0352
0.035
0.0348
0.0346
0.0348 0.035 0.0352 0.0354 0.0356 0.0358 0.036 0.0362
Observed S index
0.0358
Test data y = 0.9313x + 0.0024
0.0357
R² = 0.9251
0.0356
0.0355
Predicted S index
0.0354
0.0353
0.0352
0.0351
0.035
0.0349
0.0348
0.0348 0.035 0.0352 0.0354 0.0356 0.0358
Observed S index
Fig. 2. Comparison of the predicted and measured S index of the ANN model.
is because adequate amounts of clay particles in soils have a con- particles flocculation, aggregate formation and improvement of the soil
siderable role in creation of a good structure. In addition, clay particles structure. Regarding CCE impact on soil structure, carbonates act as a
have high water retention capacity owing to their particularly high source of Ca+2 and help flocculation of clay particles.
surface area. Also, having strong adhesion and plasticity, they provide The pH, EC, PD and silt properties were considered redundant fea-
good soil structure. This way, they affect the moisture curve shape and tures among the input parameters. Thus, they were not used in the
S index, itself obtained by the former. databases for modeling S index. Electrical conductivity is indicative of
The BD as a soil structural property has an influence on soil-invol- soluble salts concentration, which causes flocculation of clay particles
ving processes including water flow, solute transport and air flow. The by reducing thickness of the diffuse double layer. Since the values
BD shows the compact status of soil and the volume of soil pores in- measured for the electrical conductivity (EC) of the studied soil samples
directly. Changes in soil BD affect such a variety of properties and were low and had a limited range, they were not likely sufficient (see
processes as porosity, pore size distribution, water retention capacity, Table 2) to result in significant differences on the S index. Conse-
and air capacity. As such, S index, itself a soil structural index, is subject quently, the GA-ANN algorithm did not recognize it as a discriminating
to change too. In an assessment of influential factors on soil physical property. Likewise, the pH variable was removed from the input
destruction by Dexter et al. (2007) S index decreased with an increase parameters due to its limited variation in the study area (Table 2). In
in bulk density. fact, as an explanatory property, pH contributes to chemical inter-
Sand particles were one of the most influential features on the S pretation of soil. Nevertheless, for having a substantial influence on the
index. The soils under study had a high sand percentage. Despite having soil structure, it needs to be in a wide range. Another redundant feature
a low specific surface area, low surface charge and inability in adhering was silt content due to its close relationship and correlation with other
soil particles and soil structure, sand particles have a significant impact soil texture components. Soil particle density is another redundant
on water flow in soil and shape of retention curve owing to their macro feature that can be calculated by SOM and soil mineral matters. One
pores. Therefore, they influence S index, which is obtained by moisture reason behind feature selection is to remove redundant and correlated
curve. parameters. Since soil particle density was correlated with SOM and soil
The other influential feature on S index was CCE. With respect to mineral matter, it was regarded as a redundant.
high percentage of CCE in the studied area (due to its existence in the
semi-arid location and non-leaching), this cementing factor causes soil
4
H. Shekofteh and A. Masoudi Geoderma 355 (2019) 113908
Train data
0.0365
0.036
0.0355
0.035
Predicted S index
y = 0.7897x + 0.0066
0.0345 R² = 0.0404
0.034
0.0335
0.033
0.0325
0.032
0.0315
0.033 0.0335 0.034 0.0345 0.035 0.0355 0.036 0.0365
Observed S index
Test data
0.036
0.0355
Predicted S index
0.035
0.0345
0.034
0.0335
0.033 y = 1.904x - 0.0326
R² = 0.0165
0.0325
0.032
0.0315
0.0349 0.03495 0.035 0.03505 0.0351 0.03515 0.0352
Observed S index
Fig. 3. Comparison of the predicted and measured S index of the MLR model.
18
16
14
12
Sensitivity index
10
0
Sand Organic matter Bulk density Clay CCE
Fig. 4. The sensitivities of RMSE derived by ANN model to removals of soil physicochemical properties.
5
H. Shekofteh and A. Masoudi Geoderma 355 (2019) 113908
3.3. Modeling of S index after selecting proper features water characteristic curve can be affected by both soil texture and
structure. About Dexter (2004a) stated that the effect of organic matter
As already mentioned, in this section the dataset was randomly on soil S index is greater when clay content is low. In the study areas,
divided into two sets: training data set (70%) and testing data set clay content was low; thus, organic matter had a more effect on the S
(30%). The MAPE and RMSE values between the proposed ANN model index.
and the observed S index for the training data were 0.11% and 0.0046,
respectively. The obtained coefficient of determination (R2) value be- 4. Conclusion
tween the observed and predicted S index using ANN values for the
training data was 0.95 (Fig. 2). The MAPE and RMSE values between The results of this study indicate that the GA-ANN has the ability to
the proposed ANN model and observed value for the testing data were provide an appropriate framework for selection of the most influential
0.0065 and 0. 14%, respectively. The coefficient of determination (R2) features on S index. The GA-ANN can be applied to similar areas and
obtained for the observed and predicted S index using the proposed issues. According to the results, the neural network has better accuracy
ANN values for the testing data was also 0.92 (Fig. 2). in predicting the S index than the linear regression. Its accuracy is ac-
The MAPE and RMSE values for the MLR model and the observed S ceptable regarding prediction of the S-index.
index for the training data were 39.54 and 2.05, respectively. The R2 for
the MLR and observed S-index for the training data was 0.04 (Fig. 3). References
Moreover, the MAPE and RMSE values between MLR model and the
observed S index for the testing data were 43.16 and 2.72, respectively. Amini, M., Abbaspour, K.C., Khademi, H., Fathianpour, N., Afyuni, M., Schulin, R., 2005.
The obtained coefficient of determination (R2) between the measured Neural network models to predict cation exchange capacity in arid regions of Iran.
Eur. J. Soil Sci. 56 (4), 551–559.
and predicted S index for the testing data using the proposed MLR Asadi, H., Bagheri, F., 2010. Comparison of regression pedotransfer functions and arti-
model was also 0.0016 (Fig. 3). ficial neural networks for soil aggregate stability simulation. World Appl. Sci. J. 8 (9),
The results for the training data show that ANN model could capture 1065–1072.
Blake, G.R., Hartge, K.H., 1986. In: Page, A.L. (Ed.), Bulk density. Methods of Soil
the relationship between the input parameters and the S index with Analysis, Part 1. American Society of Agronomy Wisconsin, Madison.
more accuracy than the MLR model. The values of performance criteria Dexter, A., 2004a. Soil physical quality: part II. Friability, tillage, tilth and hard-setting.
for the testing data show that ANN is a more accurate approach than Geoderma 120 (3–4), 215–225.
Dexter, A.R., 2004b. Soil physical quality: part I. theory, effects of soil texture, density,
MRL in predicting the S index in the studied area.
and organic matter, and effects on root growth. Geoderma 120 (3–4), 201–214.
Based on the evaluation indices, it appears that conventional re- Dexter, A., Czyż, E., Gaţe, O., 2007. A method for prediction of soil penetration resistance.
gression model is very poor in predicting S index. These results suggest Soil Tillage Res. 93 (2), 412–419.
Gee, G.W., Bauder, J.W., 1986. Particle size analysis. In: Klute, A. (Ed.), Methods of Soil
that conventional regression techniques (i.e. multiple linear regression)
Analysis: Part 1. American Society of Agronomy and Soil Science Society of America,
may not be reliable for predicting S index in the studied site. Madison, WI, pp. 383–411.
Actually, multiple linear regression is only able to identify linear Ingleby, H., Crowe, T., 2001. Neural network models for predicting organic matter con-
relationships between dependent and independent variables. In case tent in Saskatchewan soils. Outlook 1 (2), 3.
Klute, A., 1986. Methods of Soil Analysis. Part 1. American Society of Agronomy, Inc. Soil
there is a non-linear relationship between these variables, efficiency of Science Society of America, Madison, Wisconsin, USA.
these models will decrease significantly. Nelson, R.E., 1982. Carbonate and gypsum. In: Page, A.L. (Ed.), Methods of Soil Analysis:
So far, wide research has showed superiority of ANN over MLR Part 1. Agronomy Handbook 9. American Society of Agronomy and Soil Science
Society of America, Madison, WI, pp. 181–197.
(Ingleby and Crowe, 2001; Amini et al., 2005; Asadi and Bagheri, Nelson, D., Sommers, L.E., 1982. Total carbon, organic carbon, and organic matter. In:
2010). Methods of Soil Analysis. Part 2. Chemical and Microbiological Properties (meth-
odsofsoilan2), pp. 539–579.
Pal, M., Foody, G.M., 2010. Feature selection for classification of hyperspectral data by
3.4. Sensitivity analysis of the ANN model SVM. Geoscience and Remote Sensing, IEEE Transactions on 48 (5), 2297–2307.
Rhoades, J.D., 1996. Salinity: electrical conductivity and total dissolved solids. In: Page,
Given that only the ANN's accuracy was acceptable in predicting the A.L. (Ed.), Methods of Soil Analysis: Part 2. Agronomy Handbook 9. American Society
of Agronomy and Soil Science Society of America, Madison, WI, pp. 417–435.
S index, the sensitivity analysis was done by the ANN. Results of RMSE Shekofteh, H., Ramazani, F., Shirani, H., 2017. Optimal feature selection for predicting
sensitivities for ANN are shown in Fig. 4. The figure indicates that BD, soil CEC: comparing the hybrid of ant colony organization algorithm and adaptive
sand, SOM, clay and CCE have the highest effect on the S index, in the network-based fuzzy system with multiple linear regression. Geoderma 298, 27–34.
Shirani, H., Habibi, M., Besalatpour, A., Esfandiarpour, I., 2015. Determining the features
order of appearance.
influencing physical quality of calcareous soils in a semiarid region of Iran using a
Further to the sensitivity analysis, the S index had the highest sen- hybrid PSO-DT algorithm. Geoderma 259, 1–11.
sitivity to bulk density (BD), indicating that BD is the most important Staff, S.S., 2014. Keys to Soil Taxonomy, Twelfth ed. NRCS, USDA, USA.
structural index. Regarding that BD is more easily measurable than the Thomas, G.W., 1996. Soil pH and soil acidity. In: Page, A.L. (Ed.), Methods of Soil
Analysis: Part 2. Agronomy Handbook 9. American Society of Agronomy and Soil
S index, it can be used as a substitute for the latter. More research can Science Society of America, Madison, WI, pp. 475–490.
be done on BD optimization limits. Van Genuchten, M.T., 1980. A closed-form equation for predicting the hydraulic con-
After BD, the sand particles had the highest effect on the S index. ductivity of unsaturated soils 1. Soil Sci. Soc. Am. J. 44 (5), 892–898.
Vieira, S.M., Sousa, J.M., Runkler, T.A., 2010. Two cooperative ant colonies for feature
With regard to the high percentage of sand in the soil of the studied selection using fuzzy models. Expert Syst. Appl. 37 (4), 2714–2723.
area, the obtained results seem reasonable. However, as to the S index,
it can be said that it is not just an index of soil structure because soil