Beruflich Dokumente
Kultur Dokumente
Abstract - The deteriorating quality of natural water waterborne diseases cause death of more than 1.5 million
resources like lakes, streams and estuaries, is one of the people each year, much greater than deaths caused by
direst issues faced by humanity. The effects of un-clean accidents, crimes and terrorism combined[2]. Therefore, it is
water are far-reaching, impacting every aspect of life. very crucial to devise novel approaches and methodologies
Therefore, management of water resources to optimize for analyzing water quality and to forecast future water
the quality is very crucial. The effects of water quality trends.
contamination can be tackled efficiently if data is In order to carry out useful and efficient water quality
analyzed and water quality is predicted beforehand. This analysis and predicting the water quality patterns, it is
issue has been addressed in many previous researches, significant to determine the inter-dependence among
however, more work needs to be done in terms of different water quality parameters. Different methodologies
reliability, accuracy as well as usability of the current have been proposed and applied for analysis and monitoring
water quality management methodologies. The goal of of water quality and its parameter interdependence in past.
this study is to develop a water quality prediction model The methodologies range from statistical techniques, visual
using Artificial Neural Network (ANN) by determining modeling, analysis algorithms and prediction algorithms and
dependency among different water quality parameters, decision making. Multivariate statistical techniques like
in order to assist in decision making. This research uses Principal Component Analysis (PCA) has been used in order
the water quality historical data taken from the United to determine relationship among different water quality
States Geological Survey (USGS). For this study, the parameters[3]. The geo-statistical techniques that have been
data includes 7 parameters which affect water quality. used include kriging, transitional probability, multivariate
For the purpose of evaluating the performance of model, interpolation, regression analysis etc.[4]. The algorithms for
the performance evaluation measures used are Mean- analysis and prediction might include Artificial Intelligence
Squared Error (MSE), Root Mean-Squared (AI) techniques like Bayesian Networks (BN), Artificial
Error(RMSE) and Regression Analysis. Previous works Neural Networks (ANN) [5] Neuro-Fuzzy Inference[3],
about Water Quality prediction have also been analyzed Support Vector Regression (SVR)[6], Decision Support
and future improvements have been proposed in this System (DSS) and Auto-Regressive Moving Average
paper. (ARMA)[7]. However, the non-linear nature of water quality
Keywords: - Artificial Neural Networks, Environmental data, as in this research, makes it very complex to map
Modeling, Machine Learning input-output data and predict future water quality[8].
In order to carry out efficient water quality analysis and
1. INTRODUCTION prediction, the dependence among different water quality
Natural water resources like groundwater and surface parameters must be determined. The basic idea of this
water have always been the cheapest and most widely research is to devise a comprehensive methodology that
available resources of fresh water. However, these resources predicts , analyzes and visualizes water quality of particular
are also most likely to become contaminated due to various regions with the help of certain water quality parameters.
factors including human, industrial and commercial These parameters include physical, biological or chemical
activities as well as natural processes. In addition to that, factors which influence water quality. There are certain
poor sanitation infrastructure and lack of awareness also quality standards set up by international organizations like
contributes immensely to drinking water contamination [1]. World Health Organization (WHO) and Environmental
The effects of water quality deterioration are far-reaching, Protection Agency (EPA), which serve as a benchmark for
impacting health, environment and infrastructure in a very determining the quality of water. In its document
adverse manner. According to United Nations (UN), “Parameters of Water Quality”, EPA mentions a total of 101
parameters which have an effect upon water quality in one
1
way or another [9]. However, some parameters have a .
greater and more visible effect on water quality than others.
This paper intends to address this issue by suggesting a
model based upon Machine Learning techniques in order to
predict the future water quality trends of a particular area
with the help of current water quality data and determine
relationships among different water quality parameters.
Artificial Neural Networks (ANN) model is used in order to
develop a comprehensive methodology for efficient water
quality prediction and analysis. This includes a correlation
analysis between different water quality parameters that
determines the dependency and relationship among different
water quality parameters.
2
Figure 2. Structure of Artificial Neural Network
3
concentration, Turbidity, Dissolved Oxygen concentration quickly (57 epochs with best performance on epoch 51). The
and Specific Conductance. graph almost overlaps on MSE in the range of 10 -4 and the
Four ANN models have been created for this test for the MSE value decreases drastically at this point. Similarly,
parameters of Chlorophyll, Turbidity, Specific Conductance when we look at the Regression Analysis for Dissolved
and Dissolved Oxygen. In these tests, there are 6 input units Oxygen (DO), we can see data fitted well with the
with samples ranging from January to March 2014, with the function(R=0.994), with only a few outliers visible. The
seventh quality parameter being the target. A feed-forward MSE for DO shows the training and validation error almost
Neural Network has been used with the training algorithm of completely overlapping, hence there is less chance of over-
Scaled Conjugate Gradient (SCG) and the activation fitting. The Regression Analysis for Turbidity (Figure 6(a))
function of Log Sigmoid. After running the test, the shows relatively under-fitted data points, and hence the
performance parameters of Regression(R), Mean Squared regression value is 0.5. If we look at the graph of MSE
Error (MSE) and Root Mean Squared Error (RMSE) have (Figure 6(b)), we can see that it has 189 epochs; hence it has
been calculated. The performance is shown graphically with taken a relatively more time for the function to converge.
MSE and Regression analysis of four models. (Figure The performance measures and analysis can be verified by
3,4,5,6). The values of the performance measures for four looking at Table 2.
ANN models for training and testing processes are shown in Minimum Maximum Mid-Range
Parameter
the table (Table 2). Value Value Value
The graphs for Regression Analysis show how well the Temperature( oC) -1.0 28.4 14.7
data fits into the function, both for training and testing. The
closer the value of Regression is to 1, the better the function Specific 38900 49100 44000
fits and hence it indicates better accuracy. The graphs for Conductance(µS/cm)
MSE show the amount of epochs (iterations) it takes for the
function to converge and related MSE for training, testing
Salinity(PSU) 19.9 31.7 25.8
and validation. We can see in Figure 3(a) that most data for
Chlorophyll prediction fits into the range of 0 and 0.5,
though there are a lot of outliers. Here, Regression for both Nitrate(Mg/L) 0.01 1.0 0.505
training and testing is 0.6, hence, the training and testing
data does not fit that well. The graph of MSE (Figure 3(b))
shows that it takes 65 epochs, with best performance on Dissolved 3.6 18.0 10.8
epoch 59. We can see that the MSE for training, testing and Oxygen (Mg/L)
validation almost overlaps, hence the MSE value lies around
10-2 and the MSE decreases very slightly as the iterations
increase. On the other hand, if we see Regression Analysis Turbidity(FNU) <0.1 120 --
for Conductance (Figure 4(a)), we can analyze that the data
almost entirely fits the function with negligible amount of
Chlorophyll(µg/L) 0.7 140 70.35
outliers, hence the value of Regression is approximately
equal to 1 (0.998). The performance can also be seen in
MSE (figure 4(b)) where function converges even more Table 1: Characteristics of Water Quality Data for 2014
Figure 3(a) Regression Analysis for Chlorophyll Figure 3(b) Mean Squared Error for Chlorophyll
4
Figure 4(a) Regression Analysis for Conductance Figure 4(b) Mean Squared Error for Conductance
Figure 5(a) Regression Analysis for Dissolved Oxygen Figure 6(b) Mean Squared Error for Dissolved Oxygen
Figure 6(b) Regression Analysis for Turbidity Figure 6(b) Mean Squared Error for Turbidity
5
Training Data Testing Data
Parameters Unit Model No. of R MSE RMSE R MSE RMSE
Inputs
Chlorophyll µg/L ANN 6 0.665 0.0051 0.071 0.611 0.00504 0.070
Specific Conductance µS/cm ANN 6 0.998 0.00014 0.012 0.996 0.000141 0.011
Dissolved Oxygen mg/L ANN 6 0.994 0.00092 0.030 0.992 0.00119 0.034
Turbidity FNU ANN 6 0.534 0.00040 0.02 0.552 0.00025 0.015
Table 2 Performance measures with ANN