Beruflich Dokumente
Kultur Dokumente
the surface and below ground, with three times more renewable
Abstract— Water is critical to a country’s development, as water resources than Kenya and 37 per cent more than Uganda.
it is not only used in agriculture but also for industrial Despite the vast amounts of fresh water available, many
development. Like many poor nations around the world, Tanzanians are still faced with water shortages due to
Tanzania suffers from serious issues involving its people insufficient capacity to access and store it both in rural and
in regards to water. In a nation where one third of the urban areas.
The following statistics illustrate the magnitude of the
country is arid to semi-arid, it is very difficult for people
problem:
to find access to clean, sanitary water if they don't live
near one of the three major lakes that border the country.
Access to water from a piped source all but stagnated
As a result, Tanzania's ground water is the major source over the past two decades. In 1991/92, 33.5 per cent of
of water for the nation's people; however it's not always the population had such access; this figure was 33.1 per
clean. Many of these ground water wells are located near cent in 2010. Despite this, Tanzania is doing better than
or next to toxic drainage systems, which leak into the fresh Uganda (15.3 per cent in 2006), at par with Kenya (34.3
ground water and contaminate it. Consequently, per cent in 2008-09) but far behind Senegal (68.7 per
Tanzanians turn to surface water which contains things cent in 2010);
Urban areas witnessed a sharp deterioration in access to
like bacteria or human waste; and people have no choice
water from 77.8 per cent to 58.6 per cent. On the other
but to drink from, bathe in or wash their clothes in these hand, - rural areas experienced a slight improvement
areas. According to Tanzania National Website, water- from 19.2 percent to 24.1 per cent during the same
borne illnesses, such as malaria and cholera "account for period;
over half of the diseases affecting the population," because A large majority of rural households (more than 70 per
people don't have access to sanitary options .Under these cent) were more than 15 minutes away from their main
circumstances people, particularly women and girls, spend water source in 2010;
a significant amount of time traveling some distance to Only 3 per cent of total cultivated area in Tanzania was
under irrigation in 2010.
collect water. We are looking at the dataset of water
pumps in Tanzania to predict the operating condition of a
Taarifa with the help of Tanzanian Ministry of Water is
water point. By finding which water pumps are looking into the problem of the people. Taarifa is an online
functional, functional needs repairs, and non-functional, platform for crowd sourced reporting and triaging of
the Tanzanian Ministry of Water can improve the infrastructure related issues. It is a platform which allows
maintenance operations of the water pumps and make citizens to engage with the government and register their
sure that clean, potable water is available to communities issues with the local infrastructure of their area. This project
across Tanzania. While we weren’t able to identify all the aims to analyze the data collected by Taarifa and the
Tanzanian Ministry of Water.
pumps that need repair, our confidence in the ones we did
Pump it Up: Data Mining the Water Table is a
is high and we expect this to aid the maintenance process.
competition hosted by DrivenData for the betterment of
Tanzania. We want to help the Tanzanian Ministry of Water
Index Terms—Machine Learning Algorithm, Deep Neural
Networks, Random Forest, Adaboost, XGBoost, Linear SVM. in identifying these water pumps that are functional but need
repair so that an immediate action can be taken to keep them
running in a healthy state. By fixing these water pumps early,
I. INTRODUCTION the people of Tanzania could have improved and continuous
There is no doubt about the importance of water to human access to running water.
existence. People need clean water to survive and stay healthy.
Lack of clean water contributes to the high mortality rates in We are using the data from Taarifa and Tanzanian Ministry of
children around the world. Tanzania has been blessed, both on Water to predict which water pumps are functional, functional
2
III. METHODOLOGY
A. Preliminary Data Analysis
The first task for the project was to explore the dataset and try
to establish non linearity relationship between different
features of dataset and the labels & try to exclude those who
were not affecting our labels. There was also a major problem
as features like gps_height,population,latitude and longitude
had many missing data points, so We filled those missing data
points with the mean and median(as required) of the respective
feature in that particular district in which it was lying.
4
We got the best result from the XGBoost classifier which gave
C. Model selection and training a score of 0.8114.
For further processing we dropped the unnecessary features to
ease our training procedure. We reduced our feature size to 18.
Now we have to choose the model to train our dataset. We
tried with different models to fufil our need. We tried with 5
different models and tried to tune them to maximize our result.
We first tried with linear SVM. It gave score around 0.7066.
Flow Chart:
For tuned Adaboost, we got score around 0.724 on drivendata.
V. CONCLUSION
Tanzania people are suffering very much due to unavailability
of Water resources and Water crisis. The objective of
the project was to predict the status of a water pump from
three possible status values (functional, functional needs repair,
non functional) based on a variety of quality and quantitative
water pump attributes and geographical values. Data was about
59,400 water pumps was used for training and validating five
distinct types classification models (Random forest, DNN,
Bootstrap Adaboost and XGboost), and the output of these
models shows the interactive geomap whereby the user can
view the predicted statuses of more than 14,850 additional
pumps whose functional status was not available as part of the
original data set..
REFERENCES