Praveen Apc Report 2018

INDIAN INSTITUTE OF TECHNOLOGY MANDI
MANDI- 175 001 (H.P.), INDIA

www.iitmandi.ac.in
PROGRESS REPORT FOR THE ACADEMIC YEAR 2018
Scholar’s Name: Praveen Kumar Roll No: S17007

School: School of Computing and Electrical Engineering Date of Registration:01/02/2018
Date of Last Presentation: Date of Current Presentation: 25/02/2019
1. Research Objective
 To develop MEMS and IOT based low-cost and ultra-low power Landslide
Monitoring and Early Warning System.
 Predict the landslide and soil movement with the help of machine learning
algorithms from the real-world landslide data.
2. Introduction
Landslides due to the movement of soil mass are a big problem in India especially in
Himachal Pradesh and Uttarakhand [1]. Landslides are natural hazards that often
happen without warning and cause massive damage to property and life across the
world. However, with an unbelievable 11,000 deaths in the last 12 years, India tops the
world in landslide deaths. According to the Geological Survey of India (GSI), in the year
2017, 12 landslides were reported in India. This year, the GSI has listed 23 events till
August 2018 [2].
One way to overcome the landslide problem is to use early warning systems against
landslides. Existing commercially available early warning systems use sensors like
vibrating wire piezometers and in-place inclinometers (IPI). These systems are installed
to determine the magnitude, rate, direction and types of landslides [3]. But these sensors
are very costly, and because of their cost it is difficult to install many sensors for
monitoring landslides. One solution to this problem is a low-cost landslide monitoring
and early warning system, which works on the same principle as that of a conventional
system but has very low cost and low power consumption.
Furthermore, different machine learning algorithms can help in the prediction of

landslides. The focus of machine learning in landslide mitigation is to timely predict the
movements of the soil so that the lives can be saved. Some researchers have applied
machine learning to landslides. For example, Hao et al. [4] broke the landslide
displacement into cycle terms and trend terms and combined with the periodicity
characteristics of time series to analyse cycle items of landslide displacement. Dujuan et
al. [5] used the Back Propagation (BP) neural network to predict its displacement based
on the work of Hao. Qiang and Duan [6] proposed a time series analysis with
capabilities of the forecasting complex systems in development trend and adopted
timing analysis method to establish the ARIMA model and the CAR model for landslide
www.iitmandi.ac.in
displacement dynamic forecasts. Yesilnacar et al. [7] combined logistic regression and
neural networks to overcome shortcomings of the statistical methods that could not
effectively build models of complex geological disasters. Xiangenjun [8] used a rough
set to dig out the inherent law of slope disaster activities from the historical slope data.
Daifuchu [9] focused on the natural landslide spatial prediction in Hong Kong adopted
two or single type to support vector machine for spatial prediction of landslide hazard
and compared with Logistic regression models at the same time [10].
Machine-learning algorithms could use the data collected by monitoring systems and
allow researchers to predict significant debris-flow in advance. There can be a large class
of algorithms which can be used for such predictions. However particularly important
are time-series forecasting algorithms.
Here, machine-learning algorithm like SMO [11] and Autoregression [12], and ensemble
algorithms (Random forest [13], bagging [14], stacking [15], voting [16] and SARIMA
[17]) could be used for time-series forecasting to predict movement one-week ahead
given the soil displacement of the previous weeks. For example, SMO optimizes the
training of support vector machine. Autoregression is mostly used for predicting and
finding out cause and effect relationship between variables. Random forest produces
the best result from the collection of a random tree. Bagging is used to subsamples from
the dataset with replacement and training the predictive model on those subsamples.
Stacking combines multiple models of different types and the voting combines
classifiers that use distinct pattern representations. In this research, we use these
machine-learning algorithms to do time-series forecasting of debris flow on real-world
data.
3. Work Done and Target Set for Last Year
A landslide monitoring and warning system (LMWS) was engineered with real-time
reporting for an active landslide site.
System Architecture
Figure 1A and 1B show the deployment and system architecture of the LMWS. As shown in
Figure 1, the solar-powered LMWS is deployed on hills prone to landslides. The data from
this system is used to trigger alerts on the mobile and web. Figure 1B shows the system
architecture of the LMWS. The LMWS consists of a sensing unit, a data logging and
thresholding unit, and an alert generation unit. These units work together to sense
movement and weather data from a landslide site and log this data at a remote site on the
cloud. Also, thresholding is used to generate alerts on phone and web from the system.
www.iitmandi.ac.in
Figure 1. (A) The deployment architecture of the landslide monitoring

and warning system (LMWS). (B) The system architecture of the LMWS.
Figure 2. Cloud data display
Figure 2 shows the data captured in a cloud database at a remote site from the
LMWS. The data is logged every 10-minutes and it contains the weather
parameters (temperature, pressure, relative humidity, light intensity, and
rainfall) and soil parameters (soil movements, moisture in the soil by volume,
and force acting by soil at the point of deployment).
www.iitmandi.ac.in
Sensors
The LMWS contains several sensors for measuring weather and soil parameters. These
sensors include the following:
1. Motion Processing Unit: MPU sensor comes with MEMS based accelerometer
and gyroscope combined in a single chip. The accelerometer used for the
measurement of acceleration in X, Y and Z direction of debris flow and gyroscope
data used for finding the total rotation of soil movement in X, Y, and Z direction.
We can measure acceleration up to 16G and gyroscope rotation up to 2000 degree
per second.
2. Soil Moisture: The capacitive soil moisture sensor works on the capacitive
principle. When moisture comes between two plates then dielectric property
changes and capacitance of those plates increases. For most types of slope failure,
soil moisture plays a critical role because increased pore water pressure reduces
the soil strength and increases stress [18]. We can measure the amount of
moisture between 0 to 100 percent in the soil with the help of Soil Moisture
Sensor.
3. Force Sensor: Water pressure reduces the shearing force between particles. The
zone of soil that is below the water table will be fully saturated. The pressure in
pores is higher than atmospheric pressure. Hence, it is said to be positive pore
pressure. The force sensor is a variable resistor. Without applying any force on
the surface of the sensor, the resistance will be very high. If force is applied to the
sensor’s surface, then the resistance starts decreasing. We make a voltage divider
with 10k ohm resistor for the force sensor and we apply a fixed voltage across
the voltage divider. When force is applied to the surface of the sensor, the voltage
across the sensor drops accordingly. We calibrate that voltage into the force unit
in Newton. When we bury this sensor into the soil, then pore water pressure in
the wet soil causes additional force on this sensor, which is read by the
microcontroller.
4. DHT-22: This sensor used for measures temperature in centigrade and relative
humidity in the air. A humidity sensor (or hygrometer) senses, measures and
reports both moisture and air temperature. The ratio of moisture in the air to the
highest amount of moisture at a particular air temperature is called relative
humidity.
5. BMP-180: This sensor used for measure barometric pressure in mill bars (mb).
Barometric pressure (also known as atmospheric pressure), is the pressure
caused by the weight of air pressing down on the Earth. Imagine a column of air
rising from the Earth’s surface to the top of the atmosphere. The air in the
atmosphere has mass, so gravity causes the weight of that column to exert
www.iitmandi.ac.in
pressure on the surface. Barometric pressure decreasing will bring cloudy and
windy weather with a chance of precipitation.
6. BH1750 Sensor: Light sensor measures the sunlight intensity in lux. After rainfall
event, if sunlight is low, then moisture from the soil will not vaporize, and the
soil becomes wet for a long time that can trigger a landslide. Thus, measuring the
light intensity is important.
7. Rain Gauge: The tipping bucket rain gauge collects the amount of rain in the
collector. A seesaw below the collector can collect up to 2.25 ml of water before
tipping. Each time the seesaw tips, the magnetic switch in the sensor can count
the tip. Thus, by counting the tips one can compute the volume of water as rain
as well as the amount of rain in mm.
8. LoRa: The Long-Range Radio Amplification module working on the radio

frequency 433 MHz’s. It can transmit data up to 10 kilometres, but in hilly region
it is able to transmit data up to 1 kilometre. This module is used to signal a
wireless blinker or hooter.
Table 1 represents the comparison between conventional sensors and low-cost sensors.
The conventional sensors come with high accuracy and sensitivity, which increase their
costs to very high value compared to low-cost sensors. The low-cost sensors provide a
reasonable compromise between accuracy, sensitivity, and cost.
Table 1. Conventional sensors v/s low cost sensors.
Sensor Name Conventional Sensors Low Cost Sensors
Sensor MEMS Type Uni-Axial/Bi-Axial MEMS Type Tri-Axis
Range ±3/5/10&15º 0 - 35º
Accuracy ±0.05% FS ±3.0%

Inclinometer
0.008% FS/Repeatability ±0.01% FS
Resolution Sensitivity changes ±0.02% FS
FS: Full Scale
http://www.aimil.com/RESOURCES https://store.invensense.com/datasheets
Source
/RESOURCEFILE/542_SMI.pdf /invensense/MPU-6050_DataSheet_V3%204.pdf
500 to 1200 mbar Pressure range: 300-1100 mbar
Operating temperature: -40 +85º C Operating temperature: -40 +85º C

Accuracy ±0.4 mbar Relative Accuracy ±0.12 mbar
Barometric 0.6 µA (standby ≤ 0.1 µA at 25°C) Standby current: 0-4µA @ 25 º C
Pressure
Sensor Resolution 0.00111 % FS
Resolution 0.01 mbar
https://aerospace.honeywell.com/en
https://cdn-shop.adafruit.com/datasheets/BST-
/~/media/aerospace/files
BMP180-DS000-09.pdf
/user-manual/hpbhpa-usermanual.pdf
Power disputation 100 mW.
Light Sensor Power disputation 260 mW
www.iitmandi.ac.in
Collector emitter voltage - 6 V Emitter collector Operating Conditions

voltage -1.5 V Voltage - 2.4 - 3.6 V
Range of spectral bandwidth λ0.5 440 to 800 nm Matching to Human eye’s (400 – 700 nm)
Wavelength of peak sensitivity 570 nm
Peak Wave Length - 560 – nm
Power supply- 2XAAA Batteries ~3V
2.3V to 5.5V (for only temp sensor) Power supply- 3.3-6V DC
http://www.mouser.com/ds/2/427/tept5700- https://www.mouser.com/ds/2/348/bh1750fvi-e-
247497.pdf 186247.pdf
Operating range- humidity 0-100%RH Operating range- humidity 0-100%RH
Temperature -40~125 Celsius; temperature -40~80Celsius;

Temperature Accuracy: humidity +/- 3%; Accuracy-humidity +-2%RH (Max +-5%RH);
/Humidity
Sensors Accuracy: temperature 1 %
Accuracy: temperature <+-0.5Celsius
http://wisense.in/wp-
https://www.sparkfun.com/datasheets/
content/uploads/2018/10/WiSense
Sensors/Temperature/DHT22.pdf
-Ambient-Humidity-and-Pressure-Sensor.pdf
Range 0.5 – 15.0 MPa 0.1 – 77.4 MPa
Repeatability Not Available ±2.0%
Accuracy ±0.5% FS ±10.0%

Soil Pressure Operating
Sensor -10 º C to +70º C -30 º C to +70 º C
Range
Sensitivity 0.5 MPa 0.15 MPa
Hysteresis
Not Available +10%
http://www.aimil.com/RESOURCES http://www.trossenrobotics.com/productdocs/2010-
Source
/RESOURCEFILE/542_SMI.pdf 10-26-datasheet-fsr402-layout2.pdf
Range 0 – 100% 0 – 100%
Accuracy ±0.20% ±5.0%
Sensitivity 0.001% 0.15%

Life
Soil 10 -20 Years 2 – 5 years
Expectancy
Moisture Not affected by ambient Affected by ambient environmental temperature
Sensor Environmentl
environmental temperature conditions.
conditions
conditions.
Operating
220VAC 3.3 – 5.5 VDC
Voltage
https://scholar.google.co.in/scholar?
http://commercialequipments.in/wp-
Source hl=en&as_sdt=0%2C5&q=dfrobot+soil
content/uploads/2016/06/Soil.pdf
+capacitive+moisture+sensor+SEN0193&btnG=
www.iitmandi.ac.in
Working
The LEWS works on a master-slave configuration. The working of the system is

described in the following flow chart.
Figure 3. Flow chart for working of LMEWS
As shown in Figure 3, the master initialises all sensors like accelerometer, soil moisture,
DHT, BMP, Force and light intensity and sets the timer for 10-minutes. Then, the master
system sleeps with SLEEP_MODE_PWR_DOWN for 10-minutes to save the power. In
sleep the master consumes 5-milliamps and in operational mode it consumes 200-
milliamps. After wakeup, the master checks whether the 10-minutes are over or not. If
10-minutes are over, then the master first resets the slave system. After resetting the
slave, the master records the value of the accelerometer and other weather sensors and
send these values to the cloud via a GSM module. If the master is interrupted in sleep,
then it wakes up and handles the interrupt. Two types of the interrupts in the system
are the following: first one is a rain interrupt and the second one is an accelerometer
(movement) interrupt. When an accelerometer interrupt triggered, then the master
calculates the acceleration values in the X, Y, and Z directions. It also calculates the total
rotation in radians per second. If this total rotation breaches the predefined threshold,
www.iitmandi.ac.in
then the master sends an alert to the slave system, and the accelerometer data is send to
the cloud.
The slave system initialises LoRa module and sets its timer for 11-minutes. If for some
reason the master system hangs, then the slave system can reset the master when 11-
minutes timer is over. The slave system receives the accelerometer interrupt from the
master system and transmits the alert message to the traffic light and the siren that are
connected wirelessly to the LEWS.
The cloud system records the data sent by LEWS in every 10-minutes. When data
arrives in the cloud, then a Z-score value is calculate based upon the previously stored
values. If the total rotation received from system breaches the threshold Z-score value,
then SMS alert messages are sent to the registered mobile number immediately.
Deployment:
We deployed the LEWSs at 10-selected sites in Mandi District. After installation LEWS
system on hills at different sites, we place two poles with traffic light and siren alongside
the road with wireless connectivity to alert vehicular traffic. The MEMS-based
accelerometer senses the soil movement (acceleration). The accelerometer measures
accelerations (rate of change of velocity of an object) in three orthogonal directions X, Y
and Z. When interface with a microcontroller, this sensor provides analogue
acceleration values. These analogue units are converted to “m/s2 “unit by using an
appropriate calibration procedure. If our accelerometer senses the soil movement and
this movement exceed the predefined threshold, then an alarm signal is sent to the
traffic light and siren on the road for vehicular traffic. In parallel all the acceleration data
and weather data are sent to the cloud and cloud system sends alerting SMSes to the
registered people. These messages contain Google map coordinates of the movement
location. The LEWS sends data to cloud in every 10 minutes for minute scale prediction
of landslide.
Landslide data processing:
As our sensors are collecting data since their deployment, we contacted DTRL, DRDO
for data from their conventional landslide monitoring system deployed at Chamoli
district, Uttrakhand, India. DTRL DRDO gave us data from Chamoli district. These data
are inclinometer sensors movement data in mm per m units (essentially the angle the
inclinometer tilts). Chamoli landslide has five boreholes, and each borehole has five
sensors in it. Thus, in total there are 25 sensors across 5 boreholes. First, we calculated
the average relative displacement of each sensor from its initial reading at the time of
installation of this sensor. Second, we chose those sensors from each borehole that gave
the maximum average relative displacement. Thus, overall, we came-up with five
sensors data, i.e., one per each borehole. As the data was sparse, we averaged the tilt
over week as the time period. The dataset has 78 weeks where we split it into a 80:20
www.iitmandi.ac.in
ratio: Sixty-two weeks for training and 16 weeks for testing machine learning
algorithms.
Algorithms:
We have applied different algorithms like Sequential Minimal Optimization, Auto

Regression, Random Forest, Bagging, Stacking, Voting, and SARIMA.
Sequential Minimal Optimization:
John Platt invented sequential minimal optimization (SMO) in 1998. It is a widely-used

algorithm for solving the quadratic programming (QP) problem that arises during the
training of support vector machines. QP running time complexity is O(N3) in the worst
case when the data is true big data. SMO requires an amount of memory that is linear
in the training set size N.
The goal of the SMO algorithm is to return alphas that satisfy the constraint
optimization problem below. The constraints are written as 'subject to (s.t.)' in machine
learning. These alphas called as Lagrange multipliers. They play a major role to identify
support vectors in our data.
1
𝑚𝑖𝑛𝛼 ∑ ∑ 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑗𝑋𝑖 . 𝑋𝑗 − ∑ 𝛼𝑖
2
𝑖 𝑗
(equality constraint)
𝑠. 𝑡. ∑ 𝛼𝑖 𝑦𝑖 = 0, 𝛼𝑖 ∈ [0, 𝐶]
𝑖
The user chooses a Kernel Function K for non-linear SVM to transform data into a higher
dimension. Therefore, exchange the dot product of x with the Kernel Function K.
1
𝑚𝑖𝑛𝛼 ∑ ∑ 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑗𝑋𝑖 . 𝐾(𝑋𝑖 , 𝑋𝑗 ) − ∑ 𝛼𝑖
2
𝑖 𝑗
(Kernel function)
𝐾(𝑋𝑖 , 𝑋𝑗 ) = Φ(𝑋𝑖 ). Φ(𝑋𝑗 )
(equality constraint)
𝑠. 𝑡. ∑ 𝛼𝑖 𝑦𝑖 = 0, 𝛼𝑖 ∈ [0, 𝐶]
𝑖
Instead of providing all alphas as once SMO is formulated as iterative algorithms. It

breaks the big QP optimization problem into the small sub-problems. Each sub-problem
www.iitmandi.ac.in
is then solved analytically until convergence. Since large matrix computation is

avoided, it scales between linear and quadratic in the training set size N depending on
the data analysis problem. The SMO computation time is dominated by the evaluation
of the SVM and thus it is faster for linear SVM problems and sparse datasets.
Auto Regression:
A regression model, such as linear regression, models an output value based on a linear
combination of input values. Auto regression is a time series model that uses
observations from previous time steps as input to a regression equation to predict the
value at the next time step. It is a very simple idea that can result in accurate forecasts
on a range of time series problems.
For example:
𝑌 = 𝛽0 + 𝛽1 𝑋1
Where 𝑌 is the prediction, 0 and 1 are coefficients found by optimizing the model on
training data, and X is an input value.
This technique can be used on time series where input variables are taken as
observations at previous time steps, called lag variables.
For example, we can predict the value for the next time step (t+1) given the observations
at the last two time steps (t-1 and t-2). As a regression model, this would look as follows:
𝑋(𝑡 + 1) = 𝛽0 + 𝛽1 𝑋(𝑡 − 1) + 𝛽2 𝑋(𝑡 − 2)
Because the regression model uses data from the same input variable at previous time
steps, it is referred to as an auto regression
Seasonal Auto-Regressive Integrated Moving-Average (SARIMA):
SARIMA is an extension of ARIMA model, which is statistical forecasting method

popular for univariate time-series data forecasting SARIMA can model a data with a
trend as well as a seasonal component by describing the auto-correlations in data [19].
Stationarity of Time-Series: A time-series with constant values over time for mean,
variance, auto-correlation is stationary. Most statistical forecasting methods assume
that a time-series can be made approximately stationary using mathematical
transformations such as differencing [20]. The first step of building a SARIMA model is
stationarizing the data.
www.iitmandi.ac.in
Auto-Regressive Models: In an auto-regressive model, we predict a variable using past

values of the same variable. Thus, an auto-regressive model is defined as:
𝑦𝑡 = 𝑐 + 𝜙1 𝑦𝑡−1 + 𝜙2 𝑦𝑡−2 +. . . +𝜙𝑝 𝑦𝑡−𝑝 + 𝜖𝑡 (1)
where p is the auto-regressive trend parameter, 𝜖𝑡 is white noise and 𝑦𝑡−1 , 𝑦𝑡−2 …𝑦𝑡−𝑝
denote the movement at previous time periods [19].
Moving-Average Models: A moving-average model uses past prediction errors in a

regression model. A moving-average model is defined as:
𝑦𝑡 = 𝑐 + 𝜖𝑡 + 𝜃1 𝜖𝑡−1 + 𝜃2 𝜖𝑡−2 +. . . +𝜃𝑞 𝜖𝑡−𝑞 (2)
where q is the moving-average trend parameter, 𝜖𝑡 is white noise and 𝜖𝑡−1 , 𝜖𝑡−2 … 𝜖𝑡−𝑞
are the error terms at previous time periods.
If we combine auto-regression and a moving-average model on stationary

data, we obtain a non-seasonal ARIMA model, which is defined as:
𝑦 ′ 𝑡 = 𝑐 + 𝜙1 𝑦 ′ 𝑡−1 +. . . +𝜙𝑝 𝑦 ′ 𝑡−𝑝 + 𝜃1 𝜖𝑡−1 +. . . +𝜃𝑞 𝜖𝑡−𝑞 + 𝜖𝑡 (3)
SARIMA builds upon an ARIMA model and incorporates seasonal data. The
seasonal parameters of the model are like the non-seasonal parameters of the
model with the backshifts of the seasonal period.
The three trend elements, same as ARIMA, that require calibration are trend
auto-regressive order ‘p’, trend difference order ‘d’ and trend moving-
average order ‘q’. Additional four seasonal elements, that require calibration
are, seasonal auto-regressive order ‘P’, seasonal difference order ‘D’,
seasonal moving-average order ‘Q’ and the number of time steps for a single
seasonal period ‘m’.
The 𝑆𝐴𝑅𝐼𝑀𝐴 (𝑝, 𝑑, 𝑞) (𝑃, 𝐷, 𝑄)𝑚 model is defined as:

𝛷(𝐵𝑚 )𝜙(𝐵)𝛥𝑚 𝐷 𝛥𝑑 𝑋𝑡 = 𝛩(𝐵 𝑚 )𝜃(𝐵)𝑍𝑡 (4)
where ‘Zt’ is the white noise process.
Differencing using ‘D’ parameter on seasonal component and ‘d’ parameter

on non-seasonal component of time-series is given by:
𝛥𝑚 𝑋𝑡 = 𝑋𝑡 − 𝑋𝑡−𝑚 (5)
𝛥𝑋𝑡 = 𝑋𝑡 − 𝑋𝑡−1 (6)
On applying equation 1 using ‘P’ and ‘m’ parameters on seasonal component and ‘p’
parameter on non- seasonal component of time-series, we obtain:
www.iitmandi.ac.in
𝛷(𝐵 𝑚 ) = 1 − 𝛷1 𝐵 𝑚 −. . . −𝛷𝑃 𝐵𝑃𝑀 (7)

𝛷(𝐵) = 1 − 𝜙1 𝐵−. . . −𝜙𝐵 (8)
On applying equation 2 using ‘Q’ and ‘m' parameters on seasonal component and ‘q’
parameter on non- seasonal component of time-series, we obtain:
𝛩(𝐵 𝑚 ) = 1 + 𝛩1 𝐵 𝑚 +. . . +𝛩𝑄 𝐵 𝑄𝑚 (9)

𝜃(𝐵) = 1 + 𝜃1 𝐵+. . . +𝜃𝑞 𝐵 𝑞 (10)
Random Forest:
Random forest algorithm was developed by Leo Breiman and Adele Cutler in 1984 [13].
Random forests or random decision forests are an ensemble learning method
for classification and regression. At the time of training, the random forest algorithm
output is the majority class (classification) or the value that is the mean of the prediction
of individual trees (regression). By aggregation, the random forest algorithm corrects
the problem of overfitting in decision trees [22].
Random forest algorithm is used for regression on continues valued attributes with the
random tree algorithm as a base learner [23]. Since the target variable is a real-valued
number, we fit a regression model to the target variable using each of the independent
variables. Then for each independent variable, the dataset is split at several split points
by trail-and-error. First, we calculate the Residual Sum of Squares (RSS) at each split
point between the predicted value and the actual values. The RSS is defined by the
following equation:
𝑅𝑆𝑆 = ∑(𝑦𝑖 − 𝑦𝐿 )2 + ∑ (𝑦𝑖 − 𝑦𝑅 )2

𝑙𝑒𝑓𝑡 𝑟𝑖𝑔ℎ𝑡
Where YL = mean y-value for left side.

YR = mean y-value for right side.
Yi = Points on the left and right sides of the split point.
The attribute that has minimum RSS is selected at a node in the tree. Also, the split point
on this attribute is the one that minimizes the RSS among all split points. This process
is recursively continued until all attribute are covered. Overall, the dataset is split into
several regions, and each region may represent a leaf node in the random tree. The
random tree algorithm uses the K parameter for the number of randomly chosen
attributes to make the tree. We choose K based upon a heuristic rule to be the integer of
log2(number of predictors) + 1. The final output of the random forest algorithm is the
average of all random tree outputs.
Bagging:
www.iitmandi.ac.in
Bagging is a machine learning ensemble Meta algorithm designed to improve the

stability and accuracy of machine learning algorithms used in statistical classification
and regression. It used subsamples from the dataset with replacement and training the
predictive model on those subsamples. The final output model is average of that
model for the better result.
Voting:
Voting algorithm is used for combining classifiers which use distinct pattern
representations and show that many existing schemes can be considered as special
cases of compound classification where all the pattern representations are used jointly
to decide.
Stacking:
Stacked generalization (or stacking) is a different way of combining multiple models

used to combine models of different types. The procedure is as follows:
1. Split the training set into two disjoint sets.
2. Train several base learners on the first part.
3. Test the base learners on the second part.
4. Using the predictions from 3) as the inputs, and the correct responses as the
outputs, train a higher-level learner.
Parameters Tuning:
SMO algorithm has two parameters first one is complexity parameter (C) that is used to
build the 'hyperplane' between two classes which used for classification, regression or
other tasks. A good classification is one where the hyperplane separates two class with
the largest distance to the nearest training data points of any class. Since in general the
larger the margin the lower the generalization error of the classifier. So, C controls how
soft the class margins are, in practice how many instances are used as 'support vectors'
to draw the linear separation boundary in the transformed Euclidean feature space. The
second parameter of the SMO algorithm is an exponent (E) or kernel. In its simplest
form, the kernel trick means transforming data into another dimension that has a clear
dividing margin between classes of data [21]. We used the following values of C and E
in SMO: C=0, 1 and E=1, 2, 3, 4 for polynomial kernel; C=0, 1 and E=1, 2 for normalized
polynomial kernel; and, C=0 and E = 1 for RBF kernel. The best result for this algorithm
was polynomial kernel where C=1 (hyperplane = 1) and Exponent E=1 (linear kernel).
Using a grid search procedure eight free parameters were optimized in the SARIMA
model. These parameters were varied between the ranges given in table 2. One reason
for using the SARIMA model was that it allows one to account for a seasonal trend
present in the time-series.
www.iitmandi.ac.in
Table 2. Parameter Optimization of SARIMA
Parameter Range of Values

Trend Auto Regressive (p) 0, 1, 2
Trend Differencing (d) 0,1
Trend Moving-Average (q) 0, 1, 2
Trend Absent, Constant, Trend, Constant Trend
Seasonal Auto-Regressive (P) 0, 1, 2
Seasonal Differencing (D) 0,1
Seasonal Moving-Average (Q) 0, 1, 2
Seasonal Periods (m) 0,1
Results:
Following are the RMSE data with their corresponding algorithm.
Table 3. Different algorithms fitted to training data with different borehole.

TRAINING DATA SET
Root-Mean Squared Error (RMSE)
Borehole 1 Borehole 2 Borehole 3 Borehole 4 Borehole 5 AVERAGE
Algorithm
Meter 3 Meter 12 Meter 6 Meter 15 Meter 15 RMSE
Random Forest 5.46 0 0.01 0.10 5.44 2.202
Voting 8.10 0 0.02 0.17 8.01 3.26
Linear Regression 14.73 0 0.03 0.27 12.36 5.478
Bagging 16.60 0 0.04 0.29 13.04 5.994
SMO 16.60 0 0.03 0.29 13.92 6.168
Gaussian 16.86 0 0.03 0.31 15.76 6.592
SARIMA 18.27 7.34 0.28 8.77 14.49 9.83
Stacking 33.67 0 0.10 0.44 22.07 11.256
Table 4. Different algorithms fitted to test data with different borehole.

TESTING DATA SET
Root-Mean Squared Error (RMSE)
Borehole 1 Borehole 2 Borehole 3 Borehole 4 Borehole 5 AVERAGE
Algorithm
Meter 3 Meter 12 Meter 6 Meter 15 Meter 15 RMSE
SARIMA 0.0 0.11 9.17 1.18 19.51 5.99
SMO 0.37 0 10.64 1.14 20.21 6.472
Bagging 0.14 0 11.57 1.16 20.95 6.764
Voting 2.28 0 19.61 1.30 16.52 7.942
Linear Regression 11.04 0 9.98 1.17 24.76 9.39
Random Forest 0 0 27.14 1.56 20.35 9.81
Gaussian 9.81 0 16.63 1.34 28.51 11.258
www.iitmandi.ac.in
Stacking 24.32 0 27.18 1.72 34.01 17.446
As seen in table 3 when different algorithms applied to this dataset during training time,
then there was a large variation in the average RMSE. For example, best performing
algorithm in training data set like Random forest and Voting possessed 2.202 and 3.26
mm/m as RMSEs; however, SARIMA and Stacking algorithm had much larger RMSEs
of 9.83 mm/m and 11.256 mm/m. When these models were generalized to the test
dataset, we found that the SARIMA model performed very well followed by the SMO
algorithm. Thus, both SARIMA and SMO algorithm predicted the time-series
movement data relatively well.
Figure 4 shows the results of the best performing SARIMA model during training and
test accordingly to the borehole. The blue line represents the actual data, and the orange
line represents the value predicted from the SARIMA model.
0 50
Relative movement (mm/m)
-20 1 2 3 4 5 6 7 8 9 10111213141516
0
-40 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61
-50
-60
-80 -100
-100 -150
-120
Week
Actual RD13 Predict RD13 Actual RD13 Predict RD 13
Testing graph for Borehole 1 Meter 3 Training graph for Borehole 1 Meter 3
www.iitmandi.ac.in
-42.75 0
1 3 5 7 9 11 13 15 17 19 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61
-42.8 -20
-42.85 -40
-42.9 -60
-42.95 -80
-43 -100
ActualRD212 PredictRD212 ActualRD212 PredictRD212
0 0.5
1 3 5 7 9 11 13 15 17 19 21 23 25 0
-20
-0.5 1 5 9 131721252933374145495357616569737781
-40 -1
-1.5
-60
-2
-80 -2.5
ActualRD36 PredictRD36 Actual RD36 PredictRD36
-50 20
1 3 5 7 9 11 13 15 17 19 21 0
-55 -20 1 5 9 1317212529333741454953576165697377
-40
-60
-60
-65 -80
ActualRD415 PredictRD415 ActualRD415 PredictRd415
www.iitmandi.ac.in
20 50
0
0
1 3 5 7 9 11 13 15 17 19 21 23
-20 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61
-50
-40
-60 -100
ActualRD515 PredictRD515 ActualRD515 PredictRD515
Figure 4. Training and Test RMSE in mm/m of SARIMA model
Discussion and Conclusions:
The focus of machine learning in landslide mitigation is to timely predict the

movements of the soil so that the lives can be saved. By applying different algorithms
like-Sequential Minimal Optimization(SMO), Linear Regression, Random Forest and
ensemble version of these algorithm which included bagging, stacking, voting and
SARIMA. Amongst all algorithms, SARIMA model performed the best on this dataset.
In the nearest future, we want to compare the results of SARIMA model with other
algorithms like deep learning, LSTM model, MLP and Holt-Winters method. Also, we
plan to perform machine learning on data collected by our low-cost LEWS deployed in
Mandi district.
Courses:
Semester Courses Grade CGPA

EE-592p Selected topics in IoT C
1st Semester HS-616 Managerial Thinking and Decision Making C 6.1
CS-671 Deep Learning and Applications E
CS-660 Data Mining for decision making O
2nd Semester 7.5
CS-601 Probability and random process E
CS-606 Computational Modelling of Social Systems Currently taking
3rd Semester HS-650 Statistical Methods Currently taking
4. Planned Work for the Next Year
Currently, our LEWS can detect the soil movement up to 1-meter depth from the surface
of earth. This year our plan to engineered a system to detect soil movement up to 15-
meter depth from the surface of earth. This sub-surface system buried in the 15-meter
bore hole from the surface level, so it can detect soil movement every meter and up to
www.iitmandi.ac.in
15-meters.
With help of this system we can measure the slope and direction of movement in the
landslide, and we can monitor what is happening in the landslide in sub-surface area.
Landslide event occur due to slope failure in the hill, sometimes slope failure is deep
inside the hill so with 1-meter system we cannot find the slope failure. With help of sub-
surface system, we can detect the slope failure. For example, if slope failure is at 10-
meter level. Thus, 10 to 15-meter accelerometer will be stable and 1 to 10-meter
accelerometer will detect the soil movement.
We have deployed 10 LEWSs in Mandi district and 5 LEWSs in Sirmour district in

Himachal Pradesh last year. We are collecting the data from these sensors at every 10-
minute interval. The data that we are collecting is minute scale time-series data. Now,
we have huge amount of data from these sensors and we will do machine learning over
this dataset to predict the soil movement in the near future.
5. Workshops/Conferences Attended
 Landslide Mitigation and Detailed Project Report (DPR) Preparation, IIT Mandi in
29 August 2018.
 3rd Himachal Pradesh Science Congress, IIT Mandi in October 2018.
 Winter School on Cognitive Modelling, IIT Mandi in February 2019.
6. Paper Published/Communicated and Other Achievements
 Dutt, V., Chaturvedi, P., Agrawal, S., P. Kumar, Priyanka, S., Mali, N., A. Pathania & Kala, U.
(2018). Smart IOT based test-bed system for lab scale landslide monitoring experiment, Patent
Application 201813039735. New Delhi, Patent Office Dwarka New Delhi 110078, 2018/10/22.
 Kumar, P., Shroti, S., Chaturvedi, P., Sihag, P., Agarwal, S., Pathania, A., Mali, N., Singh, R.,
Uday, K.V., Dutt, V.,(in press, 2019) Daily-scale predictions of debris movement in chamoli
Uttarakhand area using conventional and deep machine-learning methods.(ICITG2019, 064, v1).
 Pathania, A., Kumar, P., Kesri, J., Agarwal, S., Sihag, P., Mali, N., Singh, R., Chaturvedi, P.,
Uday, K.V., Dutt, V.,(in press, 2019) Reducing power consumption of weather stations for
landslide monitoring.(ICITG2019, 062, v1).
 Won the 3 rd prize in the Development of Innovative Prototypes for Disaster Risk Reduction
(DRR), Shimla, Himachal Pradesh.
7. References
1. Pande, R. K. (2006). Landslide problems in Uttaranchal, India: issues and challenges. Disaster Prevention and
Management: An International Journal, 15(2), 247-255.
2. Landslide Recent Incidents - Geological Survey of India. Retrieved from https://gsi.gov.in

www.iitmandi.ac.in
3. Chaturvedi, P., Srivastava, S., & Kaur, P. B. (2017). Landslide Early Warning System Development Using
Statistical Analysis of Sensors’ Data at Tangni Landslide, Uttarakhand, India. In Proceedings of Sixth
International Conference on Soft Computing for Problem Solving (pp. 259-270). Springer, Singapore.
4. Hao,X.Y.,Hao,X.H,Xiong,H.M.,et al, JournalofEngineeringGeology,7(3):279-283, 1999.
5. Du, J., Yin, K.L., Chai, B., Chinese Journal of Rock Mechanics and Engineering,(09): 1783-1789, 2009.
6. Li, Q., Li, R.Y., Journal of Yangtze River Scientific Research Institute,22(6), 2005.
7. E.Yesilnacar, T.Topal. Engineering Geology, 79:251-266, 2005.
8. Wang, G.Y., Cui, H.L., Li, Q., Rock and Soil Mechanics, 30(8): 2418-2422, 2009.
9. Lin, D.C., An, F.P., Guo, Z.L., et al., Rock and Soil Mechanics, 32(1), 2011.
10. Jian Huang, Zhihuan Liu, and Ni Li. “Study on displacement prediction of landslide based on neural network “,
ISSN: 0975-7384 CODEN(USA): JCPRC5.
11. J. Platt, “Sequential minimal optimization: A fast algorithm for training support vector machines,” Microsoft Res.,
Bengaluru, India Rep. MSR-TR-98-14, Apr. 1998.
12. "What is an Autoregressive Model?". deepai.org.
13. L. Breiman, 2001, Random forest, Machine Learning, vol. 45, no. 1, pp. 5-32.
14. Breiman, Leo (1996). "Bagging predictors". Machine Learning. 24 (2): 123–140.
15. Wolpert, David. (1992). Stacked Generalization. Neural Networks. 5. 241-259.
16. Josef Kittler; Robert P.W. Duin; et al. (1998). "On combining classifiers". IEEE TPAMI. IEEE. 20 (3): 226–239.
17. Hyndman, Rob J; Athanasopoulos, George. 8.9 Seasonal ARIMA models. Forecasting: principles and practice.
oTexts. Retrieved 19 May 2015.
18. Ray, Ram & Jacobs, Jennifer. (2006). Relationships among remotely sensed soil moisture, precipitation and
landslide events. Natural Hazards. 43. 211-222. 10.1007/s11069-006-9095-9.
19. Asteriou D., Hall S. G., ARIMA Models and the Box-Jenkins Methodology, Applied Econometrics pp. 265286,
2011.
20. Hyndman R.J., Athanasopoulos G., Forecasting: Principles and Practice.
21. "The Kernel Trick". deepai.org.
22. Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2008). The Elements of Statistical Learning (2nd ed.).
Springer. ISBN 0-387-95284-5.
23. Ian H. Witten , Eibe Frank , Mark A. Hall, Data Mining: Practical Machine Learning Tools and Techniques,
Morgan Kaufmann Publishers Inc., San Francisco, CA, 2011
www.iitmandi.ac.in
REPORT BY APC/DC COMMITTEE
1. Has the student met the targets set for last year?
(a) Mention the Achieved Targets:
(b) If not what are the major reasons?
2. Is there a reasonable target set for next year? Give detailed plan.
3. What is the perception of the student and guide(s) about the fraction of thesis work
completed?
4. What is the approximate time scale for thesis submission (only for students in their
5th year or above for Ph.D. and 3rd year and above for M.S. students).
5. Any other observations of the committee.

www.iitmandi.ac.in
Recommendation of APC/DC (Tick Appropriately)

1. (a) Continuation of Registration is Recommended/ Not Recommended.
(b) Continuation of Scholarship/Research Assistantship Recommended/ Not
Recommended.
(c) Enhancement of Scholarship from JRF to SRF is Recommended/ Not Recommended
(only
after Two Year of Registration).
2. Source of Funding/Scholarship:
3. OVERALL PERFORMANCE: Very Good/Good/Satisfactory/Unsatisfactory
4. Any Other Recommendation/Comments (Attach separate sheet).
COMMITTEE MEMBERS
S. No. Faculty Name School/Department Signature Remarks
1 Dr. Varun Dutt (guide) SCEE, IIT Mandi
Dr. Venkata Uday Kala (Co-

2 guide) SE, IIT Mandi
3 Dr. Arnav Bhavsar SCEE, IIT Mandi
4 Dr. Shyamasree Dasgupta SHSS, IIT Mandi
Signature of the Supervisor School

Chairperson
Date: Date:
Associate Dean (Research)

Date:
Note:
www.iitmandi.ac.in
(i) Ph.D. Scholar shall, after Registration, submit a written report to Doctoral Committee in the required
format, annually for the first three years, and every six months thereafter.
(ii) M.S. Scholar shall, after Registration, submit annually a written report to Academic Progress Committee.
Attach additional sheets if required.

Praveen Apc Report 2018

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Praveen Apc Report 2018

Hochgeladen von

Copyright:

Verfügbare Formate

INDIAN INSTITUTE OF TECHNOLOGY MANDI

MANDI- 175 001 (H.P.), INDIA

PROGRESS REPORT FOR THE ACADEMIC YEAR 2018

Scholar’s Name: Praveen Kumar Roll No: S17007

Furthermore, different machine learning algorithms can help in the prediction of

3. Work Done and Target Set for Last Year

Figure 1. (A) The deployment architecture of the landslide monitoring

Figure 2. Cloud data display

8. LoRa: The Long-Range Radio Amplification module working on the radio

Table 1. Conventional sensors v/s low cost sensors.

Sensor Name Conventional Sensors Low Cost Sensors

Sensor MEMS Type Uni-Axial/Bi-Axial MEMS Type Tri-Axis

Range ±3/5/10&15º 0 - 35º

Accuracy ±0.05% FS ±3.0%

Operating temperature: -40 +85º C Operating temperature: -40 +85º C

Collector emitter voltage - 6 V Emitter collector Operating Conditions

Temperature -40~125 Celsius; temperature -40~80Celsius;

Repeatability Not Available ±2.0%

Accuracy ±0.5% FS ±10.0%

Accuracy ±0.20% ±5.0%

Sensitivity 0.001% 0.15%

The LEWS works on a master-slave configuration. The working of the system is

Figure 3. Flow chart for working of LMEWS

Landslide data processing:

We have applied different algorithms like Sequential Minimal Optimization, Auto

Sequential Minimal Optimization:

John Platt invented sequential minimal optimization (SMO) in 1998. It is a widely-used

Instead of providing all alphas as once SMO is formulated as iterative algorithms. It

is then solved analytically until convergence. Since large matrix computation is

𝑋(𝑡 + 1) = 𝛽0 + 𝛽1 𝑋(𝑡 − 1) + 𝛽2 𝑋(𝑡 − 2)

Seasonal Auto-Regressive Integrated Moving-Average (SARIMA):

SARIMA is an extension of ARIMA model, which is statistical forecasting method

Auto-Regressive Models: In an auto-regressive model, we predict a variable using past

𝑦𝑡 = 𝑐 + 𝜙1 𝑦𝑡−1 + 𝜙2 𝑦𝑡−2 +. . . +𝜙𝑝 𝑦𝑡−𝑝 + 𝜖𝑡 (1)

Moving-Average Models: A moving-average model uses past prediction errors in a

𝑦𝑡 = 𝑐 + 𝜖𝑡 + 𝜃1 𝜖𝑡−1 + 𝜃2 𝜖𝑡−2 +. . . +𝜃𝑞 𝜖𝑡−𝑞 (2)

If we combine auto-regression and a moving-average model on stationary

𝑦 ′ 𝑡 = 𝑐 + 𝜙1 𝑦 ′ 𝑡−1 +. . . +𝜙𝑝 𝑦 ′ 𝑡−𝑝 + 𝜃1 𝜖𝑡−1 +. . . +𝜃𝑞 𝜖𝑡−𝑞 + 𝜖𝑡 (3)

The 𝑆𝐴𝑅𝐼𝑀𝐴 (𝑝, 𝑑, 𝑞) (𝑃, 𝐷, 𝑄)𝑚 model is defined as:

where ‘Zt’ is the white noise process.

Differencing using ‘D’ parameter on seasonal component and ‘d’ parameter

𝛷(𝐵 𝑚 ) = 1 − 𝛷1 𝐵 𝑚 −. . . −𝛷𝑃 𝐵𝑃𝑀 (7)

𝛩(𝐵 𝑚 ) = 1 + 𝛩1 𝐵 𝑚 +. . . +𝛩𝑄 𝐵 𝑄𝑚 (9)

𝑅𝑆𝑆 = ∑(𝑦𝑖 − 𝑦𝐿 )2 + ∑ (𝑦𝑖 − 𝑦𝑅 )2

Where YL = mean y-value for left side.

Bagging is a machine learning ensemble Meta algorithm designed to improve the

Stacked generalization (or stacking) is a different way of combining multiple models

Table 2. Parameter Optimization of SARIMA

Parameter Range of Values

Table 3. Different algorithms fitted to training data with different borehole.

Table 4. Different algorithms fitted to test data with different borehole.

Stacking 24.32 0 27.18 1.72 34.01 17.446

Actual RD13 Predict RD13 Actual RD13 Predict RD 13

ActualRD212 PredictRD212 ActualRD212 PredictRD212

ActualRD36 PredictRD36 Actual RD36 PredictRD36

ActualRD415 PredictRD415 ActualRD415 PredictRd415

ActualRD515 PredictRD515 ActualRD515 PredictRD515

Figure 4. Training and Test RMSE in mm/m of SARIMA model

Discussion and Conclusions:

The focus of machine learning in landslide mitigation is to timely predict the

Semester Courses Grade CGPA

4. Planned Work for the Next Year