Sie sind auf Seite 1von 23

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.

1 (1-23)
Available online at www.sciencedirect.com

ScienceDirect
Fuzzy Sets and Systems ••• (••••) •••–•••
www.elsevier.com/locate/fss

Handling interpretability issues in ANFIS using rule base


simplification and constrained learning
Sharifa Rajab
Department of IT and SS, University of Kashmir, India
Received 15 December 2017; received in revised form 30 October 2018; accepted 20 November 2018

Abstract
Adaptive neuro-fuzzy inference system (ANFIS) is a well-known neuro-fuzzy model for approximating highly complex non-
linear systems. ANFIS uses precise fuzzy modelling concept that aims at accuracy of a fuzzy model being designed than on
interpretability. But interpretability of a fuzzy system is an equally important aspect of fuzzy modelling as accuracy. So far the re-
search based on ANFIS has been mostly application based due to which the various issues related to the interpretability of ANFIS
have not been dealt with. The rule base of ANFIS is typically obtained using data driven clustering algorithms. But this process
introduces redundancy in the system in terms of similar fuzzy sets and redundant fuzzy rules which unnecessarily increases system
complexity. This in turn reduces both the interpretability as well as generalization capability of ANFIS. Additionally in case of
ANFIS, unconstrained gradient descent based learning algorithms are used to fine-tune the membership function parameters which
usually result in a rule base with inconsistent, excessively overlapping and indistinguishable membership functions for input vari-
ables and thus the interpretability of the final optimized system is not guaranteed. This paper is based on addressing the issue of rule
base redundancy in ANFIS to reduce complexity and enforcing constraints during learning phase to ensure interpretability of the
final optimized system. Rule base redundancy is removed using similarity analysis based rule base simplification in which similar
fuzzy sets are merged and subsequently resulting fuzzy rules with equal premises are combined. Hybrid learning technique which
is an efficient parameter tuning method for ANFIS is constrained to prevent inconsistency, excessive overlapping and inclusion of
membership functions so that the final fuzzy partitions of inputs stay interpretable. The empirical analysis of the impact of rule
base simplification and constrained learning on ANFIS is done by application to two well-known benchmark problems and a real
world stock price prediction problem. The introduction of rule base simplification and constrained learning in ANFIS modeling
has shown better results in terms of obtaining a desired accuracy-interpretability tradeoff than conventional ANFIS.
© 2018 Elsevier B.V. All rights reserved.

Keywords: Clustering; Neuro-fuzzy systems; Generalization; Interpretability; Fuzzy model

1. Introduction

System modelling is the process of approaching a real system for fabricating a theoretical design that eases the
understanding of the system. The goal is to design reliable and comprehensible model to simulate, explain, improve

E-mail address: sharifa18mca@gmail.com.

https://doi.org/10.1016/j.fss.2018.11.010
0165-0114/© 2018 Elsevier B.V. All rights reserved.
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.2 (1-23)
2 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

or predict a real system. Fuzzy modelling is a popular system modeling approach which helps to model a real system
using fuzzy logic based descriptive language [1]. Such systems based on fuzzy set theory and fuzzy logic are called
fuzzy models and offer the benefit of representing expert knowledge in the form of fuzzy if–then rules. Fuzzy if–then
rules help in modelling different aspects of human knowledge and reasoning process without the need of precise qual-
itative analysis [2]. This enables the design of fuzzy models for ill-defined and uncertain systems. Also fuzzy systems
have better transparency because it is easier to interpret the system knowledge in the form of fuzzy if–then rules
which allows an in-depth understanding of the system functionality. Fuzzy models have the universal approximation
capability as that of artificial neural networks (ANN) but unlike ANNs lack learning ability. Therefore, neuro-fuzzy
systems were introduced which combine the benefits of fuzzy systems in terms of interpretability with the learning
capability of ANNs.
There are two main but contradictory goals in fuzzy modelling of neuro-fuzzy systems which are also used to
access the quality of fuzzy models: (1) Accuracy which is the ability of the system to faithfully represent the real
system; (2) Interpretability which is the ability to express the behavior of the system in a comprehensible manner [1].
In practical data driven fuzzy modelling one of these two properties prevails over the other, increasing one usually de-
creases the other. In case expert knowledge is used to build a fuzzy model it is easier to ensure that the system remains
interpretable while achieving satisfactory accuracy. On the other hand if automated data driven fuzzy modelling ap-
proaches are used to construct fuzzy rules, the interpretability aspect is not necessarily guaranteed as it usually results
in a fuzzy model with poor transparency. Data driven fuzzy modelling is predominantly used in TSK (Takagi, Sugeno
and Kang) based neuro-fuzzy systems [3]. Neuro-fuzzy systems built on underlying TSK fuzzy model are one of
the important areas in practical and theoretical fuzzy system literature. Well-known neuro-fuzzy model viz. adaptive
neuro-fuzzy inference system (ANFIS) is based on TSK fuzzy modelling concept. ANFIS has been the most popular
neuro-fuzzy system with wide range of applications in control, forecasting and inference [26]. But ANFIS has been
used in real applications mainly to replace other black box models like ANN with focus on how accurately the model
approximates a real system, ignoring the important aspect of interpretability which is questionable as indicated in [4].
ANFIS is being predominantly used for solving real world problems in various fields like business, medical science,
image processing, student modelling, traffic control and so on. Recently, ANFIS has been successfully applied in
various novel domains. Lately, ANFIS was introduced for estimating the sediment transport in sewers [39]. The
study used grid partitioning and subtractive clustering for initial rule base induction and also used hybrid learning.
The results proved that ANFIS demonstrates greater precision in estimation than existing techniques used for the
purpose. In another study ANFIS was used in tuberculosis diagnosis [40]. The study aimed at diagnosing tuberculosis
as accurately as possible and to reduce the waiting time for commencing the treatment on suspected patients. ANFIS
showed classification accuracy of 97% as compared to rough set algorithm which showed an accuracy of 92%. ANFIS
has also been used in the novel field of soil liquefaction potential due to earth quakes [41]. The model training was
done using a large database of case histories of soil liquefaction. The inputs included various parameters like water
table, vertical stress etc. effective in predicting soil liquefaction. The results of the study revealed the effectiveness
of ANFIS in predicting soil liquefaction potential. Also a novel adaptive fuzzy system was designed based on radial
basis function based components [38]. The system had the self-organizing capability of adjusting the number of fuzzy
rules during parameter learning phase. The system was successfully applied in 3 DOF helicopter systems. The fuzzy
system was based on sequential learning using sliding data window which reflected dynamic changes within the
system and dynamically growing and shrinking structure of fuzzy system. In a significant study ANFIS was used
for reconstructing the kinematics of colliding vehicles [37]. The authors used acceleration, displacement and velocity
for reproducing the kinematics of vehicles in oblique barrier collision. After training phase, the authors used the
same ANFIS model for simulating different other types of collisions and performed a comparison of the results with
other modelling techniques in which ANFIS based method showed better reliability. More recently, it was used in a
novel field of estimation of building energy consumption [42]. The estimation was done based on building envelope
parameters viz. insulation K-value and material thickness. The experimental study included 180 simulations using
different values of insulation K-values and material thickness. Lately ANFIS has also been used for classification
of work rate using field heart rate measurements [46]. The work rate was classified into very light, light, moderate
and heavy. In this study the classifier based on ANFIS showed superior results in terms of sensitivity, accuracy and
specificity as compared to currently used practice of establishing work rate using percent heart rate reserve.
In addition to above studies ANFIS has been used in a number of other real world applications with main focus
on approximation accuracy while the interpretability aspect is overlooked due to the assumption that a fuzzy model is
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.3 (1-23)
S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 3

implicitly interpretable in the form of fuzzy if–then rules which is not essentially true. Therefore, in practice ANFIS is
used in much the same way as other black box techniques like ANNs which results in complicated fuzzy model with
high accuracy disregarding interpretability. This poses problems for real applications requiring high human interac-
tion that demand interpretability as the main design criteria for yielding a comprehensive output explanation. This is
the case in applications like medical diagnosis, decision support systems or safety critical systems. Also interpretabil-
ity is needed in various industrial applications for easier and more intuitive design processes like model validation,
verification and maintenance.
Data driven fuzzy modelling of ANFIS is done in two stages viz. structure learning and parameter learning phase.
In structure learning phase, clustering techniques are used to cluster the experimental data set and subsequently the
obtained clusters are used to construct fuzzy rules. Parameter learning phase involves fine-tuning of parameters of
fuzzy sets to optimize the performance of system in terms of approximation accuracy. In both these phases there are
interpretability issues which are not tackled in case of conventional ANFIS. During data driven structure learning
redundancy is introduced into the system in the form of similar fuzzy sets and fuzzy rules. This redundancy unneces-
sarily increases system complexity as the system uses multiple fuzzy sets to describe the same concept. This has the
effect of decreasing both the interpretability and generalization capability of the fuzzy system. The parameter learning
methods used in case of ANFIS are unconstrained which implies that any possible updates to the rule base parameters
are done, ignoring consistency, overlapping and distinguishability of the membership functions defining the fuzzy
partitions of input variables. This further reduces the interpretability of the final optimized system as it is difficult to
understand the fuzzy partitions of the system making it difficult for experts to predict the system output for a given
input vector. But most of the research studies based on the applications of ANFIS for real word problems ignore these
aspects related to the interpretability of ANFIS and apply it mainly for function approximation purpose.
In this paper, we attempt to address the above mentioned interpretability issues in ANFIS during structure learn-
ing and optimization phases. The proposed methodology is based on using a rule base simplification procedure after
structure learning to remove rule base redundancy and on applying constrained parameter learning. The rule base
simplification is done by merging the similar fuzzy sets using a set theoretic similarity measure and subsequently
merging the resulting fuzzy rules with equivalent premise parts. This removes rule base complexity and simplifies
the model which in turn improves system generalization capability in terms of forecasting accuracy. ANFIS uses an
efficient hybrid learning technique based on gradient descent and least square estimation (LSE) for fine-tuning the
system parameters. To address the interpretability issues, we have introduced constraints on the updates of the mem-
bership function parameters in this learning algorithm to avoid inconsistency, excessive overlapping and inclusions of
membership functions. This makes the fuzzy partitioning of input variables interpretable so that the use of linguistic
labels can be facilitated and the system interpretability is guaranteed after learning phase. The applicability of the
proposed approach has been empirically investigated by application of the simplified and constrained ANFIS to two
well-known benchmark prediction problems and a real world problem of stock price forecasting.
Next section gives the relevant literature survey, in section 3 ANFIS is discussed, section 4 describes the concept
of data driven rule base induction, in section 5 the use of similarity measures in rule base simplification is discussed,
section 6 introduces the rule base simplification procedure for ANFIS, section 7 discusses the constrained training
methodology, section 8 presents the experimental results and lastly section 9 provides the concluding remarks.

2. Literature survey

Since the advent of TSK fuzzy model in 1985 numerous studies have been devoted to the design of intelligent
modelling and control systems based on TSK fuzzy systems. As a result various design issues related to fundamental
problems of reliability, stability and interpretability of TSK fuzzy systems are being investigated. Stability analysis
of TSK fuzzy system has received a lot of attention and a number of significant contribution related to the issue was
made [43–45]. In a significant study Karimi et al. [33] employed feedback linearization, H∞ control and supervisory
control to design adaptive control law and Lyapunov based design for developing parameter adaptation laws. The
authors demonstrated that the proposed system guaranteed the stability of the closed loop system, boundedness of
errors of states and convergence of network parameters. Zhao et al. [31] proposed a novel non-quadratic membership
dependent Lyapunov function in higher order and developed stability conditions for TSK systems based on this new
Lyapunov function method. The authors also showed that the conservativeness of the obtained stability criteria de-
creased as the membership function degree increased. Kommuri [32] proposed a novel fault tolerant cruise control
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.4 (1-23)
4 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

design based on higher order sliding mode (HOSM) observer for a permanent magnet synchronous motor (PMSM)
powered electric vehicle in the presence of speed sensor faults. The proposed system guaranteed finite time stability
of the perceived error for reconfiguration based control. The authors also demonstrated the robustness of the proposed
fault tolerant control (FTC) in existence of vehicular disturbances like road roughness. In another study, Wang et al.
[34] proposed required conditions for ensuring the asymptotic stability of sliding mode dynamics along with strictly
dissipative performance. The authors presented a fuzzy integrated sliding mode control for driving system trajectories
onto fuzzy switching surface in presence of matched/unmatched uncertainties and external disturbances. In the same
year, Jiang [35] conducted a study aimed to design a novel a fuzzy integral sliding surface without assuming input
matrices with full column rank and subsequently developing fuzzy sliding mode controls for stochastic stability pur-
pose. Also, authors proposed a set of novel linear matrix inequality conditions for stochastic stability of sliding mode
dynamics with uncertain transition rates and then extended the results to the case where input matrices are plant-rule
independent. Yossef et al. [36] introduced a new technique for proportional integral observer design for sensor and
actuator faults assessment based on Takagi–Sugeno fuzzy model having unmeasurable input variables. The authors
developed sufficient design conditions for concurrent estimation of states and time varying actuator and sensor faults
on the basis L2 performance analysis and Lyapunov stability theory.
The development of simple and transparent fuzzy systems has also received a lot of attention lately as is evident
from the relevant literature. Transparency of a fuzzy system makes it linguistically more interpretable in terms of
the rules extracted from the system and thus makes it easy to comprehend the functioning of the system. Various
techniques have been proposed that increase accuracy of these systems but generally increase model complexity as
well. Nonetheless, lately there has been a shift in fuzzy modelling research towards achieving a tradeoff between
accuracy and interpretability. A number of studies in literature are based on model simplification as a method to attain
accuracy-interpretability tradeoff in case of fuzzy systems by removing redundancy from the rule base. Setnes et al.
[6] proposed a rule base simplification method based on similarity analysis. The authors used a set theoretic similarity
measure to find and remove similar fuzzy sets from the rule base. This reduced model complexity while enhancing the
generalization capability. Jin [8] used similarity analysis for checking similarity between fuzzy rules to enhance fuzzy
model interpretability. The parameters of the fuzzy rules were fine-tuned using regularization. More recently, Chen
and Linkens [7] also used similarity analysis to remove similar fuzzy sets from the rule base of a TSK neuro-fuzzy
system using approximate similarity measures. The model simplification was followed by parameter fine-tuning us-
ing gradient descent based mechanism. The simplified model was shown to be linguistically more interpretable and
computationally efficient.
Many methods for attaining interpretability have been based on reducing the number of rules in an existing rule
base. Koczy and Hirota [10] simplified a complex rule base to simple rule base containing the important information
of the original rule base, and all other rules were replaced by an interpolation algorithm that recovered them to a
certain accuracy predefined before reduction. Klose et al. [9] used fuzzy rule performance to select the best rules from
the rule base in order to reduce the rule base size. But the rule base generation method used in this study usually leads
to a large rule base in case of high dimensionality data sets which diminishes the effect of rule base reduction used
after rule base induction. Espinosa and Vandewalle [11] proposed a method to induce fuzzy rules from dataset such
that linguistic integrity of the model is supported in order to guarantee interpretability in the linguistic context. The
approach also allowed inclusion of prior expert knowledge into the rule base.
Recently, in a study Pota and Esposito [29] proposed an index to control tradeoff between neuro-fuzzy model
performance and complexity. The authors also provided some insights into fuzzy partition properties, ideal fuzzy set
shape and evaluation of fuzzy rules. The authors gave evaluations about controversial interpretability properties. The
methods presented in this study helped to obtain best choice in terms of semantic interpretability at both fuzzy set level
and the partition level and also allowed employing gradient descent optimization method. Similarly, Łapa et al. [28]
proposed a novel method for obtaining an interpretable neuro-fuzzy system based on the appropriate use of parametric
triangular norms with weights of arguments. In addition the authors used a modified learning algorithm to select both
the model structure and its parameters with interpretability under consideration. The method was proved to be success-
ful on some well-known non-linear problems. Some studies were also conducted recently on the application of some
neuro-fuzzy models designed with the goal of interpretability for various real world problems. For example, Alonso et
al. [27] employed HLK (highly interpretable linguistic knowledge) neuro-fuzzy model for designing medical decision
support system. The authors used the system to predict the evolution of end-stage renal disease (ESRD) in people
affected by Immunoglobin nephropathy. Dian et al. [30] integrated particle swarm optimization (PSO) and ANFIS for
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.5 (1-23)
S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 5

improved accuracy and better interpretability. PSO was used to find the optimal number of fuzzy rules which was also
helpful in improving the interpretability of the system in addition to enhancing accuracy. The modelling technique
was applied on some benchmark classification problems and showed good results.
In literature some studies [12,13] used various constraints on the fuzzy sets of the fuzzy rules to ensure the in-
terpretability of the system. Many of these constraints can be ensured at the rule base induction stage and some of
these are ensured at the rule base simplification stage for example by removing highly overlapping fuzzy sets by
fuzzy set merging [6]. But during the parameter optimization stage there is no guarantee that the final system fol-
lows these restrictions due to which complex rule base difficult to interpret may emerge. Therefore, some studies
introduced constrained parameter learning techniques that ensure interpretability of the neuro-fuzzy system after the
parameter optimization stage. Nacuk et al. [4,14] used a constrained heuristic learning technique so that unconstrained
fine-tuning of the membership function parameters is not allowed. Constrained learning ensures that the membership
functions stay consistent, fuzzy sets do not exchange positions and have a certain degree of overlapping. But the
learning technique was not so efficient in terms of approximation accuracy [15]. Paiva and Dourado [15] introduced a
constrained gradient based technique for rule base fine-tuning. Rule base simplification based on similarity measure
was also implemented in this study. This technique demonstrated better approximation accuracy. Thus, in literature
interpretability in neuro-fuzzy systems has been ensured by using various methods during rule base induction, post
processing the initial rule base and use of well-organized parameter optimization techniques. But the impact of rule
base simplification and constrained learning on interpretability-accuracy balance in case of ANFIS has not been ana-
lyzed so far.

3. Adaptive neuro-fuzzy inference system (ANFIS)

Jang proposed ANFIS in 1993 [2]. ANFIS has three core parts: fuzzy rule base, membership functions defining
fuzzy sets in fuzzy rules and a reasoning mechanism. ANFIS is an adaptive fuzzy model that uses gradient descent
based optimization methods for tuning the membership function parameters. The architecture of ANFIS can be rep-
resented as a multilayer ANN like connectionist structure to represent the computations and dataflow through the
fuzzy model in order to formalize the use of learning techniques. ANFIS uses TSK fuzzy model which implies the
antecedents of fuzzy rules in the rule base consist of fuzzy sets corresponding to each model input variable and the
consequents are linear combinations of a constant and an input variable. A fuzzy rule base for an ANFIS with two
rules using two input variables x1 and x2 can be outlined as:

R1: if x1 is A1 and x2 is B1 then f1 = p1 x1 + q1 x2 + r1


R2: if x1 is A2 and x2 is B2 then f2 = p2 x1 + q2 x2 + r2

where Ai and Bi are the fuzzy sets corresponding to inputs x1 and x2 respectively, pi , qi and ri are the linear
consequent parameters. An equivalent ANFIS structure is shown in Fig. 1. This connectionist network structure is
based on six layers:

Layer 0: This layer represents the external inputs for ANFIS. This layer is not usually shown in the main ANFIS
structure.

Layer 1: This is the fuzzification layer which is built using membership functions corresponding to fuzzy sets of each
input variable. The membership function takes the input variable value and outputs the membership degree of the
input which lies between 0 and 1. This is the fuzzified value of the crisp input. Each node in this layer corresponds to
an adaptive membership function with output given by:

O1,i = μAj (x) (1)

where O1,i is the output of node i, μAj (x) is the output of the membership function for fuzzy set Aj . Each mem-
bership function is a piecewise differentiable and continuous function which implies that the function parameters can
be updated using a gradient descent based learning technique. Various types of membership functions can be used for
ANFIS but commonly used ones are Gaussian and generalized Bell functions.
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.6 (1-23)
6 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

Fig. 1. ANFIS architecture with two inputs and one output.

Layer 2: In this layer each node computes the product of outputs from the previous layer which corresponds to the
strength of a fuzzy rule. The output of i-th node in this layer is given by:

O2,i = wi = μAi (x1 )μBi (x2 ) (2)


Here O2,i represents the product of the membership values μAi (x) and μBi (x) which gives the firing strength of
ith rule. For the first rule shown above the firing strength can be written as A1 (x1 )μB1 (x2 ). For obtaining the product,
any fuzzy t-norm for example min operator can be used.
Layer 3: Each node in this layer corresponds to the normalization of the firing strength of a fuzzy rule. Each j -th
node computes the normalized rule strength from the ratio of the firing strength of the j -th rule and the sum of firing
strengths of all other rules. For example the output of j th node in this layer is obtained as:
wj
O3,j = wj = R (3)
i=1 wi
where wi is the firing strength of ith rule and R is the number of rules in the rule base.
Layer 4: This layer is also an adaptive layer like layer 2 with nodes that have updatable parameters associated with
them. The output of each node is a linear function given by:

O4,i = wi fi = wi (pi x1 + qi x2 + ri ) (4)


where O4,i is the output of node i in this layer, pi , qi and ri are the function coefficients that are updated during
parameter optimization phase. There are n + 1 parameters corresponding to n input variables (in our case there are 2
input variables).
Layer 5: This layer has a single node that computes the overall output as the sum of all incoming signals:

 wi f i
O5,1 = w i fi = i (5)
i i wi

where O5,1 is the obtained output available to user and wi fi is the output of node i in layer 4.
For optimization of the rule base parameters either standard error back-propagation algorithm or hybrid learning
algorithm based on gradient descent method and least square estimation (LSE) can be used in case of ANFIS [2].

4. Data driven rule base induction

Using data driven fuzzy modelling, fuzzy rule base is generated automatically from input–output data patterns. In
case of TSK fuzzy system modelling, given n input–output data patterns, the goal is to find non-linear parameters in
antecedents and linear parameters in rule consequents plus a minimum number of fuzzy rules that approximate the real
system as accurately as possible. Data driven rule base induction is commonly performed using clustering algorithms.
A clustering algorithm partitions the dataset into several clusters that capture the internal trends in data space. Each
data cluster is a fuzzy relation and corresponds to a fuzzy rule. The fuzzy sets of the rules are typically obtained by
projecting the identified clusters against the corresponding data axes [6]. This process however, results in a rule base
that usually exhibits redundancy in the form of highly overlapping similar fuzzy sets as shown in Fig. 2. For a Mamdani
model [16] parameters of fuzzy sets in both rule premises and consequents are obtained by this method but for a TSK
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.7 (1-23)
S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 7

Fig. 2. Data clusters and overlapping fuzzy sets.

model fuzzy sets are only in the premises of fuzzy rules and can be obtained using this method. The parameters in
consequents are obtained using a cluster covariance matrix [17] or some other parameter estimation technique.
Subtractive clustering [18] is a widely used, efficient and simple clustering algorithm to obtain the initial rule base
for ANFIS. It is a well-known data clustering algorithm based on the improved Mountain method of data partitioning.
The user need not set the number of clusters as the algorithm automatically determines the best possible number of
clusters for a given input–output dataset. However, the resulting rule base size depends on the cluster neighborhood
radius. A small value results in a large rule base and therefore high model complexity while a large value leads to a
small rule base resulting in a poor model. To obtain the best possible value of this parameter usually trial and error
method is used which is also followed in this paper. The next section gives a detailed overview of this clustering
algorithm.

4.1. Subtractive clustering

Subtractive clustering is based on calculating the potential function called mountain value at each data point of a
dataset. It uses each input data point in the dataset as a potential cluster center rather than using grid based formulation
in mountain clustering method. Thus this method achieves lower computational complexity for higher dimensional
data sets.
In subtractive clustering, the potential value at each data point di of a dataset D = {d1 , d2 , . . . , dP } is given by:

P
4
e−αdi −dj  ,
2
α= (6)
ra2
j =1

where p is the number of data patterns, ra is a positive constant called cluster radius defining the range of influence of
a cluster center along each dimension and affects the number of clusters generated. A smaller radius leads to a higher
number of clusters which may lead to over-fitting and vice versa. Therefore to find the appropriate number of clusters
for a dataset various values of radii may be tested and the one with the best results should be chosen. The data point
with the highest potential value P1 is selected as the cluster center c1 . In order to find the subsequent cluster centers
the potential values for each data point di are modified as:
4
Pi = Pi − P1 ∗ e−βdi −c1  ,
2
β= and rb = ηra (7)
rb2
where rb is a positive constant and η is the squash factor used to squash the potential values for the distant points to
be considered as part of a cluster. The reductions in the potential values of data points near the newly found cluster
center are more than those for distant points and hence have a least chance of being selected as cluster centers. After
reducing the potential values, the data point with the highest potential is selected as the next cluster center and again
the potential of the rest of data points is reduced. In general when kth cluster center is selected, the potential value of
rest of the points is updated using:

Pi = Pi − Pk ∗ e−βdi −ck 
2
(8)
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.8 (1-23)
8 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

At the end of this process n cluster centers are obtained. Each cluster center is used as a basis of obtaining a single
fuzzy rule that can describe the system behavior in a region of input output space. As a result in case of subtractive
clustering and ANFIS, if n data clusters are obtained after clustering process then each input variable for ANFIS has
n fuzzy sets associated with it and there are n fuzzy rules in the rule base.
As depicted in Fig. 2 overlapping fuzzy sets may be present in the same region of a variable domain after clustering
and lead to redundancy as the model uses multiple fuzzy sets to represent the same domain area. This unnecessarily
increases model complexity and also complicates the linguistic labeling of the rule base. In case of a simple non redun-
dant rule base it is easier to assign meaningful labels to the fuzzy sets which can facilitate the linguistic interpretation
of a fuzzy model. Therefore, the resulting rule base needs to be analyzed for redundancy and simplified before tuning
the parameters of fuzzy sets so that the final model is interpretable.

5. Similarity measures

Fuzzy sets are similar if the membership functions defining these are highly overlapping which result in approx-
imately equal membership degrees of the elements in a domain [6]. The most widely accepted and deeply studied
methods in literature for quantifying similarity between fuzzy sets are based on similarity measures [19–22]. A sim-
ilarity measure S between two fuzzy sets is a function that assigns a similarity value s to two fuzzy sets A and B
i.e.,

S(A, B) = s where s ∈ [0, 1] (9)


The similarity value indicates degree to which the fuzzy sets A and B are equal. A higher value indicates more
similar fuzzy sets with high overlapping and vice versa. If μA and μB are the membership functions associated with
fuzzy sets A and B respectively and U is the universe of discourse, then a similarity measure should satisfy following
criteria:

(1) If A and B are non-overlapping fuzzy sets then

∀x ∈ U, S(A, B) = 0 ⇔ μA (x)μB (x) = 0 (10)


(2) If A and B are equal sets then

∀x ∈ U, S(A, B) = 1 ⇔ μA (x) = μB (x) (11)


(3) If A and B are overlapping sets then

∃x ∈ U, S(A, B) > 0 ⇔ μA (x)μB (x) = 0 (12)


(4) Similarity is not altered if the domain in which fuzzy sets are defined is scaled or shifted i.e.,
 
S(A, B) = S A
, B
, μA (x) = μA
(nx + m) and μB (x) = μB
(nx + m) where m, n ∈ R, m > 0.
(13)

Various similarity measures have been proposed in literature, an in depth description of which can be found in
[17]. Similarity measures are divided into two main categories viz. geometric similarity measures and set theoretic
similarity measures. In case of geometric similarity measures fuzzy sets are considered as points in the data space
and similarity is defined as the inverse of the distance between them. Many geometric similarity measures have
been defined including those based on Minkowski distance [6] and generalized Handroff distance [23]. Set theoretic
similarity measures are considered apt for rule base simplification as these determine similarity between overlapping
fuzzy sets more appropriately and are not affected by ordering and scaling of a variable domain [24]. These measures
are based on the fuzzy set operations like union, intersection, etc. Below is an effective set theoretic similarity measure
used in practice which is based on fuzzy set union and intersection:
|A ∩ B|
S(A, B) = (14)
|A ∪ B|
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.9 (1-23)
S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 9

where | represents set cardinality, ∩ and ∪ are fuzzy set intersection and union operations respectively. This similarity
measure satisfies all the criteria for similarity measures mentioned above and has been proven to be quite adequate in
similarity analysis of fuzzy sets [5].

6. Rule base simplification procedure

In order to simplify the initial rule base obtained from data driven clustering process we use a simplification
technique based on minimizing the number of fuzzy sets for each input variable of ANFIS by eliminating redundant
fuzzy sets and then removing the redundant fuzzy rules. This simplification procedure depicted in Fig. 4 can be broadly
divided into two phases: in the first phase fuzzy set merging is done using similarity analysis through similarity
measures, and in second phase fuzzy rules with equivalent premises are merged. The two phases are discussed in
below subsections:

6.1. Fuzzy set merging

In this paper, the similarity measure defined in eq. (14) is used to obtain the degree of similarity between each
pair of fuzzy sets for each input variable. Since fuzzy sets in the rule base are defined by membership functions, for a
discrete universe U = {xi |i = 1, 2, . . . , n}, the eq. (14) for fuzzy sets A and B can be expressed as:
n
[μA (xi ) ∧ μB (xi )]
S(A, B) = i=1 n (15)
i=1 [μA (xi ) ∨ μB (xi )]
where μA and μB are the membership function associated with A and B respectively. Symbols ∧ and ∨ represent
fuzzy union and intersection respectively. A similarity threshold λ for degree of similarity is also used that indicates
similarity value above which the similarity between two fuzzy sets is considered significant. The value of the threshold
significantly affects the model accuracy and interpretability. A smaller value implies that more number of fuzzy sets
having similarity above threshold which are merged and may lead to over-simplified model with decreased accuracy.
On the other hand a larger value may retain redundancy in model due to lesser merging which can also decrease
accuracy due to model over-fitting problem. Therefore, it is an important parameter in balancing the interpretability
and the approximation accuracy of the model.
At each step of the merging process similarity measure is calculated between each distinct pair of fuzzy sets of each
input variable. Then a pair of fuzzy sets A and B with highest similarity value above similarity threshold is selected
from the rule base and replaced by a new fuzzy set in all the fuzzy rules where A and B are present. The new fuzzy
set is obtained by merging the parameters of the membership functions of these two fuzzy sets. In this study, we have
used the following method [15] to obtain the new fuzzy set C:
nA Ap + nB Bp
Cp = (16)
n A + nB
where Cp is the vector of parameters defining fuzzy set C, nA and nB represent the number of fuzzy sets merged
before obtaining fuzzy sets A and B respectively. The above method gives more weight to the fuzzy set obtained after
merging the fuzzy sets in previous iterations. This gives better results than un-weighted average of parameters of the
two fuzzy sets being merged. The merging procedure continues until there is no pair of fuzzy sets that has similarity
value greater than merging threshold.
During the merging process the rule base is also updated as the fuzzy set obtained by merging two fuzzy sets
replaces both the fuzzy sets in all the fuzzy rules of the rule base. As a result redundant fuzzy rules may appear in the
rule base for which rule merging is done to reduce number of fuzzy rules.

6.2. Fuzzy rule merging

The merging of fuzzy sets and replacing the original fuzzy sets with merged fuzzy set in the first phase of rule
base simplification may lead to several fuzzy rules with equivalent premises and consequents in case the initial rule
base is highly redundant. The premise or the consequent parts of two fuzzy rules are equivalent if these have the same
membership functions for each of the corresponding input and output variable. In case of Mamdani fuzzy models either
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.10 (1-23)
10 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

Fig. 3. Merging of two fuzzy sets (dotted curves) into a single fuzzy set (solid curve).

both the premises and consequents of some rules may become equal or only the premises may happen to be equal.
As we are concerned with ANFIS which is a TSK model, only the premises of fuzzy rules may become equivalent
on fuzzy set merging because rule consequents are not fuzzy at all in case of a TSK fuzzy system. The presence of
such fuzzy rules with equivalent premises and unequal consequents leads to inconsistency and redundancy in the rule
base which needs to be addressed. When the rule base has m fuzzy rules with equal premises, m − 1 of these rules
are removed from the rule base. However, the consequent parameters of the fuzzy rule kept in the rule base need to be
re-estimated. The process is referred as fuzzy rule merging (Fig. 3).
In order to remove the redundant fuzzy rules, m fuzzy rules with equal premises are replaced by a single fuzzy
rule with premise part equal to that of m original rules and consequent parameters are obtained by averaging the
corresponding parameters of m equal rules i.e.,

1
m
P= Pi (17)
m
i=1

where Pi is the vector of the consequent parameters of the ith rule. The whole rule base simplification procedure is
depicted in Fig. 4, and the algorithm is summarized below.
Given an initial fuzzy rule base with N fuzzy rule, n input variables and a preset λ, λ ∈ (0, 1), the algorithm is
given below:
REPEAT:
Step 1: for j = 1 to n
for i = 1 to N
for k = 1 to N
sikj = S(Aij , Akj )
set slmq = maxi =k (sikj )
Step 2: if slmq ≥ λ
merge Alq and Amq into A
set Alq = A and Amq = A in all fuzzy rules.
UNTILL: no two fuzzy sets in rule base have similarity value sij k ≥ λ, i = k.
Step 3: for i = 1 to N
for j = 1 to N
if premise(Ri ) = premise(Rj ), i = j
i. merge Ri and Rj into R
ii. add R into rule base
iii. remove Ri and Rj from the rule base
The optimal value of λ has been obtained by experimenting with different values and generally varies according
to a problem at hand. But a much larger value of λ may retain complexity in the model because some redundancies
remain in the model due to lesser fuzzy set merging. On the other hand a much smaller value may unnecessarily
remove some important fuzzy sets and fuzzy rules from the rule base due to excessive fuzzy set merging which may
result in a poor under-fitted model. In case the initial model is overestimated, rule base simplification also improves
model generalization capability in addition to improving interpretability.
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.11 (1-23)
S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 11

Fig. 4. Rule base simplification procedure.

7. Training methodology

Using rule base simplification, ANFIS becomes more tractable in terms of interpretability but the approximation
accuracy of the system is not acceptable at this stage. A parameter fine-tuning algorithm is needed to train the system
so that a model with optimal or satisfactory accuracy is obtained. In case of ANFIS, system parameters are rule
base parameters that can be decomposed into two sets: one is the set A of non-linear parameters of membership
functions related to rule antecedents, and other is the set B of linear parameters in rule consequents. We use hybrid
back-propagation based learning technique in batch mode that combines gradient descent (GD) and least square
estimation (LSE) methods. This algorithm is more efficient than standalone GD method in that it is faster and has
lesser chances of getting stuck in local minima [2]. Each epoch of the hybrid learning method consists of a forward
and a backward pass. During forward pass an instance of a data set is passed through ANFIS structure via input layer
and outputs are obtained from output layer. If X is the data set of size M, |B| = P and Y is the output vector, for fixed
set of non-linear system parameters in A, the three can be put in matrix equation form as:

XB = Y (18)
where dimensions of X, B and Y are M × P , P × 1 and M × 1. This becomes an over-determined problem without
an exact solution as the size of data sets is usually greater than the number of the linear parameters. Therefore, a least
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.12 (1-23)
12 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

square estimate of B, B
is used that minimizes the squared error XB − Y 2 . The value of B
is obtained using a
widely known formula based on pseudo-inverse of B:
 −1
B
= XT X XT Y (19)
where X T is the transpose of X and (X T X)−1 X T is the pseudoinverse of X. The above formula is computationally
expensive and can be computed iteratively using sequential formulas as used in [25]:
 T 
Bi+1 = Bi + Si+1 xi+1 yi+1 − xi+1
T
Bi (20)
T S
Si xi+1 xi+1 i
Si+1 = Si − , 0≤i ≤M −1 (21)
1 + xi+1
T S x
i i+1

where xiT is the ith row vector of X, yiT is the ith element of Y , Si is the covariance matrix. Initially X0 = 0 and
S0 = αI where α is a positive number and I is an identity matrix of size P × P . In case of a system with n outputs,
the derivation is valid except that yiT is the ith row of matrix Y .
Therefore, during each forward pass a data set pattern is passed into network, linear parameter set B is adjusted,
output is obtained while non-linear parameter set A stays constant. Next, based on fixed linear parameter values in B,
non-linear parameters of fuzzy sets in rule antecedents are updated. Using the outputs for each input data pattern p,
error measure Ep is calculated as the sum of squared errors as:

n
Ep = (Ti,p − Yi,p )2 (22)
i=1

where, n is the number of system outputs, Yi,p is the ith system output and Ti,p is the real output value for input
pattern p. Using per pattern error measure the overall network error is calculated as:


M
E= Ep (23)
p=1

Starting from the nodes in output layer, a backward pass is done so that non-linear parameters in A are adjusted
in the direction of minimum error. This is done iteratively using gradient descent method by calculating error rates
from node activations at the nodes in various layers. The error rate at the ith output node is calculated from the
corresponding output value (node activation) as:
∂Ep
= −2(Ti,p − Yi,p ) (24)
∂Yi,p
The error rate at a node i in the hidden layer K is obtained using chain rule and is given by:
K
∂Ep 
nKi+1 i+1
∂Ep ∂Yj,p
k
= K k
(25)
∂Yi,p j =1 ∂Yj,pi+1 ∂Yi,p
where, nKi+1 is the number of nodes in the layer next to layer K. This gives the error rate at a hidden layer node as
the linear combination of the error rates at nodes in the next layer. If w is a generic updatable parameter, the error rate
with respect to w for a data pattern p is given by:
∂Ep  ∂Ep ∂Y ∗
= (26)
∂w ∗
∂Y ∗ ∂w
Y ∈Q

where, Q is the set of nodes whose output depends on parameter w. The derivative of the overall network error using
w is given by:

∂E  ∂Ep
M
= (27)
∂w ∂w
p=1
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.13 (1-23)
S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 13

Using eq. (27) and learning rate η the generic parameter w is updated using
∂E s
w = −η , η=  (28)
∂w  ∂Ep 2
w( ∂w )
In case of a TSK fuzzy model the generic parameter w refers to a parameter of a membership function in the fuzzy
rule antecedent, η is the learning rate and s is the step size which is a parameter that may have an impact on the
convergence speed of the network. Using a small value for s helps to closely trail the gradient path but reduces the
convergence speed while a large value increases the convergence speed but the system may fluctuate about the optimal
solution. In this paper, two heuristic rules used in [2] have been used for updating s and gave satisfactory results:

i. If the error measure E is reduced four times consequently, s is increased by 10%.


ii. If the error measure shows a combination of an increase followed by a decrease twice consecutively s is decreased
by 10%.

7.1. Constraints during parameter learning

The parameter fine-tuning process in ANFIS may lead to a complex fuzzy rule base due to the unrestrained mod-
ification of membership function parameters. This results in a final fuzzy system with poor transparency even if an
interpretability oriented approach is followed during structure learning. In order to guarantee interpretability of the
final fuzzy system, constraints must be applied before applying updates to the parameters so that the adjacent mem-
bership functions do not overlap excessively i.e. stay distinguishable, do not have parameter values that are invalid
and remain consistent, do not exchange position with the adjacent membership functions and so on. This can be done
by considering the semantic properties of the membership functions based on width and position parameters which
determine the support size and location of a membership function respectively. We demonstrate the approach to retain
interpretability in terms of well-known Gaussian membership function defined by eq. (29). The center of the function
c determines function location and width σ determines support which is in fact the standard deviation of the function
corresponding to an input x in the domain in which v is defined.
−(c−x)2
f (x; c, σ ) = e 2σ 2 (29)
It is important to ensure that consistency of the membership functions is maintained during parameter updates. For
example, in case of a Gaussian membership function, if σ becomes negative, the membership function is incorrect and
therefore to maintain membership function validity, σ is set to a small value or simply zero. Similarly, if on update
the function center assumes a value that is lower than the lowest bound of the domain or greater than highest value,
the center is changed to be equal to the corresponding bound value that it crosses. That is if X is the domain in which
an input variable is defined having a membership function with center c,
if c < Xmin then c = Xmin and (30)
if c > Xmax then c = Xmax (31)
To ensure sufficient but proportionate overlapping between the membership functions either possibility measure
between each pair of fuzzy sets may be used or updates to the support parameters of the functions may be constrained.
In this work we have used the later method. If on update the extreme of the support of a Gaussian membership function
f becomes greater than the extreme of the adjacent function j on right side, overlapping is excessive. This is the case
when:
cf + 3σf > cj + 3σj (32)
To avoid this condition σf is changed as:
cj + 3σj − cf
σf = (33)
3
In case excessive overlapping occurs with left neighboring function i i.e.:
cf − 3σf > ci − 3σi (34)
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.14 (1-23)
14 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

σf is changed as:

ci − 3σi − cf
σf = (35)
−3
In case of a generic two sided membership functions the above constraints are applied on the updates of the position
and support parameters of both the left and right components of the function.
For avoiding a membership function being completely or almost completely included in other function, the distance
between membership functions has to be monitored during parameter learning phase. If on updating the center of a
membership function k, the condition:

γ (Xmin − Xmax ) ≥ ck − ci (36)


with its left neighboring function i does not hold, the centers of both the membership functions need to be changed
as:
c k + ci α(Xmin − Xmax )
ck = + (37)
2 2
c k + ci α(Xmin − Xmax )
ci = − (38)
2 2
where, γ in eq. (36) is the percentage of domain for obtaining least distance between membership functions. In
the same way inclusion with the right neighboring membership function can be removed. The no pass constraint
i.e. the adjacent functions do not exchange positions is ensured by simply comparing the position parameters of the
adjacent membership functions and changing them appropriately. The constraints can be applied after every epoch or
periodically after n epochs but the experiments indicated that this changes the accuracy and interpretability results of
the model accordingly. The constraints may be relaxed if the accuracy of the model is not satisfactory but it may affect
the interpretability of the system.

8. Experimental results

In this section we provide the experimental analysis of the effectiveness of rule base simplification and constrained
learning in ANFIS by application to two well-known forecasting problems and a real world stock price prediction
problem. In all the experiments Gaussian membership function have been used for all the input variables. A compari-
son with conventional ANFIS on the basis of forecasting accuracy and interpretability aspects has been provided and
discussed.

8.1. Non-linear function approximation

In this section the proposed interpretability oriented neuro-fuzzy modelling approach to ANFIS is used for approx-
imating a well-known non-linear sinc function given by:
sin(x) sin(y)
sinc(x, y) = × (39)
x y
Using the inputs in the range of [−15, 15] × [−15, 15] we obtained 372 input–output patterns. From this data set,
170 instances were used as training data, 102 instances as checking data and 100 records as test data.
The data was used in the structure learning phase to construct initial fuzzy rule base for ANFIS. As already dis-
cussed in section 4.1, the numerical value of cluster radius ra affects the number of data clusters obtained and therefore
the number of fuzzy rules and fuzzy sets per variable obtained after applying subtractive clustering. Larger the value
of ra lesser is the number of clusters and vice versa. Therefore, a trial and error method is followed to find the optimal
value of ra which is the one that gives least error on test data. For this approximation problem cluster radius of 0.5
was optimal with least training and testing RMSE. It resulted in 4 clusters and initial rule base for ANFIS with 4 fuzzy
rules and 4 membership functions per input variable.
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.15 (1-23)
S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 15

Table 1
Rule base simplification results.
Model Fuzzy sets per input variable Linear parameter Non-linear parameters
X Y
Conventional ANFIS 4 4 12 16
Simplified ANFIS (λ = 0.80) 4 2 12 12
Simplified ANFIS (λ = 0.75) 3 2 9 10
Simplified ANFIS (λ = 0.70) 2 2 9 8

Fig. 5. Training RMSE plots of ANFIS based on proposed approach and conventional ANFIS.

Fig. 6. Checking RMSE plots of ANFIS based on proposed approach and conventional ANFIS.

ANFIS with 4 fuzzy rules in the rule base was subsequently passed through simplification phase to remove any
redundant fuzzy sets and fuzzy rules. It can be observed from Table 1 that how simplification process using differ-
ent values of similarity threshold λ reduces the number of fuzzy sets in the rule base through fuzzy set merging.
Fuzzy rule merging may also occur in case fuzzy set merging is high which reduces the number of fuzzy rules in
the rule base. Table 1 also depicts the structural properties of conventional ANFIS on same data for comparison
purpose.
For improving approximation accuracy, different simplified ANFIS versions were trained using constrained hy-
brid learning algorithm discussed in section 7. Conventional ANFIS was trained using unconstrained hybrid learning.
Training RMSE and checking RMSE plots during first 100 epochs for simplified ANFIS (λ = 0.75) and conventional
ANFIS are shown in Fig. 5 and Fig. 6 respectively which clearly indicate lower error values for simplified ANFIS.
Training was done for 500 epochs but the checking errors stayed at a constant value beyond epoch 198. The mem-
bership functions of conventional ANFIS and simplified ANFIS (λ = 0.75) after the completion of training phase are
shown in Fig. 7 and Fig. 8 respectively. The figures indicate better interpretable input space partitioning in case of
simplified ANFIS with non-overlapping membership functions as compared to that of conventional ANFIS. Table 2
shows the approximation accuracy of the models in terms of RMSE on test data. The results indicate how different
values of λ affect the RMSE value and therefore model generalization.
Similarity threshold λ affected the number of fuzzy sets and fuzzy rules being merged in the system which in turn
affected the complexity and approximation accuracy of the system. It can be observed from the table that simplified
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.16 (1-23)
16 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

Fig. 7. Membership functions of two inputs for conventional ANFIS.

Fig. 8. Membership functions of two inputs for ANFIS based on proposed approach (λ = 0.75).

Table 2
RMSE values on test data.
Model No. of fuzzy rules Test data RMSE
ANFIS 4 0.0567
NEFPROX 59 0.0571
Simplified (λ = 0.80) and constrained ANFIS 4 0.0546
Simplified (λ = 0.75) and constrained ANFIS 3 0.0534
Simplified (λ = 0.70) and constrained ANFIS 3 0.540

ANFIS showed least RMSE at λ = 0.75. The most appropriate values for λ where in the range of [0.80, 0.70] which
led to simplified ANFIS versions with least RMSE. λ values > 0.80 led to lesser fuzzy set and fuzzy rule merging
due to which redundancy persisted in the system leading to an over-fitted model with lower approximation accuracy
having higher RMSE on test data due to poor model generalization. Reducing λ below 0.70 led to merging of many
non-overlapping fuzzy sets with lower similarity measures which over-simplified the model and thus RMSE on test
data increased.
Thus, similarity threshold λ can be adjusted to obtain a desired balance between interpretability and approximation
accuracy of the system.
Therefore, using an appropriate similarity threshold during rule base simplification and constrained learning an
interpretable neuro-fuzzy model with better accuracy during training and testing phases can be obtained. Table 2 also
compares the RMSE of conventional ANFIS and various simplified versions of ANFIS with NEFPROX [4] based on
the same dataset. We have considered NEFPROX as it is a well-known neuro-fuzzy model for function approximation
based on interpretability criteria for neuro-fuzzy systems. It is evident from the results that ANFIS obtained using the
proposed modelling approach based on rule base simplification and constrained hybrid learning is simple implying
better interpretability and also has lesser RMSE on test data implying better generalization.
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.17 (1-23)
S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 17

Fig. 9. A snapshot of the CNX Nifty stock price dataset with values of essential technical indicators and corresponding basic stock quantities.

8.2. Stock price prediction

Forecasting the direction of future stock market price fluctuations from historical prices is a prerequisite for in-
vestors and financial consultants for making efficient trading decisions. It is a complex and difficult task due to the
chaotic behavior and high uncertainty in the stock market prices. A number of research studies have employed ANFIS
to obtain accurate and reliable stock trading systems. We used the proposed fuzzy modelling methodology based on
data driven rule base induction, rule base simplification and constrained hybrid learning to build ANFIS with bet-
ter interpretability for predicting the day-ahead closing price. The impact of rule base simplification and constrained
learning on accuracy and interpretability of ANFIS is methodically investigated.
Daily CNX Nifty stock dataset of 2703 records from January 3, 2005 to December 24, 2015 comprising of five
fundamental stock quantities viz. maximum price, open price, minimum price, close price and trading volume of stock
was used as experimental dataset. Forecasting was done using technical indicators as inputs of the model which were
calculated from the four basic stock quantities.
A correlation matrix was used as the feature selection method to select the most essential technical indicators as
model inputs. Based on a two-tailed significance test (using 0.05 as significance level), the model inputs were: volume
(VOL), moving average (MA), rate of change (ROC), relative strength index (RSI), stochastic oscillators (%K, %D),
William’s percent range (%R) and moving average convergence and divergence (MACD). Fig. 9 gives a snapshot of
these essential technical indicators along with the basic stock quantities used to calculate these. The values of the
indicators were calculated using 10 periods of four basic stock quantities. 1700 records were used for model training,
603 records as validation data and 400 instances as test data.
To obtain the initial rule base from data, subtractive clustering with cluster radius r = 0.5 was used which resulted
in 18 fuzzy rules and therefore 18 fuzzy sets per input. This system configuration was found optimal with respect
to forecasting error. But the initial rule base exhibited high redundancy in terms of highly overlapping membership
functions in case of all the input variables. Therefore, fuzzy set merging was performed to remove redundant fuzzy
sets of input variables and subsequent rule deletion was done to simplify the model. Fuzzy set merging was done using
different similarity thresholds, the results of three different values of λ are shown in Table 3. The results indicate a
neuro-fuzzy model with much simpler structure having smaller number of fuzzy sets per input variable and lesser
number of linear and non-linear parameters than conventional ANFIS obtained using the same dataset. Fig. 10 and
Fig. 11 compare the membership functions of simplified ANFIS (λ = 0.70) and conventional ANFIS for input 5 (%K)
and input 7 (%R). It can be clearly observed from the figures how rule base simplification using an appropriate simi-
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.18 (1-23)
18 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

Table 3
Rule base simplification results.
Stock prediction model Fuzzy sets per input variable Linear parameters Non-linear parameter
VOL MA ROC RSI %K %D %R MACD
Conventional ANFIS 18 18 18 18 18 18 18 18 162 288
Simplified (λ = 0.80) and constrained ANFIS 3 7 5 7 8 7 8 6 162 102
Simplified (λ = 0.75) and constrained ANFIS 6 7 3 7 2 7 4 6 144 84
Simplified (λ = 0.70) and constrained ANFIS 2 6 2 7 2 6 4 6 144 70

Fig. 10. Membership functions for two inputs of simplified ANFIS (λ = 0.75).

Fig. 11. Membership functions for two inputs of conventional ANFIS.

larity threshold λ removes the redundant fuzzy sets form the rule base thereby simplifying the model and improving
interpretability.
The prediction accuracy of three versions of simplified ANFIS in Table 3 was improved by training the model using
constrained hybrid learning algorithm for 1500 epochs above which the RMSE on checking data remained more or less
constant. After training, test data was used to assess generalization capability. The RMSE values of different versions
of the simplified ANFIS obtained using various values of similarity threshold λ trained using constrained hybrid
learning are depicted in Table 4. Different versions of simplified ANFIS showed different RMSE values on test data.
Table 4 also compares the RMSE values with conventional ANFIS and NEFPROX on the same test data. Conventional
ANFIS and NEFPROX models were obtained using the same training and checking data sets as used for simplified
ANFIS. Evidently, simplified ANFIS obtained using λ = 0.75 and trained using constrained hybrid learning showed
least RMSE indicating a better forecasting accuracy. The model therefore obtains a better accuracy-interpretability
tradeoff as it is much simpler than conventional ANFIS with lesser number of fuzzy rules and much smaller number
of fuzzy sets per input variable (Table 3) but shows lowest test data RMSE. Reducing λ further increased RMSE on
test data due to model over-simplification.
Fig. 12 and Fig. 13 show the plots of predicted close prices of the ANFIS based on proposed approach (λ = 0.75)
and conventional ANFIS against the actual next day close prices in the test dataset respectively. The day ahead close
price plots for simplified ANFIS with constrained learning shows meagre difference between the actual and output
close prices. This shows the applicability of ANFIS based on proposed interpretability oriented modelling technique
for stock price prediction which is characterized by tremendous amount of noise in stock market data. Therefore, it
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.19 (1-23)
S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 19

Table 4
RMSE values on test data.
Model No. of fuzzy rules Test data RMSE
Conventional ANFIS 18 64.6742
NEFPROX 261 70.455
Simplified (λ = 0.80) and constrained ANFIS 18 65.8591
Simplified (λ = 0.75) and constrained ANFIS 16 61.2799
Simplified (λ = 0.70) and constrained ANFIS 16 63.3878

Fig. 12. Actual close prices and output close prices for ANFIS based on proposed approach (λ = 0.75) on test data.

Fig. 13. Actual close prices and output close prices for conventional ANFIS on test data.

is obvious that using rule base simplification with an appropriate value of similarity threshold along with constrained
hybrid learning ANFIS demonstrates better forecasting accuracy on test data while also ensuring the interpretability of
the final optimized model. This implies a simple stock price forecasting model with a better accuracy-interpretability
tradeoff.

8.3. Chaotic time-series prediction

In this subsection, chaotic time series prediction is considered. For this purpose Mackey–Glass time series is used.
This time series is generated using Mackey–Glass differential delay equation given by:
0.2x(t − τ )
x(t) = − 0.1x(t) (40)
1 + x 10 (t − τ )
Using the previous values of time series up to a point t separated by equal interval I , the future value at some point
t + p is predicted. That is if there are N known previous values given by x(t − (N − 1)I ), . . . , x(t − I ), x(t), the future
value x(t + p) is predicted. In this experiment we have used N = 4 so that the model has four inputs and P = I = 6.
We obtained 1000 input–output patterns using the method depicted in [2] with I = 0.1, x(0) = 1.2, t within range
[0, 2000] and τ = 17. To build the model 500 instances were used as training data and rest 500 records as validation
data.
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.20 (1-23)
20 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

Table 5
Rule base simplification results.
Model Fuzzy sets per variable Linear parameters Non-linear parameters
X(t − 18) X(t − 12) X(t − 6) X(t)
Conventional ANFIS 10 10 10 10 50 80
Simplified (λ = 0.75) and constrained ANFIS 6 5 8 6 50 50
Simplified (λ = 0.70) and constrained ANFIS 5 5 6 6 50 44
Simplified (λ = 0.65) and constrained ANFIS 4 5 6 6 50 42

Table 6
RMSE values on test data parameters of rule base for maximizing system accuracy.
Model No. of fuzzy rules Test data RMSE
Conventional ANFIS 10 0.0017
Simplified (λ = 0.75) and constrained ANFIS 10 0.0035
Simplified (λ = 0.70) and constrained ANFIS 10 0.0019
Simplified (λ = 0.65) and constrained ANFIS 10 0.0027
NEFPROX 128 0.0301

Subtractive clustering with neighborhood radii 0.5 and 0.4 resulted in 10 and 15 fuzzy rules respectively. However,
in terms of RMSE on test data and model simplicity, the rule base with 10 rules performed better. So the model with 10
fuzzy initial rules was considered for rule base simplification phase. The simplification results are displayed in Table 5
which also depicts the configuration of conventional un-simplified ANFIS. As can be observed different values of λ
led to different versions of simplified ANFIS which also affected the forecasting accuracy. It is evident that in terms of
the number of fuzzy sets per input variable and therefore the number of non-linear parameters ANFIS is significantly
simplified.
For improving approximation accuracy of different versions of simplified ANFIS, constrained hybrid learning
technique was used. Fig. 14 (right) shows the fuzzy sets of four input variables of final conventional un-simplified
ANFIS obtained using unconstrained hybrid learning indicating high model complexity due to the presence of similar
fuzzy sets defined by highly overlapping membership functions and unconstrained parameter updates. ANFIS based
on the proposed neuro-fuzzy approach is much interpretable in terms of the fuzzy partitions of input variables as can
be observed from Fig. 14 (left). The reason is the removal of highly overlapping membership functions due to rule
base simplification and the application of constrained learning for tuning the membership function parameters.
At the end of the learning phase the models were tested for generalization accuracy using an independent test
dataset, the results are shown in Table 6. It can be observed that the forecasting accuracy is different for different
simplified versions of ANFIS. Table 6 also compares the forecasting accuracy of ANFIS based on proposed modelling
approach with conventional ANFIS and NEFPROX for this approximation problem. Amongst all the models, the
simplified version of ANFIS obtained using λ = 0.70 with just five or six membership functions per input variable
and trained using constrained parameter learning has lowest RMSE on test data. The result is significant as this
version of ANFIS is highly interpretable in terms of low complexity due to removal of overlapping fuzzy sets and well
distinguished fuzzy partitions of input variables and yet shows a small RMSE value. This ascertains the requirement
of the rule base simplification and constrained learning during fuzzy modelling in order to achieve a balance between
simplicity and approximation accuracy.

9. Conclusion

In this paper we proposed a novel approach to system identification using ANFIS oriented towards improved
interpretability and generalization capability. The methodology is based on the application of an effective rule base
simplification technique for initial fuzzy model obtained via data driven approaches and then using a constrained
hybrid learning method that separately tunes the linear and non-linear parameters of rule base for maximizing system
accuracy.
The approach has been successfully applied for three simulation examples and a thorough analysis of the impact
of this approach on interpretability and accuracy aspects of ANFIS has been performed. It has been experimentally
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.21 (1-23)
S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 21

Fig. 14. Membership functions of four inputs for ANFIS based on rule base simplification and constrained learning (left) after learning phase and
corresponding membership functions (right) in case of conventional ANFIS.

demonstrated that the rule base simplification using similarity analysis based on set theoretic similarity measure ef-
fectively removes redundant fuzzy sets and fuzzy rules from the rule base. This simplifies the model and enhances the
generalization capability. Further by adjusting the value of similarity threshold during rule base simplification pro-
cess a desired interpretability-accuracy tradeoff can be obtained. The constrained hybrid technique based on GD and
LSE effectively tunes the system while ensuring consistency, limited overlapping, no position exchange and distin-
guishability of fuzzy sets of the resulting neuro-fuzzy system. The experimental results have shown that the proposed
neuro-fuzzy modelling approach results in ANFIS with better approximation accuracy and interpretability features
and thus is useful in obtaining effective and simple TSK neuro-fuzzy based framework for real system identification
and forecasting problems.
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.22 (1-23)
22 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

References

[1] J. Casillas, O. Cordon, et al., Interpretability Issues in Fuzzy Modeling, Springer, Berlin–Heidelberg, 2003.
[2] J.S.R. Jang, ANFIS: Adaptive-Network-Based Fuzzy Inference System, IEEE Trans. Syst. Man Cybern. 23 (1993) 665–685.
[3] M. Sugeno, G.T. Kang, Structure identification of fuzzy model, Fuzzy Sets Syst. 28 (1) (1988) 15–33.
[4] D. Nauck, R. Kruse, Neuro-fuzzy systems for function approximation, Fuzzy Sets Syst. 101 (1999) 61–271.
[5] M. Setnes, H. Roubos, GA-fuzzy modeling and classification: complexity and performance, IEEE Trans. Fuzzy Syst. 8 (1995) 509–522.
[6] M. Setnes, R. Babuska, et al., Similarity measures in fuzzy rule base simplification, IEEE Trans. Syst. Man Cybern. 3 (1998) 376–386.
[7] M.Y. Chen, D.A. Linkens, Rule-base self-generation and simplification for data-driven fuzzy models, Fuzzy Sets Syst. 142 (2004) 243–265.
[8] Y. Jin, Fuzzy modeling of high-dimensional systems: complexity reduction and interpretability improvement, IEEE Trans. Fuzzy Syst. 2
(2000) 212–221.
[9] A. Klose, A. Nurnberger, D. Nauck, Improved NEFCLASS pruning techniques applied to a real world domain, in: Proceedings of Neuronale
Netze in der Anwendung, NN’99, University of Magdeburg, 1999.
[10] L.T. Koczy, K. Hirota, Size reduction by interpolation in fuzzy rule bases, IEEE Trans. Syst. Man Cybern. 27 (1) (1997) 14–25.
[11] J. Espinosa, J. Vandewalle, Constructing fuzzy models with linguistic integrity from numerical data-AFRELI Algorithm, IEEE Trans. Fuzzy
Syst. 8 (2000) 591–600.
[12] V.D. Oliveira, Towards neuro-linguistic modeling: constraints for optimization of membership functions, Fuzzy Sets Syst. 3 (1999) 357–380.
[13] C. Mencar, A.M. Fanelli, Interpretability constraints for fuzzy information granulation, Inf. Sci. 178 (2008) 4585–4618.
[14] D. Nauck, R. Kruse, A neuro-fuzzy to obtain interpretable model for function approximation, in: Proceedings of IEEE Conference on Fuzzy
Systems, 1998, pp. 1106–1111.
[15] R.R. Paiva, A. Dourado, Interpretability and learning in neuro-fuzzy systems, Fuzzy Sets Syst. 147 (2004) 17–38.
[16] E.H. Mamdani, Application of fuzzy algorithms for control of a simple dynamic plant, Proc. Inst. Electr. Eng. 12 (1974) 1585–1588.
[17] I. Couso, L. Garrido, L. Sánchez, Similarity and dissimilarity measures between fuzzy sets: a formal relational study, Inf. Sci. 229 (2013)
122–141.
[18] S. Chiu, Fuzzy model identification based on cluster estimation, J. Intell. Fuzzy Syst. 2 (1994) 267–278.
[19] C.P. Pappis, N.I. Karacapilidis, A comparative assessment of measures of similarity of fuzzy value, Fuzzy Sets Syst. 56 (1993) 171–174.
[20] S.M. Chen, M.S. Yeh, P.Y. Hsiao, A comparison of similarity measures of fuzzy values, Fuzzy Sets Syst. 72 (1995) 79–89.
[21] I. Beg, S. Ashraf, Similarity measures for fuzzy sets, Appl. Comput. Math. 8 (2009) 192–202.
[22] D. Guha, D. Chakraborty, New approach to fuzzy distance measure and similarity measure between two generalized fuzzy numbers, Appl.
Soft Comput. 10 (2010) 90–99.
[23] A.L. Ralescu, D.A. Ralescu, Probability and fuzziness, Inf. Sci. 34 (1984) 85–92.
[24] R. Zwick, E. Carlstein, D.V. Budescu, Measures of similarity among fuzzy concepts: a comparative analysis, Int. J. Approx. Reason. 1 (2)
(1987) 221–242.
[25] K.J. Åström, B. Wittenmark, Computer-Controlled Systems: Theory and Design, Printice-Hall, 1984.
[26] S. Rajab, V. Sharma, A review on the applications of neuro-fuzzy systems in business, Artif. Intell. Rev. 49 (4) (2018) 481–510.
[27] J.M. Alonso, C. Castiello, M. Lucarelli, C. Mencar, Modeling interpretable fuzzy rule-based classifiers for medical decision support, in: Data
Mining: Concepts, Methodologies, Tools, and Applications, IGI Global, 2015.
[28] K. Łapa, K. Cpałka, L. Wang, New approach for interpretability of neuro-fuzzy systems with parametrized triangular norms, in: Interna-
tional Conference on Artificial Intelligence and Soft Computing, ICAISC 2016, in: Lecture Notes in Computer Science, vol. 9692, 2016,
pp. 248–265.
[29] M. Pota, M. Esposito, Insights into interpretability of neuro-fuzzy systems, in: 16th World Congress of the International Fuzzy Systems
Association, IFSA, 2015.
[30] P.R. Dian, S.M. Shamsuddin, S.S. Yuhaniz, Particle swarm optimization for ANFIS interpretability and accuracy, Soft Comput. 20 (2016)
251–262.
[31] X. Zhao, L. Zhang, P. Shi, H.R. Karimi, Novel stability criteria for T–S fuzzy systems, IEEE Trans. Fuzzy Syst. 22 (2014) 313–323.
[32] S.K. Kommuri, M. Defoort, H.R. Karimi, K.C. Veluvolu, A robust observer-based sensor fault-tolerant control for PMSM in electric vehicles,
IEEE Trans. Ind. Electron. 63 (2016) 7671–7681.
[33] H.R. Karimi, B. Lohmann, B. Moshiri, P.J. Maralani, Wavelet-based identification and control design for a class of nonlinear systems, Int. J.
Wavelets Multiresolut. Inf. Process. 4 (2006) 213–226.
[34] Y. Wang, H. Shen, H.R. Karimi, D. Duan, Dissipativity-based fuzzy integral sliding mode control of continuous-time T–S fuzzy systems,
IEEE Trans. Fuzzy Syst. 26 (2018) 1164–1176.
[35] B. Jiang, H.R. Karimi, Y. Kao, C. Gao, A novel robust fuzzy integral sliding mode control for nonlinear semi-Markovian jump T–S fuzzy
systems, IEEE Trans. Fuzzy Syst. (2018), https://doi.org/10.1109/TFUZZ.2018.2838552.
[36] T. Youssef, M. Chadli, H.R. Karimi, R. Wang, Actuator and sensor faults estimation based on proportional integral observer for TS fuzzy
model, J. Franklin Inst. 354 (2017) 2524–2542.
[37] L. Zhao, W. Pawlus, H.R. Karimi, K.G. Robbersmyr, Data-based modeling of vehicle crash using adaptive neural-fuzzy inference system,
IEEE/ASME Trans. Mechatron. 19 (2014) 684–696.
[38] X.C. Dong, Y.Y. Zhao, H.R. Karimi, P. Shi, Adaptive variable structure fuzzy neural identification and control for a class of MIMO nonlinear
system, J. Franklin Inst. 350 (2013) 1221–1247.
[39] I. Ebtehaj, H. Bonakdari, Performance evaluation of adaptive neural fuzzy inference system for sediment transport in sewers, Water Resour.
Manag. 28 (2014) 4765–4779.
[40] T. Uçar, A. Karahoca, D. Karahoca, Tuberculosis disease diagnosis by using adaptive neuro fuzzy inference system and rough sets, Neural
Comput. Appl. 23 (2013) 471–483.
JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.23 (1-23)
S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 23

[41] X. Xue, X. Yang, Application of the adaptive neuro-fuzzy inference system for prediction of soil liquefaction, Nat. Hazards 67 (2013) 901–917.
[42] S. Naji, S. Shamshirband, H. Basser, D. Petković, Application of adaptive neuro-fuzzy methodology for estimating building energy consump-
tion (Tier 1), Renew. Sustain. Energy Rev. 53 (2015) 1520–1528.
[43] X.P. Xie, Z.W. Liu, X.L. Zhu, An efficient approach for reducing the conservatism of LMI-based stability conditions for continuous-time T–S
fuzzy systems, Fuzzy Sets Syst. 263 (2015) 71–81.
[44] X.P. Xie, D. Yue, H. Zhang, C. Peng, Control synthesis of discrete-time T–S fuzzy systems: reducing the conservatism whilst alleviating the
computational burden, IEEE Trans. Cybern. 47 (2017) 2480–2490.
[45] L. Huang, K. Wang, P. Shi, H.R. Karimi, A novel identification method for generalized TS fuzzy systems, Math. Probl. Eng. 2012 (2012)
893807.
[46] K. Kolusa, D.I. Philippe, A. Dubé, D. Dubeau, Classifying work rate from heart rate measurements using an adaptive neuro-fuzzy inference
system, Appl. Ergon. 54 (2016) 158–168.