0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

1 Ansichten23 Seitenmge

Jul 24, 2019

© © All Rights Reserved

PDF, TXT oder online auf Scribd lesen

mge

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

1 Ansichten23 Seitenmge

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

Sie sind auf Seite 1von 23

1 (1-23)

Available online at www.sciencedirect.com

ScienceDirect

Fuzzy Sets and Systems ••• (••••) •••–•••

www.elsevier.com/locate/fss

simplification and constrained learning

Sharifa Rajab

Department of IT and SS, University of Kashmir, India

Received 15 December 2017; received in revised form 30 October 2018; accepted 20 November 2018

Abstract

Adaptive neuro-fuzzy inference system (ANFIS) is a well-known neuro-fuzzy model for approximating highly complex non-

linear systems. ANFIS uses precise fuzzy modelling concept that aims at accuracy of a fuzzy model being designed than on

interpretability. But interpretability of a fuzzy system is an equally important aspect of fuzzy modelling as accuracy. So far the re-

search based on ANFIS has been mostly application based due to which the various issues related to the interpretability of ANFIS

have not been dealt with. The rule base of ANFIS is typically obtained using data driven clustering algorithms. But this process

introduces redundancy in the system in terms of similar fuzzy sets and redundant fuzzy rules which unnecessarily increases system

complexity. This in turn reduces both the interpretability as well as generalization capability of ANFIS. Additionally in case of

ANFIS, unconstrained gradient descent based learning algorithms are used to fine-tune the membership function parameters which

usually result in a rule base with inconsistent, excessively overlapping and indistinguishable membership functions for input vari-

ables and thus the interpretability of the final optimized system is not guaranteed. This paper is based on addressing the issue of rule

base redundancy in ANFIS to reduce complexity and enforcing constraints during learning phase to ensure interpretability of the

final optimized system. Rule base redundancy is removed using similarity analysis based rule base simplification in which similar

fuzzy sets are merged and subsequently resulting fuzzy rules with equal premises are combined. Hybrid learning technique which

is an efficient parameter tuning method for ANFIS is constrained to prevent inconsistency, excessive overlapping and inclusion of

membership functions so that the final fuzzy partitions of inputs stay interpretable. The empirical analysis of the impact of rule

base simplification and constrained learning on ANFIS is done by application to two well-known benchmark problems and a real

world stock price prediction problem. The introduction of rule base simplification and constrained learning in ANFIS modeling

has shown better results in terms of obtaining a desired accuracy-interpretability tradeoff than conventional ANFIS.

© 2018 Elsevier B.V. All rights reserved.

1. Introduction

System modelling is the process of approaching a real system for fabricating a theoretical design that eases the

understanding of the system. The goal is to design reliable and comprehensible model to simulate, explain, improve

https://doi.org/10.1016/j.fss.2018.11.010

0165-0114/© 2018 Elsevier B.V. All rights reserved.

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.2 (1-23)

2 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

or predict a real system. Fuzzy modelling is a popular system modeling approach which helps to model a real system

using fuzzy logic based descriptive language [1]. Such systems based on fuzzy set theory and fuzzy logic are called

fuzzy models and offer the benefit of representing expert knowledge in the form of fuzzy if–then rules. Fuzzy if–then

rules help in modelling different aspects of human knowledge and reasoning process without the need of precise qual-

itative analysis [2]. This enables the design of fuzzy models for ill-defined and uncertain systems. Also fuzzy systems

have better transparency because it is easier to interpret the system knowledge in the form of fuzzy if–then rules

which allows an in-depth understanding of the system functionality. Fuzzy models have the universal approximation

capability as that of artificial neural networks (ANN) but unlike ANNs lack learning ability. Therefore, neuro-fuzzy

systems were introduced which combine the benefits of fuzzy systems in terms of interpretability with the learning

capability of ANNs.

There are two main but contradictory goals in fuzzy modelling of neuro-fuzzy systems which are also used to

access the quality of fuzzy models: (1) Accuracy which is the ability of the system to faithfully represent the real

system; (2) Interpretability which is the ability to express the behavior of the system in a comprehensible manner [1].

In practical data driven fuzzy modelling one of these two properties prevails over the other, increasing one usually de-

creases the other. In case expert knowledge is used to build a fuzzy model it is easier to ensure that the system remains

interpretable while achieving satisfactory accuracy. On the other hand if automated data driven fuzzy modelling ap-

proaches are used to construct fuzzy rules, the interpretability aspect is not necessarily guaranteed as it usually results

in a fuzzy model with poor transparency. Data driven fuzzy modelling is predominantly used in TSK (Takagi, Sugeno

and Kang) based neuro-fuzzy systems [3]. Neuro-fuzzy systems built on underlying TSK fuzzy model are one of

the important areas in practical and theoretical fuzzy system literature. Well-known neuro-fuzzy model viz. adaptive

neuro-fuzzy inference system (ANFIS) is based on TSK fuzzy modelling concept. ANFIS has been the most popular

neuro-fuzzy system with wide range of applications in control, forecasting and inference [26]. But ANFIS has been

used in real applications mainly to replace other black box models like ANN with focus on how accurately the model

approximates a real system, ignoring the important aspect of interpretability which is questionable as indicated in [4].

ANFIS is being predominantly used for solving real world problems in various fields like business, medical science,

image processing, student modelling, traffic control and so on. Recently, ANFIS has been successfully applied in

various novel domains. Lately, ANFIS was introduced for estimating the sediment transport in sewers [39]. The

study used grid partitioning and subtractive clustering for initial rule base induction and also used hybrid learning.

The results proved that ANFIS demonstrates greater precision in estimation than existing techniques used for the

purpose. In another study ANFIS was used in tuberculosis diagnosis [40]. The study aimed at diagnosing tuberculosis

as accurately as possible and to reduce the waiting time for commencing the treatment on suspected patients. ANFIS

showed classification accuracy of 97% as compared to rough set algorithm which showed an accuracy of 92%. ANFIS

has also been used in the novel field of soil liquefaction potential due to earth quakes [41]. The model training was

done using a large database of case histories of soil liquefaction. The inputs included various parameters like water

table, vertical stress etc. effective in predicting soil liquefaction. The results of the study revealed the effectiveness

of ANFIS in predicting soil liquefaction potential. Also a novel adaptive fuzzy system was designed based on radial

basis function based components [38]. The system had the self-organizing capability of adjusting the number of fuzzy

rules during parameter learning phase. The system was successfully applied in 3 DOF helicopter systems. The fuzzy

system was based on sequential learning using sliding data window which reflected dynamic changes within the

system and dynamically growing and shrinking structure of fuzzy system. In a significant study ANFIS was used

for reconstructing the kinematics of colliding vehicles [37]. The authors used acceleration, displacement and velocity

for reproducing the kinematics of vehicles in oblique barrier collision. After training phase, the authors used the

same ANFIS model for simulating different other types of collisions and performed a comparison of the results with

other modelling techniques in which ANFIS based method showed better reliability. More recently, it was used in a

novel field of estimation of building energy consumption [42]. The estimation was done based on building envelope

parameters viz. insulation K-value and material thickness. The experimental study included 180 simulations using

different values of insulation K-values and material thickness. Lately ANFIS has also been used for classification

of work rate using field heart rate measurements [46]. The work rate was classified into very light, light, moderate

and heavy. In this study the classifier based on ANFIS showed superior results in terms of sensitivity, accuracy and

specificity as compared to currently used practice of establishing work rate using percent heart rate reserve.

In addition to above studies ANFIS has been used in a number of other real world applications with main focus

on approximation accuracy while the interpretability aspect is overlooked due to the assumption that a fuzzy model is

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.3 (1-23)

S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 3

implicitly interpretable in the form of fuzzy if–then rules which is not essentially true. Therefore, in practice ANFIS is

used in much the same way as other black box techniques like ANNs which results in complicated fuzzy model with

high accuracy disregarding interpretability. This poses problems for real applications requiring high human interac-

tion that demand interpretability as the main design criteria for yielding a comprehensive output explanation. This is

the case in applications like medical diagnosis, decision support systems or safety critical systems. Also interpretabil-

ity is needed in various industrial applications for easier and more intuitive design processes like model validation,

verification and maintenance.

Data driven fuzzy modelling of ANFIS is done in two stages viz. structure learning and parameter learning phase.

In structure learning phase, clustering techniques are used to cluster the experimental data set and subsequently the

obtained clusters are used to construct fuzzy rules. Parameter learning phase involves fine-tuning of parameters of

fuzzy sets to optimize the performance of system in terms of approximation accuracy. In both these phases there are

interpretability issues which are not tackled in case of conventional ANFIS. During data driven structure learning

redundancy is introduced into the system in the form of similar fuzzy sets and fuzzy rules. This redundancy unneces-

sarily increases system complexity as the system uses multiple fuzzy sets to describe the same concept. This has the

effect of decreasing both the interpretability and generalization capability of the fuzzy system. The parameter learning

methods used in case of ANFIS are unconstrained which implies that any possible updates to the rule base parameters

are done, ignoring consistency, overlapping and distinguishability of the membership functions defining the fuzzy

partitions of input variables. This further reduces the interpretability of the final optimized system as it is difficult to

understand the fuzzy partitions of the system making it difficult for experts to predict the system output for a given

input vector. But most of the research studies based on the applications of ANFIS for real word problems ignore these

aspects related to the interpretability of ANFIS and apply it mainly for function approximation purpose.

In this paper, we attempt to address the above mentioned interpretability issues in ANFIS during structure learn-

ing and optimization phases. The proposed methodology is based on using a rule base simplification procedure after

structure learning to remove rule base redundancy and on applying constrained parameter learning. The rule base

simplification is done by merging the similar fuzzy sets using a set theoretic similarity measure and subsequently

merging the resulting fuzzy rules with equivalent premise parts. This removes rule base complexity and simplifies

the model which in turn improves system generalization capability in terms of forecasting accuracy. ANFIS uses an

efficient hybrid learning technique based on gradient descent and least square estimation (LSE) for fine-tuning the

system parameters. To address the interpretability issues, we have introduced constraints on the updates of the mem-

bership function parameters in this learning algorithm to avoid inconsistency, excessive overlapping and inclusions of

membership functions. This makes the fuzzy partitioning of input variables interpretable so that the use of linguistic

labels can be facilitated and the system interpretability is guaranteed after learning phase. The applicability of the

proposed approach has been empirically investigated by application of the simplified and constrained ANFIS to two

well-known benchmark prediction problems and a real world problem of stock price forecasting.

Next section gives the relevant literature survey, in section 3 ANFIS is discussed, section 4 describes the concept

of data driven rule base induction, in section 5 the use of similarity measures in rule base simplification is discussed,

section 6 introduces the rule base simplification procedure for ANFIS, section 7 discusses the constrained training

methodology, section 8 presents the experimental results and lastly section 9 provides the concluding remarks.

2. Literature survey

Since the advent of TSK fuzzy model in 1985 numerous studies have been devoted to the design of intelligent

modelling and control systems based on TSK fuzzy systems. As a result various design issues related to fundamental

problems of reliability, stability and interpretability of TSK fuzzy systems are being investigated. Stability analysis

of TSK fuzzy system has received a lot of attention and a number of significant contribution related to the issue was

made [43–45]. In a significant study Karimi et al. [33] employed feedback linearization, H∞ control and supervisory

control to design adaptive control law and Lyapunov based design for developing parameter adaptation laws. The

authors demonstrated that the proposed system guaranteed the stability of the closed loop system, boundedness of

errors of states and convergence of network parameters. Zhao et al. [31] proposed a novel non-quadratic membership

dependent Lyapunov function in higher order and developed stability conditions for TSK systems based on this new

Lyapunov function method. The authors also showed that the conservativeness of the obtained stability criteria de-

creased as the membership function degree increased. Kommuri [32] proposed a novel fault tolerant cruise control

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.4 (1-23)

4 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

design based on higher order sliding mode (HOSM) observer for a permanent magnet synchronous motor (PMSM)

powered electric vehicle in the presence of speed sensor faults. The proposed system guaranteed finite time stability

of the perceived error for reconfiguration based control. The authors also demonstrated the robustness of the proposed

fault tolerant control (FTC) in existence of vehicular disturbances like road roughness. In another study, Wang et al.

[34] proposed required conditions for ensuring the asymptotic stability of sliding mode dynamics along with strictly

dissipative performance. The authors presented a fuzzy integrated sliding mode control for driving system trajectories

onto fuzzy switching surface in presence of matched/unmatched uncertainties and external disturbances. In the same

year, Jiang [35] conducted a study aimed to design a novel a fuzzy integral sliding surface without assuming input

matrices with full column rank and subsequently developing fuzzy sliding mode controls for stochastic stability pur-

pose. Also, authors proposed a set of novel linear matrix inequality conditions for stochastic stability of sliding mode

dynamics with uncertain transition rates and then extended the results to the case where input matrices are plant-rule

independent. Yossef et al. [36] introduced a new technique for proportional integral observer design for sensor and

actuator faults assessment based on Takagi–Sugeno fuzzy model having unmeasurable input variables. The authors

developed sufficient design conditions for concurrent estimation of states and time varying actuator and sensor faults

on the basis L2 performance analysis and Lyapunov stability theory.

The development of simple and transparent fuzzy systems has also received a lot of attention lately as is evident

from the relevant literature. Transparency of a fuzzy system makes it linguistically more interpretable in terms of

the rules extracted from the system and thus makes it easy to comprehend the functioning of the system. Various

techniques have been proposed that increase accuracy of these systems but generally increase model complexity as

well. Nonetheless, lately there has been a shift in fuzzy modelling research towards achieving a tradeoff between

accuracy and interpretability. A number of studies in literature are based on model simplification as a method to attain

accuracy-interpretability tradeoff in case of fuzzy systems by removing redundancy from the rule base. Setnes et al.

[6] proposed a rule base simplification method based on similarity analysis. The authors used a set theoretic similarity

measure to find and remove similar fuzzy sets from the rule base. This reduced model complexity while enhancing the

generalization capability. Jin [8] used similarity analysis for checking similarity between fuzzy rules to enhance fuzzy

model interpretability. The parameters of the fuzzy rules were fine-tuned using regularization. More recently, Chen

and Linkens [7] also used similarity analysis to remove similar fuzzy sets from the rule base of a TSK neuro-fuzzy

system using approximate similarity measures. The model simplification was followed by parameter fine-tuning us-

ing gradient descent based mechanism. The simplified model was shown to be linguistically more interpretable and

computationally efficient.

Many methods for attaining interpretability have been based on reducing the number of rules in an existing rule

base. Koczy and Hirota [10] simplified a complex rule base to simple rule base containing the important information

of the original rule base, and all other rules were replaced by an interpolation algorithm that recovered them to a

certain accuracy predefined before reduction. Klose et al. [9] used fuzzy rule performance to select the best rules from

the rule base in order to reduce the rule base size. But the rule base generation method used in this study usually leads

to a large rule base in case of high dimensionality data sets which diminishes the effect of rule base reduction used

after rule base induction. Espinosa and Vandewalle [11] proposed a method to induce fuzzy rules from dataset such

that linguistic integrity of the model is supported in order to guarantee interpretability in the linguistic context. The

approach also allowed inclusion of prior expert knowledge into the rule base.

Recently, in a study Pota and Esposito [29] proposed an index to control tradeoff between neuro-fuzzy model

performance and complexity. The authors also provided some insights into fuzzy partition properties, ideal fuzzy set

shape and evaluation of fuzzy rules. The authors gave evaluations about controversial interpretability properties. The

methods presented in this study helped to obtain best choice in terms of semantic interpretability at both fuzzy set level

and the partition level and also allowed employing gradient descent optimization method. Similarly, Łapa et al. [28]

proposed a novel method for obtaining an interpretable neuro-fuzzy system based on the appropriate use of parametric

triangular norms with weights of arguments. In addition the authors used a modified learning algorithm to select both

the model structure and its parameters with interpretability under consideration. The method was proved to be success-

ful on some well-known non-linear problems. Some studies were also conducted recently on the application of some

neuro-fuzzy models designed with the goal of interpretability for various real world problems. For example, Alonso et

al. [27] employed HLK (highly interpretable linguistic knowledge) neuro-fuzzy model for designing medical decision

support system. The authors used the system to predict the evolution of end-stage renal disease (ESRD) in people

affected by Immunoglobin nephropathy. Dian et al. [30] integrated particle swarm optimization (PSO) and ANFIS for

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.5 (1-23)

S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 5

improved accuracy and better interpretability. PSO was used to find the optimal number of fuzzy rules which was also

helpful in improving the interpretability of the system in addition to enhancing accuracy. The modelling technique

was applied on some benchmark classification problems and showed good results.

In literature some studies [12,13] used various constraints on the fuzzy sets of the fuzzy rules to ensure the in-

terpretability of the system. Many of these constraints can be ensured at the rule base induction stage and some of

these are ensured at the rule base simplification stage for example by removing highly overlapping fuzzy sets by

fuzzy set merging [6]. But during the parameter optimization stage there is no guarantee that the final system fol-

lows these restrictions due to which complex rule base difficult to interpret may emerge. Therefore, some studies

introduced constrained parameter learning techniques that ensure interpretability of the neuro-fuzzy system after the

parameter optimization stage. Nacuk et al. [4,14] used a constrained heuristic learning technique so that unconstrained

fine-tuning of the membership function parameters is not allowed. Constrained learning ensures that the membership

functions stay consistent, fuzzy sets do not exchange positions and have a certain degree of overlapping. But the

learning technique was not so efficient in terms of approximation accuracy [15]. Paiva and Dourado [15] introduced a

constrained gradient based technique for rule base fine-tuning. Rule base simplification based on similarity measure

was also implemented in this study. This technique demonstrated better approximation accuracy. Thus, in literature

interpretability in neuro-fuzzy systems has been ensured by using various methods during rule base induction, post

processing the initial rule base and use of well-organized parameter optimization techniques. But the impact of rule

base simplification and constrained learning on interpretability-accuracy balance in case of ANFIS has not been ana-

lyzed so far.

Jang proposed ANFIS in 1993 [2]. ANFIS has three core parts: fuzzy rule base, membership functions defining

fuzzy sets in fuzzy rules and a reasoning mechanism. ANFIS is an adaptive fuzzy model that uses gradient descent

based optimization methods for tuning the membership function parameters. The architecture of ANFIS can be rep-

resented as a multilayer ANN like connectionist structure to represent the computations and dataflow through the

fuzzy model in order to formalize the use of learning techniques. ANFIS uses TSK fuzzy model which implies the

antecedents of fuzzy rules in the rule base consist of fuzzy sets corresponding to each model input variable and the

consequents are linear combinations of a constant and an input variable. A fuzzy rule base for an ANFIS with two

rules using two input variables x1 and x2 can be outlined as:

R2: if x1 is A2 and x2 is B2 then f2 = p2 x1 + q2 x2 + r2

where Ai and Bi are the fuzzy sets corresponding to inputs x1 and x2 respectively, pi , qi and ri are the linear

consequent parameters. An equivalent ANFIS structure is shown in Fig. 1. This connectionist network structure is

based on six layers:

Layer 0: This layer represents the external inputs for ANFIS. This layer is not usually shown in the main ANFIS

structure.

Layer 1: This is the fuzzification layer which is built using membership functions corresponding to fuzzy sets of each

input variable. The membership function takes the input variable value and outputs the membership degree of the

input which lies between 0 and 1. This is the fuzzified value of the crisp input. Each node in this layer corresponds to

an adaptive membership function with output given by:

where O1,i is the output of node i, μAj (x) is the output of the membership function for fuzzy set Aj . Each mem-

bership function is a piecewise differentiable and continuous function which implies that the function parameters can

be updated using a gradient descent based learning technique. Various types of membership functions can be used for

ANFIS but commonly used ones are Gaussian and generalized Bell functions.

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.6 (1-23)

6 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

Layer 2: In this layer each node computes the product of outputs from the previous layer which corresponds to the

strength of a fuzzy rule. The output of i-th node in this layer is given by:

Here O2,i represents the product of the membership values μAi (x) and μBi (x) which gives the firing strength of

ith rule. For the first rule shown above the firing strength can be written as A1 (x1 )μB1 (x2 ). For obtaining the product,

any fuzzy t-norm for example min operator can be used.

Layer 3: Each node in this layer corresponds to the normalization of the firing strength of a fuzzy rule. Each j -th

node computes the normalized rule strength from the ratio of the firing strength of the j -th rule and the sum of firing

strengths of all other rules. For example the output of j th node in this layer is obtained as:

wj

O3,j = wj = R (3)

i=1 wi

where wi is the firing strength of ith rule and R is the number of rules in the rule base.

Layer 4: This layer is also an adaptive layer like layer 2 with nodes that have updatable parameters associated with

them. The output of each node is a linear function given by:

where O4,i is the output of node i in this layer, pi , qi and ri are the function coefficients that are updated during

parameter optimization phase. There are n + 1 parameters corresponding to n input variables (in our case there are 2

input variables).

Layer 5: This layer has a single node that computes the overall output as the sum of all incoming signals:

wi f i

O5,1 = w i fi = i (5)

i i wi

where O5,1 is the obtained output available to user and wi fi is the output of node i in layer 4.

For optimization of the rule base parameters either standard error back-propagation algorithm or hybrid learning

algorithm based on gradient descent method and least square estimation (LSE) can be used in case of ANFIS [2].

Using data driven fuzzy modelling, fuzzy rule base is generated automatically from input–output data patterns. In

case of TSK fuzzy system modelling, given n input–output data patterns, the goal is to find non-linear parameters in

antecedents and linear parameters in rule consequents plus a minimum number of fuzzy rules that approximate the real

system as accurately as possible. Data driven rule base induction is commonly performed using clustering algorithms.

A clustering algorithm partitions the dataset into several clusters that capture the internal trends in data space. Each

data cluster is a fuzzy relation and corresponds to a fuzzy rule. The fuzzy sets of the rules are typically obtained by

projecting the identified clusters against the corresponding data axes [6]. This process however, results in a rule base

that usually exhibits redundancy in the form of highly overlapping similar fuzzy sets as shown in Fig. 2. For a Mamdani

model [16] parameters of fuzzy sets in both rule premises and consequents are obtained by this method but for a TSK

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.7 (1-23)

S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 7

model fuzzy sets are only in the premises of fuzzy rules and can be obtained using this method. The parameters in

consequents are obtained using a cluster covariance matrix [17] or some other parameter estimation technique.

Subtractive clustering [18] is a widely used, efficient and simple clustering algorithm to obtain the initial rule base

for ANFIS. It is a well-known data clustering algorithm based on the improved Mountain method of data partitioning.

The user need not set the number of clusters as the algorithm automatically determines the best possible number of

clusters for a given input–output dataset. However, the resulting rule base size depends on the cluster neighborhood

radius. A small value results in a large rule base and therefore high model complexity while a large value leads to a

small rule base resulting in a poor model. To obtain the best possible value of this parameter usually trial and error

method is used which is also followed in this paper. The next section gives a detailed overview of this clustering

algorithm.

Subtractive clustering is based on calculating the potential function called mountain value at each data point of a

dataset. It uses each input data point in the dataset as a potential cluster center rather than using grid based formulation

in mountain clustering method. Thus this method achieves lower computational complexity for higher dimensional

data sets.

In subtractive clustering, the potential value at each data point di of a dataset D = {d1 , d2 , . . . , dP } is given by:

P

4

e−αdi −dj ,

2

α= (6)

ra2

j =1

where p is the number of data patterns, ra is a positive constant called cluster radius defining the range of influence of

a cluster center along each dimension and affects the number of clusters generated. A smaller radius leads to a higher

number of clusters which may lead to over-fitting and vice versa. Therefore to find the appropriate number of clusters

for a dataset various values of radii may be tested and the one with the best results should be chosen. The data point

with the highest potential value P1 is selected as the cluster center c1 . In order to find the subsequent cluster centers

the potential values for each data point di are modified as:

4

Pi = Pi − P1 ∗ e−βdi −c1 ,

2

β= and rb = ηra (7)

rb2

where rb is a positive constant and η is the squash factor used to squash the potential values for the distant points to

be considered as part of a cluster. The reductions in the potential values of data points near the newly found cluster

center are more than those for distant points and hence have a least chance of being selected as cluster centers. After

reducing the potential values, the data point with the highest potential is selected as the next cluster center and again

the potential of the rest of data points is reduced. In general when kth cluster center is selected, the potential value of

rest of the points is updated using:

Pi = Pi − Pk ∗ e−βdi −ck

2

(8)

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.8 (1-23)

8 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

At the end of this process n cluster centers are obtained. Each cluster center is used as a basis of obtaining a single

fuzzy rule that can describe the system behavior in a region of input output space. As a result in case of subtractive

clustering and ANFIS, if n data clusters are obtained after clustering process then each input variable for ANFIS has

n fuzzy sets associated with it and there are n fuzzy rules in the rule base.

As depicted in Fig. 2 overlapping fuzzy sets may be present in the same region of a variable domain after clustering

and lead to redundancy as the model uses multiple fuzzy sets to represent the same domain area. This unnecessarily

increases model complexity and also complicates the linguistic labeling of the rule base. In case of a simple non redun-

dant rule base it is easier to assign meaningful labels to the fuzzy sets which can facilitate the linguistic interpretation

of a fuzzy model. Therefore, the resulting rule base needs to be analyzed for redundancy and simplified before tuning

the parameters of fuzzy sets so that the final model is interpretable.

5. Similarity measures

Fuzzy sets are similar if the membership functions defining these are highly overlapping which result in approx-

imately equal membership degrees of the elements in a domain [6]. The most widely accepted and deeply studied

methods in literature for quantifying similarity between fuzzy sets are based on similarity measures [19–22]. A sim-

ilarity measure S between two fuzzy sets is a function that assigns a similarity value s to two fuzzy sets A and B

i.e.,

The similarity value indicates degree to which the fuzzy sets A and B are equal. A higher value indicates more

similar fuzzy sets with high overlapping and vice versa. If μA and μB are the membership functions associated with

fuzzy sets A and B respectively and U is the universe of discourse, then a similarity measure should satisfy following

criteria:

(2) If A and B are equal sets then

(3) If A and B are overlapping sets then

(4) Similarity is not altered if the domain in which fuzzy sets are defined is scaled or shifted i.e.,

S(A, B) = S A

, B

, μA (x) = μA

(nx + m) and μB (x) = μB

(nx + m) where m, n ∈ R, m > 0.

(13)

Various similarity measures have been proposed in literature, an in depth description of which can be found in

[17]. Similarity measures are divided into two main categories viz. geometric similarity measures and set theoretic

similarity measures. In case of geometric similarity measures fuzzy sets are considered as points in the data space

and similarity is defined as the inverse of the distance between them. Many geometric similarity measures have

been defined including those based on Minkowski distance [6] and generalized Handroff distance [23]. Set theoretic

similarity measures are considered apt for rule base simplification as these determine similarity between overlapping

fuzzy sets more appropriately and are not affected by ordering and scaling of a variable domain [24]. These measures

are based on the fuzzy set operations like union, intersection, etc. Below is an effective set theoretic similarity measure

used in practice which is based on fuzzy set union and intersection:

|A ∩ B|

S(A, B) = (14)

|A ∪ B|

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.9 (1-23)

S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 9

where | represents set cardinality, ∩ and ∪ are fuzzy set intersection and union operations respectively. This similarity

measure satisfies all the criteria for similarity measures mentioned above and has been proven to be quite adequate in

similarity analysis of fuzzy sets [5].

In order to simplify the initial rule base obtained from data driven clustering process we use a simplification

technique based on minimizing the number of fuzzy sets for each input variable of ANFIS by eliminating redundant

fuzzy sets and then removing the redundant fuzzy rules. This simplification procedure depicted in Fig. 4 can be broadly

divided into two phases: in the first phase fuzzy set merging is done using similarity analysis through similarity

measures, and in second phase fuzzy rules with equivalent premises are merged. The two phases are discussed in

below subsections:

In this paper, the similarity measure defined in eq. (14) is used to obtain the degree of similarity between each

pair of fuzzy sets for each input variable. Since fuzzy sets in the rule base are defined by membership functions, for a

discrete universe U = {xi |i = 1, 2, . . . , n}, the eq. (14) for fuzzy sets A and B can be expressed as:

n

[μA (xi ) ∧ μB (xi )]

S(A, B) = i=1 n (15)

i=1 [μA (xi ) ∨ μB (xi )]

where μA and μB are the membership function associated with A and B respectively. Symbols ∧ and ∨ represent

fuzzy union and intersection respectively. A similarity threshold λ for degree of similarity is also used that indicates

similarity value above which the similarity between two fuzzy sets is considered significant. The value of the threshold

significantly affects the model accuracy and interpretability. A smaller value implies that more number of fuzzy sets

having similarity above threshold which are merged and may lead to over-simplified model with decreased accuracy.

On the other hand a larger value may retain redundancy in model due to lesser merging which can also decrease

accuracy due to model over-fitting problem. Therefore, it is an important parameter in balancing the interpretability

and the approximation accuracy of the model.

At each step of the merging process similarity measure is calculated between each distinct pair of fuzzy sets of each

input variable. Then a pair of fuzzy sets A and B with highest similarity value above similarity threshold is selected

from the rule base and replaced by a new fuzzy set in all the fuzzy rules where A and B are present. The new fuzzy

set is obtained by merging the parameters of the membership functions of these two fuzzy sets. In this study, we have

used the following method [15] to obtain the new fuzzy set C:

nA Ap + nB Bp

Cp = (16)

n A + nB

where Cp is the vector of parameters defining fuzzy set C, nA and nB represent the number of fuzzy sets merged

before obtaining fuzzy sets A and B respectively. The above method gives more weight to the fuzzy set obtained after

merging the fuzzy sets in previous iterations. This gives better results than un-weighted average of parameters of the

two fuzzy sets being merged. The merging procedure continues until there is no pair of fuzzy sets that has similarity

value greater than merging threshold.

During the merging process the rule base is also updated as the fuzzy set obtained by merging two fuzzy sets

replaces both the fuzzy sets in all the fuzzy rules of the rule base. As a result redundant fuzzy rules may appear in the

rule base for which rule merging is done to reduce number of fuzzy rules.

The merging of fuzzy sets and replacing the original fuzzy sets with merged fuzzy set in the first phase of rule

base simplification may lead to several fuzzy rules with equivalent premises and consequents in case the initial rule

base is highly redundant. The premise or the consequent parts of two fuzzy rules are equivalent if these have the same

membership functions for each of the corresponding input and output variable. In case of Mamdani fuzzy models either

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.10 (1-23)

10 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

Fig. 3. Merging of two fuzzy sets (dotted curves) into a single fuzzy set (solid curve).

both the premises and consequents of some rules may become equal or only the premises may happen to be equal.

As we are concerned with ANFIS which is a TSK model, only the premises of fuzzy rules may become equivalent

on fuzzy set merging because rule consequents are not fuzzy at all in case of a TSK fuzzy system. The presence of

such fuzzy rules with equivalent premises and unequal consequents leads to inconsistency and redundancy in the rule

base which needs to be addressed. When the rule base has m fuzzy rules with equal premises, m − 1 of these rules

are removed from the rule base. However, the consequent parameters of the fuzzy rule kept in the rule base need to be

re-estimated. The process is referred as fuzzy rule merging (Fig. 3).

In order to remove the redundant fuzzy rules, m fuzzy rules with equal premises are replaced by a single fuzzy

rule with premise part equal to that of m original rules and consequent parameters are obtained by averaging the

corresponding parameters of m equal rules i.e.,

1

m

P= Pi (17)

m

i=1

where Pi is the vector of the consequent parameters of the ith rule. The whole rule base simplification procedure is

depicted in Fig. 4, and the algorithm is summarized below.

Given an initial fuzzy rule base with N fuzzy rule, n input variables and a preset λ, λ ∈ (0, 1), the algorithm is

given below:

REPEAT:

Step 1: for j = 1 to n

for i = 1 to N

for k = 1 to N

sikj = S(Aij , Akj )

set slmq = maxi =k (sikj )

Step 2: if slmq ≥ λ

merge Alq and Amq into A

set Alq = A and Amq = A in all fuzzy rules.

UNTILL: no two fuzzy sets in rule base have similarity value sij k ≥ λ, i = k.

Step 3: for i = 1 to N

for j = 1 to N

if premise(Ri ) = premise(Rj ), i = j

i. merge Ri and Rj into R

ii. add R into rule base

iii. remove Ri and Rj from the rule base

The optimal value of λ has been obtained by experimenting with different values and generally varies according

to a problem at hand. But a much larger value of λ may retain complexity in the model because some redundancies

remain in the model due to lesser fuzzy set merging. On the other hand a much smaller value may unnecessarily

remove some important fuzzy sets and fuzzy rules from the rule base due to excessive fuzzy set merging which may

result in a poor under-fitted model. In case the initial model is overestimated, rule base simplification also improves

model generalization capability in addition to improving interpretability.

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.11 (1-23)

S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 11

7. Training methodology

Using rule base simplification, ANFIS becomes more tractable in terms of interpretability but the approximation

accuracy of the system is not acceptable at this stage. A parameter fine-tuning algorithm is needed to train the system

so that a model with optimal or satisfactory accuracy is obtained. In case of ANFIS, system parameters are rule

base parameters that can be decomposed into two sets: one is the set A of non-linear parameters of membership

functions related to rule antecedents, and other is the set B of linear parameters in rule consequents. We use hybrid

back-propagation based learning technique in batch mode that combines gradient descent (GD) and least square

estimation (LSE) methods. This algorithm is more efficient than standalone GD method in that it is faster and has

lesser chances of getting stuck in local minima [2]. Each epoch of the hybrid learning method consists of a forward

and a backward pass. During forward pass an instance of a data set is passed through ANFIS structure via input layer

and outputs are obtained from output layer. If X is the data set of size M, |B| = P and Y is the output vector, for fixed

set of non-linear system parameters in A, the three can be put in matrix equation form as:

XB = Y (18)

where dimensions of X, B and Y are M × P , P × 1 and M × 1. This becomes an over-determined problem without

an exact solution as the size of data sets is usually greater than the number of the linear parameters. Therefore, a least

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.12 (1-23)

12 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

square estimate of B, B

is used that minimizes the squared error XB − Y 2 . The value of B

is obtained using a

widely known formula based on pseudo-inverse of B:

−1

B

= XT X XT Y (19)

where X T is the transpose of X and (X T X)−1 X T is the pseudoinverse of X. The above formula is computationally

expensive and can be computed iteratively using sequential formulas as used in [25]:

T

Bi+1 = Bi + Si+1 xi+1 yi+1 − xi+1

T

Bi (20)

T S

Si xi+1 xi+1 i

Si+1 = Si − , 0≤i ≤M −1 (21)

1 + xi+1

T S x

i i+1

where xiT is the ith row vector of X, yiT is the ith element of Y , Si is the covariance matrix. Initially X0 = 0 and

S0 = αI where α is a positive number and I is an identity matrix of size P × P . In case of a system with n outputs,

the derivation is valid except that yiT is the ith row of matrix Y .

Therefore, during each forward pass a data set pattern is passed into network, linear parameter set B is adjusted,

output is obtained while non-linear parameter set A stays constant. Next, based on fixed linear parameter values in B,

non-linear parameters of fuzzy sets in rule antecedents are updated. Using the outputs for each input data pattern p,

error measure Ep is calculated as the sum of squared errors as:

n

Ep = (Ti,p − Yi,p )2 (22)

i=1

where, n is the number of system outputs, Yi,p is the ith system output and Ti,p is the real output value for input

pattern p. Using per pattern error measure the overall network error is calculated as:

M

E= Ep (23)

p=1

Starting from the nodes in output layer, a backward pass is done so that non-linear parameters in A are adjusted

in the direction of minimum error. This is done iteratively using gradient descent method by calculating error rates

from node activations at the nodes in various layers. The error rate at the ith output node is calculated from the

corresponding output value (node activation) as:

∂Ep

= −2(Ti,p − Yi,p ) (24)

∂Yi,p

The error rate at a node i in the hidden layer K is obtained using chain rule and is given by:

K

∂Ep

nKi+1 i+1

∂Ep ∂Yj,p

k

= K k

(25)

∂Yi,p j =1 ∂Yj,pi+1 ∂Yi,p

where, nKi+1 is the number of nodes in the layer next to layer K. This gives the error rate at a hidden layer node as

the linear combination of the error rates at nodes in the next layer. If w is a generic updatable parameter, the error rate

with respect to w for a data pattern p is given by:

∂Ep ∂Ep ∂Y ∗

= (26)

∂w ∗

∂Y ∗ ∂w

Y ∈Q

where, Q is the set of nodes whose output depends on parameter w. The derivative of the overall network error using

w is given by:

∂E ∂Ep

M

= (27)

∂w ∂w

p=1

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.13 (1-23)

S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 13

Using eq. (27) and learning rate η the generic parameter w is updated using

∂E s

w = −η , η= (28)

∂w ∂Ep 2

w( ∂w )

In case of a TSK fuzzy model the generic parameter w refers to a parameter of a membership function in the fuzzy

rule antecedent, η is the learning rate and s is the step size which is a parameter that may have an impact on the

convergence speed of the network. Using a small value for s helps to closely trail the gradient path but reduces the

convergence speed while a large value increases the convergence speed but the system may fluctuate about the optimal

solution. In this paper, two heuristic rules used in [2] have been used for updating s and gave satisfactory results:

ii. If the error measure shows a combination of an increase followed by a decrease twice consecutively s is decreased

by 10%.

The parameter fine-tuning process in ANFIS may lead to a complex fuzzy rule base due to the unrestrained mod-

ification of membership function parameters. This results in a final fuzzy system with poor transparency even if an

interpretability oriented approach is followed during structure learning. In order to guarantee interpretability of the

final fuzzy system, constraints must be applied before applying updates to the parameters so that the adjacent mem-

bership functions do not overlap excessively i.e. stay distinguishable, do not have parameter values that are invalid

and remain consistent, do not exchange position with the adjacent membership functions and so on. This can be done

by considering the semantic properties of the membership functions based on width and position parameters which

determine the support size and location of a membership function respectively. We demonstrate the approach to retain

interpretability in terms of well-known Gaussian membership function defined by eq. (29). The center of the function

c determines function location and width σ determines support which is in fact the standard deviation of the function

corresponding to an input x in the domain in which v is defined.

−(c−x)2

f (x; c, σ ) = e 2σ 2 (29)

It is important to ensure that consistency of the membership functions is maintained during parameter updates. For

example, in case of a Gaussian membership function, if σ becomes negative, the membership function is incorrect and

therefore to maintain membership function validity, σ is set to a small value or simply zero. Similarly, if on update

the function center assumes a value that is lower than the lowest bound of the domain or greater than highest value,

the center is changed to be equal to the corresponding bound value that it crosses. That is if X is the domain in which

an input variable is defined having a membership function with center c,

if c < Xmin then c = Xmin and (30)

if c > Xmax then c = Xmax (31)

To ensure sufficient but proportionate overlapping between the membership functions either possibility measure

between each pair of fuzzy sets may be used or updates to the support parameters of the functions may be constrained.

In this work we have used the later method. If on update the extreme of the support of a Gaussian membership function

f becomes greater than the extreme of the adjacent function j on right side, overlapping is excessive. This is the case

when:

cf + 3σf > cj + 3σj (32)

To avoid this condition σf is changed as:

cj + 3σj − cf

σf = (33)

3

In case excessive overlapping occurs with left neighboring function i i.e.:

cf − 3σf > ci − 3σi (34)

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.14 (1-23)

14 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

σf is changed as:

ci − 3σi − cf

σf = (35)

−3

In case of a generic two sided membership functions the above constraints are applied on the updates of the position

and support parameters of both the left and right components of the function.

For avoiding a membership function being completely or almost completely included in other function, the distance

between membership functions has to be monitored during parameter learning phase. If on updating the center of a

membership function k, the condition:

with its left neighboring function i does not hold, the centers of both the membership functions need to be changed

as:

c k + ci α(Xmin − Xmax )

ck = + (37)

2 2

c k + ci α(Xmin − Xmax )

ci = − (38)

2 2

where, γ in eq. (36) is the percentage of domain for obtaining least distance between membership functions. In

the same way inclusion with the right neighboring membership function can be removed. The no pass constraint

i.e. the adjacent functions do not exchange positions is ensured by simply comparing the position parameters of the

adjacent membership functions and changing them appropriately. The constraints can be applied after every epoch or

periodically after n epochs but the experiments indicated that this changes the accuracy and interpretability results of

the model accordingly. The constraints may be relaxed if the accuracy of the model is not satisfactory but it may affect

the interpretability of the system.

8. Experimental results

In this section we provide the experimental analysis of the effectiveness of rule base simplification and constrained

learning in ANFIS by application to two well-known forecasting problems and a real world stock price prediction

problem. In all the experiments Gaussian membership function have been used for all the input variables. A compari-

son with conventional ANFIS on the basis of forecasting accuracy and interpretability aspects has been provided and

discussed.

In this section the proposed interpretability oriented neuro-fuzzy modelling approach to ANFIS is used for approx-

imating a well-known non-linear sinc function given by:

sin(x) sin(y)

sinc(x, y) = × (39)

x y

Using the inputs in the range of [−15, 15] × [−15, 15] we obtained 372 input–output patterns. From this data set,

170 instances were used as training data, 102 instances as checking data and 100 records as test data.

The data was used in the structure learning phase to construct initial fuzzy rule base for ANFIS. As already dis-

cussed in section 4.1, the numerical value of cluster radius ra affects the number of data clusters obtained and therefore

the number of fuzzy rules and fuzzy sets per variable obtained after applying subtractive clustering. Larger the value

of ra lesser is the number of clusters and vice versa. Therefore, a trial and error method is followed to find the optimal

value of ra which is the one that gives least error on test data. For this approximation problem cluster radius of 0.5

was optimal with least training and testing RMSE. It resulted in 4 clusters and initial rule base for ANFIS with 4 fuzzy

rules and 4 membership functions per input variable.

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.15 (1-23)

S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 15

Table 1

Rule base simplification results.

Model Fuzzy sets per input variable Linear parameter Non-linear parameters

X Y

Conventional ANFIS 4 4 12 16

Simplified ANFIS (λ = 0.80) 4 2 12 12

Simplified ANFIS (λ = 0.75) 3 2 9 10

Simplified ANFIS (λ = 0.70) 2 2 9 8

Fig. 5. Training RMSE plots of ANFIS based on proposed approach and conventional ANFIS.

Fig. 6. Checking RMSE plots of ANFIS based on proposed approach and conventional ANFIS.

ANFIS with 4 fuzzy rules in the rule base was subsequently passed through simplification phase to remove any

redundant fuzzy sets and fuzzy rules. It can be observed from Table 1 that how simplification process using differ-

ent values of similarity threshold λ reduces the number of fuzzy sets in the rule base through fuzzy set merging.

Fuzzy rule merging may also occur in case fuzzy set merging is high which reduces the number of fuzzy rules in

the rule base. Table 1 also depicts the structural properties of conventional ANFIS on same data for comparison

purpose.

For improving approximation accuracy, different simplified ANFIS versions were trained using constrained hy-

brid learning algorithm discussed in section 7. Conventional ANFIS was trained using unconstrained hybrid learning.

Training RMSE and checking RMSE plots during first 100 epochs for simplified ANFIS (λ = 0.75) and conventional

ANFIS are shown in Fig. 5 and Fig. 6 respectively which clearly indicate lower error values for simplified ANFIS.

Training was done for 500 epochs but the checking errors stayed at a constant value beyond epoch 198. The mem-

bership functions of conventional ANFIS and simplified ANFIS (λ = 0.75) after the completion of training phase are

shown in Fig. 7 and Fig. 8 respectively. The figures indicate better interpretable input space partitioning in case of

simplified ANFIS with non-overlapping membership functions as compared to that of conventional ANFIS. Table 2

shows the approximation accuracy of the models in terms of RMSE on test data. The results indicate how different

values of λ affect the RMSE value and therefore model generalization.

Similarity threshold λ affected the number of fuzzy sets and fuzzy rules being merged in the system which in turn

affected the complexity and approximation accuracy of the system. It can be observed from the table that simplified

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.16 (1-23)

16 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

Fig. 8. Membership functions of two inputs for ANFIS based on proposed approach (λ = 0.75).

Table 2

RMSE values on test data.

Model No. of fuzzy rules Test data RMSE

ANFIS 4 0.0567

NEFPROX 59 0.0571

Simplified (λ = 0.80) and constrained ANFIS 4 0.0546

Simplified (λ = 0.75) and constrained ANFIS 3 0.0534

Simplified (λ = 0.70) and constrained ANFIS 3 0.540

ANFIS showed least RMSE at λ = 0.75. The most appropriate values for λ where in the range of [0.80, 0.70] which

led to simplified ANFIS versions with least RMSE. λ values > 0.80 led to lesser fuzzy set and fuzzy rule merging

due to which redundancy persisted in the system leading to an over-fitted model with lower approximation accuracy

having higher RMSE on test data due to poor model generalization. Reducing λ below 0.70 led to merging of many

non-overlapping fuzzy sets with lower similarity measures which over-simplified the model and thus RMSE on test

data increased.

Thus, similarity threshold λ can be adjusted to obtain a desired balance between interpretability and approximation

accuracy of the system.

Therefore, using an appropriate similarity threshold during rule base simplification and constrained learning an

interpretable neuro-fuzzy model with better accuracy during training and testing phases can be obtained. Table 2 also

compares the RMSE of conventional ANFIS and various simplified versions of ANFIS with NEFPROX [4] based on

the same dataset. We have considered NEFPROX as it is a well-known neuro-fuzzy model for function approximation

based on interpretability criteria for neuro-fuzzy systems. It is evident from the results that ANFIS obtained using the

proposed modelling approach based on rule base simplification and constrained hybrid learning is simple implying

better interpretability and also has lesser RMSE on test data implying better generalization.

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.17 (1-23)

S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 17

Fig. 9. A snapshot of the CNX Nifty stock price dataset with values of essential technical indicators and corresponding basic stock quantities.

Forecasting the direction of future stock market price fluctuations from historical prices is a prerequisite for in-

vestors and financial consultants for making efficient trading decisions. It is a complex and difficult task due to the

chaotic behavior and high uncertainty in the stock market prices. A number of research studies have employed ANFIS

to obtain accurate and reliable stock trading systems. We used the proposed fuzzy modelling methodology based on

data driven rule base induction, rule base simplification and constrained hybrid learning to build ANFIS with bet-

ter interpretability for predicting the day-ahead closing price. The impact of rule base simplification and constrained

learning on accuracy and interpretability of ANFIS is methodically investigated.

Daily CNX Nifty stock dataset of 2703 records from January 3, 2005 to December 24, 2015 comprising of five

fundamental stock quantities viz. maximum price, open price, minimum price, close price and trading volume of stock

was used as experimental dataset. Forecasting was done using technical indicators as inputs of the model which were

calculated from the four basic stock quantities.

A correlation matrix was used as the feature selection method to select the most essential technical indicators as

model inputs. Based on a two-tailed significance test (using 0.05 as significance level), the model inputs were: volume

(VOL), moving average (MA), rate of change (ROC), relative strength index (RSI), stochastic oscillators (%K, %D),

William’s percent range (%R) and moving average convergence and divergence (MACD). Fig. 9 gives a snapshot of

these essential technical indicators along with the basic stock quantities used to calculate these. The values of the

indicators were calculated using 10 periods of four basic stock quantities. 1700 records were used for model training,

603 records as validation data and 400 instances as test data.

To obtain the initial rule base from data, subtractive clustering with cluster radius r = 0.5 was used which resulted

in 18 fuzzy rules and therefore 18 fuzzy sets per input. This system configuration was found optimal with respect

to forecasting error. But the initial rule base exhibited high redundancy in terms of highly overlapping membership

functions in case of all the input variables. Therefore, fuzzy set merging was performed to remove redundant fuzzy

sets of input variables and subsequent rule deletion was done to simplify the model. Fuzzy set merging was done using

different similarity thresholds, the results of three different values of λ are shown in Table 3. The results indicate a

neuro-fuzzy model with much simpler structure having smaller number of fuzzy sets per input variable and lesser

number of linear and non-linear parameters than conventional ANFIS obtained using the same dataset. Fig. 10 and

Fig. 11 compare the membership functions of simplified ANFIS (λ = 0.70) and conventional ANFIS for input 5 (%K)

and input 7 (%R). It can be clearly observed from the figures how rule base simplification using an appropriate simi-

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.18 (1-23)

18 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

Table 3

Rule base simplification results.

Stock prediction model Fuzzy sets per input variable Linear parameters Non-linear parameter

VOL MA ROC RSI %K %D %R MACD

Conventional ANFIS 18 18 18 18 18 18 18 18 162 288

Simplified (λ = 0.80) and constrained ANFIS 3 7 5 7 8 7 8 6 162 102

Simplified (λ = 0.75) and constrained ANFIS 6 7 3 7 2 7 4 6 144 84

Simplified (λ = 0.70) and constrained ANFIS 2 6 2 7 2 6 4 6 144 70

Fig. 10. Membership functions for two inputs of simplified ANFIS (λ = 0.75).

larity threshold λ removes the redundant fuzzy sets form the rule base thereby simplifying the model and improving

interpretability.

The prediction accuracy of three versions of simplified ANFIS in Table 3 was improved by training the model using

constrained hybrid learning algorithm for 1500 epochs above which the RMSE on checking data remained more or less

constant. After training, test data was used to assess generalization capability. The RMSE values of different versions

of the simplified ANFIS obtained using various values of similarity threshold λ trained using constrained hybrid

learning are depicted in Table 4. Different versions of simplified ANFIS showed different RMSE values on test data.

Table 4 also compares the RMSE values with conventional ANFIS and NEFPROX on the same test data. Conventional

ANFIS and NEFPROX models were obtained using the same training and checking data sets as used for simplified

ANFIS. Evidently, simplified ANFIS obtained using λ = 0.75 and trained using constrained hybrid learning showed

least RMSE indicating a better forecasting accuracy. The model therefore obtains a better accuracy-interpretability

tradeoff as it is much simpler than conventional ANFIS with lesser number of fuzzy rules and much smaller number

of fuzzy sets per input variable (Table 3) but shows lowest test data RMSE. Reducing λ further increased RMSE on

test data due to model over-simplification.

Fig. 12 and Fig. 13 show the plots of predicted close prices of the ANFIS based on proposed approach (λ = 0.75)

and conventional ANFIS against the actual next day close prices in the test dataset respectively. The day ahead close

price plots for simplified ANFIS with constrained learning shows meagre difference between the actual and output

close prices. This shows the applicability of ANFIS based on proposed interpretability oriented modelling technique

for stock price prediction which is characterized by tremendous amount of noise in stock market data. Therefore, it

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.19 (1-23)

S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 19

Table 4

RMSE values on test data.

Model No. of fuzzy rules Test data RMSE

Conventional ANFIS 18 64.6742

NEFPROX 261 70.455

Simplified (λ = 0.80) and constrained ANFIS 18 65.8591

Simplified (λ = 0.75) and constrained ANFIS 16 61.2799

Simplified (λ = 0.70) and constrained ANFIS 16 63.3878

Fig. 12. Actual close prices and output close prices for ANFIS based on proposed approach (λ = 0.75) on test data.

Fig. 13. Actual close prices and output close prices for conventional ANFIS on test data.

is obvious that using rule base simplification with an appropriate value of similarity threshold along with constrained

hybrid learning ANFIS demonstrates better forecasting accuracy on test data while also ensuring the interpretability of

the final optimized model. This implies a simple stock price forecasting model with a better accuracy-interpretability

tradeoff.

In this subsection, chaotic time series prediction is considered. For this purpose Mackey–Glass time series is used.

This time series is generated using Mackey–Glass differential delay equation given by:

0.2x(t − τ )

x(t) = − 0.1x(t) (40)

1 + x 10 (t − τ )

Using the previous values of time series up to a point t separated by equal interval I , the future value at some point

t + p is predicted. That is if there are N known previous values given by x(t − (N − 1)I ), . . . , x(t − I ), x(t), the future

value x(t + p) is predicted. In this experiment we have used N = 4 so that the model has four inputs and P = I = 6.

We obtained 1000 input–output patterns using the method depicted in [2] with I = 0.1, x(0) = 1.2, t within range

[0, 2000] and τ = 17. To build the model 500 instances were used as training data and rest 500 records as validation

data.

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.20 (1-23)

20 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

Table 5

Rule base simplification results.

Model Fuzzy sets per variable Linear parameters Non-linear parameters

X(t − 18) X(t − 12) X(t − 6) X(t)

Conventional ANFIS 10 10 10 10 50 80

Simplified (λ = 0.75) and constrained ANFIS 6 5 8 6 50 50

Simplified (λ = 0.70) and constrained ANFIS 5 5 6 6 50 44

Simplified (λ = 0.65) and constrained ANFIS 4 5 6 6 50 42

Table 6

RMSE values on test data parameters of rule base for maximizing system accuracy.

Model No. of fuzzy rules Test data RMSE

Conventional ANFIS 10 0.0017

Simplified (λ = 0.75) and constrained ANFIS 10 0.0035

Simplified (λ = 0.70) and constrained ANFIS 10 0.0019

Simplified (λ = 0.65) and constrained ANFIS 10 0.0027

NEFPROX 128 0.0301

Subtractive clustering with neighborhood radii 0.5 and 0.4 resulted in 10 and 15 fuzzy rules respectively. However,

in terms of RMSE on test data and model simplicity, the rule base with 10 rules performed better. So the model with 10

fuzzy initial rules was considered for rule base simplification phase. The simplification results are displayed in Table 5

which also depicts the configuration of conventional un-simplified ANFIS. As can be observed different values of λ

led to different versions of simplified ANFIS which also affected the forecasting accuracy. It is evident that in terms of

the number of fuzzy sets per input variable and therefore the number of non-linear parameters ANFIS is significantly

simplified.

For improving approximation accuracy of different versions of simplified ANFIS, constrained hybrid learning

technique was used. Fig. 14 (right) shows the fuzzy sets of four input variables of final conventional un-simplified

ANFIS obtained using unconstrained hybrid learning indicating high model complexity due to the presence of similar

fuzzy sets defined by highly overlapping membership functions and unconstrained parameter updates. ANFIS based

on the proposed neuro-fuzzy approach is much interpretable in terms of the fuzzy partitions of input variables as can

be observed from Fig. 14 (left). The reason is the removal of highly overlapping membership functions due to rule

base simplification and the application of constrained learning for tuning the membership function parameters.

At the end of the learning phase the models were tested for generalization accuracy using an independent test

dataset, the results are shown in Table 6. It can be observed that the forecasting accuracy is different for different

simplified versions of ANFIS. Table 6 also compares the forecasting accuracy of ANFIS based on proposed modelling

approach with conventional ANFIS and NEFPROX for this approximation problem. Amongst all the models, the

simplified version of ANFIS obtained using λ = 0.70 with just five or six membership functions per input variable

and trained using constrained parameter learning has lowest RMSE on test data. The result is significant as this

version of ANFIS is highly interpretable in terms of low complexity due to removal of overlapping fuzzy sets and well

distinguished fuzzy partitions of input variables and yet shows a small RMSE value. This ascertains the requirement

of the rule base simplification and constrained learning during fuzzy modelling in order to achieve a balance between

simplicity and approximation accuracy.

9. Conclusion

In this paper we proposed a novel approach to system identification using ANFIS oriented towards improved

interpretability and generalization capability. The methodology is based on the application of an effective rule base

simplification technique for initial fuzzy model obtained via data driven approaches and then using a constrained

hybrid learning method that separately tunes the linear and non-linear parameters of rule base for maximizing system

accuracy.

The approach has been successfully applied for three simulation examples and a thorough analysis of the impact

of this approach on interpretability and accuracy aspects of ANFIS has been performed. It has been experimentally

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.21 (1-23)

S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 21

Fig. 14. Membership functions of four inputs for ANFIS based on rule base simplification and constrained learning (left) after learning phase and

corresponding membership functions (right) in case of conventional ANFIS.

demonstrated that the rule base simplification using similarity analysis based on set theoretic similarity measure ef-

fectively removes redundant fuzzy sets and fuzzy rules from the rule base. This simplifies the model and enhances the

generalization capability. Further by adjusting the value of similarity threshold during rule base simplification pro-

cess a desired interpretability-accuracy tradeoff can be obtained. The constrained hybrid technique based on GD and

LSE effectively tunes the system while ensuring consistency, limited overlapping, no position exchange and distin-

guishability of fuzzy sets of the resulting neuro-fuzzy system. The experimental results have shown that the proposed

neuro-fuzzy modelling approach results in ANFIS with better approximation accuracy and interpretability features

and thus is useful in obtaining effective and simple TSK neuro-fuzzy based framework for real system identification

and forecasting problems.

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.22 (1-23)

22 S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–•••

References

[1] J. Casillas, O. Cordon, et al., Interpretability Issues in Fuzzy Modeling, Springer, Berlin–Heidelberg, 2003.

[2] J.S.R. Jang, ANFIS: Adaptive-Network-Based Fuzzy Inference System, IEEE Trans. Syst. Man Cybern. 23 (1993) 665–685.

[3] M. Sugeno, G.T. Kang, Structure identification of fuzzy model, Fuzzy Sets Syst. 28 (1) (1988) 15–33.

[4] D. Nauck, R. Kruse, Neuro-fuzzy systems for function approximation, Fuzzy Sets Syst. 101 (1999) 61–271.

[5] M. Setnes, H. Roubos, GA-fuzzy modeling and classification: complexity and performance, IEEE Trans. Fuzzy Syst. 8 (1995) 509–522.

[6] M. Setnes, R. Babuska, et al., Similarity measures in fuzzy rule base simplification, IEEE Trans. Syst. Man Cybern. 3 (1998) 376–386.

[7] M.Y. Chen, D.A. Linkens, Rule-base self-generation and simplification for data-driven fuzzy models, Fuzzy Sets Syst. 142 (2004) 243–265.

[8] Y. Jin, Fuzzy modeling of high-dimensional systems: complexity reduction and interpretability improvement, IEEE Trans. Fuzzy Syst. 2

(2000) 212–221.

[9] A. Klose, A. Nurnberger, D. Nauck, Improved NEFCLASS pruning techniques applied to a real world domain, in: Proceedings of Neuronale

Netze in der Anwendung, NN’99, University of Magdeburg, 1999.

[10] L.T. Koczy, K. Hirota, Size reduction by interpolation in fuzzy rule bases, IEEE Trans. Syst. Man Cybern. 27 (1) (1997) 14–25.

[11] J. Espinosa, J. Vandewalle, Constructing fuzzy models with linguistic integrity from numerical data-AFRELI Algorithm, IEEE Trans. Fuzzy

Syst. 8 (2000) 591–600.

[12] V.D. Oliveira, Towards neuro-linguistic modeling: constraints for optimization of membership functions, Fuzzy Sets Syst. 3 (1999) 357–380.

[13] C. Mencar, A.M. Fanelli, Interpretability constraints for fuzzy information granulation, Inf. Sci. 178 (2008) 4585–4618.

[14] D. Nauck, R. Kruse, A neuro-fuzzy to obtain interpretable model for function approximation, in: Proceedings of IEEE Conference on Fuzzy

Systems, 1998, pp. 1106–1111.

[15] R.R. Paiva, A. Dourado, Interpretability and learning in neuro-fuzzy systems, Fuzzy Sets Syst. 147 (2004) 17–38.

[16] E.H. Mamdani, Application of fuzzy algorithms for control of a simple dynamic plant, Proc. Inst. Electr. Eng. 12 (1974) 1585–1588.

[17] I. Couso, L. Garrido, L. Sánchez, Similarity and dissimilarity measures between fuzzy sets: a formal relational study, Inf. Sci. 229 (2013)

122–141.

[18] S. Chiu, Fuzzy model identification based on cluster estimation, J. Intell. Fuzzy Syst. 2 (1994) 267–278.

[19] C.P. Pappis, N.I. Karacapilidis, A comparative assessment of measures of similarity of fuzzy value, Fuzzy Sets Syst. 56 (1993) 171–174.

[20] S.M. Chen, M.S. Yeh, P.Y. Hsiao, A comparison of similarity measures of fuzzy values, Fuzzy Sets Syst. 72 (1995) 79–89.

[21] I. Beg, S. Ashraf, Similarity measures for fuzzy sets, Appl. Comput. Math. 8 (2009) 192–202.

[22] D. Guha, D. Chakraborty, New approach to fuzzy distance measure and similarity measure between two generalized fuzzy numbers, Appl.

Soft Comput. 10 (2010) 90–99.

[23] A.L. Ralescu, D.A. Ralescu, Probability and fuzziness, Inf. Sci. 34 (1984) 85–92.

[24] R. Zwick, E. Carlstein, D.V. Budescu, Measures of similarity among fuzzy concepts: a comparative analysis, Int. J. Approx. Reason. 1 (2)

(1987) 221–242.

[25] K.J. Åström, B. Wittenmark, Computer-Controlled Systems: Theory and Design, Printice-Hall, 1984.

[26] S. Rajab, V. Sharma, A review on the applications of neuro-fuzzy systems in business, Artif. Intell. Rev. 49 (4) (2018) 481–510.

[27] J.M. Alonso, C. Castiello, M. Lucarelli, C. Mencar, Modeling interpretable fuzzy rule-based classifiers for medical decision support, in: Data

Mining: Concepts, Methodologies, Tools, and Applications, IGI Global, 2015.

[28] K. Łapa, K. Cpałka, L. Wang, New approach for interpretability of neuro-fuzzy systems with parametrized triangular norms, in: Interna-

tional Conference on Artificial Intelligence and Soft Computing, ICAISC 2016, in: Lecture Notes in Computer Science, vol. 9692, 2016,

pp. 248–265.

[29] M. Pota, M. Esposito, Insights into interpretability of neuro-fuzzy systems, in: 16th World Congress of the International Fuzzy Systems

Association, IFSA, 2015.

[30] P.R. Dian, S.M. Shamsuddin, S.S. Yuhaniz, Particle swarm optimization for ANFIS interpretability and accuracy, Soft Comput. 20 (2016)

251–262.

[31] X. Zhao, L. Zhang, P. Shi, H.R. Karimi, Novel stability criteria for T–S fuzzy systems, IEEE Trans. Fuzzy Syst. 22 (2014) 313–323.

[32] S.K. Kommuri, M. Defoort, H.R. Karimi, K.C. Veluvolu, A robust observer-based sensor fault-tolerant control for PMSM in electric vehicles,

IEEE Trans. Ind. Electron. 63 (2016) 7671–7681.

[33] H.R. Karimi, B. Lohmann, B. Moshiri, P.J. Maralani, Wavelet-based identification and control design for a class of nonlinear systems, Int. J.

Wavelets Multiresolut. Inf. Process. 4 (2006) 213–226.

[34] Y. Wang, H. Shen, H.R. Karimi, D. Duan, Dissipativity-based fuzzy integral sliding mode control of continuous-time T–S fuzzy systems,

IEEE Trans. Fuzzy Syst. 26 (2018) 1164–1176.

[35] B. Jiang, H.R. Karimi, Y. Kao, C. Gao, A novel robust fuzzy integral sliding mode control for nonlinear semi-Markovian jump T–S fuzzy

systems, IEEE Trans. Fuzzy Syst. (2018), https://doi.org/10.1109/TFUZZ.2018.2838552.

[36] T. Youssef, M. Chadli, H.R. Karimi, R. Wang, Actuator and sensor faults estimation based on proportional integral observer for TS fuzzy

model, J. Franklin Inst. 354 (2017) 2524–2542.

[37] L. Zhao, W. Pawlus, H.R. Karimi, K.G. Robbersmyr, Data-based modeling of vehicle crash using adaptive neural-fuzzy inference system,

IEEE/ASME Trans. Mechatron. 19 (2014) 684–696.

[38] X.C. Dong, Y.Y. Zhao, H.R. Karimi, P. Shi, Adaptive variable structure fuzzy neural identification and control for a class of MIMO nonlinear

system, J. Franklin Inst. 350 (2013) 1221–1247.

[39] I. Ebtehaj, H. Bonakdari, Performance evaluation of adaptive neural fuzzy inference system for sediment transport in sewers, Water Resour.

Manag. 28 (2014) 4765–4779.

[40] T. Uçar, A. Karahoca, D. Karahoca, Tuberculosis disease diagnosis by using adaptive neuro fuzzy inference system and rough sets, Neural

Comput. Appl. 23 (2013) 471–483.

JID:FSS AID:7537 /FLA [m3SC+; v1.290; Prn:27/11/2018; 15:21] P.23 (1-23)

S. Rajab / Fuzzy Sets and Systems ••• (••••) •••–••• 23

[41] X. Xue, X. Yang, Application of the adaptive neuro-fuzzy inference system for prediction of soil liquefaction, Nat. Hazards 67 (2013) 901–917.

[42] S. Naji, S. Shamshirband, H. Basser, D. Petković, Application of adaptive neuro-fuzzy methodology for estimating building energy consump-

tion (Tier 1), Renew. Sustain. Energy Rev. 53 (2015) 1520–1528.

[43] X.P. Xie, Z.W. Liu, X.L. Zhu, An efficient approach for reducing the conservatism of LMI-based stability conditions for continuous-time T–S

fuzzy systems, Fuzzy Sets Syst. 263 (2015) 71–81.

[44] X.P. Xie, D. Yue, H. Zhang, C. Peng, Control synthesis of discrete-time T–S fuzzy systems: reducing the conservatism whilst alleviating the

computational burden, IEEE Trans. Cybern. 47 (2017) 2480–2490.

[45] L. Huang, K. Wang, P. Shi, H.R. Karimi, A novel identification method for generalized TS fuzzy systems, Math. Probl. Eng. 2012 (2012)

893807.

[46] K. Kolusa, D.I. Philippe, A. Dubé, D. Dubeau, Classifying work rate from heart rate measurements using an adaptive neuro-fuzzy inference

system, Appl. Ergon. 54 (2016) 158–168.