Sie sind auf Seite 1von 10

Chemometrics and Intelligent Laboratory Systems 162 (2017) 73–82

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems

journal homepage:

Fault propagation path estimation in NGL fractionation process using MARK

principal component analysis

Usama Ahmeda, Daegeun Haa, Jinjoo Ana, Umer Zahidb, Chonghun Hana,
School of Chemical and Biological Engineering, Seoul National University, Seoul 151-744, Republic of Korea
Chemical Engineering Department, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia


Keywords: Multivariate statistical methods for process monitoring are attaining a lot of attention in chemical and process
Principal Component Analysis (PCA) industries to enhance both the process performance and safety. The fault in one process variable readily affects
Fault detection the other variables which makes it difficult to identify the fault variable precisely. In this study, principal
Singular Value Decomposition (SVD) component analysis (PCA) model has been developed and applied to monitor the NGL (natural gas liquid)
Residual subspace (RS)
fractionation process. Normal and fault case scenarios are developed and compared statistically to identify the
Fault propagation path estimation
fault variable and to estimate the fault propagation path in the system. The simulated NGL plant is first validated
against the design data and then the developed methodology is applied to predict the fault direction by
projecting the samples on the residual subspace (RS). The RS of fault data is usually superimposed by normal
variations which must be eliminated to amplify the fault magnitude. The RS is further transformed into co-
variance matrix followed by Singular Value Decomposition (SVD) analysis to generate the fault direction matrix
corresponding to the highest eigenvalue. The process variables are further analyzed according to their
magnitude of contribution towards a particular fault that in turn can be used for the determination of fault
propagation path in the system. Furthermore, the applied methodology can quickly detect the fault variable
irrespective of using the fault detection indices where the variable showing highest variation is most likely to be
the fault variable.

1. Introduction application of data-driven models. The success of these models can be

seen in various industries, for example, semiconductor manufacturing
The development in automation and control systems makes it [1], chemical [2] and steel industries [3]. Based on the knowledge,
possible to collect a large amount of data from both process and experience and existing models, several univariate and multivariate
product development industries. However, analyzing and interpreting statistical methods have been used for fault detection and diagnosis.
the data always remained the key issue. Quick data analysis using Most of the process variables exhibit a strong correlation making
various statistical tools has already enhanced the process performance, multivariate statistical methods as a preferred approach over univariate
reduced industrial waste and still exhibits a potential to improve the statistical methods for fault detection and its diagnosis. Principal
process economics. The real time process monitoring and fault diag- component analysis (PCA) is an efficient multivariate statistical tech-
nosis is gaining a lot of attention in process industries to enhance both nique used for process monitoring and control [4,5], fault detection and
the performance and product quality. The safe operation of chemical diagnosis [6,7], and sensor validation [8,9]. The basic strategy of PCA
plants demands for a large number of sensors to monitor the process is to generate a few principal components from a high dimensional
variables. However, the increase in number of sensors not only correlated data into uncorrelated data while retaining the original
increases the chances of sensor faults but also make it difficult to information. PCA method can also be used as a tool that models the
analyze the recorded data. process behavior in terms of variables during normal operation and
Data-driven models for fault detection and diagnosis had been compares the changes in variables during the fault situation. Hotelling
widely used in process industries during the last few decades. The T2 statistics and Q-statistic (squared prediction error (SPE)) indices are
modern industries having large scale unit operations and unit processes extensively used for fault detection in various industrial applications
with a multi-level control hierarchy are good candidates for the [10–15]. The T2 statistics and SPE represents the systematic and

Corresponding author.
E-mail address: (C. Han).
Received 31 May 2016; Received in revised form 22 December 2016; Accepted 9 January 2017
Available online 10 January 2017
0169-7439/ © 2017 Elsevier B.V. All rights reserved.
U. Ahmed et al. Chemometrics and Intelligent Laboratory Systems 162 (2017) 73–82

Nomenclature RS Residual subspace

PC Principal component
PCA Principal component analysis AGR Acid gas removal
SVD Singular value decomposition NG Natural gas
SPE Squared prediction error NGL Natural gas liquids
PCS Principal component subspace

residual part of the process variation in principal component subspace in the system. However, this procedure becomes quite complicated and
(PCS) and residual subspace (RS), respectively. Both T2 statistic and SPE time taking with an increase in number of process variables. Therefore,
are calculated for each sample/observation and compared with the more robust algorithm is required that can predict the hierarchy of
corresponding control limits calculated for the normal process. Any variables in a fault situation. The main objective of this study is to
occurrence of fault in the process changes either T2 statistic or SPE of develop a robust algorithm that can estimate the fault propagation path
the samples or even both in some cases. T2 statistics measures the in a process system in addition to the detection of an actual fault. The
variation in each sample and indicates its distance from the center of developed algorithm represents the hierarchy of variables that can be
model. This method is fairly easy to use for overall process monitoring, affected by a certain fault. The developed methodology can trace the
however, it cannot be used to identify the process faults. As multiple fault transmission direction in industries to identify the fault more
sensors are affected simultaneously during the process abnormality, the readily compared to the other conventional techniques. Moreover, fault
contribution plots of T2 statistics for samples represents the contribu- propagation path can help in short listing the affected variables for the
tion of multiple variables towards a fault which makes it difficult to corrected action to ensure safe and smooth operation of the process.
precisely detect the fault variable. On the other hand, SPE measures the The developed PCA model is applied to NGL fractionation process
sum of variations in the RS by analyzing the contribution of each containing series of distillation columns and the hypothetical fault case
variable in a fault situation. scenarios are considered to identify the fault propagation path in the
Data driven methods used to analyze the variables relationship do system.
not require a deep process knowledge for achieving satisfactory results. The paper is divided into six major sections. First section gives a
However, recent studies suggests that coupling the data driven methods brief introduction about PCA model development. Second section
with the process knowledge is highly essential to validate the reliability mainly discusses the fault detection indexes i.e. T2 statistic and SPE.
of the model [16]. The development of PCA model for various industrial The following section represents the algorithm for fault propagation
applications is also attaining a lot of attention due to less computation path detection in the system. Fourth section briefly explains the NGL
complexities and an already developed framework. Jiang et al. [17] fractionation process and its model validation using the design data.
improved the PCA model by resolving the data loss issues through Fifth section discusses various fault case scenarios and the implementa-
utilizing both the fault related information and the normal process. tion of the developed algorithm. Finally, the last section concludes the
Wang et al. [18] developed the partitioning PCA model to identify paper.
multiple fault variables in the process. Moreover, Jiang et al. [19]
presented a method for optimizing the variables selection for model 2. Principal Component Analysis (PCA) model
development that improved the process monitoring reliability with
more accurate description of faults. The developed models were tested 2.1. Model development
for various chemical processes and showed improvement in contribu-
tion plot of variables for the fault detection. The PCA modelling also In this study, PCA model is developed using MATLAB® which is
finds its application in various fields ranging from nano-material commercial tool and contain large number of built-in mathematical
manufacturing industries to the super critical nuclear power plants operations for handling large dimensional matrices. Two type of
and petroleum industries. Penha et al. [20] developed the PCA to methodologies can be used in process monitoring and fault diagnosis.
monitor temperature variations in the nuclear reactors using both T2 One way is to develop models for all the relevant fault cases and the
statistics and SPE methods. Landells et al. [21] implemented the PCA other method is to make a general model for a normal case and compare
models to refinery and chemical production process for early fault it with all the fault case scenarios. It is quite evident that the latter one
detection to enhance the process efficiency. Villegas et al. [13] applied require less effort and could be used for wide range of case studies. PCA
the PCA model to monitor the liquid level in tanks by making various model transforms the set of correlated variables into fewer un-corre-
fault cases and validate the SPE results with the actual process. lated variables (Principal components) while retaining most of the
Similarly, Ferrer [22] used the T2 statistics and score plots to observe original information. It takes an advantage of using redundant informa-
the process shift in automobile manufacturing industry which in turn tion existing in the correlated variables to reduce dimensionality. PCA
predicts the manufacturing quality. Qin [12] compared the two indices requires a data matrix X ε R n×m containing n number of samples/
and showed that SPE predicts the fault more accurately than T2 statistic. observations corresponding to m number of variables. This data matrix
Several studies have used PCA models for fault detection and X is considered as a training data which is scaled to mean zero and unit
diagnosis, however, only few studies investigated the fault propagation variance for PCA modelling. Usually, large number of samples are
path in the real systems. The fault in one sensor or variable instantly involved in the training data so that the normal variations in the
affects the other process variables so it is essential to estimate the fault process are also incorporated in the model. The data matrix X is given
propagation path in the system in addition to the fault detection. in Eq. (1) where xi represents the ith sample of m variables.
Recently, Hong et al. [23–25] developed a progressive PCA model to
predict the fault propagation path in the penicillin production process X = [x1 x2 . . . . . xn ]T ε R n × m (1)
in terms of process variables. The algorithm in the model detects the
The next step after scaling the training data is to determine the
fault variable having highest contribution in SPE contribution plot.
principal components. Usually, two methods can be employed for
Once the variable is detected, it is removed and a new PCA model is
generating the principal components namely singular value decomposi-
developed with the remaining variables to detect the next fault
tion (SVD) and eigenvalue decomposition. SVD method can be applied
variable. The detection of fault variable from each new model in turn
to the training data matrix X to decompose it into score matrix T ε
represents the hierarchy of variables involved in fault propagation path
R n×l , loading matrix P ε R m×l and residual matrix E as given in Eq. (2),

U. Ahmed et al. Chemometrics and Intelligent Laboratory Systems 162 (2017) 73–82

where l represents the number of PC's [10]. The score matrix T and SPE ≤ δα 2 (10)
loading matrix P in turn gives an information about the samples and
Jackson et al. [26] developed an expressions to calculate δα2 as given
variables relationship, respectively. The number of PC's generated from
PCA are equal to the number of variables involved in the training data. in Eqs. (11–13).
Scree test, parallel analysis, percent variance test and residual sum of ⎛ h c 2θ
θ h (h − 1) ⎞ h0
δα2 = θ1⎜⎜ ⎟⎟
square statistics are some of the common methods used to determine 0 α 2
+ 1 + 2 0 02
the number of PC's, however, there is no fix criteria to use any specific ⎝ θ1 θ1 ⎠ (11)
technique [10]. The number of PC's used to develop PCA model should m
explain as much variance in data as possible. Therefore, in this study, θ1 = ∑ j =a +1 λji , i = 1, 2, 3. . .
percent variance technique is used and only those PC's are retained that
explains cumulative percent variance (90–99%) of data. 2θ1θ3
h0 = 1 −
∼∼T 3θ2 2 (13)
X = TP T + E = TP T + TP (2)
The SPE is not affected by the inaccuracies and over-sensitivities in
The decomposed part TP T determines the system variation,
the smaller singular values associated with the noise measurements.
whereas, E represents the noise in the system and is termed as residuals
∼ ∼ However, any violation to the control limit represents the occurrence of
[20]. The residual matrix E can be further decomposed into T and P
an unusual event causing an alteration in the covariance matrix of the
which represents the residual scores and loading matrices, respectively.
∼ model. In an event of fault, the sample vector x contains both the
The range space of P and P is PCS and RS with dimension l and m- l,
normal portion which is superimposed by the fault portion. Addition of
respectively. It can be seen from the Eq. (2) that T holds the linear
fault portion makes the SPE larger than the control limit (δα2 ) that leads
combination of matrix X defined by the transformation vector P i.e. T
to fault detection.
= X P. The vectors T are the principal component scores which show
the samples relationship with each other, whereas, the vectors P are the
eigenvectors also known as principal component loadings. The alter- 2.2.2. Hotelling's T2 statistic
nate method of finding principal components is to perform eigenvalue The Hotelling's T2 statistic measure the variations in the PCS for
decomposition on the covariance matrix X as shown in Eq. (3). each sample x as shown in Eq. (14).

1 T 2 = xT P (∑a)−2 P T x (14)
N−1 (3)
where a presents the non-negative eigenvalues of corresponding P
The eigen-decomposition also generates the P (principal compo- loading vectors. The upper confidence limit of T2 statistic for a normal
nents) with l leading eigenvectors and a diagonal matrix containing data following a multi-variate normal distribution can be calculated
eigenvalues arranged in descending order as shown in Eq. (4). from F-distribution as given in Eq. (15).
Λ = diag{λ1, λ 2 , λ 3, . . . . , λl } (4) a(n−1)
Ta2, n, α = Fa, n − a, α
n−a (15)
where the jth eigenvalue corresponding to the jth column of score
matrix T can be represented as follows: where n, a and α represents the number of samples, number of principal
1 components and level of significance. As T2 statistics measures the
λj = t jT t j ≈ {t j} systematic variation in the process, any violation in the threshold of
N−1 (5)
upper confidence limit would in turn represents the fault in the system.
The projection of sample vector x ε R on the principal component
On the contrary, T2 statistical indices are used for analyzing the process
subspace (PCS) and residual subspace (RS) is given in Eq. (6) and Eq. shift in the plant, whereas, SPE techniques are used for evaluating the
(7), respectively. fault in the system.
xˆ = Pt ≡ PP T x ∴ t = P T x (6)
3. Fault direction estimation methodology

x = x − xˆ ≡ (I − PPT)x (7)
Occurrence of any fault in a system changes the value of variables in
x = xˆ + ∼
x (8)
a way that the resultant contains sum of normal portion and fault
where x̂ and ∼x are the projections of sample vector on PCS and RS, portion of data. Therefore, it is important to reduce the impact of
respectively. normal portion of data to amplify the fault effect. Qin [12] and Valle
et al. [27] developed an expressions to estimate the fault directions in
2.2. Fault detection the residual subspace of a fault data as given in Eq. (16). For an
instance, sample vector x under fault situation can be projected on the
The critical part of multivariate process monitoring is the fault RS where x* presents the fault free portion of the subspace and Ξif
detection. Usually, Hotelling's T2 and SPE (Q-Statistic) indices are used represents the actual fault.
to monitor the variability in PCS and RS, respectively. T2 statistic
measures the variation in the PCA model, whereas, SPE predicts the x = x* + Ξif (16)
lack of model fit for each sample. where Ξ i represents the orthonormality and ||f|| denotes the fault
magnitude that subjects to change over time as the fault develops. The
2.2.1. Squared prediction error (SPE) contribution of x* projected on the RS is usually very small compared to
Squared prediction error (SPE) index measures the fault by project- the fault magnitude so it can be eliminated as given in Eq. (17).
ing the samples on the RS. The SPE for each sample can be calculated 2
using Eq. (9), where I represents the identity matrix. x* = SPE (x) < δα 2 (17)
2 On the contrary, moving data average techniques can be used to
SPE = ∼
x 2
≡ (I − PP T )x (9)
reduce the impact of normal variations in situations where the fault
The process is considered to be normal as long as the sample's SPE is magnitude is not too large [28]. In an actual fault situation, the x* is
less than or equal to the δα2 as shown in Eq. (10). Where δα2 represents usually unavailable and is overlapped with the fault data. Therefore,
the upper control limit of SPE and α denotes the level of significance. the removal of x* from RS can be achieved by rescaling the residual

U. Ahmed et al. Chemometrics and Intelligent Laboratory Systems 162 (2017) 73–82

matrix of the fault data with mean zero and unit variance of residuals of separate the ethane, propane and butane, respectively. Each distillation
the normal matrix. The algorithm developed for PCA model to estimate column is equipped with the reboiler which provides heat to the
the fault propagation path is given in Fig. 1. The PCA model is column. Dynamic modelling of fractionation process helps in generat-
developed by using the normal operation data which determines the ing the data through computer simulations that represents the dynamics
principal component loadings, principal component scores and the of a real process. It can also be used for analyzing process variables
limits of fault detection indices. The developed model can be further relationship, personnel training and specifying process limitations.
utilized for the process monitoring using real time operational data or it Therefore, dynamic model of NGL fractionation process is developed
can be compared with any hypothetical fault case scenario to estimate to observe the process variable relationships and to generate data.
the fault direction. As fault detection is more accurate in the RS so the
normal process variations are eliminated from the RS of the fault data
to amplify the fault magnitude. The removal of x* amplifies the fault 4.2. Model development and validation
portion of the data as given in Eq. (18).
The dynamic model of NGL fractionation process is developed in the
x = Ξif (18) Aspen HYSYS®(v7.3) which is a commercially available software. The
For an illustrative example, Xi in Eq. (19) represents the fault data simulated model is then validated against the industrial design data of
collected for any fault case scenario containing n and m number of NGL plant to check the accuracy of the model [29,30]. Fig. 2 shows the
samples and variables, respectively. schematic process flow diagram for the simulated design. The feed
stream is fed to the 27th tray (numbered from bottom) of the de-
⎡ x11 ⋯ x1m ⎤ ethanizer column at 54.5 °C. In addition to hydrocarbons, the feed
Xi = ⎢ ⋮ ⋱ ⋮ ⎥ ≡ [x1x2 … xn ]T
⎢x ⋯ x ⎥ stream also contains some impurities like CO2 and H2S which are
⎣ n1 nm ⎦ (19)
removed in the first two columns. The bottoms from the de-ethanizer
where xi represents the row vector showing ith sample corresponding to then enter the 22nd stage (numbered from bottom) of de-propanizer
m variables. The projection of these samples on the RS containing the column at 107.3 °C. Finally, the bottom stream from the de-propanizer
amplified fault data can be achieved from the above Eq. (18) as shown column enters the 20th tray (numbered from bottom) of de-butanizer
in Eq. (20). column at 100 °C. All the distillation columns are installed in series to
T ∼
X͠ i = Ξ i [f1 f2 … fn ] (20)
∼ T
where Ξ i and X͠ i shares the same range space. The covariance matrix of
fault data can be used to analyze the covariance among different
variables (σij ) for a particular number of samples using Eq. (3).
⎡ σ11 ⋯ σ1m ⎤
Covariance[X͠ i ] = ⎢ ⋮ ⋱ ⋮ ⎥ = [σij ]i, j =1,2… m
⎢⎣ σ ⋯ σ ⎥⎦
m1 mm (21)
Performing SVD on the covariance matrix X͠ i and
retaining singular
values can help in transforming correlated variables into un-correlated
variables. SVD decomposes the covariance matrix (X͠ ) into product of
three matrices as given in Eq. (22).
X͠ i = UD
i i Vi (22)

where Ui is the orthogonal matrix and represents the fault direction, Di

represents the diagonal matrix containing nonzero singular values
arranged in descending order and ViT is the transpose of orthogonal
matrix such that UUi i = Ui Ui = I and VVi i = Vi Vi = I . The fault direc-

tion matrix can be chosen as Ξ i where the first column of the orthogonal
matrix Ui corresponding to the highest eigenvalue in the diagonal
matrix Di represents the maximum variation in the variables as
represented in Eq. (20).

Ξ i = Ui(: , 1) (23)

The historical data of various fault case scenarios can be used in a

similar way to extract the fault directions that could be used in future to
identify the particular fault.

4. Application to the Natural Gas Liquid (NGL) fractionation


4.1. Process description

In order to check the stability and accuracy of the above mentioned

methodology, it has been applied to the NGL (Natural Gas Liquids)
fractionation process shown in Fig. 2. Natural gas (NG) from the well
contains high concentration of methane and other NGLs. Fractionation
units are employed to increase both the purity of methane gas and to
separate NGLs. Three distillation columns are installed in series to Fig. 1. Algorithm for fault propagation path estimation.

U. Ahmed et al. Chemometrics and Intelligent Laboratory Systems 162 (2017) 73–82

Fig. 2. Schematic of NGL fractionation train with the process variable's location used for PCA model development.

achieve high concentration of ethane, propane and butane, respec- the score (red circles) of each sample on PC1 and PC2 for normal
tively. The design specifications used for modeling the fractionation operation where the blue circle represents the process limits. The
columns are represented in Table 1.The validation of simulated model results show that during the normal operation when all the process
is highly important prior to its use for the statistical analysis. Therefore, variables were within their normal limits, the scores for PC1 and PC2
the simulated model was tuned and converged to achieve the maximum were also within the process limits. The developed model can be
robustness and reliability. Table 2 highlights the comparison between compared with the real plant data for process monitoring. Therefore,
the design values and simulated results. Distillate and bottoms rates, the fault case scenario is generated to monitor the overall process and to
temperature, pressure, composition, reboiler and condenser duties were estimate the fault propagation path in terms of variables. In this study,
calculated to test the accuracy of the model. The results show that the fault data containing ~800 samples (observations) is used that corre-
simulation results and design data are in good agreement with an sponds to the same variables used for data training. Fig. 4 shows the
absolute error of less than 2.5%. Therefore, it can be said that the scores for the fault case scenario plotted on the first two principal
developed model is robust and can be used with high confidence for components. As long as the scores are within the limit circle, the
further statistical analysis. process is considered to be normal. However, fault can be detected if
any score is plotted outside the limit circle. The fault can be detected by
5. Fault case scenario using either T2 statistics or SPE index. As SPE control limits include
residual components which mainly represents the noise so the faults
5.1. Fault case scenario for de-ethanizer column with even small magnitudes are easily detectable. On the other hand, T2
has great variance and therefore requires a great change in the system
Fault case scenarios have been generated by manipulating one of characteristic to be detectable [12,31]. Therefore, SPE index is only
the process variable during dynamic simulation to analyze its effect on taken in account in this study to determine the fault propagation path.
the other variables. NGL fractionation process can be affected by The upper confidence limit of SPE index is calculated as 17.61 for a
changing any of the stream flow rates, temperatures, pressures or the normal data using Eq. (11) where the SPE calculated for all the fault
reboiler's heat duty. Any variation in the process variables readily affect samples is represented in Fig. 5. The sample which exceeds the upper
the downstream process variables compared to the upstream variables. control limit in this study is detected as 105 where the contribution plot
The schematic of the fractionation process used in this study is of variables for this sample is represented in Fig. 6. It can be seen from
represented in Fig. 2 where the numerical digits represents the location results that only few variables show their high contribution towards a
of specific variables defined in Table 3. Twenty six variables are fault which makes it easy to identify the fault variable. For an instance,
selected from various sections of the fractionation unit to record the variable 5-T in this particular example shows the highest contribution
data for both the normal and abnormal operation as shown in Table 3. towards a fault that can be readily detected. It can be also seen from the
Prefix and suffix are used with the stream numbers to represent the results that SPE index offers a better prediction of fault through
specific process variables. For an instance, 1-F, 3-T and 12-P represents analyzing variables contribution plot compared to the T2 statistics.
the flowrate, temperature and pressure of 1st, 3rd and 12th stream,
respectively. To illustrate the effect of fault generation by manipulating
5.1.2. Fault propagation path estimation
one variable as an example, the reboiler duty of de-ethanizer distillation
Any fault occurred in a system must be readily identified followed
column is increased by a step change and the values of all the variables
by a corrective action to bring the process variable back to its normal
are recorded over time. All the process variables are affected by an
limits. If the fault is not instantly removed, it may disturb the rest of the
induced fault and showed variation in their values. An increase in the
system and tend to affect all the associated process variables which
heat duty of the reboiler causes an increase in the stage temperature of
makes the fault detection process even more complex and time
the de-ethnaizer distillation column and also the temperature of down
streams. The results obtained from simulations are further validated
Table 1
through multivariate process monitoring techniques and fault propaga-
Specification of fractionation towers.
tion path algorithm is used to estimate the fault direction.
Specifications Distillation column
5.1.1. Fault detection using fault indices
De-ethanizer De-propanzier De-butanizer
The PCA model is developed using a training data (normal data)
arranged in the form of matrix of 10,000 rows (observations/samples) Feed Pressure (Psig) 362 300 95
and 26 columns (variables). The input data is scaled to zero mean and No of trays 40 45 40
unit variance to generate a model. The number of PC's is decided on the Feed Tray from bottom 27 22 20
Reboiler Pressure (Psig) 360 300 95
basis of overall cumulative variance explained by the principal
Condenser Pressure (Psig) 347 290 85
components. After developing the PCA model, the scores are generated Condenser type Full Reflux Total Total
corresponding to each sample and are plotted on PCs. Fig. 3 represents

U. Ahmed et al. Chemometrics and Intelligent Laboratory Systems 162 (2017) 73–82

Table 2
Validation of simulation results with the design data.

Component Stream ethane Stream propane Stream butane

Design Simulation Error Design Simulation Error Design Simulation Error

CO2 5.18% 5.19% −0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
H2S 1.15% 1.18% −0.03% 0.21% 0.22% −0.01% 0.00% 0.00% 0.00%
Methane 24.99% 25.00% −0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
Ethane 66.17% 63.79% 2.38% 2.99% 4.95% −1.95% 0.00% 0.00% 0.00%
Propane 2.51% 4.85% −2.34% 95.81% 93.87% 1.94% 3.99% 4.10% −0.11%
i-Butane 0.00% 0.00% 0.00% 0.89% 0.93% −0.04% 25.48% 25.45% 0.03%
n-Butane 0.00% 0.00% 0.00% 0.10% 0.04% 0.06% 69.54% 69.76% −0.22%
i-Pentane 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.92% 0.67% 0.25%
n-Pentane 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.06% 0.02% 0.05%
n-Hexane 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
n-Heptane 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
Total 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
Temp (°C) −6.7 −6.7 – 60.6 55.0 – 62.2 56.6 –
Pressure (Psig) – 347 – – 290 – – 85 –
Flow Rate(lbmol/h) 1939.8 1939 – 2394.1 2393 – 2189.6 2190 –
Reboiler Duty (kW) – 14112 – – 15955 – – 10249 –

Table 3
Stream variables used for PCA model and fault case scenario.

No Stream number Variable name

1 1-F Stream flow rate

2 1-T Stream temperature
3 3-T Stream temperature
4 5-F Stream flow rate
5 5-T Stream temperature
6 6-F Stream flow rate
7 6-T Stream temperature
8 7-T Stream temperature
9 11-P Stream pressure
10 12-P Stream pressure
11 13-P Stream pressure
12 11-T Stream temperature
13 14-T Stream temperature
14 15-P Stream pressure
15 16-P Stream pressure
16 17-T Stream temperature
17 18-P Stream pressure
18 19-P Stream pressure Fig. 3. Score plot of training data on principal components for NGL fractionation process.
19 20-T Stream temperature
20 21-P Stream pressure
21 21-T Stream temperature
22 22-P Stream pressure
23 23-P Stream pressure
24 24-T Stream temperature
25 25-T Stream temperature
26 26-T Stream temperature

consuming. SPE index provides a good estimate for fault detection by

projecting each sample on RS. Therefore, the residual matrix containing
the fault data can be formulated in terms of variable's co-relationship.
The residual matrix generated from a fault data is superimposed by the
variations in the normal data. Therefore, the residual matrix of a fault
data is re-scaled with the residuals of the normal data to amplify the
fault effect in the resulting matrix. The covariance matrix of the
resulting matrix is generated to analyze the variation in terms of
process variables. Following the methodology described in Fig. 1, the
SVD analysis is performed on the covariance matrix to extract the fault
direction matrix (U) and the eigenvalues (D). The contribution of all
variables can be evaluated from the U matrix corresponding to the
highest eigenvalues. Fig. 7 represents the contribution of variables Fig. 4. Score plot of fault data on principal components for NGL fractionation process.
corresponding to the highest eigenvalue. The developed methodology
follows an absolute descending order functions to rearrange the variables showing highest contribution towards a fault are automati-
variables according to their hierarchy of contribution as represented cally rearranged in an absolute descending order according to their
in Fig. 8. Moreover, this methodology can also be used to detect the magnitude. The variable showing the maximum contribution is most
fault variable irrespective of using the fault detection indices. The

U. Ahmed et al. Chemometrics and Intelligent Laboratory Systems 162 (2017) 73–82

Fig. 5. Fault detection using SPE index.

Fig. 8. Fault propagation path estimation in terms of process variables.

Fig. 6. Contribution plot of variables using residual matrix of fault data.

Fig. 9. Score plot of de-propanizer column for monitoring the product purity. (For
interpretation of the references to color in this figure legend, the reader is referred to the
web version of this article.)

Fig. 7. Contribution of variables in a fault case scenario.

likely to be the fault variable, whereas, the variables which shows least
contribution towards a fault are the least affected. The current
methodology is based on the variation in covariance matrix of the
selected variables due to an induced fault so the causal analysis of fault Fig. 10. Fault propagation path in terms of product purity and column temperature.

U. Ahmed et al. Chemometrics and Intelligent Laboratory Systems 162 (2017) 73–82

umns. It can be seen from Fig. 8 that the variables (5-T, 24-T, 25-T, 21-
T, 13-P and 11-P) corresponding to de-ethanzier and de-propanizer
column are more readily affected compared to the other variables.
The results also show that the upstream variables (1T, 1F) are least
influenced by the fault compared to the downstream variables. It has
been further analyzed that the results of both multivariate statistical
analysis and the dynamic model competes very well with each other
which in turn represents the robustness of both the models. Similarly, a
number of hypothetical fault case scenarios can be analyzed to predict
the effect of any fault on other sections of the process that can further
help in safe and smooth operation of plant.

5.2. Fault case scenario for de-propanizer column

Distillation columns have been widely used in the petrochemical

and gas processing industries for the separation processes. The conven-
tional distillation columns require a reboiler for a heat source which is
considered as a highly energy intensive unit. If the reboiler heat duty is
Fig. 11. Score plot to monitor the flooding in the de-propanizer column. (For interpreta-
not controlled as required by the process, it can affect the stage
tion of the references to color in this figure legend, the reader is referred to the web
version of this article.) temperatures of the column along with a damage to the column
integrity. Therefore, a high level control system is usually employed
to control the operational parameters and to ensure the safe operation
of the distillation column. In this study, the developed PCA algorithm
has been applied to the de-propanizer section of NGL fractionation
process to monitor the distillation column operation. For a simple
distillation column without any chemical reactions, the temperature of
bottom plates of column is usually higher compared to the top plates
due to the continuous heat supply from the reboiler. Two case studies
have been developed to analyze the effect of reboiler malfunction on
both the purity of product (propane) and the temperatures at various
stages of the column.

5.2.1. Effect of reboiler malfunction on the product purity

Distillation column depends on the reboiler heat duty to maintain
both the operational parameters and to achieve the required purity of
product. Some of the most common malfunctions associated with the
reboilers includes the plugging, pump failure, leakage, surging, fouling
in the heat exchangers and so on [32,33]. All of these factors can affect
the reboiler operation and decreases the heat input to the column which
ultimately affects the purity of product. This case study considered a
scenario in which the reboiler duty drops due to any of the above
Fig. 12. Disturbance propagation path for flooding case in the de-propanizer column. mentioned reasons. Logically, once the reboiler duty goes down, the
heat input to the column decreases causing a change in the temperature
would be an interesting area of research for future studies. at each of the stages which in turn affects the product purity. PCA
The validation of multivariate process monitoring results for fault model has been developed using 12 variables including temperature
detection and fault propagation path estimation can be achieved by (ST42, ST30, ST26, ST19, ST10 and ST5) and propane concentration
relating it with the real processes. There are two ways to validate the (SM45, SM40, SM29, SM25, SM20 and SM15) at different stages of the
current methodology. First, analyzing the real industrial process in fault de-propanizer column. For an instance, ST42 and SM45 represent the
situation and detect point to point fault to generate the hierarchy of temperature and propane mole fraction at the stage 42 and 45,
affected variables. This process could be very complex, expensive and respectively. Fig. 9 represents the score plot of distillation column
time consuming because large numbers of process variables are operation on the first two principal components where the blue and red
involved. Conversely, computerized simulators can be used to generate points represent the normal and fault operation of the de-propanizer
and analyze the fault case scenarios under normal and ab-normal column, respectively. It can be seen from the results that as the reboiler
situations. In this study, the second methodology is used to estimate the heat duty is decreased due to malfunction, the score plot violates the
fault variables to validate the reasonable effectiveness of the suggested process limits and indicates the fault in the system. The drop in
fault propagation path approach. For an instance, the results obtained reboiler's duty reduces the vapor generation rate which deceases the
from PCA model used for estimating the fault propagation path heat and mass transfer leading to a decline in the temperature of the
methodology indicates that an increase in the reboiler duty of de- column. Due to less heat input in the column, the separation efficiency
ethanizer column instantly increases the temperature of stream (5-T) is reduced and results in higher propane concentration at the bottom of
leaving the reboiler followed by an increase in temperature of the column rather than at the top. The next step is then to detect the fault
lowest stage (24-T) of the corresponding column. The similar affect can propagation path in order to find the cause of disturbance in the
be analyzed from the distillation column operational principles which product purity. Therefore, the fault propagation path algorithm has
validates the current methodology of multivariate statistical analysis. been applied to the current system to statistically analyze the effect of
Moreover, the increase in reboiler's duty increase the stages tempera- reboiler's malfunction on the process variables. The results of the fault
ture and pressure in de-ethanizer and de-propanizer distillation col- propagation path algorithm for the fault case scenario have been shown
in Fig. 10. The results show that the concentration of propane gas

U. Ahmed et al. Chemometrics and Intelligent Laboratory Systems 162 (2017) 73–82

(SM45) is readily effected at the bottom of the column compared to the Furthermore, the current methodology allowed us to readily detect
top stages. Similarly, the temperature distribution results showed that the fault variable irrespective of using the fault detection indices. The
the lower stages (ST42) are more readily affected by a reboiler fault multivariate statistical model is validated by its application on the NGL
compared to the top stages (ST5). Both the results showed that the fractionation process. The current methodology can also be used in
reboiler heat duty readily affects the variables associated with the various process and product development industries to identify the fault
bottom of column as compared to the top section. The fault propagation readily and the fault directions could be used to analyze the impact of
algorithm in this way can identify the area where the disturbance has one fault on other process variables.
been triggered for the timely rectification of the fault.
5.2.2. Flooding in the distillation column
Flooding is one of the common problem in NGL plant operations This research was supported by the Brain Korea 21 Plus Program in
which can impact the temperature and pressure gradients across the 2017, by Institute of Chemical Processes in Seoul National University,
column [34,35]. Various factors that can cause flooding include the by a Grant (14IFIP-B085984-02) from the Smart Civil Infrastructure
excessive vapor generation at the bottom of column, increased feed Research Program funded by the Ministry of Land, Infrastructure and
flow rate, un-suitable reflux rate and so on. Excessive vapor generation Transport (MOLIT) of Korea government and Korea Agency for
case study has been selected to analyze the flooding in the de- Infrastructure Technology Advancement (KAIA).
propanizer column by manipulating the reboiler's heat duty. The
excessive reboiler heat duty not only increases the stage temperatures References
but can also affect the column metallurgical strength. Due to an
excessive vapor generation, the liquid is entrained with the vapors up [1] H.H. Yue, S.J. Qin, R.J. Markle, C. Nauert, M. Gatto, Fault detection of plasma
the column and also holds up the liquid in the downcomer resulting in etchers using optical emission spectra, Semicond. Manuf. IEEE Trans. 13 (3) (2000)
an increased liquid holdup on the trays. Hence, the flow rate of liquid [2] K.A. Kosanovich, K.S. Dahl, M.J. Piovoso, Improved process understanding using
down to the column decreases which results in an increase of ΔT and ΔP multiway principal component analysis, Ind. Eng. Chem. Res. 35 (1) (1996)
across the column [36]. In this case, the temperature of eleven stages 138–146.
[3] I. Miletic, S. Quinn, M. Dudzic, V. Vaculik, M. Champagne, An industrial
(ST-45, ST-40, ST-29, ST-28, ST-24, ST-22, ST-20, ST-18, ST-10, ST-4 perspective on implementing on-line applications of multivariate statistics, J.
and ST-1) of the de-propanizer column have been selected for the PCA Process Control 14 (8) (2004) 821–836.
model where ST-45 and ST-1 represents the temperature of bottom and [4] J.V. Kresta, J.F. MacGregor, T.E. Marlin, Multivariate statistical monitoring of
process operating performance, Can. J. Chem. Eng. 69 (1) (1991) 35–47.
top stage of the distillation column, respectively. Fig. 11 represents the
[5] R.D. De Veaux, L.H. Ungar, J.M. Vinson, Statistical approaches to fault analysis in
score plot for the distillation column operation on PC1 and PC2 where multivariate process control, in: Proceedings of the American Control Conference
blue and red points represent the normal and fault operation of on IEEE, 1994.
[6] T. Kourti, J. MacGregor, Multivariate SPC methods for monitoring and diagnosing
distillation column, respectively. The results show that all the scores
of process performance, in: Proceedings of PSE, 1994.
are plotted within the process limits for the normal operation. However, [7] A. Raich, A. Cinar, Statistical process monitoring and disturbance diagnosis in
the scores (red points) move outside the process limits during the fault multivariable continuous processes, AIChE J. 42 (4) (1996) 995–1009.
situation. The developed algorithm has been applied to the fault case [8] H. Tong, C.M. Crowe, Detection of gross erros in data reconciliation by principal
component analysis, AIChE J. 41 (7) (1995) 1712–1722.
scenario to statistically determine the fault propagation path in the [9] R. Dunia, S.J. Qin, T.F. Edgar, T.J. McAvoy, Identification of faulty sensors using
process. Fig. 12 represents the hierarchy of variables that are affected principal component analysis, AIChE J. 42 (10) (1996) 2797–2812.
by an increase in reboiler's heat duty. Since the temperature and [10] L.H. Chiang, R.D. Braatz, E.L. Russell, Fault Detection and Diagnosis in Industrial
Systems, Springer Science & Business Media, UK, 2001.
pressure are tied together with vapor-liquid equilibrium at each stage, [11] S. Bezergianni, A. Kalogianni, Application of principal component analysis for
change in stage temperatures due to reboiler malfunction also increases monitoring and disturbance detection of a hydrotreating process, Ind. Eng. Chem.
the pressure gradient across the column [33]. The results showed that Res. 47 (18) (2008) 6972–6982.
[12] S. Joe Qin, Statistical process monitoring: basics and beyond, J. Chemom. 17 (8‐9)
the flooding caused an increase in ΔT across the column where the (2003) 480–502.
temperature of bottom stages (ST-45, ST-40) are more readily affected [13] T. Villegas, M.J. Fuente, M. Rodríguez, Principal component analysis for fault
than the top stages (ST-4, ST-1). The hierarchy of variables obtained detection and diagnosis. experience with a pilot plant, in: Proceedings of the 9th
WSEAS International Conference on Computational Intelligence, Man-machine
from the algorithm showed that the fault propagates from bottom to the Systems and Cybernetics, CIMMACS'10, 2010.
top of the column in a sequential manner. [14] S. Yoon, J.F. MacGregor, Principal‐component analysis of multiscale data for
process monitoring and fault diagnosis, AIChE J. 50 (11) (2004) 2891–2903.
[15] S.J. Qin, Survey on data-driven industrial process monitoring and diagnosis, Annu.
6. Conclusion
Rev. Control 36 (2) (2012) 220–234.
[16] R. Landman, S.-L. Jämsä-Jounela, Hybrid approach to casual analysis on a complex
In this study, PCA model has been developed for monitoring the industrial system based on transfer entropy in conjunction with process connec-
NGL fractionation process. Twenty six process variables from various tivity information, Control Eng. Pract. 53 (2016) 14–23.
[17] Q. Jiang, X. Yan, W. Zhao, Fault detection and diagnosis in chemical processes using
sections of the fractionation unit were selected for the multivariate sensitive principal component analysis, Ind. Eng. Chem. Res. 52 (4) (2013)
process monitoring and analysis. The fault in any process variable 1635–1644.
affects all the other associated variables which make the fault detection [18] G. Wang, J. Liu, Y. Li, C. Zhang, Fault diagnosis of chemical processes based on
partitioning PCA and variable reasoning strategy, Chin. J. Chem. Eng. (2016).
process more complex and time consuming. PCA based methodology is [19] Q. Jiang, X. Yan, B. Huang, Performance-driven distributed PCA process monitoring
developed in this study to identify the fault variable and to estimate the based on fault-relevant variable selection and Bayesian inference, IEEE Trans. Ind.
fault propagation path in the system. The developed algorithm esti- Electron. 63 (1) (2016) 377–386.
[20] R. Penha, J.W. Hines, Using principal component analysis modeling to monitor
mates the fault direction by projecting the samples on the RS. The RS of temperature sensors in a nuclear research reactor, in: Proceedings of the 2001
the fault sample is superimposed with the normal variations which Maintenance and Reliability Conference (MARCON 2001), Knoxville, TN, Citeseer,
should be minimized in order to amplify the fault affect. Therefore, the 2001.
[21] K. Landells, Z. RaWi, Abnormal Event Detection Using Principal Component
fault data is rescaled with the mean average and unit variance of the Analysis. U.S. Patent No. 8,121,817B2, 2012.
normal data to determine the actual fault in the RS. Applying the SVD [22] A. Ferrer, Multivariate statistical process control based on principal component
analysis on covariance matrix of the rescaled residual matrix helped in analysis (MSPC-PCA): some reflections and a case study in an autobody assembly
process, Qual. Eng. 19 (4) (2007) 311–325.
generating the orthogonal matrix corresponding to the highest eigen-
[23] J.J. Hong and J. Zhang, Progressive PCA modeling for enhanced fault diagnosis in a
value. By using the absolute descending order functions, the process batch process, in: Proceedings of the International Conference on IEEE Control
variables are rearranged according the hierarchy of contribution which Automation and Systems (ICCAS), 2010.
in turn represents the fault propagation path in the system. [24] J.J. Hong, J. Zhang, J. Morris, Fault localization in batch processes through

U. Ahmed et al. Chemometrics and Intelligent Laboratory Systems 162 (2017) 73–82

progressive principal component analysis modeling, Ind. Eng. Chem. Res. 50 (13) [30] Parsons, R. M, NGL Fractionation Facilities, Operation Manual Bandar Mahshahr,
(2011) 8153–8162. The Ralph M. Parsons Company U.K. Ltd.
[25] J.J. Hong, J. Zhang, J. Morris, Progressive multi-block modelling for enhanced fault [31] L. Mujica, J. Rodellar, A. Fernandez, A. Guemes, Q-statistic and T2-statistic PCA-
isolation in batch processes, J. Process Control 24 (1) (2014) 13–26. based measures for damage assessment in structures, Struct. Health Monit. (2010)
[26] J.E. Jackson, G.S. Mudholkar, Control procedures for residuals associated with (1475921710388972).
principal component analysis, Technometrics 21 (3) (1979) 341–349. [32] H.Z. Kister, Distillation Operations, McGraw-Hill, New York, 1990.
[27] S. Valle, S.J. Qi, M.J. Piovoso, Extracting fault subspaces for fault identification of a [33] A. Gorak, H. Schoenmakers, Distillation: Operation and Applications, Academic
polyester film process, in: Proceedings of the American Control Conference on IEEE, Press, 2014.
2001. [34] H.Z. Kister, Distillation Design 223, McGraw-Hill, New York, 1992.
[28] H.H. Yue, S.J. Qin, Reconstruction-based fault identification using a combined [35] H.Z. Kister, Distillation Troubleshooting, Wiley Online Library, USA, 2006.
index, Ind. Eng. Chem. Res. 40 (20) (2001) 4403–4414. [36] N.P. Cheremisinoff, Handbook of Chemical Processing Equipment, Butterworth-
[29] M. Moshfeghian, Hydrogen damage (Blistering) case study: Mahshahr NGL plant, Heinemann, USA, 2000.
Iran. J. Sci. Technol. 11 (1) (1985).