Sie sind auf Seite 1von 9

CHAPTER 2

CORRELATION
METHODOLOGY

The literature contains much data on pilot plant and commercial


plant operation of most of the petroleum refinery processes of
interest. However, when it comes to correlating empirical data, one
rarely is satisfied with the quantity or coverage (range of feeds
and/or conditions) of the data. Yet, these data are of little value
until gathered together and correlated in a meaningful way.
Rarely will one find a set of data fitting exactly the particular
set of conditions of interest (feedstock, product octane, product
smoke point, etc.). In general it is better to take information from
a good correlation than to use isolated sets of data. This is partic-
ularly true when evaluating process results over a range of values
(e.g., product octane). The absolute values from the correlation
may be somewhat in error, but the differences between points
should be very meaningful.
For our purposes, a set of data consists of the simultaneous,
steady-state yields of products (and their properties) when a unit is
processing a particular feedstock at a fixed set of conditions—
temperature, pressure, type of catalyst (if any), space velocity, etc.
Each product yield or property constitutes a point of data in the set.
A first step in correlating sets of gathered data is to tabulate
the data with a row for each set and a column for each variable.
Each product is a dependent variable, but may also be an indepen-
dent variable at times. Other independent variables can be type of
feed, and/or one or more properties of the feed, or of a product
(gravity, boiling range, characterization [K] factor, etc.).

9
PETROLEUM REFINERY P ROCESS ECONOMICS

As we would expect (and shall see later), the actual operating results
from the literature will be scattered to some degree or other. This is due to
the complexity of the systems involved (a multitude of species of hydrocar-
bons) and, of course, uncertainties in observations (errors in measuring,
reading, recording), failure to attain true steady state, etc.
In correlating these data, we are attempting to find a relatively simple
expression (equation) to characterize the relationship between two or more
variables in a very complex system. From a consideration of the chemistry
involved, possible reaction mechanisms and kinetics, we may infer a possi-
ble relationship between a set of variables. A plot of the data on this basis
will indicate by the pattern of the points (trend and scatter) how well the
assumed relation fits the data.
Ordering the data (arranging by increasing or decreasing order) in
terms of one of the variables often helps to indicate a possible correlating
parameter. Plotting an independent variable against one of the dependent
variables may indicate the type of relationship (linear, quadratic, expo-
nential, etc.) between them—if any.
Once a possible relation is detected, the equation describing this
apparent relationship is usually determined by linear regression analysis—
or multiple linear regression analysis if more than one independent variable
is involved.
Before electronic computers (mainframe or personal computer [PC]),
this was a very tedious process—even with a calculator. Now however, with
ready access to PCs with very high speed and capacity, regression analysis
is quick and easy.
Spreadsheet programs such as Lotus, QuatroPro, Excel, etc., provide
great flexibility in the arrangement and manipulation of data (moving
columns, transforming data, etc.) and provide for automatic plotting of data
in addition to regression analysis capabilities.
In developing each correlation in this book, an attempt was made to dis-
cover a single independent variable as a basis for correlation. The degree to
which this was successful will be apparent from the graphs on which both
the raw data and the regression lines have been plotted. Frequently, it was
necessary to employ two or more independent variables to obtain a satisfac-
tory correlation. The results are summarized in the following tabulation:

10
CHAPTER 2 • CORRELATION METHODOLOGY

Independent Variables
Process
Yields Properties

Solvent deasphalting (SDA) 1 2


Visbreaking (VB) 1 1–2
Delayed coking (DC) 1–2 1–2
Fluid coking (FC) 1 1–2
Fluid catalytic cracking (FCC) 1–2 1–2
Heavy oil cracking (HOC) 1–2 1
Hydrocracking (HC) 1–2 1–2
Hydrodesulfurization (HDS) 2 2
Catalytic reforming (CR) 1–5 1–2

Parameters Used in Correlating Process Yields


A review of the literature reveals some consistency in the correlation
parameters used by the author and others:

Process Author HPI G&H


SDA Wt% DAO na na
VB Wt % Conv. nC5insol na
FC Wt % CCR Sed Cont na
na
DC WT% CCR WT% CCR WT% CCR
FCC LV% Conv LV% Conv LV% Conv
HOC LV% Conv na na
HC LV% Gaso LV% Lt HC LV% Lt HC
HDT Feed API % Desulf na
WT% S
CR Reformate RON Reformate RON Reformate RON
Feed N+2A Feed N+2A Feed K

11
PETROLEUM REFINERY P ROCESS ECONOMICS

where:
HPI represents HPI Consultants, Inc.
G & H represents Gary and Handwerk
DAO represents deasphalted oil
CCR represents continuous catalyst regeneration
LV represents liquid volume

A General Data Correlation Procedure


A step-by-step procedure for performing data correlations follows:

• Enter data in a spreadsheet format with a column for each


variable (yield or property) and a row for each set of data.

• Select a column for the dependent variable (product yield or


property) to be correlated with some feed property.

• Select a column(s) for the independent variable(s), feed


property(s) usually. Note that the independent variables
must be in adjacent columns, since the range selected can-
not be interrupted. One of the big advantages of spreadsheet
programs is that columns can be moved easily.
• Delete any row where there is no entry for one or more of
the variables selected, since there can be no empty cells in
the selected columns.
• At this point, any column of data may be manipulated:
1. A variable may be ordered (put in ascending or descend-
ing order).
2. Any variable may be transformed—into a logarithmic
value, a trigonometric function, a higher or lower power,
etc.

• After indicating the location on the spreadsheet for the


regression results to be displayed and whether an intercept

12
CHAPTER 2 • CORRELATION METHODOLOGY

is to be calculated or the line forced through the origin, the


regression may be performed. There are times when you will
know that the regression should pass through a certain
point, such as the origin, but the regression results may indi-
cate a better fit (over the range of the data) when a finite
interception is computed.

The regression output will give the value of the constant, the coeffi-
cient(s) of the variable(s), the coefficient of correlation, and the standard
error of the estimate of the dependent variable and of each of the coeffi-
cients, also the number of points and the degrees of freedom.

(NOTE: The coefficient of determination, R 2, is a measure of the vari-


ation in the dependent variable explained by the derived regression
equation. The closer R 2 approaches 1.0, the less will be the scatter of
the data points about the calculated regression line. The standard
error of the estimate of the dependent variable is approximately equal
to its standard deviation. In like manner, the standard error of a coef-
ficient is a measure of the confidence in the value of that coefficient.)

Having the equation of the regression line, one may calculate values
of the dependent variable for each of the sets. The difference between the
calculated value and the corresponding “observed” value may then be cal-
culated. The magnitude of the differences may point to certain data that do
not fit with the rest of the population. Reference to the source of these data
may suggest reasons for discarding these data. A plot of the data points
together with the regression line will give a visual indication of the appro-
priateness of the relation selected to represent the data.
It is not necessary to have a regression program to obtain the same
results. They can be calculated from the sums of the individual variables, of
their squares, and of their cross products. In the case of a first order or lin-
ear equation, this requires the sums of the following:

X, Y, XY, X2, Y2 and N (the number of points)

13
PETROLEUM REFINERY P ROCESS ECONOMICS

For a second order equation, 10 such sums are required—for third


order, 15 values are needed. With so many regression programs available,
some in the public domain, it is hardly practical to go through such a long
and tedious procedure. Use of a regression program can expedite the user's
work and permit the user to focus on the relationship represented by the
data and its significance.

Significance of Results
The yield of full-boiling range gasoline in FCC has been chosen to illustrate
the significance of a regression analysis and the use to which it may be put.
Figure 2–1 is a plot of the data (382 points) for gasoline yield (Y) from FCC
vs. conversion (X) together with the regression line for the equation:

Y = a + bX + cX2

Assuming for practical purposes the standard error of the estimate


(SEE) of Y is equal to the standard deviation (D) and that for a normal dis-
tribution, 95% of the data should lie within plus or minus 2D of the regres-
sion line, Figure 2–2 is a plot of Figure 2–1 with lines of plus 2SEE and
minus 2SEE added defining the 95% probability band for the data.1 This
band is sometimes referred to as the error band. Points lying outside this
band are known as “outliers” and may be disregarded in further regression
of the data. These points result from errors in measurement of variables,
errors in recording of data, or because the data do not fit in the remaining
population.
The simplicity of this correlation is all the more impressive when one
considers the very large number of variables at play in the FCC process:

• Boiling range of gasoline


• Composition of feed
• Type and activity of catalyst
• Catalyst to feed ratio

14
CHAPTER 2 • CORRELATION METHODOLOGY

Fig. 2–1 FCC Gasoline Yield Data

Fig. 2–2 FCC Gasoline Error Band


15
PETROLEUM REFINERY P ROCESS ECONOMICS

• Hetero elements such as S, N2, Ni, V, Na, Fe, and As


in the feed
• Reactor geometry
• Carbon on regenerated catalyst

Application to an existing process unit


This relationship for an existing unit may be obtained by passing a
“best” line parallel to (if not coincident with) the regression line and pass-
ing through actual operating data for that unit. Figure 2–3 shows such a
plot where actual plant data have been plotted. Thus by measuring the devi-
ation (d) of this line from the regression line, a constant is obtained that
may be used to “tune” the correlation to the actual unit:

Y = (a+d) + bX + cX2

Fig. 2–3 FCC Gasoline Yield Tuning

16
CHAPTER 2 • CORRELATION METHODOLOGY

Notes
1. McElroy, E.E., Applied Business Statistics, Holden-Day, Inc., San
Francisco, Second Edition, 1979, p. 293

Barish, N.N., Economic Analysis for Engineering and Managerial


Decision Making, McGraw-Hill Book Co., Second Edition, 1978,
p. 597

17

Das könnte Ihnen auch gefallen