Beruflich Dokumente
Kultur Dokumente
1
PARAMETER ESTIMATION
Objectives
1. To solve curve-fitting problems using MS Excel®
2. To determine the best model for a given set of data
3. To estimate parameters based on developed model
Theoretical Discussion
Chemical engineering often involves the expression of experimental data in terms of an equation. The equation
must be developed and the parameters that provide the best fit to the data must be determined. MS Excel® offers
simple methods in fitting a straight line to data, as well as methods in fitting a polynomial to data.
The equation depends on x and on some unknown parameters, {a1, a2, … aM}. The goal is to find the set of
parameters that gives the best fit. The best fit is usually defined by minimizing the sum of the square residuals,
where the residual is the difference between the predicted value and the data. Because the data may have errors
in it, an exact fit will not be possible in most cases. Thus, it is imperative that the variance of the residuals be
minimized.
𝑁
2
[𝑦𝑖 − 𝑦(𝑥𝑖 )]2
𝜎 =∑ ; 𝑦 ≡ 𝑦(𝑥𝑖 , 𝑎1 , 𝑎2 , … , 𝑎𝑀 ) (3)
𝑁
𝑖=1
If the parameters enter the equation linearly, then the minimization problem reduces to a set of linear equations
which are solved easily by MS Excel®. The effectiveness of the curve fit is often reported as values of the
square of the linear correlation coefficient, r2. The linear correlation coefficient is defined as:
∑𝑁𝑖=1(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦
̅)
𝑟= 𝑁 𝑁 (4)
∑𝑖=1(𝑥𝑖 − 𝑥̅ ) ∑𝑖=1(𝑦1 − 𝑦̅)2
2
Values of r near 1 indicate a positive correlation; r near -1 means a negative correlation and r near zero means
no correlation.
Illustrative Examples
Table 2.1 represents seven measurements of the same thing (x,y). The goal is to find the equation, 𝑦 = 𝑎 + 𝑏𝑥
that best represents the data.
1. Prepare the worksheet containing the data given in Table 2.1. Moreover, assign placeholders for the slope,
intercept and the square of the correlation coefficient. See Figure 2.1.
2. Determine the slope by using the Slope Function. Calling out the Slope Function brings out the dialog
box shown in Figure 2.2.
3. Determine the intercept and the square of the correlation coefficient. Figure 2.3 and Figure 2.4 show the
dialog boxes for the Intercept and Correlation Coefficient Functions respectively.
4. This may also be done by plotting the data in MS Excel(R) using XY Scatter. Highlight the data. Choose
Insert/Chart and choose the scatter plot with no lines. Figure 2.6 shows the chart produced.
5. To plot the trend line, right click on any data point and choose add trendline. Choose the linear trendline
type. Tick Display Equation on chart and Display R-squared value on chart. Figure 2.7 shows the
Trendline Options and Figure 2.8 shows the Data plot with the trendline, including the curve fit and the r2
value.
You need not be limited to a straight line when fitting data. The trendline options offer a number of regression
types. The curve fit that yields the r2 value nearest 1 is the best fit.
When functions are not simple powers, polynomial regression is used. However, to keep the problem linear, the
unknown coefficients must be coefficients of those functions; that is, the functions are completely specified.
Multiple regression simply determines how much of each one is needed. The form of the equation is
𝑀
The goal is to find the best M values of {ai}, given the M functions fi (x) and data yi = y(xi), i = 1,...,N.
As an example, determine the constants in a reaction rate formula. The expected expression is
and the goal is to find the values of k, n and m that give the best fit of the rate for various partial pressures of
substances A and B. This form is not linear, which is a requirement of multiple regression, but a transformation
can make it linear. In this case, take the logarithm of both sides.
ln(𝑟𝑎𝑡𝑒) = ln 𝑘 + 𝑛 ln 𝑝𝐴 + 𝑚 ln 𝑝𝐵 (7)
2. Transform the given to make it linear. Take the logarithm of the partial pressures and the rate. Figure 2.9
shows the worksheet produced.
3. Use Data Analysis / Regression to determine the best line representing ln (rate) depending on ln (pA) and
ln (pB). Figure 2.10 shows the Regression dialog box. Figure 2.11 shows the results of the Regression
Analysis.
The coefficients column gives the results of the regression analysis. The best fit is for
𝑎 = 1.9603
𝑏 = 0.9801
𝑐 = 0.1894
𝑘 = 𝑒 𝑎 = 7.101
The standard error gives an idea of how accurately the parameter is determined. If this value is a significant
fraction of the parameter, the data is probably too scattered to be correctly correlated.
Other options in the Regression Dialog Box that are useful include Residuals, Residual Plots, and Line Fit Plots.
They are important in evaluating the results. Residuals should be both positive and negative with no trends. The
r2 value is 0.9969, which indicates a good correlation.
Nonlinear regression is a curve fit in which the unknown parameters enter into the problem in a nonlinear way.
Note that, since nonlinear regression is more difficult for the computer, this method does not always work.
Nonlinear regression uses techniques borrowed from the field of optimization, and it is difficult to construct a
method that works every single time for every problem.
To use nonlinear regression, Equation 3 is minimized with respect to the unknown parameters. For this, the Solver
function is used.
1. Prepare a new worksheet for the reaction rate data in Table 2.2. Additionally, select place holders for the
parameters k, n and m. Assume a value of 1 for these parameters.
2. In another column, calculate the rate using the parameters, the partial pressure data and Equation 6.
3. The next column should contain the difference between the measured and calculated rates. The square of
this column goes to the next column.
4. Determine the average of the squares. Figure 2.11 shows the completed worksheet.
5. The goal is to minimize the average of the square of the residuals, by changing the parameters. Open the
Solver add in by selecting Data/Analysis/Solver. Figure 2.12 shows the Solver Dialog Box.
6. Selecting Solve brings the Solver Solution Dialog Box. Select the needed reports. The Answer Report
Worksheet generated by Excel(R) gives the optimum solution. The final value of the Target Cell shows
the minimum value of the average of the square of the residuals. The final values of the Adjustable Cells
show the values of the parameters k, n and m. Figure 2.13 shows the Answer Report.
Problems
Answer the following problems using MS Excel(R). Save the workbook on the mapped network drive using the
specified filename format: MP1-Surname. Move the charts to separate sheets.
1. Ten data points were taken in an experiment in which the independent variable x is the mole percentage of a
reactant and the dependent variable Y is the yield. Fit a model with these data. Show all iterations made.
x Y
20 73
20 78
30 85
40 90
40 91
50 87
50 86
50 91
60 75
70 65
2. Using the same data points in Problem No. 1, use a quadratic model to determine the value of x that maximizes
the yield.
3. The following experimental data for the equilibrium adsorption of pure methane gas on activated carbon at
296 K were obtained by Ritter and Yang.
Determine which of the three most common isotherms (Linear, Freundlich, Langmuir) best describes the data.
Give the model, including its parameters.
1. What model best describes the data in Problem #1? What makes it the best model? Explain in terms of
statistics.
3. Explain through equations how the most appropriate isotherm for Problem #3 was determined.