Sie sind auf Seite 1von 22

Statistics for Engineer

Master of Engineering Management, Department of Industrial and Production Engineering University of Ibadan.

5/1/2013 Oloyede I. PhD [in view]

Descriptive Statistics
In the article Evaluation of Low-Temperature Properties of HMA Mixtures (P. Sebaaly, A. Lake, and J. Epps, Journal of Transportation Engineering, 2002: 578583), the following values of fracture stress (in megapascals) were measured for a sample of 24 mixtures of hotmixed asphalt (HMA).: 30 75 79 80 80 105 126 138 149 179 179 191 223 232 232 236 240 242 245 247 254 274 384 470
30 75 79 80 80 105 126 138 149 179 179 191 223 232 232 236 240 242 245 247 254 274 384 470

Compute the mean, median, and the 5%, 10%, and 20% trimmed means. Procedures: Irisdat.sta contains data reported by Fisher (1936). It contains the lengths and widths of sepals (Sepallen, Sepalwid) and petals (Petallen, Petalwid) for 50 flowers of three types of iris. Open this data file via the File - Open Examples menu; it is in the Datasets folder.

Specifying the analysis. Select Nonparametrics from the Statistics menu to display the Nonparametric Statistics Startup Panel. Next, select Ordinal descriptive statistics (median, mode, ...) on the Quick tab, and then click the OK button to display the Descriptive Statistics dialog.

First, specify the variables; click the Variables button to display the standard variable selection dialog. Since Iristype is a coding variable to identify the type of iris, select only Variables 1-4, and then click the OK button.

Reviewing the results. Click the Summary button to begin the analysis and display the results.

Plot histogram of the frequency distribution with the normal curve superimposed. This plot is very useful for identifying deviations from the normal distribution. To produce this plot for variable Petallen, right-click on a cell in the Petallen row and select Graphs of Input Data Histogram PETALLEN - Normal Fit from the resulting shortcut menu.

Basic Statistics: Nominal Data Analysis: select Basic Statistics/Tables from the Statistics menu to display the Basic Statistics and Tables Startup Panel. The Descriptive Statistics spreadsheets will contain the mean, valid N, standard deviation, and minimum and maximum values of the selected variables. Click on the Advanced tab to select the types of statistics to be calculated.

Then below dialog appears,

mark all statistics you want and click Summary button to produce the spreadsheet of results. &&&&&&&&&&&&&&&&&&############%%%%%%%%%%%&&&&&&&&&&

The Chi-Square
Four machines manufacture cylindrical steel pins. The pins are subject to a diameter specification. A pin may meet the specification, or it may be too thin or too thick. Pins are sampled from each machine, and the number of pins in each category is counted

Illustration: A contingency table summarizes the frequencies across two variables. It may be interesting to compare variables such as Level of Smoking and Employee Category. Does a difference in smoking rate exist across various employee categories? The data may list each employee, their job title, and level of smoking. This data can then be tabulated into a contingency table, showing the frequencies across the levels of these variables as shown in the table below.

At times, the data are collected in a contingency table format. Instead of each employee listed in a spreadsheet, the summarized contingency table is all that is available. When this is the case, and additional statistics are required such as chi-square tests for independence or row and column percentages, the data require rearrangement.

Rearranging the Data for Analysis


These data preparation tools include Stacking and Unstacking, Recode, Transpose, and spreadsheet formulas. Smoking.sta is a contingency table showing the crosstabulation frequencies of Level of Smoking: NONE, LIGHT, MEDIUM, and HEAVY, and Employee Category: SR. MANAGERS, JR. MANAGERS, SR. EMPLOYEES, JR. EMPLOYEES, and SECRETARIES. This is summarized data, not raw data. STATISTICA requires raw data for

analysis. This data can easily be transformed to raw data with a few steps, and then be analyzed. 1. Open the data set Smoking.sta. This spreadsheet is the contingency table. 2. Select the Data tab. In the Transformations group, click Stack to display the Unstacking/Stacking dialog. Select the Stacking tab, and click the Variables button. In the variable selection dialog, select all four variables and click the OK button. In the Unstacking/Stacking dialog, in the Destination variable name field, enter Frequency. In the Code variable name field, enter Level of Smoking.

Click OK. A new spreadsheet is created with two variables, Frequency and Level of Smoking. 3. On the ribbon bar, select the Data tab. In the Variables group, click Variables and from the drop-down list, select Add to display the Add Variables dialog. In the How many field, enter 1. In the After field, enter 2. In the Name field, enter Employee Category.

Click OK. A new variable is added to the spreadsheet. 4. On the ribbon bar, select the Data tab. In the Cases group, click Names to display the Case Names Manager dialog. In the Transfer case names group box, select the To option button. Double-click in the Variable field to display the Select Variable dialog. Select Employee Category and click OK.

Click OK in the Case Names Manager dialog to update the Employee Category variable with the case names. 5. On the ribbon bar, select the Tools tab. Click Weight to display the Spreadsheet Case Weights dialog. In the Status group box, select the On option button. In the Weight variable field, enter Frequency.

Click OK. If the Setting Spreadsheet Case Weights dialog is displayed, click OK. The spreadsheet is now ready for analysis with the Crosstabulation tool. It contains the two variables for analysis, Level of Smoking and Employee Category. The third variable, Frequency, contains the frequency weights that will be used in analysis.

Analyzing the Data


The Crosstabulation tool can be used for creating contingency tables as well as for the calculation of several statistics including various independence tests and percentages of rows, columns, and total. 1. On the ribbon bar, select the Statistics tab. In the Base group, click Basic Statistics to display the Basic Statistics and Tables Startup Panel. Select Tables and banners and click OK to display the Crosstabulation Tables dialog. 2. On the Crosstabulation tab, click the Specify tables (select variables) button to display the Select up to 6 lists of grouping variables dialog. In List 1, select Level of Smoking. In List 2, select Employee Category. Click OK in the Select up to 6 lists of grouping variables dialog, and click OK in the Crosstabulation Tables dialog to display the Crosstabulation Tables Results dialog.

3. Select the Options tab. In the Compute tables group box, select the Percentages of row counts check box and the Percentages of column counts check box. In the Statistics for two-way tables group box, select the Pearson & ML Chi-Square check box.

4. Select the Advanced tab. Click the Detailed two-way tables button to create the two-way table output and chi-square output. The 2-Way Summary Table output gives the contingency table along with the requested row and column percents.

The Chi-square test is 16.44164 with an insignificant p-value of 0.1783. This indicates that the two variables are independent or that no significant relationship exists between Level of Smoking and Employee Category.

T-Test- Pair Sample, Independent Sample T-Test


The article Calibration of an FTIR Spectrometer (P. Pankratz, Statistical Case Studies for Industrial and Process Improvement, SIAM-ASA, 1997:1938) describes the use of a spectrometer to make five measurements of the carbon content (in ppm) of a certain silicon wafer on each of two successive days. The results were as follows: Day 1: 2.1321 2.1385 2.0985 2.0941 2.0680 Day 2: 2.0853 2.1476 2.0733 2.1194 2.0717 Particulate matter (PM) emissions from automobiles are a serious environmental concern. Eight vehicles were chosen at random from a fleet, and their emissions were measured under both highway driving and stop-and-go driving conditions. The differences (stop-and-go emission highway emission) were computed as well. The results, in milligrams of particulates per gallon of fuel, were as follows:

Differences between Means (t-Test). The possibility of differences in response patterns between males and females are examined. Specifically, males may use some rating scales in a different way, resulting in higher or lower ratings on some scales. The t-test for independent samples will be used to identify such potential differences. The sample of males and females will be compared regarding their average ratings on each scale. Return to the Basic Statistics and Tables Startup Panel and double-click t-test, independent, by groups in order to display the T-Test for Independent Samples by Groups dialog box. Next, click the Variables button to display the standard variable selection dialog box. Here, you can select both the independent (grouping) and dependent variables for the analysis. For this example, select (highlight) variables 3 through 25 (the variables containing the responses) as the dependent variables, select variable Gender as the independent variable, and click OK.

10

Once you have made the grouping variable selection, STATISTICA will automatically propose the codes used in that variable to identify the groups to be compared (in this case, the codes are Male and Female). You can double-click on either the Code for Group 1 or Code for Group 2 boxes to display the Variable Codes dialog box in which you can review and select the codes for each group.

PART II
Correlation Coefficient An environmental scientist is studying the rate of absorption of a certain chemical into skin. She places differing volumes of the chemical on different pieces of skin and allows the skin to remain in contact with the chemical for varying lengths of time. She then measures the volume of chemical absorbed into each piece of skin. She obtains the results shown in the following table.

Illustration: Open the Striving.sta data file via the File - Open Examples menu; it is in the Datasets folder. Twelve students completed two questionnaires designed to measure (1) authoritarianism and (2) striving for social status. Authoritarianism (Adorno et al., 1950) is a psychological concept; in short, highly authoritarian people tend to be rigid and believe in authority ("law and order").

11

The purpose of the study was to find out whether these two variables are correlated. Specifying the analysis. Select Nonparametrics from the Statistics menu to display the Nonparametric Statistics Startup Panel. Next, select Correlations (Spearman, Kendall tau, gamma) on the Quick tab, and then click the OK button to display the Nonparametric Correlation dialog. In the Compute box, select Detailed report. Click the Variables button to display the standard variable selection dialog. From the First variable list, select Authorit; from the Second variable list, select Striving, and then click the OK button.

Reviewing the results. Now, click the Spearman rank R button to display a spreadsheet with the results of the analysis.

The correlation between the two scales is highly significant, you can conclude that highly authoritarian individuals probably also seek to strive toward social status. Correlation plot can be displayed, which will provide us with further information. Select Scatterplots from the Graph menu to display the 2D Scatterplots dialog. Then, click the Variables button to display the standard variable selection dialog. Here, select variable Striving as the X, variable Authorit as the Y, and then click the OK button. Then on the Advanced tab, select the Corr. and p (linear fit) check box under Statistics and the Confidence option button under Regression bands. Finally, click the OK button on the 2D Scatterplots dialog to produce the plot.

12

Multiple Regressions
A chemical engineer is studying the effect of temperature and stirring rate on the yield of a certain product. The process is run 16 times, at the settings indicated in the following table. The units for yield are percent of a theoretical maximum.

Illustration: The data Poverty.sta are based on a comparison of 1960 and 1970 Census figures for a random selection of 30 counties. The names of the counties were entered as case names. Starting the Analysis. Select Multiple Regression from the Statistics menu. Specify the regression equation by clicking the Variables button on the Multiple Linear Regression dialog - Quick tab to display the variable selection dialog. Select PT_POOR as the Dependent variable and all of the other variables in the data file from the Independent variable list, and
13

then click the OK button. On the Multiple Linear Regression dialog - Advanced tab, select the Review descriptive statistics, correlation matrix check box.

Specifying the Multiple Regression. Now click the OK button in the Review Descriptive Statistics dialog to perform the regression analysis and display the Multiple Regression Results dialog. A standard regression (which includes the intercept) will be performed. Reviewing Results. The Summary box from the top of the Multiple Regression Results dialog is displayed below. Overall, the multiple regression equation is highly significant. Thus, given the independent variables, you can "predict" poverty better than what would be expected by pure chance alone.

14

Regression coefficients. In order to learn which of the independent variables contributes most to the prediction of poverty, examine the regression (or B) coefficients. Click the Summary: Regression results button on the Quick tab to display a spreadsheet with those coefficients.

This spreadsheet shows the standardized regression coefficients (b*) and the raw regression coefficients (b). The magnitude of these Beta coefficients enable you to compare the relative contribution of each independent variable in the prediction of the dependent variable. As is evident in the spreadsheet shown above, variables POP_CHNG, PT_RURAL, and N_EMPLD are the most important predictors of poverty; of those, only the first two variables are statistically significant. The regression coefficient for POP_CHNG is negative; the less the population increased, the greater the number of families who lived below the poverty level in the respective county. The regression weight for PT_RURAL is positive; the greater the percent of rural population, the greater the poverty level.

Analysis of variance
In the article Review of Development and Application of CRSTER and MPTER Models (R.Wilson, Atmospheric Environment, 1993:4157), several measurements of the maximum hourly concentrations (in g/m3) of SO2 are presented for each of four power plants. The results are as follows (two outliers have been deleted):
Plant 1 Plant 2 Plant 3 Plant 4 438 857 925 893 619 1014 786 891 732 1153 1179 917 638 883 786 695 1053 675 595

Open Adstudy.sta via the File - Open Examples menu. After choosing the Breakdown and one-way ANOVA procedure from the Basic Statistics and Tables Startup Panel, select the Individual tables tab in the Statistics by Groups (Breakdown) dialog, and click the Variables button; select Measure01 through Measure23 as the Dependent variables, and the two variables Gender (subject's gender, Male and Female) and Advert (type of advertisement shown to the subjects; Coke and Pepsi) as the Grouping variables, and click OK. 15

Click the Codes for grouping variables button and select all codes for both of the grouping variables.

To select all codes for a variable, you can either enter the code numbers in the respective edit field, click the respective All button, or place an * in the respective edit field. Clicking the OK button without specifying any values is equivalent to selecting all values of all variables. Click the OK button in this dialog and in the Statistics by Groups (Breakdown) dialog to display the Statistics by Groups - Results dialog. This dialog provides various options and procedures for analyzing the data within groups in order to obtain a better understanding of the differences between categories of the grouping variables. Summary Table of Means. You can select the desired statistics to be displayed in the Summary: Table of statistics or Detailed two-way tables; click on the Descriptives tab and select all the options in the Statistics box. Now, click the Detailed two-way tables button to display that results spreadsheet.

16

This spreadsheet shows the selected descriptive statistics for the variables as broken down by the specified groups (scroll the spreadsheet to view the results for the rest of the variables). For example, looking at the means within each group in this spreadsheet, you can see that there is a slight difference between the means for Males and Females for variable Measure01. Now, examine the means within the Male and Female groups for variable Measure01; you can see that there is very little difference between the groups Pepsi and Coke within either gender; thus, the gender groups appear to be homogenous in this respect. One-Way ANOVA and Post-Hoc Comparisons of Means. You can easily test the significance of these differences via the Analysis of Variance button on the ANOVA & tests tab in the Results dialog. Click this button to display the spreadsheet with the results of the univariate analysis of variance for each dependent variable.

The one-way Analysis of Variance procedure gave statistically significant results for Measure05, Measure07, and Measure09. These significant results indicate that the means across the groups are different in magnitude. Now, return to the Results dialog and click on the Post-hoc tab to perform post-hoc tests for the significant differences between individual groups (means). You will first need to select the variable(s) for the comparisons. Click the Variables button and select variable Measure07 and click OK. You can choose from among several post-hoc tests ; click the LSD test or planned comparison button.

17

Two way Analysis of variance The thickness of the silicon dioxide layer on a semiconductor wafer is crucial to its performance. In the article Virgin Versus Recycled Wafers for Furnace Qualification: Is the Expense Justified? (V. Czitrom and J. Reece, Statistical Case Studies for Process Improvement, SIAM-ASA, 1997:87103), oxide layer thicknesses were measured for three types of wafers: virgin wafers, wafers recycled in-house, and wafers recycled by an external supplier. In addition, several furnace locations were used to grow the oxide layer.

Distribution Fitting
The Distribution Fitting module is used to evaluate the fit of observed data to some theoretical distributions. Also note that the Survival Analysis module contains specialized routines for fitting censored (incomplete) survival or failure time data to the Weibull and Gompertz distribution. Open the data file Irisdat.sta via the File - Open Examples menu; it is in the Datasets folder. This file contains data reported by Fisher (1936) on the lengths and widths of sepals (Sepallen, Sepalwid) and petals (Petallen, Petalwid) for 50 flowers of three types of iris.

18

The distributions of the four variables describing the lengths and widths of sepals and petals will now be examined. Specifically, it is expected that those measures follow the normal distribution. Specifying the analysis. Select Distribution Fitting from the Statistics menu to display the Distribution Fitting Startup Panel. Next, select the Continuous distributions option button and then double-click on Normal in the Startup Panel. In the resulting dialog (Fitting Continuous Distributions), click the Variable button to display the standard variable selection dialog. Here, select variable Sepallen and then click the OK button. At this point, the data file will be processed and the Parameters tab will show the computed mean and variance as the default values for the Mean and Variance boxes. You can also adjust the Number of categories and the Lower and Upper limits for the computation of the frequency distribution. The Fitting Continuous Distributions - Parameters tab appears as follows.

Next, click on the Options tab and select the Yes (continuous) option button under Kolmogorov-Smirnov test. Accept all of the other default selections on this dialog and click the Summary button to compute the frequency distribution.

19

Test statistics. The Chi-square value is significant at the .05 level (p = .026). Thus, based on the Chi-square test, you would conclude that the distribution deviates significantly from the standard normal distribution. However, the Kolmogorov-Smirnov d test is not significant (p < .20). This pattern of results is not uncommon because the Kolmogorov-Smirnov test is not as much a precise procedure as it is a technique to detect gross deviations from some assumed distribution. Often, the Chi-square value is greatly affected by the way in which the distribution is "sliced up," that is, by the number of categories, minimum, and maximum

values that you choose. For example, if you slice the distribution for Sepallen into 23 pieces (enter 23 in the Number of categories box on the Parameters tab), rather than the default 10 categories, then the resulting Chi-square value is only marginally significant at the p = .04 level.

Of much greater importance is how the general shape of the observed distribution approximates the hypothesized normal distribution. Now, return to the Fitting Continuous Distributions dialog. On the Options tab, in the Graph group, you can choose to plot a histogram of the Frequency or Cumulative distribution with the Raw or Relative frequencies.
20

Accept the default graph selections and click the Plot of observed and expected distribution button on the Quick tab to produce the frequency histogram for this variable. (Note that you should still have 23 Number of categories on the Parameters tab.)

It seems that the distribution of Sepallen is bimodal, that is, it appears to have two "peaks." Also, a major lack of fit exists on the left side of the observed distribution where the first peak occurs. Thus you would conclude from the analysis that the continuous normal distribution probably does not provide an adequate model for the observed distribution.

21

References STATISTICS FOR ENGINEERS AND SCIENTISTS, THIRD EDITION Published by McGraw-Hill, Companies, Inc., 1221 Avenue of the Americas, New York, NY 10020. Copyright 2011 by The McGraw-Hill Companies, Inc. Statistics for Engineers an Introduction S.J. Morrison first published 2009, John Wiley & Sons, Ltd. Statistica software HELP menu www.statsoft.com

22

Das könnte Ihnen auch gefallen