Sie sind auf Seite 1von 8

Generating new variables in SPSS: The Transform/Compute command

SPSS provides a number of useful commands for analyzing data other than those we have introduced in the text. Three of these allow us to use existing variables to create new variables. The commands involved and their respective function are described in Table 1.
Table 1 Summary of advanced SPSS commands
Command Recode Compute Multiple response What does it do? Creates a new variable based on the values of an existing variable. In other words, it is used to collapse values or categories into a fewer number of categories. This is similar to the recode command but creates a new variable based on the responses to a combination of existing variables. Combines the answers to multiple response questions into a single variable.

Computing new variables Imagine that we asked respondents six questions that rate satisfaction with medical services at the local hospital. Each question asks respondents to rate an aspect of the service (such as nursing care) on a four point scale, with 0 indicating Completely unsatisfied, 1 indicating Somewhat unsatisfied, 2 indicating Somewhat satisfied, and 3 indicating Completely satisfied. The responses to each of these six questions will be entered in SPSS as six separate variable. However, we may have asked these questions because we want to look at the combination of scores for these questions so that we can get an overall measure of satisfaction that incorporates all these individual aspects to hospital service levels. It is intuitively clear that we might add each persons scores together for these six questions; a person who was completely satisfied with all aspects of the hospitals service will get an aggregated score of (6 3 =) 18, while a person who was completely dissatisfied with every aspect of the hospitals service will score a (6 0 =) 0. SPSS provides a method of performing such a procedure through the Transform/Compute command. Its function is to transform the values of an existing variable or combination of variables to produce a new variable. In the example we will work through, we will perform a very simple compute procedure where the scores for a set a variables are summed together to produce a new variable containing the total value. But it should be pointed out that SPSS provides many other more complicated functions for computing new variables from existing variables. To learn the basics of the compute command we will use the data in the 1991 U.S. General Social Survey file that comes with the SPSS program and is included on the CD with the text. We will work with the hlth1-hlth9 variables that we also use in discussing the Multiple Response command in the accompanying chapter. The data for these variables came from the set of questions in Table 2. Say that I want to find out how many times in total each respondent answered Yes to the items in the list. In other words, I want to create a new variable that will indicate a 9 for cases who responded Yes to all 9 health related items in the survey, and 8 to cases who answered Yes to any 8 items, and so on to 0 which is assigned to all cases who did not answer Yes to any of the items.

Statistics for Research

Table 2 Health questions from the 1991 US General Social Survey Thinking about health related matters did any of the following happen to you since February/March 1990? (please tick all relevant responses) I was ill enough to go to the doctor I sought counselling for mental problems I had problems with infertility I suffered from a drinking problem I used illegal drugs My child had to go to hospital My partner had to go to hospital A close friend died My child suffered from drug or alcohol problems

Yes

No

Dont know

Checking the coding scheme When using the Compute command it is advisable, as a preliminary step, to look at the coding scheme of the existing variables that will be used to generate the new variable. Table 2 presents the coding scheme used in the 1991 U.S. General Social Survey for the hlth1-hlth9 variables:
Table 2 Coding scheme for hlth1-hlth9
Response No Answer Provided Yes No Dont know Not applicable Value label NAP Yes No DK NA Value 0 (missing) 1 2 8 (missing) 9 (missing)

We ideally want SPSS to add up the 1s in each of the hlth1-hlth9 variables so as to give us the total number of Yes responses. Unfortunately we cannot simply ask SPSS to sum up all the scores in each of these 9 columns of data to produce a new variable. If a case returned a value of 7 for this aggregated variable, we cannot tell, on the basis of the initial coding scheme, whether this is because: they answered Yes to 7 items, or they answered No to 3 items, Yes to 1 item, and left the remaining 4 items unanswered, or they answered No to 2 items, Yes to 3 items, and left the remaining unanswered. This is why data with No or Never responses should be coded as 0, and Yes responses coded 1. It is conceptually appropriate since 0 does signify a null case, and it is also practically useful when undertaking SPSS analysis, as we are discovering here where such a coding scheme has not been followed. To deal with the fact that the 1991 U.S. General Social Survey data are not coded this way, we use the Recode command to change give these variables a more appropriate coding scheme, illustrated in Table 3. The actual steps I used in the Recode command to make these transformations for hlth1 is presented in the Figure 1 so that you can open your copy of the 1991 U.S. General Social Survey and make the same changes, if you want to follow the Compute procedure I am about to detail.

The SPSS Compute Command


Table 3 Recoded values for hlth1-hlth9
Response No Yes No Answer Provided Dont know Not applicable Value label No Yes NAP DK NA Value 0 1 7 (missing) 8 (missing) 9 (missing)

Figure 1 Recoding hlth1-hlth9

The Compute command With this recoded data it is clear that if we now ask SPSS to sum the scores in the hlth1rechlth9rec variables we will get the desired result (scores of 7, 8, or 9 will not be summed since these are coded as missing values). The procedure for doing this is presented in the Table 4. When the Compute command is completed the new variable, which we have called hlthall, is added to the end of the data file. If we look at Figure 3 we can see how the logic of the compute command works to create the new variable. We can compare the scores for a handful of cases to check that the Compute command has turned out the way we hoped. 1. Case 1 had No Answer Provided for each of the hlth1rec-hlth9rec variables, which is coded as a missing value. The compute command produces the desired result of a missing value for the hlthall variable.

Statistics for Research

2. Case 2 answered Yes to hlth1rec and hlth2rec but No to the remaining variables, and is therefore given a value of 2 for the computed variable. 3. Case 3 answered No to all questions and is given a score of 0 for hlthall.
Table 4 The SPSS Compute command
SPSS command/action 1 From the menu select Transform/Compute 2 3 4 5 6 7 8 9 In the area headed Target Variable give the new variable a short name, such as hlthall Click on the Type&Label button In the area next to Label type Health1-9 Total Yes Responses Click Continue In the area headed Functions scroll down and select Sum(?,?) Click on Highlight over ?,? and from the source variable list select Ill Enough to Go to Doctor (hlth1rec) Click on Select Counselling for Mental Problems (hlth2rec) This pastes hlth2rec into the function This pastes hlth1rec into the function This will allow you to give the new variable a variable label This provides the new variable with an explanatory label This returns you to the Compute Variable dialog box This function will add up the items pasted in between the functions brackets This pastes the selected function into the Numeric Expression area Comment This brings up the Compute Variable dialog box

10 Type , 11

12 Click on 13 Type , 14 Continue selecting and pasting the remaining hlth3rec-hlth9rec variables

15 Click Ok

Figure 2 The SPSS Compute Variable and Compute Variable: Type and Label dialog boxes

Figure 3 The Data Editor with the computed variable

The SPSS Compute Command

Having checked to see that the compute command has indeed worked the way we want it to work, it is important to save the file so that the new computed variable is locked into the data set and does not need to be recomputed each time we enter the file. This variable will now appear, like all the other variables, in dialog boxes in the source variable list from which we choose variables to analyze. A simple frequency table for this variable will produce the following results (Figure 4).

Figure 4 Frequency table for the computed variable

We can see that 300 cases did not answer Yes to any of the health variables, indicating that (according to this measure) they had no health problems. At the other end of the scale, we can see that no cases answered Yes to all health related questions. In fact the highest number of items to which a Yes was given is 6. We could use this variable for more elaborate analysis, for example by calculating the mean number of health related problems suffered by women as compared to men, or by correlating number of health related problems with age. Computing new variables using functions In the following example we will illustrate how the Compute command can involve more elaborate transformations of existing data than simply summing a set of existing scores. We will use the Employee data.sav file that comes with the SPSS program, and which is included on the CD that comes with this text. When you open this file you will notice that one variable, bdate, contains the birth date for each case (Figure 5). The data entered for this variable are not the usual numeric type, but rather defined as Date format. The Date format for data allows us to enter dates in different ways, such as month and year, or as in this case month, day, and year.

Figure 5 The bdate variable

Statistics for Research

Knowing a persons birth date should allow us to determine each persons age at a specified date, and this age variable could then be used in further analysis. We therefore need to compute each persons age, by calculating the difference between their respective birth dates and some other specified date, such as the date of the survey, which we will take for the purposes of illustration to be 30 June 1995. Unfortunately, the date of the survey is not in the data set, so we need to create a new variable that will store this information. We follow the procedures for defining variables detailed in Chapter 2 of the text (p. 22), but in the variable Type column of the Variable View we select the radio button next to Date and then select the specific date format we want. In this instance we choose mm/dd/yy which is the same format for the other date variable we are working with. We then click on Continue and O K to return us to the Data Editor window (Figure 6).

Figure 6 The Define Variable Type dialog box

Now that we have defined the survey date variable, we need to enter the survey date (which we are assuming to be 30 June 1995) into each row. This may seem like a tedious process, entering 06/30/95 474 times. But there is a short-cut. We enter 06/30/95 into the first row (Figure 7).

Figure 7 Entering the survey date into the first row

We then (Figure 8): click on this first cell, select Edit/Copy, click on the variable name so that the whole column is highlighted, and select Edit/Paste.

The SPSS Compute Command

Figure 8 Copying a cell into a whole column

Now that we have taken care of the preliminary step of adding this new variable, we can now use it in the Compute command to help us generate a variable that will contain the age in years for each respondent (Table 5, Figure 9).
Table 5 Computing respondents age in years
SPSS command/action 1 From the menu select Transform/Compute 2 3 4 5 6 7 8 9 In the area headed Target Variable give the new variable a short descriptive name such age In the area headed Functions scroll down and select CTIME.DAYS(timevalue) Click on Highlight over timevalue and from the source variable list select srvydate Click on Type From the source variable list select Date of Birth(bdate) Click on This pastes Date of Birth(bdate) into the function This divides the number of days between the two dates by 365.25, which calculates the difference in years This pastes srvydate into the function This function calculates the number of days between two date format variables This pastes the selected function into the Numeric Expression area Comment This brings up the Compute Variable dialog box

10 Type /365.25

11 Click Ok

Statistics for Research

Figure 9 The SPSS Compute Variable dialog box

The result of this command will be that a new column is added to the Data Editor containing the age in years for each case (Figure 10).

Figure 10 The SPSS Data Editor with the computed age variable

If the file is saved this variable will be a permanent element of the data set and will appear with the other variables in dialog boxes.

Das könnte Ihnen auch gefallen