Sie sind auf Seite 1von 2

PH6420 Fall 2015 : Assignment 3

Due: October 26, 2015

Overview
Write one program for parts A/B and one program for part C. When creating new variables make
sure that you account for possible missing data in the original variable.
Submit the 2 SAS programs answering all questions. You may include the output if it helps you
answer the questions, but it is not required.
PART A:
1.

Create a SAS dataset that reads in the variables age, sex, income, educ, hdlbl, potassbl,
and potass12 from tomhs.dat.

2.

Within the data step create a new variable called agecat that divides age into 5 categories:
45-49; 50-54; 55-59; 60-64; 65-69. Gives values of 1-5 for the categories.

3.

Create a new variable called income40 equal to 0 if income is less than $40,000 and
equal to 1 if income is $40,000 or more. Create a new variable called collgrad equal to 0
if the participant did not graduate from college and equal to 1 if the participant graduated
from college. (see study forms for income and education categories).

4.

Create a new variable that is the smallest of the two serum potassium values (variables
potassbl and potass12).

5.

Run PROC MEANS for all variables to help verify that you read-in and created the new
variables correctly (For example, income and income40 should have the same number of
valid observations). What percentage of participants graduated from college? What
percentage of participants have incomes of 40K or more.

6.

Using PROC FREQ display the 2 by 2 table of income40 and collgrad. Are college
graduates more likely than non-graduates to make $40,000 or above? Give summary
statistics to justify your answer (You may add the CHISQ option to the table statement if
you want to statistically test if the two variables are related).

7.

Display the frequency distribution of the smallest potassium. What percentage of


participants had a serum potassium less than 3.5.

8.

Most studies show that women have lower rates of heart disease then men. One
mechanism for this lower rate is that women have higher HDL cholesterol than men. Run
a SAS procedure that addresses this question using the TOMHS data. Do women tend to
have higher HDL than men?

PART B:

1.

Add to your program in Part A syntax to create suitable formats (use PROC FORMAT)
for sex, income and agecat. (this can be put at the top of your program or after the data
step).

2.

Add suitable labels for each variable read in and each new variable. This can be put in the
data step.

3.

Use PROC FREQ to obtain a frequency distribution of sex, income and agecat. Apply
the formats created so that the formatted values are displayed rather than the numeric
values. What percentage of subjects are 65 years or older?

PART C:
The following observations are made-up data from four hospital stays. The data is comma
delimited with the variables: Patient ID, date of birth of patient, date of hospital admission, date
of hospital discharge, and total cost of stay in dollars.
001,10/21/1946,12/12/2004,12/14/2004,8000
002,05/01/1980,07/08/2004,08/08/2004,12000
003,01/01/1960,01/01/2004,01/04/2004,9000
004,06/23/1998,11/11/2004,12/25/2004,15123
.
1.

Create a SAS dataset that reads in the data. Call the variables id, dob, admit, dischrg,
and fee. Read in id as a character variable. You may type the data within the program or
read it in from the data file hosp.csv on the class website. Note: To input the date
variables you will need to use colon modifiers on the input statement (See Lecture 2).

2.

Compute a variable called staydays which is the number of days spent in the hospital..

3.

Create a variable called age that is the age of the patient in years at time of admission.

4.

Create a variable called costperday that is the average cost per day.

5.

Use PROC PRINT to display all the variables. Provide date formats for each date
variable.