Sie sind auf Seite 1von 34

Using the 2008 OFHS

Public Use File


A Self Guided Tutorial
*Stata Version*

Introduction
This tutorial is intended for persons who wish to use the
2008 OFHS Public Use File (PUF).
The PUFs exclude any information that could either
intentionally, or unintentionally identify a respondent.
Geographic information below the county level has been
removed.
The dataset is a record of the responses to the survey
questions at the respondent level.
The dataset is in a format that requires the use of SAS, a
statistical analysis software from SAS Institute.
The dataset is also available for SAS and SPSS. There
is a separate tutorial for SAS users.

STATA Users
Prerequisites
User has STATA Release 9 or Higher.
User has experience writing STATA programs.
User has an understanding of basic statistics,
including analysis of univariate data using
nominal and ordinal level variables.
User is comfortable with statistical terms such
as proportions, standard error, confidence
level, and confidence interval.

OFHS Background
The 2008 OFHS is the largest State
sponsored health survey in the U.S.
Previous surveys were completed in 1998
and 2004.
The survey had a sample size of 50,993.
The survey was stratified to have enough
respondents to do some analysis for each
county in the state.

Documents that you may download


before you get started.
OFHS Questionnaire
OFHS Codebook
These documents are available on the
OFHS web site.
http://grc.osu.edu/ofhs
Look on the Downloads page.

What you need to know about the survey.

Survey Design
Survey Questions
Imputation of Missing Values
Weighting of Responses
Constructed Variables

Survey Design
The survey is a stratified random sample of
Ohios non-institutional population.
Conducted through telephone interviews.
Land Lines (49,000 respondents)
Cell Phone (2,000 respondents)

Random Digit Dialing (land lines) within exchange


numbers associated with each county.
Exchanges are the first 3 digits of a seven digit phone
number.
The last four digits within each exchange are randomly
selected.

Survey Design
Cell Phones
Exchanges are at state level.

Over Samples
African Americans - Some Exchanges in 6 largest urban counties
have higher proportion of African Americans in the population. The
higher proportion exchanges were sampled at a higher rate.
Asian and Hispanics - Supplementation of survey with lists of
persons with hispanic or asian surnames.

Household clusters
Each household/family forms a cluster within the sample.
One adult and one child are randomly selected within the family.
Each response includes information on the adult, and the child (if there
are any children).
The adult who is most knowledgeable about the childs health responds
for the child.

Survey Design
The population of persons within each of the
strata (State, County, telephone exchange,
household, etc.) is already known or is collected
as a part of the survey.
A weight is established for each child and adult
which reflects the inverse of the probability of
being selected for the survey.
Indicators of the strata and the weights are used
in the STATA programs. We will come back to
this later on.

Survey Questions
In the survey questionnaire there are
different kinds of questions. They include:
Qs that help to establish the weights for the
survey.
How many children are in the family?
How many phone numbers are in the home?

Survey Questions
Qs that identify the demographic and
socioeconomic characteristics of the
individuals and the family.
Age, gender, race, ethnicity.
Family income, employment, occupation.
Education

Survey Questions
Qs that identify the insurance status of the
adult and child respondents.
Source of Coverage (Job based, Medicare,
Medicaid, etc.)
If no insurance, the length of time without
insurance.
Difficulty in getting insurance.
Types of Coverage (dental, prescriptions, vision
mental health)

Survey Questions
Health Status of Adult and Child

General health status


Chronic health conditions
Special Health Care needs
Functional disability
Height and weight

Survey Questions
Health Care Access, Utilization,
Satisfaction and Unmet needs.
Usual source of care
Care coordination
Specialists
Emergency room use
Hospitalizations
Types of unmet needs.

Survey Questions
Questions are at multiple levels.
Anchor Questions are questions that are
asked of everyone.
Qualifying Questions are questions that help
to narrow down who should be responding to
an in-depth question.
In-depth questions probe the dimensions of
the respondents experience with a particular
phenomenon.

Example of Question levels


Anchor
Question

D43. //Have you/Has person in S1// ever been told by a doctor or any other
health professional that //you/he// had diabetes or sugar diabetes?
01
YES
02 (Skip to D45)
NO
03
[VOLUNTEERED:] BORDERLINE
98
DK
99
REFUSED
D43a //Have you/Has person in S1// ever been told by a doctor or any other
health professional that //you/he/she// had TYPE 1 CHILD ONSET
DIABETES or TYPE 2 ADULT ONSET, DIABETES?
[INTERVIEWER NOTE: PROBE FOR TYPE, AND IF RESPONDENT SAYS
BORDERLINE CODE AS 03]
//Display response option 97, only if S15 = 02, 99.
// 97 (Skip to D45)
[VOLUNTEERED:] YES, GESTATIONAL OR
ONLY WHEN PREGNANT MENTIONED
01
02
03
04
98
99

YES - TYPE I (JUVENILE)


YES - TYPE II (ADULT ONSET)
[VOLUNTEERED:] BORDERLINE DIAGNOSIS ONLY
(Skip to D45)
NO, NEVER DIAGNOSED WITH DIABETES
(Skip to D45)
DK
(Skip to D45)
REFUSED

Example of Question levels


Qualifying
Question

D43b.
//If (s15 = 02) then ask://
//Was your/Was person in S1s// DIABETES only during a time
associated with a pregnancy? [INTERVIEWER: PROBE
FOR PROPER CODE]
01 (Skip to D45)
YES ONLY WHEN
PREGNANT
02
NO
98 (Skip to D45)
DK
99 (Skip to D45)
REFUSED

In Depth
Question

D44.
//Is your/Is person on S1s// blood sugar or
glucose level, which affects diabetes, USUALLY under
control or where a physician wants it, even if medication
is required Always, Usually, Sometimes, Rarely, or Never?
01
ALWAYS
02
USUALLY
03
SOMETIMES
04
RARELY
05
NEVER
98
DK
99
REFUSED

Question levels
Notice in the example that there are instructions
to skip to another question if the answer is no.
These are anchor questions and qualifying
questions which are eliminating persons from
answering the in-depth questions.
As a result, when a question is not asked of a
respondent it creates a missing value for the
respondent which is MISSING BY DESIGN.

Missing Values
Some data is missing in the survey
because the respondent refused to
answer the question, or did not know the
answer.
These kinds of missing values need to be
treated differently then those that are
missing by design.

Missing Values
There are some types of questions which
are very important to the survey design or
for public policy issues, for which it is not
acceptable to have values missing.
These include questions like:
Number of children in the family (design)
Family Income (public policy)

Imputation of Missing Values


Where it is important for the survey to not have any
missing values, the survey statisticians have replaced
the missing value, by imputing it from all of the other
survey respondents that answered other questions in the
survey like the respondent did.
Survey statisticians use very sophisticated models and
processes to do imputation, and the practice is well
accepted.
When using this survey to do analysis, it is expected that
the user will choose the form of the variable which
includes the imputed values.
These variables are labeled and typically have a suffix of
_imp.

Weighting
Weights for each adult and child response
which reflect the inverse of the probability
of being selected for the survey, are
constructed and should be used in all
analysis.
When the weights are used, the results
reflect an accurate reflection of the entire
population.

Weighting
If the weights for children in the OFHS
were summed up across all responses,
the total would be equal to the child
population of Ohio. The same is true of
the adult weights.
The variable name for the adult weight is
wt_a.
The variable name for the child weight is
wt_c.

Constructed Variables
There are many variables in the OFHS file
that are constructed from the responses to
the survey questions that make it easier to
use the OFHS. These variables include:
BMI Body mass index. BMI is an indicator
of adult and child obesity constructed from
height and weight. The formula is
complicated, especially for children. We
make it easier for the user to do analysis of
obesity by pre-calculating it.

Constructed Variables
Insurance Type In many instances,
respondents to the survey had more than one
source of insurance. For example, many
seniors have insurance from their private
pension plans and Medicare. For the purpose
of creating an unduplicated count of the
population by their insurance status, we have
created a variable which imposes a hierarchy
of insurance sources to classify the
population.

Using Stata with the OFHS

Step 1. Download and Un-zip the Stata dataset.


Step 2. Open dataset in Stata.
Step 3. Set survey design parameters in Stata.
Step 4. Build and run your first OFHS Stata Program

Download and Unzip the Stata dataset.


You will find the OFHS Public Use Dataset at:
http://grc.osu.edu/ofhs/datadownloads/index.htm

Right click on the file name and select save


target as.
Save the ZIP file to the directory where you will
store the data (c:\statadata\ofhs2008).
After the file has been saved, run winzip, saving
the unzipped file to the same directory.

Setting survey design parameters


After you open the data in Stata, you will have to set the
survey design parameters prior to running any analyses.
To do this, type the following command in the command
window in Stata. (Note: You will have to do this EVERY
time you open the data.)
If conducting analyses on adults:
svyset masterid [pweight=wt_a], strata(stratum)
singleunit(certainty) vce(linearized)

If conducting analyses on child population:


svyset masterid [pweight=wt_c], strata(stratum)
singleunit(certainty) vce(linearized)

Build and run your first OFHS Stata Program


You should only use procedures in Stata that
support the use of complex survey designs.
Including:
svy: mean (estimates means)
svy: prop (estimates proportions)
svy: tabulate (provides tables)
A detailed list of commands that support the use of
complex survey designs can be found by going to the
Help menu in Stata (found in toolbar), choosing Stata
command, and typing svy estimation

Proc Surveymeans
Here is a simple program which calculates the percent of children by Insurance Type.
It includes a 95% confidence interval around the mean.
Note that you have already entered all of the sampling design parameters (at the
beginning of your session).
Remember that to calculate any adult variables, you will have to re-enter your design
parameters, using the code provided on slide 28.

svy: tab i_type_c, ci

Svy: tab results


(with a little cutting and pasting and formatting of values)

Child Insurance Type

1: Medicaid & Medicare

Proportions

95% C.I. Lower


Bound

Std. Error

95% C.I. Upper


Bound

1.94%

0.17%

1.64%

2.30%

2: Medicaid, No Medicare

30.92%

0.55%

29.84%

32.01%

3: Medicare, No Medicaid

0.64%

0.09%

0.50%

0.83%

53.29%

0.57%

52.16%

54.42%

5: Directly Purchased

2.55%

0.18%

2.22%

2.93%

6: Other

0.63%

0.09%

0.47%

0.84%

7: Insured Type Unknown

5.99%

0.29%

5.45%

6.57%

8: Uninsured

4.04%

0.21%

3.65%

4.48%

4: Job-based Coverage

Total

100.00%

svy: tabulate
Now you might add some domain analysis to this,
breaking out insurance status for children by poverty level.

generate poverty200=.
replace poverty200=0 if h87_imp>4
replace poverty200=1 if h87_imp<=4
replace poverty200=. If h87_imp==.
svy: tab i_type_c if poverty200==0, se ci
svy: tab i_type_c if poverty200==1, se ci

Svy: tabulate with an if statement


Child Insurance Type if FPL>=201%

Proportions

Std. Error

95% C.I. Lower


Bound

95% C.I. Upper


Bound

1: Medicaid & Medicare

0.43%

0.09%

0.29%

0.65%

2: Medicaid, No Medicare

7.64%

0.45%

6.80%

8.57%

3: Medicare, No Medicaid

0.57%

0.11%

0.39%

0.84%

80.40%

0.63%

79.14%

81.60%

5: Directly Purchased

3.48%

0.29%

2.95%

4.10%

6: Other

0.64%

0.13%

0.43%

0.94%

7: Insured Type Unknown

4.56%

0.34%

3.95%

5.27%

8: Uninsured

2.28%

0.20%

1.91%

2.72%

4: Job-based Coverage

Total
Child Insurance Type if FPL<201%

1: Medicaid & Medicare

100.00%
Proportions

Std. Error

95% C.I. Lower


Bound

95% C.I. Upper


Bound

3.77%

0.35%

3.14%

4.51%

2: Medicaid, No Medicare

58.93%

0.86%

57.23%

60.61%

3: Medicare, No Medicaid

0.74%

0.13%

0.52%

1.04%

20.67%

0.68%

19.37%

22.03%

5: Directly Purchased

1.42%

0.19%

1.10%

1.84%

6: Other

0.62%

0.14%

0.40%

0.96%

7: Insured Type Unknown

7.69%

0.48%

0%

8.70%

8: Uninsured

6.16%

0.40%

5.43%

6.98%

4: Job-based Coverage

Total

100.00%

The END

Das könnte Ihnen auch gefallen