Beruflich Dokumente
Kultur Dokumente
Introduction
This tutorial is intended for persons who wish to use the
2008 OFHS Public Use File (PUF).
The PUFs exclude any information that could either
intentionally, or unintentionally identify a respondent.
Geographic information below the county level has been
removed.
The dataset is a record of the responses to the survey
questions at the respondent level.
The dataset is in a format that requires the use of SAS, a
statistical analysis software from SAS Institute.
The dataset is also available for SAS and SPSS. There
is a separate tutorial for SAS users.
STATA Users
Prerequisites
User has STATA Release 9 or Higher.
User has experience writing STATA programs.
User has an understanding of basic statistics,
including analysis of univariate data using
nominal and ordinal level variables.
User is comfortable with statistical terms such
as proportions, standard error, confidence
level, and confidence interval.
OFHS Background
The 2008 OFHS is the largest State
sponsored health survey in the U.S.
Previous surveys were completed in 1998
and 2004.
The survey had a sample size of 50,993.
The survey was stratified to have enough
respondents to do some analysis for each
county in the state.
Survey Design
Survey Questions
Imputation of Missing Values
Weighting of Responses
Constructed Variables
Survey Design
The survey is a stratified random sample of
Ohios non-institutional population.
Conducted through telephone interviews.
Land Lines (49,000 respondents)
Cell Phone (2,000 respondents)
Survey Design
Cell Phones
Exchanges are at state level.
Over Samples
African Americans - Some Exchanges in 6 largest urban counties
have higher proportion of African Americans in the population. The
higher proportion exchanges were sampled at a higher rate.
Asian and Hispanics - Supplementation of survey with lists of
persons with hispanic or asian surnames.
Household clusters
Each household/family forms a cluster within the sample.
One adult and one child are randomly selected within the family.
Each response includes information on the adult, and the child (if there
are any children).
The adult who is most knowledgeable about the childs health responds
for the child.
Survey Design
The population of persons within each of the
strata (State, County, telephone exchange,
household, etc.) is already known or is collected
as a part of the survey.
A weight is established for each child and adult
which reflects the inverse of the probability of
being selected for the survey.
Indicators of the strata and the weights are used
in the STATA programs. We will come back to
this later on.
Survey Questions
In the survey questionnaire there are
different kinds of questions. They include:
Qs that help to establish the weights for the
survey.
How many children are in the family?
How many phone numbers are in the home?
Survey Questions
Qs that identify the demographic and
socioeconomic characteristics of the
individuals and the family.
Age, gender, race, ethnicity.
Family income, employment, occupation.
Education
Survey Questions
Qs that identify the insurance status of the
adult and child respondents.
Source of Coverage (Job based, Medicare,
Medicaid, etc.)
If no insurance, the length of time without
insurance.
Difficulty in getting insurance.
Types of Coverage (dental, prescriptions, vision
mental health)
Survey Questions
Health Status of Adult and Child
Survey Questions
Health Care Access, Utilization,
Satisfaction and Unmet needs.
Usual source of care
Care coordination
Specialists
Emergency room use
Hospitalizations
Types of unmet needs.
Survey Questions
Questions are at multiple levels.
Anchor Questions are questions that are
asked of everyone.
Qualifying Questions are questions that help
to narrow down who should be responding to
an in-depth question.
In-depth questions probe the dimensions of
the respondents experience with a particular
phenomenon.
D43. //Have you/Has person in S1// ever been told by a doctor or any other
health professional that //you/he// had diabetes or sugar diabetes?
01
YES
02 (Skip to D45)
NO
03
[VOLUNTEERED:] BORDERLINE
98
DK
99
REFUSED
D43a //Have you/Has person in S1// ever been told by a doctor or any other
health professional that //you/he/she// had TYPE 1 CHILD ONSET
DIABETES or TYPE 2 ADULT ONSET, DIABETES?
[INTERVIEWER NOTE: PROBE FOR TYPE, AND IF RESPONDENT SAYS
BORDERLINE CODE AS 03]
//Display response option 97, only if S15 = 02, 99.
// 97 (Skip to D45)
[VOLUNTEERED:] YES, GESTATIONAL OR
ONLY WHEN PREGNANT MENTIONED
01
02
03
04
98
99
D43b.
//If (s15 = 02) then ask://
//Was your/Was person in S1s// DIABETES only during a time
associated with a pregnancy? [INTERVIEWER: PROBE
FOR PROPER CODE]
01 (Skip to D45)
YES ONLY WHEN
PREGNANT
02
NO
98 (Skip to D45)
DK
99 (Skip to D45)
REFUSED
In Depth
Question
D44.
//Is your/Is person on S1s// blood sugar or
glucose level, which affects diabetes, USUALLY under
control or where a physician wants it, even if medication
is required Always, Usually, Sometimes, Rarely, or Never?
01
ALWAYS
02
USUALLY
03
SOMETIMES
04
RARELY
05
NEVER
98
DK
99
REFUSED
Question levels
Notice in the example that there are instructions
to skip to another question if the answer is no.
These are anchor questions and qualifying
questions which are eliminating persons from
answering the in-depth questions.
As a result, when a question is not asked of a
respondent it creates a missing value for the
respondent which is MISSING BY DESIGN.
Missing Values
Some data is missing in the survey
because the respondent refused to
answer the question, or did not know the
answer.
These kinds of missing values need to be
treated differently then those that are
missing by design.
Missing Values
There are some types of questions which
are very important to the survey design or
for public policy issues, for which it is not
acceptable to have values missing.
These include questions like:
Number of children in the family (design)
Family Income (public policy)
Weighting
Weights for each adult and child response
which reflect the inverse of the probability
of being selected for the survey, are
constructed and should be used in all
analysis.
When the weights are used, the results
reflect an accurate reflection of the entire
population.
Weighting
If the weights for children in the OFHS
were summed up across all responses,
the total would be equal to the child
population of Ohio. The same is true of
the adult weights.
The variable name for the adult weight is
wt_a.
The variable name for the child weight is
wt_c.
Constructed Variables
There are many variables in the OFHS file
that are constructed from the responses to
the survey questions that make it easier to
use the OFHS. These variables include:
BMI Body mass index. BMI is an indicator
of adult and child obesity constructed from
height and weight. The formula is
complicated, especially for children. We
make it easier for the user to do analysis of
obesity by pre-calculating it.
Constructed Variables
Insurance Type In many instances,
respondents to the survey had more than one
source of insurance. For example, many
seniors have insurance from their private
pension plans and Medicare. For the purpose
of creating an unduplicated count of the
population by their insurance status, we have
created a variable which imposes a hierarchy
of insurance sources to classify the
population.
Proc Surveymeans
Here is a simple program which calculates the percent of children by Insurance Type.
It includes a 95% confidence interval around the mean.
Note that you have already entered all of the sampling design parameters (at the
beginning of your session).
Remember that to calculate any adult variables, you will have to re-enter your design
parameters, using the code provided on slide 28.
Proportions
Std. Error
1.94%
0.17%
1.64%
2.30%
2: Medicaid, No Medicare
30.92%
0.55%
29.84%
32.01%
3: Medicare, No Medicaid
0.64%
0.09%
0.50%
0.83%
53.29%
0.57%
52.16%
54.42%
5: Directly Purchased
2.55%
0.18%
2.22%
2.93%
6: Other
0.63%
0.09%
0.47%
0.84%
5.99%
0.29%
5.45%
6.57%
8: Uninsured
4.04%
0.21%
3.65%
4.48%
4: Job-based Coverage
Total
100.00%
svy: tabulate
Now you might add some domain analysis to this,
breaking out insurance status for children by poverty level.
generate poverty200=.
replace poverty200=0 if h87_imp>4
replace poverty200=1 if h87_imp<=4
replace poverty200=. If h87_imp==.
svy: tab i_type_c if poverty200==0, se ci
svy: tab i_type_c if poverty200==1, se ci
Proportions
Std. Error
0.43%
0.09%
0.29%
0.65%
2: Medicaid, No Medicare
7.64%
0.45%
6.80%
8.57%
3: Medicare, No Medicaid
0.57%
0.11%
0.39%
0.84%
80.40%
0.63%
79.14%
81.60%
5: Directly Purchased
3.48%
0.29%
2.95%
4.10%
6: Other
0.64%
0.13%
0.43%
0.94%
4.56%
0.34%
3.95%
5.27%
8: Uninsured
2.28%
0.20%
1.91%
2.72%
4: Job-based Coverage
Total
Child Insurance Type if FPL<201%
100.00%
Proportions
Std. Error
3.77%
0.35%
3.14%
4.51%
2: Medicaid, No Medicare
58.93%
0.86%
57.23%
60.61%
3: Medicare, No Medicaid
0.74%
0.13%
0.52%
1.04%
20.67%
0.68%
19.37%
22.03%
5: Directly Purchased
1.42%
0.19%
1.10%
1.84%
6: Other
0.62%
0.14%
0.40%
0.96%
7.69%
0.48%
0%
8.70%
8: Uninsured
6.16%
0.40%
5.43%
6.98%
4: Job-based Coverage
Total
100.00%
The END