Sie sind auf Seite 1von 11

Statistics

11/6/2014

Analysis of the National Survey on Drug Use and Health


The National Survey on Drug Use and Health is an annual survey of the civilian
non-institutionalized population of the United States of America who are the age of twelve
or older. The Data collected is used to create both national and state-level estimates of the
use of illegal drugs, alcohol and tobacco products. The survey also gathers data on the
mental health of the civilian non-institutionalized population. This includes residents of
households living in houses, condominiums, apartments, dorms, shelters, group homes and
civilians living on military bases. Active military personnel, homeless and transient persons
are not included. The Survey polls approximately 70,000 randomly selected individuals
and is funded by The Substance Abuse and Mental Health Services Administration. The
information collected from The National Survey on Drug Use and Health provides national
and state-level estimates on the use of tobacco products, alcohol, illicit drugs (including
non-medical use of prescription drugs) and mental health in the United States.
The survey is conducted through an in-person interview at the selected persons
place of residence. The survey uses a combination of computer assisted personal
interviewing (CAPI) to collect demographic information such as race, ethnicity, age and
income. The computer assisted personal interviewing system is carried by the interviewer
and improves survey data quality by preventing the interviewer from accidently skipping
questions, preventing data entry errors, and alerting the interviewer to inconsistent
responses. (Gilbert). An audio computer-assisted self-interviewing (ACASI) system is

used for the majority of questions in the survey. Audio computer-assisted self-interviewing
are used so that the interviewee may answer sensitive questions more honestly than if they
were asked by an interviewer since they can answer privately without feeling judged
(Gilbert). The main topics covered by this survey include usage of illegal drugs,
nonmedical uses of prescription psychotherapeutic drugs, use of alcohol and tobacco,
dependence and abuse involving drugs and alcohol, mental health problems, and the
treatment of substance abuse and mental health problems. Historically the focus of the
survey was on illicit drug use among youth and young adults, but over the years the scope
of the survey has become broader and has expanded into other areas
One of the main goals of the National Survey on Drug Use and Health is to identify
trends in the data from year to year, not just to capture drug use in a single year. Any single
survey is subject to both measurement and non-measurement errors, and one of the
strengths of the National Survey on Drug Use and Health is its ability to capture trends.
The survey must maintain a constant methodology in order to accurately analyze data from
year to year. However, the survey must also be updated yearly in order to keep up with the
various new drugs and the new ways of administering them. One of the largest issues is the
emergence of new prescription drugs, especially under various brand names. The survey
maintains a series of core questions which remain constant from year to year, while adding
additional questions in order to accurately capture any changes which may have affected
peoples interpretation of the survey questions. (SAMHSA 113)
The 2013 National Survey on Drug Use and Health was an extension of previous
years surveys from 2005 to 2009, which were extended through 2013. The sample design
for the National Survey on Drug Use and Health first stratifies the country by states, which

are the primary sampling units. Each state is then stratified into substratum based on
population, these are the secondary sampling units. Eight of the primary sampling units,
California, New York, Florida, Illinois, Michigan, Ohio, Pennsylvania, and Texas were
designated as large sample states with target sample sizes of 3,600. Actual sample sizes
ranged from 3,503 to 2,729. For the rest of the states the target sample size was 900 with
actual sample sizes ranging from 852 to 953. The primary sampling units were
substratified into 900 secondary sampling units, such that each large sample state had 48
secondary sampling units and each small state had 12 secondary sampling units. The
substratum were determined based on the population density of the area in order to yield
the same amount of elements in each region. (SAMHSA 114)
Within every secondary sampling unit, 48 Census tracts were selected with
probability proportional to their population size. The Survey uses Census Tracts, which are
the geographic regions designed for the national Census, as the first-stage in the three-stage
cluster design with equal probability. Within each selected Census tract, adjacent Census
blocks were merged to form the second-stage of the cluster sample, each with
approximately the same number of sample elements within. Then, one cluster of census
blocks was chosen within each Census tract with probability proportional to sample size.
These sampled segments were distributed evenly into four different samples, which were 3
month periods spanning the year. A list of 227,075 addresses was created from the
addresses in each of the area segments, of which 190,067 were deemed eligible. Addresses
were deemed ineligible if they contained active military personnel or were considered
institutionalized group quarters such as rehabilitation or mental health facilities. (SAMHSA
116) The households which were eligible are the third-stage of the cluster sample and were

selected randomly through an automated screening procedure programmed into a computer


carried by the interviewers. Finally, from zero to two people from each household were
chosen to participate in the survey (SAMHSA 114).
The sampling begins with each of the selected households or dwelling units being
mailed an introductory letter. A field interviewer then contacts them and collects
demographic information on all people living in the household or dwelling unit. A preprogrammed algorithm selects anywhere from 0 to 2 people to take the survey. This
algorithm is designed to select the appropriate number of people in each age group. The
interviewer interviews the selected number of residents separately in a private area of the
dwelling in order to maintain privacy. The first half of the interview is read by the
interviewer and imputed into a computer, while the second part of the interview consists of
the interviewee reading, or listening to the questions through headphones, and inputting the
data on a computer without the interviewer knowing their responses. This interview is also
available in Spanish if the interviewee requests it. Interviewees are paid $30 upon the
successful completion of the interview. (SAMHSA 115)
The National Survey on Drug Use and Health consists of core and noncore sections.
The core set of questions remain constant from year to year in an effort to gather data for
basic trend measurements of prevalence estimates. The core questions compose the first
part of the survey, and include initial demographic items, questions pertaining to use of
tobacco, alcohol, marijuana, cocaine, crack cocaine, heroin, hallucinogens, inhalants, pain
relievers, tranquillizers, stimulants and sedatives. The supplemental or non-core questions
include perceived risks of substance abuse, injectable drug habits, substance dependence or

abuse, incarceration, treatment for substance abuse, health insurance coverage and income.
(SAMHSA 115)
The data collected by field interviewers is then processed into a raw data file. Only
interviews in which data on lifetime cigarette use, and at least 9 out of 13 of the other
substances in the core section existed. Written responses in which the participants marked
OTHER and specified were assigned a numeric code that was run through an online
database to generate the appropriate variable. The next step in processing the data was the
logical editing of the data. Logical editing of the data involved using data from the
respondent on previous question in order to reduce the amount of item nonresponse.
(SAMHSA 118) This includes items that were accidently skipped, and also to make related
data elements consistent with each other as well as any other inconsistencies which may
have arisen in the responses. For example, if a respondent had reported his or her last time
using a drug more than 12 months ago, but also reported first using the drug at his or her
current age, then it is impossible for both of those statements to be true. If this occurred, the
inconsistent period of most recent use was replaced with an "indefinite" value, and the
inconsistent age at first use was replaced with a missing data code. (SAMHSA 119)
For most variables, any missing or ambiguous responses in The National Survey on
Drug Use and Health are imputed using a methodology known as predictive mean
neighborhoods, which was developed specifically for the survey. Predictive Mean
Neighborhoods allow for the ability to use covariates to determine donors, which is an
improvement over hot-deck imputation. PMN also allows for the relative importance of
covariates can be determined by standard modeling techniques, the correlations across
response variables to be accounted for by making the imputation multivariate, and

sampling weights to be easily incorporated in the models. PMN is a combination of a


model-assisted imputation and a random nearest neighbor hot-deck procedure. The hotdeck procedure within the PMN is used to ensure that missing values which are imputed
are consistent with values for other variables. Variables imputed using PMN are the core
demographic variables, core drug use variables such as most recent usage, frequency of use
and age of first use, income, health insurance, work status, and immigrant status.
(SAMHSA 120)
In standard hot-deck imputation, item non-response is replaced with a response
from a similar respondent who has a value for that particular data point. For random nearest
neighbor hot-deck imputation, the missing or ambiguous value is replaced by a responding
value from a donor randomly selected from a set of potential donors. These potential
donors are those which are considered close according to a distance function to the unit
with the missing. In the hot-deck procedure for PMN, the donors consist of respondents
with complete data who have a predicted mean close to that of the item. The predicted
means are computed both for respondents with and without missing data. The donors
consist of either the set of the closest 30 respondents or the set of respondents with a
predicted mean within 5% percent of the predicted mean, whichever set is smaller. If no
respondents are available who have a predicted mean within 5 percent of the item value,
the respondent with the predicted mean closest to that of the item is selected as the donor.
In the modeling stage of Predictive Mean Neighborhoods method, the model chosen
depended on the individual variable nonresponse is being imputed. The models used in the
2013 survey include binomial logistic regression, multinomial logistic regression, poisson

regression, time-to-event regression and ordinary linear regression. In each of these models
they incorporate the sampling design weights. (SAMHSA 120)
The analysis weights were developed by using design-based weights as the product
of the inverse of the selection probabilities at each selection stage. This survey has a four stage sample selection scheme, where an extra stage of selection occurs at the Census Tract
before the selection of a segment. The designed based weights dk incorporate this extra
level of selection. Ak() represents an adjustment factor which is used to account for
nonresponse to poststratify to known population control totals. Weight adjustments were
based on Deville and Srndal's logit model. This generalized exponential model
incorporates specific bounds for the adjustment factor. The final weights w k=dk ak ()
minimize the distance function:

The purpose of using weights in this sample design is to ensure that the actual population
matches those surveyed. This approach was used at several stages of the weight adjustment
process which include adjustment of household weights for nonresponse at the screener
level, poststratification of household weights to meet population controls for various
household level demographics by state, adjustment of household weights for extremes,
poststratification of selected person weights, adjustment of responding person weights for
nonresponse at questionnaire level, poststratification of responding person weights and

adjustment of responding person weights for extremes. An effort was made to include as
many state specific covariates into each of the multivariate models as possible in order to
accurately adjust the weights. To clarify, the state specific covariates are variables which
may influence the population of a state to be more or less likely to use drugs. For example,
a state which has stricter laws and a larger police force may have less drug use among its
population due to there being less available drugs to consume. This would lead to lower
values for drug use and cause omitted variable bias in a regression model. It was not
possible to retain all state specific covariates because subdivision of state samples by
demographic covariates often produced sample sizes which were too small for the level of
accuracy wanted. Instead, a hierarchical structure was used in grouping states with
covariates defined at the national level, the census division level within the Nation, the
State group within the census division, and the State level. (SAMHSA 121)
The variance for the total number of drug users was estimated using a Taylor series
linearization approach. Estimates for the proportions of drug use were found using a ratio
estimate where d is a linear statistic estimating the number of substance users in the
d is a linear statistic estimating the total number of persons in domain d, and d
domain d,
is the estimated proportion of drug users:

Standard Error was estimated using a Taylor series approximation approach to find an
d is free of sampling error, an estimate of
estimate for the SE of d. When the domain size
the SE for the total number of substance users is:

d are those in which they


This standard error is accurate when the domain size estimates
have been forced to match the U.S. Census Bureau population estimates through the weight
d is not subject to any additional sampling error
calibration process. When this occurs,
d is not forced to match U.S. Census Bureau population
from the sample design. Even if
d is
estimates, the estimated standard error is likely accurate as long as the variance of
small relative to the variance of d (SAMHSA 124).
One of the largest concerns with the accuracy of the National Survey on Drug Use
and Health is the accuracy of the data collected. Since the survey collects self-reported
data, participants in the survey may provide inaccurate information either intentionally or
unintentionally (Harrison, Hughes 4). A participant providing incorrect data
unintentionally may be simply failing to remember all instances of drug use and their
frequency, or be unaware of what drugs they have actually consumed. It is also possible
that participants choose to lie on the survey because they are either ashamed of their drug
use, or suspicious of reporting data on activities which are illegal. For example, a minor
being asked about drug use may be more inclined to lie about their drug use habits if they
believe their parents or guardians may find out. Many people are also very hesitant about
providing information that would be self-incriminating to the government, for fear of being
arrested. It is for these reasons that privacy of the responses collected, as well as ensuring
that the interviewee believes that their information will be private is crucial. These are
some of the reasons that the interview is conducted alone, and on a ACASI system.

Socioeconomic as well as cultural factors may influence certain groups accuracy of the
data provided as well as response rates since some groups may be less trusting of the
government (Harrison, Hughes 7). While item response on the survey is generally high,
interviewees were allowed to give inconclusive or inconsistent responses on whether they
have ever used a given drug, or when their time of last use was. Ambiguous responses were
allowed because forcing interviewees to respond to questions they felt uncomfortable
answering would lead to false responses.
The National Survey on Drug Use and Health tends to be consistent with other
surveys on drug use, however there are some inconsistencies. Inconsistencies may be the
result of different populations being sampled, sampling methods, method of data collection,
questionnaires, and methods of estimation. (SAMHSA 137). One such example is the
Youth Behavioral Risk Survey, which finds higher usage of drug among high schools
students than the National Survey on Drug Use and Health. This discrepancy may stem
from differences in the sample design, or due to the survey being conducted in school as
opposed to in the household. Despite some inconsistencies with other surveys, The
National Survey on Drug Use and Health is widely used and accepted in various journals,
and is used extensively for other surveys and research projects. The survey is the primary
source of information on the use of illicit drugs, alcohol, and tobacco in the civilian,
noninstitutionalized population of the United States aged 12 years old or older.
Annotated Bibliography
Substance Abuse and Mental Health Services Administration, Results from the 2013
National Survey on Drug Use and Health: Summary of National Findings, NSDUH
Series H-48, HHSPublication No. (SMA) 14-4863. Rockville, MD: Substance Abuse
and Mental Health Services Administration, 2014.

This report was prepared by the Center for Behavioral Health Statistics and Quality
(CBHSQ), Substance Abuse and Mental Health Services Administration (SAMHSA),
U.S. Department of Health and Human Services (HHS). This report presents a detailed
account of the National Survey on Drug use and Health, including both the results from
the survey as well as sample design and analysis

Harrison, L., & Hughes, A. (Eds.). (1997). The validity of self-reported drug use:
Improving the accuracy of survey estimates (NIH Publication No. 97-4147, NIDA
Research Monograph 167). Rockville, MD: National Institute on Drug Abuse.
This source studies analyzed the conditions in which survey respondents will present
valid information when asked about drug use in surveys. It also analyzes the conditions
in which people will lie or misrepresent drug use in surveys.

Gilbert, Nigel. "Social Research Update 3: Computer Assisted Personal


Interviewing."Social Research Update 3: Computer Assisted Personal Interviewing.
N.p., Mar. 1993. Web. 04 Dec. 2014.
This Source documents the uses and history of CAPI as well as ACASI systems. It
describes the benefits as well as issues with using these systems and their impact on
the data collected.

s choolsdadsasdads

Das könnte Ihnen auch gefallen