Assignment # 02: Course

Business Research Methods 5599
Approved Study Center

ALLAMA IQAL OPEN UNIVERSITY (AIOU)
ASSIGNMENT # 02
COURSE: BUSINESS RESEARCH METHODS (5599)
Level Executive MBA/MPA (3rd) Semester: SPRING 2010
Submitted to
MR.MAZHAR IQBAL
Submitted By
Muhammad Idrees
idreesiub7@gmail.com Roll # AD514761
Cell# 03006719422
1 Muhammad Idrees 0300 6719422

ACKNOWLEDGEMENT
I would like to thank first of all my teacher Mr.Mazhar Iqbal

for his kind briefing regarding this Assignment.
My teachers and internet were a big source in accomplishment
Of my Assignment.
My class fellows also deserve an Appreciation and Gesture of
Thankfulness since their guidance also helped a lot.
Thanks


An Introduction to Secondary Data Analysis
What are Secondary Data?

In the fields of epidemiology and public health, the distinction
between primary and secondary data depends on the relationship
between the person or research team who collected a data set and
the person who is analyzing it. This is an important concept because
the same data set could be primary data in one analysis and
secondary data in another. If the data set in question was collected
by the researcher (or a team of which the researcher is a part) for the
specific purpose or analysis under consideration, it is primary data. If
it was collected by someone else for some other purpose, it is
secondary data. Of course, there will always be cases in which this
distinction is less clear, but it may be useful to conceptualize primary
and secondary data by considering two extreme cases. In the first,
which is an example of primary data, a research team conceives of
and develops a research project, collects data designed to address
specific questions posed by the project, and performs and publishes
their own analyses of the data they have collected. In this case, the
people involved in analyzing the data have some involvement in, or
at least familiarity with, the research design and data collection
process, and the data were collected to answer the questions
examined in the analysis. In the second case, which is an example of
secondary data, a researcher poses questions that are addressed
through analysis of data from the Behavioral Risk Factor Surveillance
System (BRFSS), a data set collected annually in the United States
through cooperation of the Centers for Disease Control and Prevention
and state health departments. In this case, the person performing the

analysis did not participate in either the research design or data

collection process, and the data were not collected to answer specific
research questions.
As an example of the same data set serving as both primary and

secondary data, consider the increasingly common practice of one
researcher performing an analysis of data collected by a research
team with whom he or she has no connection. This type of analysis is
facilitated by the ease of sharing data stored electronically and the
concomitant creation of electronic data archives that allow access to
secondary users; some of these archives are discussed in Chapter 7.
Such analyses may serve a variety of purposes, such as addressing
questions not considered in the original analysis or examining how a
different analytic approach might change the conclusions reached
from the first analysis. In either case, the same data set serves as
primary data for the original research team and secondary data for
the researcher performing the later analysis.
This book deals primarily with secondary data in the sense of data sets
that can be obtained and analyzed in detail by the individual
researcher. There is another type of secondary data, again not
mutually exclusive with the first, meaning statistical information about
some geographic region or other entity. This type of information is
often useful to researchers: when you place your research project in
context by describing the racial makeup or median house value in the
metropolitan area where you conduct your research, the data used to
compute those statistics were probably secondary data. Often these
statistics are computed on data collected by the federal government,
and Chapter 7 discusses several websites that were created
specifically to permit easy access to these types of statistics. In
addition, many of the data sets described in this book are accessible

through an online interface that allows the quick computation of basic

statistics, without requiring the user to download data and use a
statistical program to analyze it. The availability of such interfaces has
been noted in the sections pertaining to each data set.
Most of the data sets discussed in this volume contain either data col-
lected through surveys or censuses, such as the National Health
Interview Survey and the U.S. Census, or administrative records such
as the medical claims records submitted to the Medicare system.
There are other types of secondary data, including diaries, video
recordings, and transcripts of interviews and focus groups: some of
these are included in sources discussed in Chapter 7. Data such as
interview transcripts are often analyzed using qualitative data methods
rather than the quantitative techniques appropriate for most of the
data sets discussed in this volume. Secondary analysis of qualitative
data is a topic unto itself and is not discussed in this volume. The
interested reader is referred to references such as James and Sorenson
(2000) and Heaton (2004).
Advantages and Disadvantages of Secondary Data Analysis
The choice of primary or secondary data need not be an either/or

question. Most researchers in epidemiology and public health will work
with both types of data in the course of their careers, and many
research projects incorporate both types of data. A more useful
approach to this question is to focus on selecting data that are
appropriate to the research question being studied and the resources
available to the researcher; the latter include time, money, and
personal expertise. In this spirit, we offer a summary of the major

advantages and disadvantages of working with secondary, as opposed

to primary, data.
The first major advantage of working with secondary data is economy:

because someone else has already collected the data, the researcher
does not have to devote resources to this phase of research. Even if
the secondary data set must be purchased, the cost is almost
certainly lower than the expense of salaries, transportation, and so
forth that would be required to collect and process a similar data set
from scratch. There is also a savings of time. Because the data are
already collected, and frequently also cleaned and stored in electronic
format, the researcher can spend the bulk of his or her time analyzing
the data. There is also the influence of preference: secondary data
analysis is an ideal focus for researchers who prefer to spend their
working hours thinking of and testing hypotheses using existing data
sets, rather than writing grants to finance the data collection process
and supervising student interviewers and data entry clerks.
The second major advantage of using secondary data is the breadth of

data available. Few individual researchers would have the resources to
collect data from a representative sample of adults in every state in
the United States, let alone repeat this data collection process every
year, but the federal government conducts numerous surveys on that
scale. Data collected on a national basis are particularly important in
epidemiology and public health, fields that focus primarily on the
health of populations rather than of individuals. In addition, some of
the data sets discussed in Chapters 2 through 7 collect data using a
longitudinal design, and others are designed so certain questions are
included annually or at regular intervals, allowing researchers to

examine the changes in health status and health behaviors in the

population over time.
The third advantage in using secondary data is that often the data col-
lection process is informed by expertise and professionalism that may
not available to smaller research projects. For instance, many of the
federal health surveys discussed in this volume use a complex sample
design and system of weighting that allows the researcher to compute
population-based estimates of health conditions and behaviors.
Although a local data collection project could conceivably use similar
techniques, more often a convenience sample, whose generalizability
is questionable, is used instead. To take another example, data
collection for many federal data sets is often performed by staff
members who specialize in that task and who may have years of
experience working on a particular survey. This is in contrast to many
smaller research projects, in which data are collected by students
working at a part-time, temporary job.
One major disadvantage to using secondary data is inherent in its

nature: because the data were not collected to answer your specific
research questions, particular information that you would like to have
may not have been collected. Or it may not have been collected in the
geographic region you want to study, in the years you would have
chosen, or on the specific population that is the focus of your interest.
In any case, you can only work with the data that exist, not what you
wish had been collected. A related problem is that variables may have
been defined or categorized differently than you would have chosen:
for instance, a data set may have collected age information in
categories rather than as a continuous variable, or race may have
been defined as only White/Other. A third difficulty is that data may

have been collected but are not available to the secondary researcher:
for instance, address and phone number information for survey
respondents may have been recorded by the original research team
but will not be released to secondary researchers for confidentiality
reasons. If an analysis incorporating geographic information was
planned, such a restriction might make the data set unusable. For
these reasons, a secondary data set should be examined carefully to
con-firm that it includes the necessary data, that the data are
defined and coded in a manner that allows for the desired analysis,
and that the researcher will be allowed to access the data required.
A second major disadvantage of using secondary data is that because

the analyst did not participate in the planning and execution of the
data collection process, he or she does not know exactly how it was
done. More to the point, the analyst does not know how well it was
done and therefore how seriously the data are affected by problems
such as low response rate or respondent misunderstanding of specific
survey questions. Every data collection effort has its “dirty little
secrets” that may not invalidate the data but should be taken into
account by the analyst. If the analyst was not present during the data
collection process, he or she has to try to find this information
through other means. Sometimes it is readily available; for instance,
many of the federal data sets have extensive documentation of their
data collection procedures, refusal rates, and other technical
information available on their websites or in published reports.
However, many other secondary data sets are not accompanied by
this type of information, and the analyst must learn to “read between
the lines” and consider what problems might have been encountered
in the data collection process.

Locating Appropriate Secondary Data
There is a vast quantity of secondary data in epidemiology and public

health that is available to the individual researcher. However, the
sheer quantity of data available, and the fact that the data are
collected and archived by many different governmental and private
entities, means that the process of locating appropriate secondary
data is not always straightforward. In fact, this book was written to
ameliorate some of the difficulties involved. There is no single process
to be followed in every case, but we offer two examples of the process
of locating and analyzing secondary data to address a specific
research question or problem.
This section might have been better titled “achieving a fit between
your research question and the data you choose to analyze” because
it is often an iterative process in which a research question is posed,
potential data sets are considered, the question is refined in terms of
the data available, other sources of data are considered, the question
is refined again, and so on. The most typical way to use secondary
data for research is to begin with a research question and seek a data
set that will allow analysis of that question. An alternative method is
to begin by selecting from among the available secondary data sets,
and then formulating a research question that may be answered using
the data chosen. Although the first method conforms more to
standard beliefs about how research is done, the second approach is
particularly useful in classroom instruction, and both methods can
produce quality research. If the researcher begins with a question and
then seeks out an appropriate data set, the following generalized
sequence of procedures may be useful:

1. Define the question you want to study; for

instance, “How does the experience of racism affect
an individual’s health?”
2. Specify the population you want to study. Are you

interested in children, adults, or people of all ages?
What races or ethnicities do you want to study? Do
you want to analyze a national sample or one con-
fined to a smaller area? What is the range of years
you would consider (e.g., you may onlybe interested
in data collected over the last 5 years)?
3. Specify what other variables you want to include in

your analysis. In this example, you might believe
that it was important to have information about the
respondents’ race, Hispanic ethnicity, age, gender,
income, and educational level in order to include
those factors in your analysis. If so, you must
confirm that the data you desire are contained in
the data set that you choose and that they are
recorded in a manner that is useful to you. If you
are interested in comparing the experiences of
Hispanic Blacks and non-Hispanic Blacks, infor-
mation about Hispanic ethnicity would need to be
recorded in the data set independently of
information about race.
4. Specify what kind of data is most appropriate for

your research question: for instance, can it best be
addressed through a national survey, examination
of hospital claims records, or transcriptions of

interviews? Also, specify if there are any specific

data collection techniques you believe are
particularly appropriate or inappropriate for your
question. For instance, if you do not believe people
would answer questions about racism honestly in a
personal interview, you would not consider any data
sets collected using that technique. However, if you
believe that a telephone survey would be the best
way to collect this information, you might begin
your search by looking at surveys that used this
data collection method.
5. Create a list of data sets that include information

related to your research question and examine
them to see if they meet your other requirements
(age range included, year of collection, etc.). This
is where the interactive process begins because you
may have to revise either your question or your
data requirements, depending on the data that are
available to you.
6. Once you have chosen your data set, examine the

variables you intend to use for the analysis of
problems such as missing data or out-ofrange
values. Also, read whatever information you can
find about the data collection process, data cleaning
procedures, and so on in order to evaluate whether
the data quality is sufficient to meet your needs. If
so, continue with the analysis; if not, either devise a
way to work around it (e.g., by imputing values for
the missing data) or choose another data set.

How do you generate the list mentioned in step five? By any means
necessary, as the saying goes. Consider the data sets described in
this book, search Medline to see what data sets other researchers
have used to address your topic, search the web portals listed in
Chapter 7, ask other researchers for suggestions, query relevant
email lists, and so forth.
If you take the approach of beginning with a data set and crafting a
research question that can be addressed using it, the process is sim-
ilar, but the order of events is different. In this case, you would
begin by looking at the variables contained in the data set and
considering how you might combine them to create an interesting
question. The process can begin with a germ of an idea, which may
reflect your personal interests or a question that has arisen in your
work. For instance, you might be interested in how disability affects
the amount of physical activity in which a person engages. You then
need to operationalize this question so it may be tested using the
variables available in the data set: how will you define disability, and
how will you define physical activity? At this point, a Medline search
for related articles would be in order, to see how others have
addressed similar questions and whether they have done so with the
data set you will be using. This step will help keep you from
reinventing the wheel and will place your research in context.
Alternatively, you can begin by simply looking at the variables included

in the data set to see which of them interest you. For instance, if you
were planning to work with the BRFSS data from 2005, you might
notice that eleven states included questions on weight control
procedures. You would then look at the actual questions asked and
confirm that the data were actually available. Information to answer

both questions can be found on the BRFSS website

(http://www.cdc.gov/brfss). This process should help you refine your
focus so you can craft a research question that can be answered
using BRFSS 2005 data and that would add to our understanding of
public health. Because the BRFSS includes racial and ethnic data, you
might decide to look at racial and ethnic differences in weight control
practices. Or, taking advantage of the fact that BRFSS data are
identified by state of residence, you could plan to conduct a
comparison of weight control practices in different states. You could
also plan a multilevel analysis that combined information about
state-level characteristics from the U.S. Census (e.g., racial makeup
or poverty level) with the individual-level data available in the BRFSS.
When you have selected the variables you will include in your analysis,
confirm that they are coded (or can be recoded by you) in a manner
that will support your intended analysis and that there are no major
data quality issues such as large quantities of missing data.
Questions to Ask About Any Secondary Data Set
Once you have located a secondary data set that you think is
appropriate for your analysis, you need to learn as much as you can
about why and how it was collected. In particular, you will want to
answer the following three questions:
1. What was the original purpose for which the data were collected?
2. What kind of data is it, and when and how were the data collected?
3. What cleaning and/or recoding procedures have been applied to the data?

Sources for this information include the website of the agency or other
entity responsible for collecting and/or making the data available,
published reports, research articles based on the data, and personal
communications with relevant individuals. For instance, many of the
federal agency websites include one or more contact people who are
available to answer questions about the data collected by that agency,
and a Medline search will often produce citations to reports and
articles discussing the procedures used to collect particular data sets.
The question of determining the original purpose of the project that

produced the data is important because its influence may be present
in other characteristics of the data, from the population targeted to
the specific wording of questions included in a survey. Because you
were not involved in planning phases for the project whose data you
will analyze, you need this information in order to place the data in
context. To take an extreme case, you would certainly want to know if
a research project on the health effects of smoking was sponsored by
a tobacco company or by a nonprofit dedicated to smoking prevention.
You would also like to know if there was any particular philosophy or
model of health behavior that shaped the project: for instance, was a
smoking cessation program structured using the Transtheoretical
Model. Knowledge of the core philosophical beliefs behind a research
project can illuminate the reasons for many choices made in the
planning and execution of the research and will be reflected in the
end product, the data you are proposing to analyze.
It is almost impossible to know too much about the data collection

process because it can influence the quality of the data in many
ways, some of them not obvious. To start with, you need to know
when the data were collected. A data set released in 2004 may have

been collected in the first 3 months of 2004 or over a 4-year period

from 2000 to 2003. Second, you want to know the process by which
the data were collected: was it via telephone interviews, in-person
interviews, abstraction of hospital records, or some other technique?
Third, you want to know the details of the data collection process.
Questions in this regard include who actually did the data collection,
how extensive was their training, and how carefully were they
supervised. If the data were collected through chart review, what
specific instructions were given to the reviewers? If the data were
collected through a survey, what was the response rate? How many
efforts were made to collect data from no responders? If data were
collected through a telephone survey, how were numbers selected?
Was there any attempt to correct for the bias introduced because
households with-out a telephone are not a random sample of all
households? The issues of survey data quality are the same whether
the data set is primary or secondary. For a thorough discussion of
these issues, consult a reference such as de Vaus (2002) or Bulmer,
Sturgis, and Allum (2006).
The third major question in working with any secondary data set is
what was done to the data after they were collected. For instance,
almost all data sets include some missing data. Were these data left
as missing, or were values imputed, and if so, how was the
imputation done? Was any data cleaning done to remove out-of-
range values, and were those cases assigned missing values or was
some other procedure followed? Were certain combinations of
answers considered invalid, and if so, how were they treated? A
famous example of this last type of procedure was the decision in the
1990 U.S. Census to recode to the opposite gender one member of a
same-gender couple who declared themselves to be married. If any

recoding has been done, is it possible to restore the original values?

You also need to find out if data can be weighted, and if so, for what
aggregations the weighting allows the production of accurate
estimates (e.g., at the national level alone or at both the national and
the state levels).
Secondary data
Secondary data is data collected by someone other than the user. Common
sources of secondary data for social science include censuses, surveys,
organizational records and data collected through qualitative methodologies or
qualitative research. Primary data, by contrast, are collected by the investigator
conducting the research.
Secondary data analysis saves time that would otherwise be spent collecting data
and, particularly in the case of quantitative data, provides larger and higher-
quality databases than would be unfeasible for any individual researcher to collect
on their own. In addition to that, analysts of social and economic change consider
secondary data essential, since it is impossible to conduct a new survey that can
adequately capture past change and/or developments.
Sources of secondary data
As is the case in primary research, secondary data can be obtained from two
different research strands:
 Quantitative: Census, housing, social security as well as electoral statistics

and other related databases.
 Qualitative: Semi-structured and structured interviews, focus groups

transcripts, field notes, observation records and other personal, research-
related documents.

A clear benefit of using secondary data is that much of the background work
needed has been already been carried out, for example: literature reviews, case
studies might have been carried out, published texts and statistic could have been
already used elsewhere, media promotion and personal contacts have also been
utilized.
This wealth of background work means that secondary data generally have a pre-
established degree of validity and reliability which need not be re-examined by the
researcher who is re-using such data.
Furthermore, secondary data can also be helpful in the research design of

subsequent primary research and can provide a baseline with which the collected
primary data results can be compared to. Therefore, it is always wise to begin any
research activity with a review of the secondary data.
Secondary analysis or re-use of qualitative data
Qualitative data re-use provides a unique opportunity to study the raw materials of
the recent or more distant past to gain insights for both methodological and
theoretical purposes.
In the secondary analysis of qualitative data, good documentation can not be

underestimated as it provides necessary background and much needed context
both of which make re-use a more worthwhile and systematic endeavour . Actually
one could go as far as claim that qualitative secondary data analysis “can be
understood, not so much as the analysis of pre-existing data; rather as involving a
process of re-contextualising, and re-constructing, data”.
Overall challenges of secondary data analysis
There are several things to take into consideration when using pre-existing data.
Secondary data does not permit the progression from formulating a research
question to designing methods to answer that question. It is also not feasible for a

secondary data analyst to engage in the habitual process of making observations

and developing concepts. These limitations hinder the ability of the researcher to
focus on the original research question.
Data quality is always a concern because its source may not be trusted. Even data
from official records may be unreliable because the data is only as good as the
records themselves, in terms of methodological validity and reliability.
Furthermore, in the case of qualitative material, primary researchers are often

reluctant to share “their less-than-polished early and intermediary materials, not
[1]
wanting to expose false starts, mistakes, etc.” .
So overall, there are six questions that a secondary analyst should be able to
answer about the data they wish to analyze.
1. What were the agency's or researcher's goals when collecting the data?
2. What data was collected and what is it supposed to measure?
3. When was the data collected?
4. What methods were used? Who was responsible and are they available for
questions?
5. How is the data organized?
6. What information is known about the success of that data collection? How
consistent is the data with data from other sources?

Advantages and Disadvantages of Using Secondary Analysis
Advantages
 Considerably cheaper and faster than doing original studies

 You can benefit from the research from some of the top scholars in your
field, which for the most part ensures quality data.
 If you have limited funds and time, other surveys may have the advantage
of samples drawn from larger populations.
 How much you use previously collected data is flexible; you might only
extract a few figures from a table, you might use the data in a subsidiary
role in your research, or even in a central role.
 A network of data archives in which survey data files are collected and
distributed is readily available, making research for secondary analysis easily
accessible.
Disadvantages
 Since many surveys deal with national populations, if you are interested in
studying a well-defined minority subgroup you will have a difficult time
finding relevant data.
 Secondary analysis can be used in irresponsible ways. If variables aren't
exactly those you want, data can be manipulated and transformed in a way
that might lessen the validity of the original research.
 Much research, particularly of large samples, can involve large data files and
difficult statistical packages.

Practical use of Secondary Data
Secondary Use of Personal Information in Health Research: Case Studies,

November 2002
Executive Summary
The challenge facing Canada today is to reach a workable and practical balance
between the value Canadians place on the improvement of their health, the
effectiveness of their health care services and the sustainability of their health
care system, and the equally compelling value they place on their right to privacy
and confidentiality with respect to their personal information.
Health research, particularly in the areas of health services and policy, population
and public health, critically depends on the ready availability of existing data about
people. Such data include health surveys; hospital, physician and laboratory
records; provincial and federal billing and registration data; birth and death
records; socio-demographic data; cancer registry data; and employment records.
Large volumes of such data are generally needed in order to assemble unbiased
samples from which health researchers can draw meaningful conclusions that are
representative of populations.
The data are analyzed for the purposes of:
 monitoring the health of the population;

 identifying populations at high risk of disease;

 determining the effectiveness of treatment;

 quantifying prognosis and survival;
 assessing the usefulness of preventive strategies, diagnostic tests and
screening programs;
 informing health policy through studies on cost-effectiveness;
 supporting administrative functions; and
 monitoring the adequacy of care.
Health research based on the secondary use of data contributes to our present
level of understanding of the causes, patterns of expression and natural history of
diseases. It also helps us to assess the impact of strategies for improving
prevention, diagnosis and treatment and to evaluate policies for increasing the
effectiveness and economic efficiency of health services.
Indeed, in the present climate of major public concern about the quality and
sustainability of our health care system, health research is urgently needed to help
inform and guide health care reform.
While health research is of great social importance, Canadians also highly value
their rights to privacy and confidentiality. These rights are intimately connected
with the right to respect for one's dignity, integrity and autonomy in a free and
democratic society and are constitutionally enshrined in the Canadian Charter of
Rights and Freedoms (Part I of the Constitution Act, 1982, being enacted as
Schedule B to the Canada Act 1982, c. 11) and Quebec's Charter of Human Rights
and Freedoms (R.S.Q. c. C-12). Privacy and confidentiality lie at the root of
international and national ethics guidelines, as well as professional codes of
conduct.
Data protection legislation is rapidly emerging across the country with different
requirements applying either at the provincial, territorial or federal level, to
personal information generally or personal health information, in the private or
public sector. Yet, health services and population health research often crosses
provincial or even national borders, and require access to general personal

information (e.g. income level, education, work history, etc.), as well as personal
health information (e.g. physician, laboratory, hospital records, registration and
billing data, etc.), that may be derived from either public or private sources. By
their very nature, therefore, these types of studies can potentially invoke multiple
laws with varying, and often, inconsistent requirements.
Despite the patchwork of existing legislation, most data protection laws are
generally modeled after the internationally accepted Guidelines on the Protection
of Privacy and Transborder Flows of Personal Data developed by the Organization
for Economic Cooperation and Development (OECD) in 1980. These guidelines
have since been adapted by Canadian businesses, consumer groups and
governments, under the auspices of the Canadian Standards Association,
reformulated into the Model Code for the Protection of Personal Information
CAN/CSA-Q830-96 (the 'CSA' Code) and more recently incorporated as Schedule 1
of the federal Personal Information Protection and Electronic Documents Act (S.C.
2000, c.5).
The CSA Code is based on ten fair information principles:
 Accountability
 Identifying purposes
 Consent
 Limiting collection
 Limiting use, disclosure and retention
 Accuracy
 Safeguards
 Openness
 Individual access
 Challenging compliance

Secondary Use of Personal Information in Health Research: Case Studies -

November 2002 It is in this context that CIHR created an ad hoc working group
composed of health researchers in order to provide real-life examples of research
involving the secondary use of data in Canada (see Appendix A for a list of
members and Appendix B for a description of the method followed). The objective
of this initiative is to foster dialogue:
 with those who draft policies and laws and those responsible for interpreting
them, by providing tangible illustrations of their practical application in the
health research context;
 among researchers about how to comply with the spirit of the fair
information principles and how to improve current information practices; and
 with privacy advocates and the broader public on the benefits and concrete
realities of health research.
Nineteen (19) case studies were developed to describe real-life examples of actual
research involving secondary use of data in Canada2. They attempt to outline why
each study is socially valuable and the benefits that have resulted, or may result.
They detail the kind of information health services and population health
researchers need and why. They explain how these researchers collect, use and
disclose data, what retention practices are followed, what security safeguards are
used and what review and oversight mechanisms are in place. When considered in
light of existing laws and ethics guidelines, these case studies highlight the
practical challenges that arise when applying various legal and ethical norms in the
specific context of population health and health services research. Accordingly, the
case studies identify a number of ethical and legal issues that warrant further
consideration and discussion.
An in-depth analysis of these cases studies focused on a number of important

questions specific to health services and population health research, including:
i. Why do health researchers need to make secondary use of data?

ii. Why is it sometimes impracticable to obtain consent?

iii. Why do research databases need to be retained over a long period of time?
iv. Why do current access policies and linkage practices among data custodians
vary so?
v. What security safeguards do health researchers currently use?
vi. What review and oversight mechanisms are currently in place?
2
Note, although case studies 5 and 6 involve secondary use of tissue originally
collected for another purpose, the present document does not purport in any way
to cover the complex issues specifically related to the collection and storage of
human tissue for research purposes. These warrant further development and
analysis that are beyond the scope of this project.
Secondary Use of Data
The case studies demonstrate that the ability to conduct health research,
particularly research on health services and population health, depends heavily on
large volumes of readily-accessible, existing data. Such data may include
information derived from: personal interviews; analyses of tissue samples; results
of scientific tests; physician, hospital and laboratory records; birth and death
records; billing claims and employee records. The case studies focus on examples
of research using data that were originally collected for another purpose
(secondary use of data). Such existing data are often found to be extremely useful
for identifying and understanding problems, as well as for providing potential
solutions.
Researchers who study health services or the health of populations rarely have
any direct interest in knowing the specific identities of the people they study. Their
focus is on aggregate trends. So, while personal information about identifiable
individuals may be the source of data, this type of research is conducted with
information that has either been made completely anonymous or has had as many

identifiers as possible removed and replaced with encrypted codes. Indeed, many
investigators conducting studies would not need any personal identifiers at all
were it not for the need to consider the effect of important individual
characteristics or to link data about individuals so as to construct histories over
time.
In some cases, therefore, the possibility of linking de-identified data to other

potentially identifying information remains crucial. This is necessary in order to:
 study the relationship between certain health determinants and health

status;
 group together individuals on the basis of common characteristics such as
age or geographic location; or
 track individuals over time in order to study the evolution of certain diseases
after long latent periods or to assess their progress through the continuum
of health care.
Researchers should implement deliberate strategies that make it impossible (or at

least extremely difficult) to determine the identity of an individual from the data
they use. Current practices for anonymizing, de-identifying and linking personal
information (whether carried out by the original data-holder before releasing the
data for research purposes or by the researchers themselves once in possession of
the data) tend to vary significantly according to what is considered 'identifiable'.
The ongoing challenge will be to reach agreement on what constitutes an
appropriate degree of identifiability, recognizing that this concept will continue to
evolve. Approaches for de-identifying and linking data need to achieve greater
consistency to streamline efforts for meeting and continually improving best
practices.

Consent
In clinical research studies, researchers directly interact with potential participants

in well-defined protocols and provide them with the detailed information required
for obtaining their informed consent. However, strict application of traditional
consent
Secondary Use of Personal Information in Health Research: Case Studies -

November 2002 procedures in health services and population health research
raises problematic issues, particularly for retrospective studies that rely on
already-existing, historical or archival data, including sample survey data. Among
the factors that often make seeking consent impracticable, impossible or self-
defeating in these particular types of studies are:
 the sheer size of the populations studied;

 the proportion of individuals who may have since relocated or died;
 the risk of introducing potential bias through the consent procedure itself
thereby affecting the generalizability and validity of research results;
 the creation of even greater privacy risks by having to link otherwise de-
identified data with nominal identifiers in order to communicate with
individuals so as to seek their consent;
 the risk of inflicting psychological, social or other harm by contacting
individuals and/or their families in delicate circumstances;
 the difficulty of contacting individuals directly when there are no ongoing
relations with them;
 the difficulty of contacting individuals even indirectly through public means
such as advertisements and notices; and
 the undue hardship that would be caused by the additional financial,
material, human, organizational or other resources required to obtain
individual consent.

As regards prospective data collection (for inclusion in a registry, for example),

obtaining express consent for future research purposes also poses a challenge. On
the one hand, obtaining specific consent for all possible secondary uses of the
information which often cannot be predicted at the time of collection, is not
feasible. On the other hand, obtaining unqualified, blanket consent for undefined
future health research purposes is empty and meaningless and may sometimes
reduce, rather than increase, privacy protection on false assurances that informed
consent has been obtained.
The case studies demonstrate the need for constructive, creative and innovative
ways of respecting peoples' right to know and to control how their information is
used without necessarily requiring that express consent be obtained from every
individual in every instance. The case studies demonstrate the need to develop
appropriate alternatives to the traditional consent model, specifically for
population health and health services research, taking into account the overall
balance of risks and benefits both to individuals and society as a whole. These
alternatives do not in any way abrogate the obligations to ensure, among other
things, that:
 an open, transparent and accountable process is implemented for managing

privacy;
 the appropriate confidentiality agreements are in place binding users of the
information; and
 effective safeguards are taken to protect the data against unauthorized
disclosure.
Retention and Destruction of Data
The existence of data archives and registries, and the retention of data by
researchers more generally, make it possible to reconstitute the data if and when
needed, since researchers are often required to retain data for possible verification

and auditing purposes (though these requirements tend to vary among sponsors
and/or publishers). They also enable the expansion of the research question to
eventually incorporate additional dimensions or examine related hypotheses in the
future. Finally, in unique cases, they allow the identification of potentially affected
patients in order to notify them about possible long-term risks of contracting fatal
diseases or experiencing adverse effects that were unknown at the time of certain
interventions (e.g. risks of contracting Creutzfeldt-Jakob disease in human growth
hormone trials, contracting HIV in hepatitis B trials or experiencing adverse effects
from certain vaccines).
The automatic destruction of data and/or all possible identifiers immediately upon
the fulfillment of each discrete research purpose would prevent these and other
important subsequent uses. Furthermore, the destruction of large databases would
result in a huge waste of valuable public funds. Having to re-create new data
archives for each new research project would be completely impossible and/or
entirely cost-prohibitive. Creative means need to be further explored to assess
under what conditions databases should be retained in the long-term and if so,
how they should be secured (for instance kept in the hands of trusted guardians,
subject to formal periodic audits and proper oversight).
Data Access Policies and Capacity for Data Linkage
Access to existing data for purposes of carrying out health services or population
health research often involves many different data stewards or custodians. These
may include hospitals, public health clinics and laboratories, physicians' offices,
research centres, pharmacies, employers, registries, health information producers,
and federal/provincial/ territorial/ municipal government departments and
agencies.
The case studies reveal a significant degree of variation in the access policies of
these different custodians. Data access policies may involve research agreements

that custodians require researchers to sign before releasing any data to them.
Whether such agreements are more or less sophisticated and/or detailed in
content and conditions is often dictated by different legislative requirements that
vary across institutional settings, sectors and jurisdictions. Data access policies
may also require, as a condition for releasing data, review and oversight by
designated bodies. Some custodians may require approval by their own internal
oversight body, in addition to any other review and oversight mechanism(s) that
may be required by law. The criteria by which these multiple oversight bodies will
consider data access requests will depend, once again, on any internal criteria
specific to the custodian and whatever other criteria are imposed by legislation
across institutional settings, sectors and jurisdictions.
Moreover, the ability of custodians to perform data linkage in-house for

researchers will largely depend on their capacity and resources, which also tend to
vary quite significantly. Custodians that can perform data linkage completely in-
house rarely need to release any identifying information to researchers. However,
custodians that do not have the capacity to perform this service on behalf of
researchers are often requested to release identifying information in order for
researchers to be able to carry out the linkage themselves. Clearly there is a need
to harmonize legislative requirements and data access policies, accommodating for
different capacity levels across institutions, in order to better streamline requests
for access to information for research purposes.
Security Safeguards
The case studies reveal a wide range of security safeguards that are commonly
used in the context of research, including:
 organizational safeguards, such as limited personnel access, security

clearance and employee confidentiality agreements;
 physical safeguards, such as locked rooms, filing cabinets and facilities; and

 Technological safeguards, such as data encryption, special algorithms,

passwords, access codes, tracking features, firewalls, etc.
Further options for protecting personal information are multiplying rapidly with
advances in computing technology and a spectrum of new solutions is emerging.
More sophisticated techniques are increasingly available for authenticating
individual users and limiting access to only the minimal data needed in the most
general form possible, thereby ensuring confidentiality of the data while also
retaining their usefulness for research purposes. The challenges now lie in: better
disseminating information about existing security systems and processes;
developing a set of minimum standards; ensuring greater consistency in the
application of minimum standards; and, continually reviewing, updating and
adapting those standards as technology advances. There is clearly a need to
identify best practices that are pragmatic, cost-effective and sufficiently flexible to
evolve over time and accommodate different research methods.
Review and Oversight Mechanisms
Health services and population health studies conducted in universities and

affiliated institutions are typically reviewed by research ethics boards (REBs). REBs
are composed of specialized experts in research, ethics and law, as well as lay
members of the community. REBs are close to the ground and sensitive to local
needs and values. They play a critical role in ensuring the protection of individual
privacy, within a larger ethical framework, in accordance with fundamental
principles of:
 respect for human dignity;

 respect for free and informed consent;
 respect for vulnerable persons;
 respect for privacy and confidentiality;
 respect for justice and inclusiveness;

 balancing of harms and benefits;

 minimizing harm; and
 Maximizing benefit.
In Canada, only those research studies approved by REBs are eligible for funding
by the three federal granting agencies and/or regulatory approval under the Food
and Drugs Act (R.S.C. c. F-27). In particular cases involving secondary use of data
or proposed data linkages, academic REBs consider, among other factors: the
sensitivity of the information involved; the possibility of identifying particular
individuals; the magnitude and probability of harm or stigma resulting from
identification; the context in which the information was originally collected; the
possibility of obtaining consent; the appropriateness of using alternative strategies
for informing participants and/or consulting with representative members of the
study group; as well as any legal provisions that may apply in the situation. In
their review, REBs apply a proportionate approach in balancing risks and benefits
and modulate their requirements accordingly. (Canadian Institutes of Health
Research, Social Sciences and Humanities Research Council, Natural Sciences and
Engineering Research Council, Tri- Council Policy Statement: Ethical Conduct for
Research Involving Humans, August 1998).
Areas for further improvement include: strengthening privacy expertise and

education of REB members particularly in light of rapidly evolving technology and
emerging legislation; ensuring adequate resources for REBs to meet their mandate
for continuing review, monitoring and periodic audits; increasing public
accountability and transparency of REBs; and, further exploring the relationship
between REBs, privacy commissioners and other oversight bodies. Indeed, in some
provinces, legislation requires that privacy commissioners or special privacy
committees designated by law also approve (or at least be notified of) the
proposed research or data linkage. As discussed above, various data custodians
may, in addition, require review and approval by their own internal data access
committees. The challenge, therefore, will be to align these various review and

oversight bodies to ensure complimentary forms of meaningful protection rather

than imposing unnecessary and duplicative hurdles.
Conclusion
In summary, the case studies provide examples of how researchers, who use
secondary data, attempt, through various ways, to comply with the spirit of the
fair information principles contained in the CSA Model Code. They suggest that
there is a need to further develop creative, effective and innovative mechanisms
for protecting privacy and confidentiality of data, as well as the need for ongoing
discussion and continual improvement of best practices. The CIHR case studies
further provide concrete illustrations of the importance for interpreting and
applying privacy laws and policies in a flexible, feasible and workable manner in
order to permit the valuable social benefits of health research to continue. The
case studies suggest that the health research community should work actively with
privacy advocates, consumers and the general public to identify and implement
strategies for balancing the right of individuals to have their personal information
protected and their desire for improved health, more effective health services and
a strengthened and sustainable health system.

Glossary
Term Definition
A
ACORN A classification of residential neighborhoods; a
marketing segmentation system that enables
consumers to be classified according to the type of
area they live - usually just by using the consumer's
postcode.
Ad Hoc In basic terms this means 'as and when' required.
Generally applied to single surveys which are
designed as a 'one-off' rather than continuous on-
going research?
Ad Testing Closely, related to promotion testing, ad testing
(a.k.a. advertisement testing) refers to various
methodologies that focus exclusively on gauging the,
perception, effectiveness, or targeting of
advertisements in a market be they single adverts or
a series. The major difference between ad and
promo testing is that promo testing is usually carried
out on a larger scale to measure the reaction to a
campaign.
Ad testing can deal with adverts of all types across

the spectrum and can be employed at any stage
throughout the advertisement development process.
The most frequent use for ad testing is to identify
the most effective advert(s) at the prototype stage
which helps in turn to eliminate the likelihood of
expensive and ineffective campaigns. Furthermore,
it tests the appropriateness of an advert to its
audience.
Some use the method to measure an adverts

competitiveness against a major competitor’s. In
this case researchers ask respondents to read
magazines or papers or ask them to watch a
television program with an advertisement break and
then ask the respondents to recall the adverts that

caught their attention or ask them for key

information that the adverts contained. A deeper
way of using this method is by conducting face-to-
face interviews or using focus groups to gain
consumer feedback to potential adverts or
advertising campaigns.
Researchers aim to analyze the ability of the
respondents to recall information, the affect that the
advert has on a respondent, its persuasive power,
whether respondents identify with the given setting
in the advert, understanding of the appeal, and the
respondents’ perception of the brand in question.
Agency An agency can be any individual, organization,
department or division (which includes those
belonging to the same organization as the client)
responsible for or acting on all or part of a market
research project.
Analysis This is the processing of data collected (which can
be qualitative or quantitative data) within a market
research project, which allows us to draw summary
and conclusions in relation to the project
B
Balanced Scale Used mainly in questionnaire formats, a balanced
scale is one on which the number of positive and
negative categories are equal, leaving the
respondent with a fair decision.
Bar Chart A chart which displays data in the form of bars
either vertically or horizontally with the length of the
bars relating directly to their value.
Bivariate Analysis This is the term for when two variables are being
measured at one time.
Brand A brand is a specific name, symbol or design which
is used to distinguish a particular seller's product
from any other.
Business to Business B2B research is research that relates to businesses
Research (B2B) rather than consumers. Whereas in the case of
consumer research, respondents could be asked of
their opinions upon a certain product or service, B2B
is concerned solely with business related issues.
C

CAPI CAPI stands for computer assisted personal

interviewing. It is performed with the respondent
face-to-face with the interviewer, using lap-top
computers or other such devices (e.g. IPAQ's). The
interviewer uses the computer to prompt the
questions and the interviewee's responses are
immediately inputted back into the computer which
enables the program to select the next appropriate
question. This enables the interview to pursue many
different, yet relevant, routes depending upon the
responses given by the interviewee. This form of
surveying is particularly effective because due to the
fact that the data is already on the computer, the
analysis becomes quicker than other methods.
CATI CATI stands for computer assisted telephone
interviewing. It is similar to CAPI but conducted over
the phone, rather than face to face.
CAWI CAWI stands for computer assisted web
interviewing. This is based on the same principle as
the two above, apart from that it is conducted over
the internet.
Census This is a study which incorporates every available

element within a defined population.
CHAID Used in the analysis of data, CHAID stands for chi-
square automatic interaction detection. It identifies
market segments that differ in response, screens out
extraneous predictor variables and collapses the
number of categories of explanatory variables.
Chi-Square Test Used for making a comparison in nominal variants
between hypothsised samples and a population
samples. The method is used in both univariate or
bivariate analysis.
Client A client is anybody, be it an individual, organization,
department or division, who is responsible for the
commissioning of, or who agree to subscribe to a
market research project.
D
Data Observations or measurements made that apply to
an area of the the marketing system.

Data Editing During the process of data editing, the data

collection instruments are tested to ensure that the
maximum accuracy in terms of results has been
attained. Checks are made on legibility, consistency
and completeness and above all the process aims to
avoid ambiguity.
Data Processing Data processing is the inputting of data into a
database format. Once in this format, data can be
tabulated and conclusions drawn as to the results of
the market research findings.
Data Sources There are four main sources of marketing data.
These are respondents, analogous situations,
experimentation and secondary data. For more
information on any of these sources, see glossary
definitions.
Depth Research A term which is used to describe a multitude of
data-collection techniques and procedures although
it is usually used in qualitative research with
individuals.
Desk Research This is a research technique which involves the
market researcher collating and drawing together
secondary sources of information. These secondary
sources can serve as complimentary to primary
sources or can be collated to form part of a larger
unrelated project.
Diary Methodology Diary methodology is a procedure which is used to
compile consumer purchase data or media habits.
During this procedure, respondents will be required
to complete a written report which details their
behavior over a certain period.
E
Editing Editing is the procedure that takes place in order to
ensure that a questionnaire has been correctly
completed.
Elasticity Elasticity is used to measure the exact amount of
volume shift responding to the variable under
scrutiny.
Element Sampling During this procedure, every unit in a specific
population has an equal probability of being chosen
for research.

Enabling Technique This is a technique that is used within a qualitative

interview that allows the respondent to articulate
their views.
Estimate An estimate is an assumption based upon a smaller
sample that is then applied as a hypothesis for a
whole population.
Experimental design This is a test whereby the researcher performing it
has control over one, or a few independent variables
and manipulates them to retrieve the most relevant
data.
Exploratory Research Used in the early stages of the decision-making
process, exploratory research is used to assess the
situation in hand with the minimum cost and time
possible. The process must be flexible due to the
unknown quantities that may be encountered.
Versatility and a wide-ranged approach to the
preliminary investigation are key. The exploratory
research can draw on interviews, observations,
group interviews, secondary data sources and case
histories.
F
F Test An F Test explores the probability that a specific
result may have been down to chance and therefore
inconclusive.
Face Validity This is a simple test which ensures that a
measurement or research actually targets the areas
that it should.
Factor Analysis Factor analysis of data identifies the underlying
factors that explain the correlation among a set of
variables. It is used to identify a new, smaller use of
uncorrelated variables to replace the original set of
correlated variables. It also identifies a smaller set
of salient variables for use in subsequent research.
It is used in various types of research including
product research, customer satisfaction, market
segmentation and consumer profiling.
False Accuracy False accuracy occurs when a collection of data gives
the impression of being accurate when in reality,
only a low degree of accuracy exists. This problem
can arise when results are being rounded to fit a
pattern. e.g. rounded to decimal places etc.

Fieldwork Fieldwork can be conducted by observation, surveys

(such as face to face interviews, telephone
interviews and web interviews) or experiments. It is
the basic term for the live collection of primary data
from external sources. Fieldwork is either co-
ordinated by an in-house fieldwork department
within a market research agency or an external
fieldwork company. Once fieldwork has been
conducted data processing is usually the next step.
Focus Groups (aka Group In a focus group, respondents (normally between 8-
Discussions) 10 people) are gathered together in order to gauge
their responses to specific stimuli. Groups are guided
by a research moderator who often uses a topic
guide to control the discussion to ensure it meets
the initial research objectives. The data generated is
probably most applicable to exploratory work. The
technique falls under the broad category of
qualitative research.
Freelance Market These self employed market researchers usually
Researcher work alone. They would typically have many years
experience in the field of work and are contracted by
market research agencies and consultancies in busy
periods, effectively as an extra resource. In many
cases they usually help out with qualitative research
techniques (for example conducting depth interviews
or group discussions) but they can also be utilized
for quantitative research projects.
Frequency Frequency measures the number of times that an
incident or event occurs. Used in data-analysis it
details the number of results that fall in the various
categories.
G
Geographic Geographic refers to any of the various methods
used to sub-divide a list based upon geographic or
political boundaries.
Grid Test This is a method where in which to variables can be
tested for data at any one time.
Group Discussions (aka In a focus group, respondents (normally between 8-
Focus Groups) 10 people) are gathered together in order to gauge
their responses to specific stimuli. Groups are guided
by a research moderator who often uses a topic

guide to control the discussion to ensure it meets

the initial research objectives. The data generated is
probably most applicable to exploratory work. The
technique falls under the broad category of
qualitative research.
Group Dynamics Group dynamics refer to the ways in which people
interact and relate to one-another in a group
situation. A successful moderator will be able to
manage these dynamics to produce a relevant and
informative discussion by employing a variety of
different techniques which obviously involves leading
the conversation away from trivial subjects when
needed.
Growth Rate Usually measured as a percentage, growth rate
records the changing size of a population by plotting
the change from a specific point in time. It is
calculated by dividing the total increase/decrease in
population during a set period by the average
population during that period.
H
Hall Test During this procedure, a number of respondents are
invited to attend a fixed location (traditionally a hall
from which the term originates) and then are asked
to respond to stimuli, usually individually.
Hedonic Scale This scale is used to measure the generalized overall
views and opinions of a product or service.
Heteroscedasticity Heteroscendasticity is the term used to express a
non-consistent variance during regression analysis.
Homogenous Groups Groups in which the people or objects are extremely
similar if not identical in their characteristics or
behavior.
Homoscendasticity The opposite of heteroscendasticity. It is used to
describe a state of constant variance in regression
analysis.
Household The generic term used to describe every person
living in one unit of housing.
I
In-Depth Interview The in-depth interview is a method used primarily in
qualitative research and is carried out upon

individuals (one-on-one), rather than in a group

discussion scenario. The interview takes part with a
researcher and a respondent (the interviewee). The
researcher normally uses a topic guide to guide the
discussion and it usually lasts between 30 minutes
to an hour. The interview helps the researcher to
gain detailed and in depth information.
Incentive This is the payment made to a respondent in return
for their participation in a research project and
depending on the complexity and duration of the
research, the incentive will vary.
Internet Research The use of internet research has grown massively
over the last few years due to the increasing
popularity of the internet for both business and
leisure purposes. The research itself can be
qualitative and quantitative. Quantitative internet
research can be performed as a web-survey where
the respondents reply to questionnaire-based
emails. This means that with the click of a mouse,
the results are back in the inbox of the researcher
ready for analysis. Qualitative research can be
carried out for example by setting up group forums
on the internet, or by setting up group discussions
using web-cams, thus mimicking the conventional
group-discussion format.
Interview An interview is the general term for the method
used to draw information from a respondent. It can
take several guises e.g. Face-to-face, in depth,
telephone interview, group discussion etc.
J
Judgement Sample The term used for a a sample which consists of
respondents who have been selected with the
understanding that their opinions, behavior,
characteristics and so on, will be representative of
the population as a whole.
K
Key Verifying This is the process by which two people input the
same data to avoid the distortion of data at the data
input stage.
L

Laboratory Experiments This term relates specifically to experiments that are

carried out in a controlled environment.
Likert scale The Likert Scale allows the respondent to specify an
opinion on statements which relate to the subject
being researched.
Longitudinal Research Longitudinal research is used in most instances
where continuous performance monitoring is taking
place which consists of repeatedly analyzing a fixed
sample of population elements.
M
Market Research Brief Also known as a research brief, this is a basic
plan which guides the data collection and analysis
phases of the research project. It acts as a
framework which details the type of information to
be collected, the data sources and the data
collection procedure.
Mean A simple average value of a set of data.
Median Another form of averaging data, the median value is
the middle value of a set of data when the data is
arranged in terms of magnitude.
(e.g. 1234567 - Four would be the median.)

Mode An average which is based on identifying the single
most frequent value in a collection of data.
(e.g. 2 ,3 ,2 ,1 ,6 ,5 , 2 , 3 - Two would be the

mode)
Modelling / Simulation The application of specific assumptions to a number
of variable factors and the relationships that exist
between those factors. Usually used in hypothetical
situations, the models can be verbal, mathematical
or graphical.
Monadic Rating This is where a respondent is asked to rate an item
on some form of scale without comparison to
another product or service.
Multiple-Choice Question A question which the respondent may only answer

within a predetermined selection of responses.

N
Nominal Scale A scale of measurement in which numbers don't
represent value, instead their use is to label or
categorize an object or an event.
Non-Balanced Scale A non-balanced scale is one which is unfairly
weighted towards one end of the response spectrum
and can often cause response bias.
Non-Probability Sample This is a sample which has been chosen with
disregard in terms of representing the wider
population.
Non-Sampling Error A non-sampling error refers to any cause of bias or
inaccuracy with the results that has been caused by
anything other than the sampling.
O
Observation Observation is the collection of data through non-
verbal means which can stand alone as research on
its own or compliment other relevant research. It is
the process of recognizing and recording relevant
objects and happenings.
Observational Methods These allow the recording of behavior when it takes
place therefore allowing for the elimination of such
errors associated with the recall of behavior. It
allows for the data to be more precise with less
margin for error.
Omnibus Surveying An omnibus survey covers a variety of topics,
usually for different clients. The majority of the
samples used are generally nationally representative
and are made up of people who are in general
demand. The charges for this element of research
are based upon the size of the questionnaire
produced in either space or questions required. It is
a form of cost effective quantitative research -
clients can share much of the cost of the survey.
Questions normally cost around £1,000 each.
Open-Ended Question A question to which the answer has no definite
response.
P
Panels A panel is a long-standing sample that is retained by

a market research agency from which data can be

attained. It is most useful for continuous research
whereby the same set of respondents are used on a
continuous basis over time.
Performance-Monitoring Once a marketing plan has been implemented,
Research performance monitoring research simply monitors
the progress of the plan. It ensures that the
progress of the program is as planned as a deviation
from the initial projected route can lead to the plan
being carried out improperly or the results pattern
inexplicably changing.
Pie Chart A circle which is divided into segments which vary in
size relating to the value which they represent.
Postal Research Postal research is carried out by sending
questionnaires or journals, which are to be
completed and returned, through the post. It is
perhaps the most traditional form of data collection.
The main drawback with postal research is response
rates - it is difficult to achieve a high response rate.
Projective Technique These are used by moderators or research
interviewers. They are used in interview
situations when the interviewer will use the
projective technique to enable the respondent to
distance themselves from their personal views. It
enables the researcher to obtain 'the real views' that
perhaps wouldn't be revealed without using the
interview technique.
Q
Qualitative Research Qualitative research is in-depth research performed
on a small scale to provide detailed in depth results
and data. Qualitative research can be performed
over the phone, via group discussions or through
one-to-one interviews. Discussions are normally
aided by using a 'topic guide' - this outlines the basic
structure of the interview/group and indicates the
general direction in which the interview/group
should be led. Questions are of open nature as
opposed to closed questions which are used in
quantitative research.
Quantitative Research Quantitative research is performed on a far larger
scale compared with qualitative research (in terms

of the sample size) and helps to provide accurate

statistical data from which conclusions can be
drawn. Questions tend to be closed as opposed to
open.
Questionnaire This contains a group of questions and is used as an
investigative market research tool in order to gain
information from a respondent. It is used mainly in
quantitative research. Questionnaires normally
contain closed questions but open questions can also
be included.
Quota Sample A quota sample is one which has been selected to
represent the population, sometimes incorporating
certain control characteristics.
R
Research Design Also known as a market research briefing, this is a
basic plan which guides the data collection and
analysis phases of the research project. It acts as a
framework which details the type of information to
be collected, the data sources and the data
collection procedure.
Research Report The research report is the compilation of findings
from a piece of research. These findings are
normally presented in the form of a report or a
PowerPoint document.
Respondent A respondent is the person whose views and
opinions are required by the researcher.
Response Bias Response bias is the term for inaccurate response
data which could have been caused by a number of
factors such as monotony, tiredness, aiming to
please etc.
S
Sample In research terms, a sample refers to a group of
interviewees or respondents who are chosen to
represent the population as a whole. The sample
provides the data within the market research
project.
Secondary Research Secondary research utilizes data sources that
already exist. The technique used to collect the
information is called desk research (for a definition
please see desk research). Using the web or visiting

libraries would be two forms of desk research.

Semiotics Used in research, semiotics is a type of social
description and analysis which places specific
emphasis upon an understanding and exploration of
the cultural context within which the particular work
is taking place.
Social Grade/Socio- Socio-economic groupings are a way of separating
Economic Grouping various sections of society. There are six social
grades in the UK socio-economic grouping system.
These are:
 A: Upper middle class. Higher level managers,

administrators and professionals.
 B: Middle class. Intermediate manager's
administrators and professionals.
 C1: Lower middle class. Supervisory or clerical
workers and junior manager's administrators
and professionals.
 C2: Skilled working class. Skilled manual
workers.
 D: Working class. Semi and unskilled manual
workers.
 E: Those at the lowest level of subsistence.
State pensioners or widows (no other
earnings), casual or lowest grade workers.
Statified Sampling A sampling procedure which is carried out in two

stages. The population is divided into two separate
groups, mutually exclusive and collectively
exhaustive, and a random sample is taken from
each of the two.
Survey The systematic gathering, analysis and elucidation of
information about any aspect of study. In reference
to market research it specifically applies to the
gathering of information through processes such as
sampling or interviews with selected candidates,
respondents or interviewees.
T
T-Test When a sample is too small to use a Z -test to
analyze the results, a T-test is used to analyze a
single mean value.

Tabulation Tabulation is simply placing the results and data

obtained from research into data tables. (See cross
tabulation)
Target Population The basic term for the population that is being
studied.
Test Market A test market is used as a trial market for a new
product or service.
Time series analysis During this form of analysis, data is recorded at set
intervals in time.
Topic Guide A topic guide is a brief guide which is used by a
researcher in qualitative research interviews or
groups discussions. It enables the interview to retain
relevance and ensures that the project meets the
initial research objectives.
Tracking Tracking refers to studies that monitor consumer
behavior towards a product, brand or service over a
period of time and is carried out over a
longer timescale than ad-hoc research.
Trimmed Mean The trimmed mean is obtained by taking a
percentage of the high and the low levels of data
and then finding the mean.
Two Way Focus Groups This is where one focus group observes another and
then discusses what they have taken from the
observation.
U
Unbiased Estimator An unbiased estimator is when the mean of the
sample distribution and the estimated distribution is
equal.
Unbiased Samples These are samples that derive from an unbiased
source thus ensuring that any sampling error is a
result of the randomness and nothing else.
Unidimensional scaling These are procedures that are designed purely to
measure only one single attribute of an object or
respondent.
Unipolar This is an ordinal scale which has a positive end and
a negative end.
Unstructured observation Unstructured observation is where a study takes
place with an observer or moderator simply

watching and taking notes on the behavior on

display.
Unstructured This is the process of market segmentation which
Segmentation uses data and analysis when no previous ideas or
views are held regarding the segmentation of that
particular market in any way.
V
Validation A post-research essential; Validation in the process
by which a respondent verifies that the interview
was conducted correctly.
Validity Validity in reference to market research is concerned
with whether the purpose of the research was
fulfilled accurately.
Verbatim A verbatim is a record of actual spoken comments

by a respondent. These are used to back up any
research findings in the final report.
Video Focus Groups These are focus groups that instead of meeting face-
to-face, carry out the group via a video conferencing
link.
Viewing Facility Viewing facilities are the premises used for
conducting group discussions or depths interviews in
qualitative research. These premises can contain
rooms where when in, the respondent can be
observed by the client using one-way mirrors or via
a video link. It is thought that because the client is
not physically present in the discussion, the
respondents may be less inhibited and more honest
and open about their opinions and feelings.
W
Word Association A technique in which the respondent is confronted
with a word and is asked to respond with the word
that immediately comes to mind.
Z
Z-Test A test which draws comparisons between a sample
test and a hypothesized test. It is used when
working with interval data and any data on a large
scale.

References
1. http://www.survey-analysis.com/glossary.html
2. http://en.wikipedia.org/wiki/Secondary_data
3. http://writing.colostate.edu/guides/research/survey/com4c2a.cfm
4. www.cambridge.org


Assignment # 02: Course

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Assignment # 02: Course

Hochgeladen von

Copyright:

Verfügbare Formate

Business Research Methods 5599

Approved Study Center

Level Executive MBA/MPA (3rd) Semester: SPRING 2010

1 Muhammad Idrees 0300 6719422

I would like to thank first of all my teacher Mr.Mazhar Iqbal

2 Muhammad Idrees 0300 6719422

3 Muhammad Idrees 0300 6719422

An Introduction to Secondary Data Analysis

What are Secondary Data?

4 Muhammad Idrees 0300 6719422

analysis did not participate in either the research design or data

As an example of the same data set serving as both primary and

5 Muhammad Idrees 0300 6719422

through an online interface that allows the quick computation of basic

Advantages and Disadvantages of Secondary Data Analysis

The choice of primary or secondary data need not be an either/or

6 Muhammad Idrees 0300 6719422

advantages and disadvantages of working with secondary, as opposed

The first major advantage of working with secondary data is economy:

The second major advantage of using secondary data is the breadth of

7 Muhammad Idrees 0300 6719422

examine the changes in health status and health behaviors in the

One major disadvantage to using secondary data is inherent in its

8 Muhammad Idrees 0300 6719422

A second major disadvantage of using secondary data is that because

9 Muhammad Idrees 0300 6719422

Locating Appropriate Secondary Data

There is a vast quantity of secondary data in epidemiology and public

10 Muhammad Idrees 0300 6719422

1. Define the question you want to study; for

2. Specify the population you want to study. Are you

3. Specify what other variables you want to include in

4. Specify what kind of data is most appropriate for

11 Muhammad Idrees 0300 6719422

interviews? Also, specify if there are any specific

5. Create a list of data sets that include information

6. Once you have chosen your data set, examine the

12 Muhammad Idrees 0300 6719422

Alternatively, you can begin by simply looking at the variables included

13 Muhammad Idrees 0300 6719422

both questions can be found on the BRFSS website

Questions to Ask About Any Secondary Data Set

14 Muhammad Idrees 0300 6719422

The question of determining the original purpose of the project that

It is almost impossible to know too much about the data collection

15 Muhammad Idrees 0300 6719422

been collected in the first 3 months of 2004 or over a 4-year period

16 Muhammad Idrees 0300 6719422

recoding has been done, is it possible to restore the original values?

Sources of secondary data

 Quantitative: Census, housing, social security as well as electoral statistics

 Qualitative: Semi-structured and structured interviews, focus groups

17 Muhammad Idrees 0300 6719422

Furthermore, secondary data can also be helpful in the research design of

Secondary analysis or re-use of qualitative data

In the secondary analysis of qualitative data, good documentation can not be

Overall challenges of secondary data analysis

18 Muhammad Idrees 0300 6719422

secondary data analyst to engage in the habitual process of making observations

Furthermore, in the case of qualitative material, primary researchers are often

2. What data was collected and what is it supposed to measure?

3. When was the data collected?