Translation and Validation of Study Instruments

for Cross-Cultural Research
Department of Gastroenterology, Soroka University Medical Center, and Faculty of Health Sciences, Ben-Gurion University of the Negev,
Beer-Sheva, Israel

Cross-cultural research often involves physicians,

nurses, and other health care providers. In studies of
fecal and urinary incontinence, cross-cultural research
has been applied to quality-of-life comparisons, and instruments have been translated to foreign languages for
use in other countries. This report presents some of the
principal methodological issues and problems associated with translating questionnaires for use in crosscultural research in a manner relevant to clinicians and
health care practitioners who are aware that, unless
these potential problems are addressed, the results of
their research may be suspect. Translation is the most
common method of preparing instruments for crosscultural research and has pitfalls that threaten validity.
Some of these problems are difcult to detect and may
have a detrimental effect on the study results. Identication and correction of problems can enhance research
quality and validity. A method for translation and validation is presented in detail. However, the specic
validation method adopted is less important than the
recognition that the translation process must be appropriate and the validation process rigorous.

esearch projects often involve study populations in

more than one cultural or ethnic group. This is
sometimes simply due to the need for large study populations (e.g., in clinical drug trials). In other instances,
the description of cultural differences is the primary goal
of the research. Cross-cultural research has been applied
for years in the social sciences and its importance has
gained recognition in the health sciences, especially with
the growing role of health-related quality-of-life research. Cross-cultural research has also been performed in
epidemiologic studies; the study of health-related beliefs,
attitudes, and behaviors; health administration; and
health economics.
In the areas of fecal and urinary incontinence, numerous scales of symptom frequency and symptom severity
have been formulated and tested.1,2 The International
Continence Society recommended inclusion of qualityof-life measures in all urinary incontinence studies,3 and
disease-specific quality-of-life instruments have been de-

veloped.4,5 However, only a few instruments have been

translated and validated in other languages for use in
cross-cultural studies.6 9
Because of the unique complexities of cross-cultural
research, it is important for physicians, nurses, and other
health care providers to gain an understanding of the
basic concepts, considerations, and methodological problems entailed. The purpose of this report is to outline the
rationale and methodology of translation and validation
of questionnaires for use in cross-cultural research.

The Problem
Cross-cultural research has specific methodological problems, most relating to translation quality and the
comparability of results in different cultural and ethnic
groups. These technical traps may lead to erroneous
research conclusions that, although due to methodological flaws, are undetectable as such and considered to be
substantive in nature.
It is not enough to translate a questionnaire literally.
The additional challenge is to adapt it in a culturally
relevant and comprehensible form while maintaining the
meaning and intent of the original items.
Unfortunately, translation of a study instrument such
as a questionnaire is often an afterthought, treated as an
unimportant part of the study protocol and implemented
without attention to the critical issues involved. Some
clinicians are unaware that a problem exists. Even those
who are aware of the problem find its solution daunting.
The process of translating and adapting a questionnaire
for a different cultural group can be arduous and requires
a considerable investment of time and money. However,
unless this process is successfully implemented, the validity of the research results may be suspect.

There are 2 sets of baseline circumstances for
instrument development in cross-cultural research. In
the first, a research instrument is developed de novo for
use in 2 or more languages and can be molded in an
ongoing reciprocal process. An assumption underlying
this approach is that neither language is primary (no
source language). It allows for greater creativity and
provides the opportunity to align the 2 versions more
In the more usual set of circumstances, clinicians do
not develop new instruments but use previously validated instruments, generally questionnaires that are
adapted for use in another target culture and language.
These questionnaires cannot be changed in any fundamental way. Most questionnaires are translated from
English, so there is a potential problem of ethnocentricity or what has been termed cultural hegemony in crosscultural research.
There are different approaches to the translation process. In the simplest and possibly the most common
method, a questionnaire is translated (often by unqualified translators) and the translated version is used without further validation. Another approach involves translation by committee. In this case, 2 or more translators
work separately or together to produce a consensus questionnaire. Another approach is the back-translation
method. In this case, a questionnaire is translated into
the target language by one translator and then translated
back into the source language by an independent translator who is blinded to the original questionnaire. The 2
source-language versions are then compared.
The translation process requires skill, knowledge, and
experience. There are critical translation problems that
adversely affect many studies, even when professional
translators are used.10 Some translators are not sufficiently aware of the rigorous requirements of translation
for cross-cultural research. They may spend time on
literal translation without devoting enough attention to
cultural nuances. Colloquial phrases, slang and jargon,
idiomatic expressions, and emotionally evocative terms
may be particularly difficult to handle.
There are potential cultural differences in the interpretation of many terms. For example, the term family
may be interpreted in a similar manner across cultures,
but the term adolescence may not.11 However, even the
term family entails potential difficulties. In some cultures, family may refer primarily to first-degree relatives,
whereas in other cultures the interpretation may be much
broader. Female and male are universal concepts, but



the closely related terms femininity and masculinity

may be interpreted very differently in some cultures.12
Two examples of problems that the author has encountered in the process of preparing questionnaires for crosscultural research may be illustrative. In translating the
Rating Form of IBD Patient Concerns13 into Hebrew,
the question about patients concerns relating to difficulty getting health insurance was included. The validation process (see following text) showed that the translation was well done. However, when the questionnaire
in Hebrew was used in a pilot study in Israel, we realized
that patients did not understand the question. Basic
health insurance is universal in Israel and is unaffected by
health status. Patients who answered the questionnaire
did not understand why they should be concerned about
this problem. Therefore, the translation was good but the
item was culturally irrelevant and had to be either deleted or rephrased to reflect Israeli reality.
In another case, an Arabic version of the Rome II
criteria questions for irritable bowel syndrome was prepared. After completing the translation process, the
translated questionnaire was given to Israeli Bedouin and
non-Bedouin Arab physicians for review. These physicians said that the term used for bowel movement, a
critical term in this setting, would be understood by
non-Bedouin Israeli Arabs but would not be understood
by 80%90% of Israeli Bedouins, who use a different
Arabic term for the same thing. The translation was
literally good but confusing when applied to a different
cultural group.
Translators are not always sufficiently knowledgeable
in the specific subject area of the instrument. Specialized
medical subjects are an example of this type of difficult
content area. Good professional translators are often incapable of translating medical material.

Fine Tuning of Translated

The back-translation technique is preferred even
though it is time consuming and can be expensive.
However, it also has potential traps. Good translators can
achieve a back translation that is similar to the source
even though the original translation is not good.14 This
accomplishment provides a methodological disservice.
It can occur because back translators may intuitively
make sense of poorly written language, in effect, correcting it. They also may retain the grammatical form of the
original language in the translation, making back translation grammatically easy but at the same time masking
the critical differences between the 2 versions. For example, the item Do you sometimes feel fed up? is




translated as Do you sometimes feel that your stomach

is full? The back translator immediately identifies the
mistake. However, instead of pointing it out to the
researchers, it is simply corrected by back translating it
into the original phrasing. The researchers who are presented with 2 identical English versions can only conclude that the translation is excellent and leave the
critically faulted target-language version unchanged.
When the translation process is complete, many researchers go directly to implementation of the study.
However, 2 important stages remain. The first is validation of the translation, which is discussed as follows. The
second is psychometric analysis of the instrument in its
new versions, which is beyond the scope of this report.

Validation of the Translation

Several methods can be used to validate translation; none is fail-safe. One method is evaluation by teams
of experts, bilinguals, or focus groups of potential research subjects. In one variation, the instrument in its
original version and its source version are given to bilingual persons in alternating language order and assessed
accordingly. The use of bilingual subjects for pretesting
also creates methodological problems. The translated
instrument is intended ultimately for monolingual subjects. Bilingual individuals often adopt some of the concepts, values, attitudes, and role expectations of the
culture of the second language that they have mastered.
Thus, bilinguals represent a separate population whose
responses cannot be automatically generalized to the
monolingual target population.
We developed an innovative method for validating the
translated instrument15 and have used it to validate
translated questionnaires on irritable bowel syndrome,
inflammatory bowel disease, and health-related quality of
life. This approach helps identify problem questions that
can then be revised. We do this by introducing a new
step into the translation validation process in which we
formally compare the original source-language version
with the back-translated source-language version. Each
item in the 2 versions is ranked in terms of comparability
of language and similarity of interpretability. The ranking is performed by at least 30 raters who are fluent in
the source language. The raters are independent of the
investigators and do not include the translators. This
process enables us to identify potentially problematic
items and reassess and retranslate them until we are as
confident as possible that the item will be interpreted in
the same manner in both languages. Figure 1 shows a
flow diagram of the process. This technique is described
in detail as follows.

Figure 1. Flow diagram of the translation (phase 1) and validation

(phase 2) processes. The mean comparison scores at each stage
determine the number of times that phase 2 is repeated.

Testing Comparability and

Our method uses 2 measures of comparison to
evaluate the success of the translation process: comparability of language and similarity of interpretability. Likert
scales ranging from 1 (extremely comparable/extremely
similar) to 7 (not at all comparable/not at all similar) are
used. Table 1 shows the rating sheet used for this evaluation. Comparability of language refers to the formal
similarity of words, phrases, and sentences. If the questions are judged to be identical or extremely comparable in
language, they are scored 1. Similarity of interpretability
refers to the degree to which the 2 versions engender the
same response even if the wording is not the same.
In theory, back-translated items may differ from their
counterparts in the original questionnaire in the linguistic form they assume and the meaning they convey. The
ideal is that corresponding items have similar meanings
and similar forms of language. Similarity of meaning,

Table 1. Comparability/Interpretability Rating Sheet15

Please circle the response which most closely represents how you would rate the following pairs of items in terms of:
(A) Comparability of language (how comparable is the formal wording?) and
(B) Similarity of interpretation (would the paired items be interpreted similarly, even if the wording is different?).
Please circle only one response for (A) and one response for (B) for each pair of items.
Original English version

Back-translated English version




even at the expense of similarity of form, is much more

desirable than the opposite. Form may be intentionally
varied to guarantee equivalence of meaning. Accordingly, we assess perceived similarity of form and meaning
separately to enhance the distinctness of the dimensions.
Otherwise, ratings may reflect overall similarity, combining meaning and form.

Table 2 shows several sample item pairs (original and

back-translated versions) from the Abuse History Screening Questionnaire translated into Hebrew and the mean
score for each item pair in each of the 2 dimensions. Any
mean score 3 (7 is worst agreement; 1 is best agreement) necessitates a formal review of the translation. Any
mean score between 2.5 and 3 in the interpretability

Table 2. Sample Items With Mean Comparison Score for Each Item Pair

Original English version

Back-translated English version

We now know that many people

have unwanted sexual or
violent experiences as
children or adults. Some of
these are with playmates or
friends, and some with
relatives or acquaintances.

We know that people have many

unwanted sexual
experiences or experiences of
violence both as children and
adults. Some of these
experiences occurred with
playmates or friends and
some with relatives or
It is possible that some of
these experiences are so
disturbing that they are not
discussed with anyone.
Sometimes they are forgotten
for long periods of time and
sometimes they rise to the
conscious level frequently.
Have you ever before discussed
these experiences with
anyone (indicate all of the
answers that apply to you)?
YES, with a member of my
YES, with a rabbi or other
spiritual advisor
Are you presently undergoing
treatment with a professional
consultant concerning these
troubles or any other sensitive

These experiences may be so

upsetting that they may not
be discussed with anyone.
Sometimes they are forgotten
for long periods of time, and
sometimes they are frequently
brought to mind.
Have you ever discussed these
experiences with anyone
before (indicate all answers
that apply to you)?
YES, with a family member
YES, with a rabbi or lay
Are you now seeing a counselor
for these or any other
emotional concerns?

Comparability of language
(mean score)

Similarity of
(mean score)

















column is also considered problematic and is reviewed for

possible correction. At times, even if the mean scores are
good, a problem is identified and evaluated. For example,
in the first item, the original version is worded many
people have unwanted, while the back-translated version states that people have many unwanted. These are
clearly not the same even though the mean scores for
comparability and similarity were 2.14 and 1.93, respectively. On reevaluation, it was found that the Hebrew
was loyal to the original English and the problem lay in
the back translation, so the Hebrew version was left
The item Sometimes they are forgotten for long
periods of time, and sometimes they are frequently
brought to mind was translated to Sometimes they are
forgotten for long periods of time and sometimes they
rise to the conscious level frequently. The mean score for
comparability of language was 2.77, indicating that the
reviewers believed there was a discrepancy in language,
as indeed there was. However, the mean score of similarity of interpretability was 1.93, indicating that despite
the difference in formal language, the reviewers did not
believe there was a serious interpretation problem. Review of the Hebrew version convinced us that there was
no serious problem with the translation. The item Are
you now seeing a counselor for these or any other emotional concerns? received poor mean scores for both
dimensions, indicating a serious problem with the translation. The translation was revised and the revised item
was back translated until the mean scores indicated a
valid version.












This report presents methodological issues associated with translating questionnaires for use in crosscultural research in a manner relevant to clinicians and
health care practitioners. Translation is the most common method of preparing instruments for cross-cultural
research. It has pitfalls that threaten validity. Some of
these problems are difficult to detect and may have
detrimental effects on study results.
A method of translation validation is presented in
detail as an example. The specific validation method
adopted is less important than the recognition that the
translation process must be appropriate and the validation process rigorous.

