Sie sind auf Seite 1von 10

medical education in review

Conducting systematic reviews in medical education: a

stepwise approach
David A Cook1,2 & Colin P West2,3

OBJECTIVES As medical education research

continues to proliferate, evidence syntheses will
become increasingly important. The purpose of
this article is to provide a concise and practical
guide to the conduct and reporting of systematic reviews.
RESULTS (i) Define a focused question
addressing the population, intervention, comparison (if any) and outcomes. (ii) Evaluate
whether a systematic review is appropriate to
answer the question. Systematic and nonsystematic approaches are complementary; the
former summarise research on focused topics
and highlight strengths and weaknesses in
existing bodies of evidence, whereas the latter
integrate research from diverse fields and
identify new insights. (iii) Assemble a team and
write a study protocol. (iv) Search for eligible

studies using multiple databases (MEDLINE

alone is insufficient) and other resources
(article reference lists, author files, content
experts). Expert assistance is helpful. (v)
Decide on the inclusion or exclusion of each
identified study, ideally in duplicate, using
explicitly defined criteria. (vi) Abstract key
information (including on study design, participants, intervention and comparison features, and outcomes) for each included article,
ideally in duplicate. (vii) Analyse and synthesise
the results by narrative or quantitative pooling,
investigating heterogeneity, and exploring the
validity and assumptions of the review itself. In
addition to the seven key steps, the authors
provide, information on electronic tools to
facilitate the review process, practical tips to
facilitate the reporting process and an annotated bibliography.

Medical Education 2012: 46: 943952

Discuss ideas arising from this article at discuss

Ofce of Education Research, Mayo Medical School, Rochester,
Minnesota, USA
Division of General Internal Medicine, Mayo Clinic, Rochester,
Minnesota, USA
Division of Biomedical Statistics and Informatics, Mayo Clinic,
Rochester, Minnesota, USA

Correspondence: David A Cook MD, MHPE, Division of General Internal

Medicine, Mayo Clinic College of Medicine, 200 First Street SW,
Rochester, Minnesota 55905, USA. Tel: 00 1 507 266 4156;
Fax: 00 1 507 284 5370; E-mail:

Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943952


D A Cook & C P West


As medical education research continues to proliferate, syntheses of this evidence will become increasingly important. Systematic reviews play a critical role
in this process of synthesis by identifying and
summarising published research on focused topics
and highlighting strengths and weaknesses in that
field. Although some have criticised systematic
reviews as engendering false confidence in their
objectivity and freedom from bias,1 others have
argued for a more balanced role.2 Of the 108
applications for funding to conduct a literature
review received in the past 2 years by the Society of
Directors of Research in Medical Education, over half
have proposed systematic reviews. As chair of the
review committee, author DAC has observed that
many applicants fail to anticipate the key actions
required in a rigorous review. It appears that guidance on how to conduct a high-quality systematic
review in medical education is needed. Although a
number of books have been published on systematic
reviews, and previous articles in the medical education literature have highlighted challenges and
provided brief tips,35 we are not aware of any concise

Table 1

guidelines offering a structured approach to planning, conducting and reporting a systematic review in
medical education.
The purpose of this article is to provide a concise
and practical guide to the conduct and reporting of
systematic reviews, with particular attention to issues
affecting medical education research. Table 1
summarises the key steps. An annotated bibliography (Appendix S1, online) lists resources that
elaborate on each of the methods that will be
described. At each step, we will illustrate the
principles involved using text drawn from a recent
review of simulation-based education published by
the first author.6


Define a focused research question

The first step in conducting a systematic review is to
identify a focused question. This can be more
challenging than it would first appear. A good
question usually evolves from discussions with

Steps in the review process

1 Define a focused question

Consider Population, Intervention, Comparison, Outcomes
2 Evaluate whether a systematic review is appropriate to answer the question
3 Assemble a team and write a protocol
4 Search for eligible studies
Identify information sources: indexing databases; previous reviews; reference lists; author files, and experts in the field
Define search terms
5 Decide on the inclusion or exclusion of each identified study
Define inclusion and exclusion criteria; pilot-test and refine operational definitions
Define restrictions
Stage 1: review titles and abstracts in duplicate; err on the side of inclusion
Stage 2: review full text in duplicate; resolve disagreements by consensus
6 Abstract data
Define data abstraction elements; pilot-test and refine operational definitions
Abstract data in duplicate; resolve disagreements by consensus
7 Analyse and synthesise
Focus on synthesis: organise and interpret the evidence while providing transparency
Pool results through narrative or meta-analysis
Explore strengths, weaknesses, heterogeneity and gaps
Explore the validity and assumptions of the review itself


Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943952

Conducting systematic reviews

collaborators and undergoes multiple iterations
before reaching its final form. The PICO mnemonic
often used in clinical evidence-based medicine can
also be helpful for systematic review questions. This
mnemonic requires specification of the population,
intervention (or other review topic, such as assessment tool), comparison interventions (if any), and
outcomes of interest. For example, a focused question
might ask:
P: in health professionals, is
I: training using simulation technologies,
C: in comparison with no intervention,
O: associated with improved knowledge, skills,
behaviours or patient effects?
The terms used in the question may require further
definition. For example, what are simulation technologies? Do veterinarians count as health professionals? What is the difference between a behaviour
and a patient effect? Thus, although the focused
question should be stated clearly and concisely
usually as a single sentence within the PICO
framework it will often require further elaboration
to be truly useful.
Researchers often have some idea of what the results
of a systematic review may show, based on relevant
theories, previous reviews addressing related questions, and their (non-systematic) familiarity with the
literature. It is often appropriate to make such
predictions explicit in the form of anticipated
results or formal hypotheses.
Although deceptively brief, the importance of a clear
question cannot be overstated. It will establish the
framework for every step that follows.
The authors of the review of simulation-based
education sought to answer two questions: (i)
To what extent are simulation technologies for
training health care professionals associated
with improved outcomes in comparison with
no intervention? and (ii) How do outcomes
vary for different simulation instructional
designs?6 Relevant outcomes were defined in
the subsequent paragraph as learning (knowledge or skills in a test setting), behaviours (in
practice), or effects on patients. The first
author framed a preliminary version of this
question prior to assembling the study team.
The scope and wording of the question evolved through iterative discussions
amongst team members before its final form
was established.

Evaluate whether a systematic review is an

appropriate way of answering the question
There are both advantages and disadvantages to the
systematic review process in comparison with other
review approaches.2 The issue is, in many respects,
analogous to debates regarding quantitative and
qualitative research in general. Non-systematic
reviews integrate research from diverse fields and
identify new insights,1 whereas systematic reviews
summarise research on focused topics and highlight
strengths and weaknesses in existing bodies of
evidence.7 Much like a lighthouse shining over the
ocean, systematic reviews illuminate our understanding of a focused question, but leave other
issues in the dark: The very rules that enhance the
systematic reviews rigor blind the researcher to
ideas outside the scope of the focused question and
resultant search strategy.2 Thus, systematic and nonsystematic approaches are perhaps best viewed as
complementary, and the decision to conduct a
systematic review should be based on the overall
study objective.
Before embarking on a review systematic or
otherwise the reviewers should carefully consider
the strengths and weaknesses of existing reviews on
that topic. In order to contribute to the literature, a
new review must fill a meaningful gap in published
reviews and add significantly to current knowledge, in
terms of either quality or data. For example, a new
systematic review that adds one small study to an
existing high-quality systematic review of a dozen wellexecuted studies is unlikely to be useful.
Finally, prospective authors should realise that a wellconducted systematic review requires much time and
effort. Systematic reviews do not represent quick and
easy research.
The first author of the review of simulation-based
education6 conducted a thorough search to identify previous reviews, and found only one comprehensive review (8 years old) in the field and
only one meta-analysis (on a narrow subtopic).
The investigating team anticipated that there
would be substantial heterogeneity among different studies, but felt that a meta-analytic summary
would be useful provided they paid close attention
to the comparison group (e.g. active intervention
versus no intervention), and performed subgroup
analyses to explore heterogeneity when found.
The investigators conducted a preliminary search
to estimate the number of eligible articles and thus
anticipate the time commitment required.

Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943952


D A Cook & C P West

Assemble a team and write a protocol

Search for eligible studies

Assemble a team

The comprehensive identification of relevant studies

is a hallmark of a systematic review. Much has been
written on the conduct of literature searches811 and
Maggio et al.12 have proposed 10 criteria that collectively define an ideal literature search. However, in
broad terms there are two key questions to consider.

Systematic reviews are a team activity. Choosing the

right team may be one of the most important
decisions in the entire review process. Although
consensus on definitions and coding decisions is
necessary for a review to move forward, a diversity of
perspectives helps to enrich discussions and ultimately enhances the quality and generalisability of
the review. At least one member of the team should
have experience with systematic reviews and ideally
one member (often a medical librarian) should have
experience in conducting literature reviews.
Because systematic reviews are hard work, it is helpful
to provide an idea of the scope of the project and
expected workload early in the process. Team members who lack the necessary time, commitment or
expertise may contribute to frustrating delays in the
research plan. Thus, effectively matching project
needs with available resources is important.
Write a protocol
As with any research activity, a project protocol is a
crucial element that provides both rigor and guidance
during the process. The protocol should be written
during or immediately after the writing of the focused
question. The protocol incorporates specific plans for
each of the elements of a successful systematic review,
listed in Table 1 and described in greater detail
herein. The protocol may be revised as the project
progresses and more is learned about the study
question, but the ability to reference a core protocol
document allows modifications to be tracked and
applied reproducibly to all steps of the review.
The first author of the review of simulation-based
education6 deliberately assembled a multidisciplinary team, including internal medicine doctors, surgeons, PhD education researchers, and
an experienced research librarian, to ensure that
diverse opinions and skills were reflected in the
review process. The anticipated scope, objectives,
workload, timeline and products were discussed
and agreed upon prior to beginning. Some of
those invited to join the team declined because
they could not commit to the deadline. A written
protocol including the focused question, initial
search terms, plans for further defining the
search strategy, inclusion and exclusion criteria,
key data abstraction items, and the analysis plan
was finalised prior to beginning the review.


Firstly, what sources of information will be used? A

comprehensive systematic search will interrogate
multiple information sources in an attempt to uncover
all eligible studies. MEDLINE is typically used, but
alone will usually be insufficient because the overlap
between MEDLINE and other databases is incomplete.8 Other indexing databases include EMBASE,
Scopus, PsycINFO, Web of Science, CINAHL (Cumulative Index to Nursing and Allied Health Literature
[for nursing]) and ERIC (Educational Resources
Information Centre [for education studies]). In
addition to these indexing databases, it is useful for
reviewers to look for relevant articles in their own files
and to contact experts in the field for further relevant
publications. The references cited in previous reviews
on the topic can be used both as a verification step (see
below) and to supplement gaps. Finally, a hand search
of the references cited in included articles may reveal
studies that were missed in the search but are known to
other authors in that field.
Secondly, what search terms will be used to query
these information sources? The development of a
thorough search strategy requires knowledge of
appropriate indexing terms [e.g. medical subject
headings (MeSH)], qualifiers and logical operators,
all of which vary from one indexing database to
another. For this reason, input from an expert in
literature searches, such as a research librarian, can
be invaluable. The sensitivity (ability to identify
relevant articles) of a preliminary search strategy
should be verified by ensuring that known relevant
articles (e.g. articles known to the reviewers or cited
in a previous review or seminal work) are identified
using the planned keywords. Reviewers should seek
new keywords in any omitted articles to improve the
search strategy in subsequent iterations.
A more sensitive search usually identifies more
ineligible articles (i.e. it is less specific). Finding the
right balance usually requires several iterations, and
benefits from expert assistance.
All articles identified in the search, including those
excluded at later stages, should be assigned a unique
identification number. The complete strategy,

Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943952

Conducting systematic reviews

including specific search terms for each indexing
database, the databases and other sources searched,
search dates and all search results should be carefully
archived for subsequent reporting.
Using the initial search terms as a starting point,
the research librarian and the first author of the
review of simulation-based education6 worked in
collaboration to identify a comprehensive search
strategy. The search sensitivity was evaluated by
comparing the articles identified against those
already known to the authors and those cited in
previous seminal reviews. If an article was missed,
the title and abstract were carefully reviewed to
identify terms that would improve the sensitivity.
The search was repeated, with adaptations as
PsycINFO, ERIC, Web of Science and Scopus. In
addition, all references cited in several seminal
reviews, and the entire table of contents of two
key journals, were added to the list of articles.
Finally, the reference lists of randomly selected
articles were hand-searched to identify additional
articles; this continued until no additional articles were identified. The unabridged search
strategy was published as an online appendix.
Decide on the inclusion or exclusion of each
identified study
After identifying a pool of articles, reviewers include
or exclude articles based on predefined inclusion and
exclusion criteria. These criteria typically emerge
naturally from the focused question and, again, the
PICO framework can often be used to help to define
the population (e.g. medical students), intervention
(e.g. problem-based learning), comparison (e.g. no
intervention) and outcome (e.g. any learning outcome). Study design (e.g. any comparative design
or only randomised trials) may also be considered,
although a more inclusive approach can allow
subsequent evaluation of whether results differ
depending on study design.
Additional restrictions may be placed on included
work. Reviewers will occasionally exclude articles
based on language (e.g. by excluding non-English
publications), publication date (e.g. by excluding
articles older than 20 years), length (e.g. by excluding abstracts) and rigor of peer review (e.g. by
excluding graduate theses, papers presented at
meetings, and other unpublished works which are
collectively termed grey literature). Decisions about
restrictions are best made on conceptual grounds
rather than convenience. For example, in a review of

Web-based learning it made conceptual sense to

begin the search at a date subsequent to the
development of the World Wide Web.13 By contrast,
there is rarely a good conceptual reason to limit the
search to English-language publications only, because excellent research is often published in other
languages.6 The inclusion of grey literature is more
controversial; some correctly argue that non-peerreviewed research may be of inferior quality, but
others correctly argue that such studies can still,
when properly analysed, contribute importantly to
evidence-based decisions.
Defining inclusion and exclusion criteria
Regardless of the actual criteria selected, it is important to clearly define these criteria both conceptually
(often by using a formal definition from a dictionary,
theory or previous review) and operationally (by using
detailed explanations and elaborations that help
reviewers recognise the key concepts as reported by
authors in published articles). Although some operational definitions will be defined from the outset,
many of these may actually emerge during the process
of the review as reviewers come across articles of
uncertain inclusion status. Such cases should be
discussed by the group with the goal not only of
deciding on the inclusion or exclusion of that article,
but also of defining a rule that will determine the
triage of similar articles in the future. Such decisions,
along with brief examples of what should and should
not be included, can be catalogued in an explanatory
document. Although the conceptual definitions
should remain unchanged, the explanatory document
and the operational definitions it contains often
continue to evolve throughout the review process.
Involving the entire reviewer group in the development of the conceptual and operational definitions
not only improves the likelihood that others will
agree with the decisions made, but ensures that
everyone will apply the criteria using this shared
understanding. Yet even after the group development
process, it remains essential to pilot-test the inclusion exclusion form and process on a small subset of
articles. After each round of pilot-testing, all reviewers
compare their decisions and use points of discrepancy to refine the operational definitions and to
recalibrate their own standards.
The inclusion and exclusion process
As with nearly all phases of the review process,
inclusion and exclusion should involve at least two
reviewers. Duplicate, independent review minimises

Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943952


D A Cook & C P West

random error and helps to avoid idiosyncrasies that
would bias the review.
The inclusion exclusion process typically has two
stages. In stage 1, reviewers look only at the title,
abstract and if available the keywords. During this
stage, if both reviewers are confident based on the title
and abstract that the article is ineligible, it is excluded.
If there is any doubt, such as in a case in which the
abstract contains insufficient information, the article
advances to stage 2. Reviewers typically do not reconcile disagreements at this stage. If either reviewer feels
the paper should be included, it is duly advanced based
on the rationale that resolving uncertainties is best
done using the full text rather than the abstract alone.
During stage 2, reviewers read the full text of each
article to make a final inclusion exclusion decision.
Here, two independent reviews are required in all
cases. Reviewers initially attempt to resolve the
inevitable coding disagreements through discussion
and consensus, and appeal to another member of the
review team if needed.
In the review of simulation-based education,6
inclusion and exclusion criteria had been defined
with the writing of the study protocol. Included
studies were required to have a comparison group,
but no other design restrictions were imposed
(i.e. both randomised and non-randomised studies were eligible). The investigators applied these
criteria to each article identified in step 4. In stage
1, one or two authors reviewed each title and
abstract; two negative votes were required to
exclude an article, whereas one positive vote would
advance the article to stage 2 (i.e. err on the side of
inclusion). In stage 2, two investigators independently reviewed the full text of each article and
resolved all disagreements by consensus. Whereas
the original wording of the inclusion and exclusion criteria remained unchanged, the operational definitions of these criteria evolved over
time. Over 30 articles were translated from nonEnglish languages including Chinese, Japanese,
Korean, Spanish, French, German, Swedish and
Finnish. The authors kept a careful accounting of
the reason for each inclusion and exclusion, and
summarised this in a trial flow figure. All included
articles were listed in an online appendix.
Abstract data
After studies have been selected for inclusion, the
next step is to methodically abstract key information
from each included article.


Defining the data abstraction elements

What information should be collected? The PICO
framework can again provide guidance in planning
which data to collect, including the key features of
participants (number and key demographics), interventions (key elements of design, intensity, timing,
duration and implementation), comparisons (similar
to interventions) and outcomes. Information on outcomes should include details of both the measurement
method (e.g. outcome classification, assessor blinding,
timing in relation to intervention, score validity) and
the actual results (mean and standard deviation, event
rate, correlation coefficient, effect size, etc.).
Reviewers should also code information on study
design, which might include the number of groups,
method of group assignment (randomised versus
non-randomised), timing of assessments (e.g. postintervention versus pre- and post-intervention),
enrolment and follow-up rates, and other features of
study quality. These elements may vary for different
study designs, but a focus on threats to study validity14
is common among them. Many instruments for
assessing study quality have been described, including
the Medical Education Research Study Quality
Instrument (MERSQI)15 for education research, the
Jadad scale16 for randomised trials, the Newcastle
Ottawa Scale13,17 for non-randomised studies, and the
Quality Assessment of Diagnostic Accuracy Studies
(QUADAS-2)18 for studies of assessment tools. All
have strengths and weaknesses, and none has been
universally accepted as a reference standard. More
important than a score on any particular instrument
is the assessment of possible bias and validity threats
in each study in the systematic review.
The data abstraction process
A data abstraction form should be developed and
iteratively refined. As with the inclusion exclusion
criteria, the elements of data to be abstracted must be
defined both conceptually and operationally, and the
development of an explanatory document with
detailed definitions and examples is essential. In
addition to the questions defined at the study outset,
new questions often emerge as the review team reads
articles during the inclusion process. These questions
dictate the data to be abstracted.
Pilot-testing is necessary to identify ambiguous definitions and other areas for which additional consistency or clarification is required. The entire review
team may examine the same article or small set of
articles prior to a group discussion, and additional

Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943952

Conducting systematic reviews

cycles of this process may be carried out thereafter as
needed until a high degree of consistency is achieved.
Data abstraction should ideally be carried out by two
independent reviewers. Coding disagreements must
be resolved, ideally by consensus and by appeal to a
third party if necessary.
Many reviews will encounter articles containing
incomplete information. The review team must
decide how to handle such articles; solutions might
involve excluding such articles, imputing missing
information from other articles, or attempting to
obtain missing information from the original authors.
The impact of these decisions on the overall review
results should be considered regardless of the choices
The authors of the review of simulation-based
education6 considered the findings of previous
reviews and their own impressions of the field as
they identified data elements to collect. Study

Table 2

quality was evaluated using two complementary

criteria. Investigators reviewed each article in
duplicate to abstract this information using an
electronic tool designed for this purpose
( and resolved discrepant codes
by consensus.
Analyse and synthesise
During synthesis the hard work of systematic inclusion and data abstraction pays off as the collected
evidence is analysed to answer the focused question
and to develop new insights. This synthesis can take
the form of a quantitative summary (i.e. metaanalysis)19 or a more qualitative narrative synthesis20
or realist review.21 However, the most important part
of the analysis or synthesis is that it should actually
synthesise the evidence rather than simply catalogue it.
Synthesis requires more than merely reporting the
results of each study one at a time (the litany of the
literature) or counting the number of studies with

Tools to facilitate the review



Useful for:

E-mail, instant messaging,

Various options

Team communication, particularly to accommodate


physical distances and scheduling difficulties



One-to-one communication, particularly if visual

E-mail listservs and groups

Google groups, Yahoo groups

Group discussions; particularly valuable when discussing

cues are important

coding criteria and other key decisions as the tool will
keep a permanent archive of the discussion that can
later be used to both recall the decision and
understand the arguments that led to that decision
Scheduling software

Doodle, MeetingWizard, Tungle, TimeBridge

Scheduling group meetings


Google Docs,

Group collaboration on documents such as the coding

explanatory document, deadlines, assigned tasks,
thematic analysis


Microsoft Excel, OpenOffice Calc, Google Docs

Article inclusion exclusion, data abstraction and

rudimentary data analysis


Reference Manager

Article inclusion exclusion, data abstraction,

Bibliographic software


Translation software

Google Translate

Translating articles from other languages

Purpose-built review

DistillerSR, EPPI-Reviewer 4

Article inclusion exclusion and data abstraction

preparation of reference lists

Meta-analysis software

RevMan, MetaAnalyst, Comprehensive

Meta Analysis, MetaWin,





do not directly support meta-analysis,

but macros are available to do this


Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943952


D A Cook & C P West

Table 3

Tips on reporting a systematic review*

Manuscript section Key points


Keep the Introduction brief (two or three paragraphs)

Focus the literature review on previous review articles, emphasising their strengths and limitations, and highlighting
the added contribution of (i.e. need for) a new review on the topic


Divide the Methods into five sections, with headings pertaining to: a focused question; search strategy; inclusion
criteria; data abstraction, and analysis
Most of this information can be pulled from your written protocol


Divide the Results section into at least four sections, with headings pertaining to: trial flow; study characteristics;
study quality, and synthesis
Trial flow: describe the number of studies identified in the search, the number included and excluded, the number
remaining for full review, and any special issues that arose during this process
Study characteristics: describe a few (three to five) of the most salient study features
Study quality: describe the most important aspects of study quality
Synthesis: present the results of the narrative or quantitative data analysis. The synthesis narrative should distil the
evidence into a clear message for readers. Articles that support a similar point (either favourable or unfavourable
to the overall conclusion) should be grouped together rather than listed individually. For example, a reviewer
might write: It appears that intervention X improves outcomes. Five randomised trials addressed this issue; four
found favourable results and one found no significant difference (see Table 2 for details). The reviewer might
then proceed to discuss salient between-study differences, such as in design, participants, interventions or
instruments, that might have influenced results
Include at least one specific figure and two specific tables: (i) a trial flow diagram as specified in the QUOROM
and PRISMA guidelines; (ii) a table that contains information on the key features of each study, and (iii) a table
with details of each studys quality features


Divide the Discussion into at least four sections, with headings for the last three, pertaining to: summary;
limitations; integration with previous reviews, and implications for practice and future research
Summary (no separate heading): recap (but do not restate) the most important results, including key
uncertainties if any
Limitations: acknowledge the reviews limitations and unique strengths
Integration: discuss how the present review supports, contradicts or extends the findings of previous relevant
reviews. In addition to considering reviews in the present field of study, it is often helpful to draw parallels with
findings in other fields (e.g. clinical medicine, other education topics, or non-medical education)
Implications: outline two to four main points that can be immediately applied in practice or will provide a starting
point for future research. Note that there is no need for a separate conclusions section; the implications are,
in reality, your conclusions


Write the abstract last

Follow the structure prescribed by the PRISMA guidelines unless the journal requires a different format
Even if the journal requires a different format, retain the content requested by PRISMA

* These complement the more complete recommendations of the PRISMA statement26

statistically significant results (vote counting).22

Rather, synthesis involves pooling and exploring the
results to provide a bottom-line statement regarding
what the evidence supports and what gaps remain in
our current understanding. This requires reviewers to


organise and interpret the evidence, anticipating and

answering readers questions about this topic, while
simultaneously providing transparency that allows
readers to verify the interpretations and arrive at their
own conclusions.

Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943952

Conducting systematic reviews

Reviewers must make a number of key decisions
regarding the analysis. Firstly, should they attempt a
statistical pooling of quantitative results (i.e. metaanalysis)? If so, further decisions about this process
will refer to which statistical model to apply (e.g. a
fixed-effects or random-effects model) and how to
standardise outcomes across studies. Details on
meta-analysis are beyond the scope of this article;
Appendix S1 lists several helpful resources.
Secondly, how will reviewers explore heterogeneity
(inconsistency) across studies? The most informative
aspect of many reviews is not the average result across
studies, but, rather, the exploration of why results
differ from study to study. An explanation of betweenstudy inconsistency should be part of all systematic
Finally, the authors should consider threats to
the validity of their own review. By transparently
reporting their methods, acknowledging key
assumptions, exploring potential sources of bias and
providing tables containing detailed information on
each study, the reviewers encourage readers to verify
and potentially reinterpret the information for
themselves. Indeed, the degree to which reviewers
explore the strengths, weaknesses, heterogeneity and
gaps in the evidence determines in large part the
value of the review.
The authors of the review of simulation-based
education6 used meta-analysis to synthesise the
results. They used I2 statistics to quantify heterogeneity and subgroup analyses to explore this
heterogeneity. They also performed a narrative
synopsis of key study characteristics including
trainees, clinical topics and study quality. In
subsequent manuscripts (in press) on focused
topics they have used narrative synthesis methods
to identify and summarise salient themes.


Paper tracking and generic statistical software may

have been sufficient to perform a systematic review in
the past, but as the volume of evidence and the
sophistication of techniques expand, it is increasingly
necessary to rely on electronic resources and tools to
support high-quality reviews. As detailed in Table 2,
such tools can facilitate and archive team communications, streamline the process of inclusion exclusion
and data abstraction,23 assist in thematic analysis,
format bibliographies and even translate excerpts
from articles in other languages.

The authors of the review of simulation-based

education6 used to track inclusion exclusion and for data abstraction. They
used a Google Group to archive all e-mail
communications, Google Docs to keep an
ongoing list of articles in need of translation, to schedule teleconferences,
Google Translate for some simple translation needs, and EndNote to manage references. They used SAS macros to perform


The key elements in reporting systematic reviews and

meta-analyses have been codified in guidelines such
as the QUOROM (quality of reporting of metaanalyses),24 MOOSE (meta-analysis of observational
studies in epidemiology)25 and, most recently, PRISMA (preferred reporting items for systematic reviews
and meta-analyses)26 statements. We encourage
reviewers to adhere to these guidelines (http://, but we will not repeat
these in detail. We provide some practical advice for
writing the manuscript itself in Table 3.
The review6 team cited the PRISMA guidelines
and adhered to these during the planning,
conduct and reporting of the review.


As the volume and quality of evidence in medical

education continue to expand, the need for evidence
synthesis will grow. By following the seven key steps
outlined in this paper to complete a high-quality
systematic review, authors will more meaningfully
contribute to this knowledge base.

Contributors: DAC drafted the initial manuscript. Both

authors jointly revised subsequent drafts and both approved
the final manuscript for publication.
Acknowledgements: none.
Funding: none.
Conflicts of interest: none.
Ethical approval: not applicable.

1 Eva KW. On the limits of systematicity. Med Educ

Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943952


D A Cook & C P West

2 Cook DA. Narrowing the focus and broadening
horizons: complementary roles for non-systematic and
systematic reviews. Adv Health Sci Educ Theory Pract
3 Reed D, Price EG, Windish DM, Wright SM, Gozu A,
Hsu EB, Beach MC, Kern D, Bass EB. Challenges in
systematic reviews of educational intervention studies.
Ann Intern Med 2005;142:10809.
4 Reeves S, Koppel I, Barr H, Freeth D, Hammick M.
Twelve tips for undertaking a systematic review. Med
Teach 2002;24:35863.
5 Hammick M, Dornan T, Steinert Y. Conducting a
best evidence systematic review. Part 1: from idea to
data coding. BEME Guide No. 13.. Med Teach
6 Cook DA, Hatala R, Brydges R, Zendejas B, Szostek JH,
Wang AT, Erwin P, Hamstra S. Technology-enhanced
simulation for health professions education: a systematic review and meta-analysis. JAMA 2011;306:97888.
7 Cook DJ, Mulrow CD, Haynes RB. Systematic reviews:
synthesis of best evidence for clinical decisions. Ann
Intern Med 1997;126:37680.
8 Lefebvre C, Manheimer E, Glanville J. Searching
for studies. In: Higgins JPT, Green S, eds. Cochrane
Handbook for Systematic Reviews of Interventions. Chichester: Wiley-Blackwell 2008;95150.
9 White HD. Scientific communication and literature
retrieval. In: Cooper H, Hedges LV, Valentine JC, eds.
The Handbook of Research Synthesis, 2nd edn. New York,
NY: Russell Sage Foundation 2009;5171.
10 Haig A, Dozier M. BEME Guide No. 3: systematic
searching for evidence in medical education. Part 1:
sources of information. Med Teach 2003;25:35263.
11 Haig A, Dozier M. BEME Guide No. 3: systematic
searching for evidence in medical education. Part 2:
constructing searches. Med Teach 2003;25:46384.
12 Maggio LA, Tannery NH, Kanter SL. Reproducibility of
literature search reporting in medical education reviews. Acad Med 2011;86:104954.
13 Cook DA, Levinson AJ, Garside S, Dupras DM, Erwin
PJ, Montori VM. Internet-based learning in the health
professions: a meta-analysis. JAMA 2008;300:118196.
14 Cook DA, Beckman TJ. Reflections on experimental
research in medical education. Adv Health Sci Educ
Theory Pract 2010;15: 45564.
15 Reed DA, Cook DA, Beckman TJ, Levine RB, Kern DE,
Wright SM. Association between funding and quality of
published medical education research. JAMA
16 Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds
DJM, Gavaghan DJ, McQuay HJ. Assessing the quality of
reports of randomised clinical trials: is blinding necessary? Control Clin Trials 1996;17:112.
17 Wells GA, Shea B, OConnell D, Peterson J, Welch V,
Losos M, Tugwell P. The NewcastleOttawa Scale
(NOS) for assessing the quality of non-randomised
studies in meta-analyses.











[Accessed 29 February 2012.]
Whiting PF, Rutjes AWS, Westwood ME, Mallett S,
Deeks JJ, Reitsma JB, Leeflang MMG, Sterne JAC,
Bossuyt PMM, Group tQ. QUADAS-2: a revised tool for
the quality assessment of diagnostic accuracy studies.
Ann Intern Med 2011;155:52936.
Cook DA. Randomised controlled trials and metaanalysis in medical education: what role do they play?
Med Teach 2012;34: 46873.
Bland CJ, Meurer LN, Maldonado G. A systematic
approach to conducting a non-statistical meta-analysis
of research literature. Acad Med 1995;70:64253.
Pawson R, Greenhalgh T, Harvey G, Walshe K. Realist
review a new method of systematic review designed
for complex policy interventions. J Health Serv Res Policy
2005;10 (Suppl 1):2134.
Bushman BJ, Wang MC. Vote-counting procedures in
meta-analysis. In: Cooper H, Hedges LV, Valentine JC,
eds. The Handbook of Research Synthesis, 2nd edn. New
York, NY: Russell Sage Foundation 2009;20720.
King R, Hooper B, Wood W. Using bibliographic software to appraise and code data in educational systematic review research. Med Teach 2011;33:71923.
Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D,
Stroup DF. Improving the quality of reports of metaanalyses of randomised controlled trials: the QUOROM statement. Quality of reporting of meta-analyses.
Lancet 1999;354:1896900.
Stroup DF, Berlin JA, Morton SC et al. Meta-analysis of
observational studies in epidemiology: a proposal for
reporting. JAMA 2000;283:200812.
Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred
reporting items for systematic reviews and metaanalyses: the PRISMA statement. Ann Intern Med

Received 2 March 2012; editorial comments to authors 3 May

2012; accepted for publication 24 May 2012


Additional supporting information may be found in

the online version of this article. Available at: http://
Appendix S1. Annotated bibliography of additional
Please note: Wiley-Blackwell is not responsible for the
content or functionality of any supporting materials
supplied by the authors. Any queries (other than for
missing material) should be directed to the corresponding author for the article.

Blackwell Publishing Ltd 2012. MEDICAL EDUCATION 2012; 46: 943952