Beruflich Dokumente
Kultur Dokumente
Abstract
Med Teach Downloaded from informahealthcare.com by The University of Manchester on 11/11/14
Background: Measurement of learning climates can serve as an indicator of a department’s educational functioning.
Aim: This article describes the development and psychometric qualities of an instrument to measure learning climates in
postgraduate specialist training: the Dutch Residency Educational Climate Test (D-RECT).
Method: A preliminary questionnaire was evaluated in a modified Delphi procedure. Simultaneously, all residents in the
Netherlands were invited to fill out the preliminary questionnaire. We used exploratory factor analysis to analyze the outcomes and
construct the definitive D-RECT. Confirmatory factor analysis tested the questionnaire’s goodness of fit. Generalizability studies
tested the number of residents needed for a reliable outcome.
Results: In two rounds, the Delphi panel reached consensus. In addition, 1278 residents representing 26 specialties completed the
questionnaire. The Delphi panel’s input in combination with the exploratory factor analysis of 600 completed surveys led to the
definitive D-RECT, consisting of 50 items and 11 subscales (e.g., feedback, supervision, patient handover and professional relations
For personal use only.
between attendings). Confirmatory factor analyses of the remaining surveys confirmed the construct. The results showed that a
feasible number of residents is needed for a reliable outcome.
Conclusion: D-RECT appears to be a valid, reliable and feasible tool to measure the quality of clinical learning climates.
Correspondence: K. Boor, Department of Medical Education, St Lucas Andreas Hospital, Amsterdam, The Netherlands. Tel: 31 64 797 6494;
fax: þ31 20 5108791; email: klarkeboor@gmail.com
interaction with attendings, peers, nurses, and other healthcare Ethical considerations
personnel. In summary, an optimal learning climate was
This study was exempt from Institutional Board Review under
characterized by integration of work and training and tailored
Dutch law. However, we made sure that no possible harm
to individual residents’ needs. Residents’ input from this study
could come to our participants. In the invitation to the Delphi
served as basis for the themes reflecting a good learning
panelists and in the letter inviting the residents to take part in
climate.
the questionnaire study, we explicitly stated that participation
Second, it is difficult to construct an instrument with sound
was voluntary and full anonymity was guaranteed.
psychometric properties. Both within the field of organiza-
Using multiple sources and different methods, we aimed to
tional psychology (Bartram et al. 1993; Ashkanasy et al. 2000)
develop a questionnaire with reproducible subscales and offer
and the medical educational field (Roff et al. 1997; Kanashiro
validity and reliability evidence. Figure 1 shows an overview of
et al. 2006), many instruments have been developed that tap
the different analysis steps.
into (learning) climates. Still, the development and psychome-
trical testing of most instruments can be improved. For
instance, Dundee ready education environment measure Development of D-RECT
(DREEM; Roff et al. 1997) and postgraduate hospital educa-
tional environment measure (PHEEM; Roff et al. 2005), two In an earlier study, we found that an optimal learning climate
Med Teach Downloaded from informahealthcare.com by The University of Manchester on 11/11/14
widely used instruments to measure learning environments, was characterized by the integration of work and training and
lack a clearly described theoretical base and their underlying tailored to residents’ needs (Boor 2009). We used input from
factor structure is disputed. For instance, PHEEM has been this qualitative study in designing the first version of the
described as possessing three different subscales in three learning climate questionnaire, which consisted of 83 items
different publications (Roff et al. 2005; Boor et al. 2007; and 21 subscales (e.g., feedback, collaboration between peers,
Schonrock-Adema et al. 2008). Such a controversial subscale patient handover, workload, or teamwork). These 83 items
structure hinders the possibilities for specific feedback to and 21 subscales were discussed in an expert group of two
departments. medical education experts (CvdV and AS), two medical
doctors (PT and KB), and one specialty tutor (FS).
Additionally, four residents, two specialty tutors, and three
Aims medical educationalists (all of whom were not in the research
For personal use only.
team) checked the items for face validity and made sugges-
This article describes our attempt to overcome the above- tions for removal or rewording. Duplicate or unclear items
described obstacles. We developed D-RECT guided by the were removed. This resulted in a 75-item preliminary D-RECT.
results from our qualitative research. We searched for validity Every item invited agreement on a five-point Likert scale
evidence on test content and internal structure (American (1 ¼ totally disagree and 5 ¼ totally agree); we also included a
Education Research Association and American Psychological not applicable option.
Association 1999). We constructed subscales and tested them We submitted the 75 items of the preliminary D-RECT to a
using input from a Delphi panel (obtaining validity evidence Delphi panel. Simultaneously, residents were asked to fill out
based on test content) and an exploratory factor analysis; the the preliminary D-RECT and we performed several analyses on
definitive questionnaire was confirmed with a confirmatory the pool of completed questionnaires.
factor analysis (obtaining validity evidence based on internal
structure). Moreover, this article studied D-RECTs reliability:
using generalizability theory, we estimated the number of Delphi procedure
participants needed to get a reproducible outcome, i.e., how
A Delphi procedure is aimed at achieving consensus among
many residents must fill out a questionnaire to get a reliable
experts in a systematic manner (Fink et al. 1984; Jones &
(reproducible) result?
Hunter 1995). In multiple consultation rounds, experts indicate
their (dis)agreement with statements or concepts. After the first
round, the experts can change their own rating in light of the
Methods
summarized (anonymous) ratings of the other panel members
(Jones & Hunter 1995). In a modified Delphi procedure, the
Setting
statements or items are not generated by the expert group
In the Netherlands, medical students obtain a basic medical but – as in this study – carefully selected based on earlier
degree after 6 years of undergraduate medical training. This research (Rowe et al. 1991; Jones & Hunter 1995; Boor 2009).
entitles them to apply for a place in a training program in one In April and May 2008, we invited 10 medical education-
of the 27 specialties. Depending on the specialty, training lasts alists, 10 policymakers, 10 residents, and 10 specialists (all
from 4 to 6 years. In this article, we use the term resident to specialty tutors) for our Delphi panel; the latter two groups
refer to a junior doctor who is undertaking specialty training in represented different specialties as well as university and non-
GME. Specialist training programs consist of rotations in a university hospitals. The 40 panelists were chosen for their
university hospital and an affiliated general hospital. In every involvement in postgraduate specialist training. They were
hospital department where training is offered, one specialist is offered a monetary incentive for participating in the complete
the ‘‘specialty tutor’’ and formally responsible for residents’ Delphi procedure. Experts received the preliminary D-RECT
education and training as well as (bi-) annual assessments. and rated every item’s relevance in relation to postgraduate
821
K. Boor et al.
Discussion (5 educationalists,
6 residents, 3 attendings)
learning climate on a five-point scale (1¼ not relevant and (which presupposes no correlations between the factors) and
5¼ highly relevant). For the analysis, the ratings were Oblimin rotation (which presupposes some correlations
dichotomized (1, 2, and 3 were interpreted as not relevant; 4 between the factors). Because the ‘‘Component Correlation
and 5 as relevant). In the absence of undisputed guidelines to Matrix’’ indicated correlations between the factors, from then
decide when consensus is reached (Holey et al. 2007), we on we only used Oblimin rotations (Field 2005). Items with
decided that agreement among 80% of participants would lead weak factor loadings were eliminated and internal consistency
to inclusion or exclusion of an item and that consensus was of the factors was determined by calculating Cronbach’s alpha.
considered to have been reached when 90% of the items that Our theoretical framework (Boor 2009) and the outcomes
were included or excluded in one round remained unchanged of the Delphi procedure guided our decisions on the inclusion
in the subsequent round. of ambiguous items. This resulted in a multi-factorial model:
the definitive D-RECT.
Questionnaire mailing
In May 2008, a letter was sent to all residents in the
Testing the definitive D-RECT
Netherlands, asking them to complete the web-based prelim-
inary D-RECT and answer some demographic questions.
Confirmatory factor analysis
D-RECT was administered in Dutch. Psychiatric residents Using structural equation modeling, we tested the goodness-
were not included for logistical reasons. Some of the respon- of-fit of the multi-factorial model on the other half of the
dents could win an incentive provided they completed the returned questionnaires. The comparative fit index (CFI), the
questionnaire. No reminders were sent. We compared the root mean square error of approximation (RMSEA), and
response group with respect to sex and specialty to the entire the relative chi-square (CMIN/DF) were used as indices of
population of residents using the chi-squared test ( p 5 0.05 the goodness-of-fit (McDonald & Ho 2002).
was considered significant).
Reliability analysis
Exploratory factor analysis
We used the same data (the second half of the returned
Factor analysis is used to identify clusters of related variables. questionnaires) in order to determine reliability. A variance
We randomly selected half of all returned questionnaires for component analysis was performed to measure the contribu-
exploratory factor analysis using both Varimax rotation tions of all relevant components (in this case, residents,
822
Development and analysis of D-RECT
departments, items, and their interactions) to the variance in an jaarverslagenopleidingregistratie-1.htm, annual report 2007)
outcome measure (Crossley et al. 2002). We performed with respect to sex and the 26 specialties, except for internal
generalizability analysis for the mean total score and each medicine residents ( p ¼ 0.01), obstetric-gynecological resi-
separate subscale, to estimate the number of residents needed dents ( p 5 0.01), and rehabilitation medicine residents
to obtain reliable test scores. ( p ¼ 0.01). Twenty-five questionnaires that had more than 25
We treated the total number of items as fixed. The number unanswered items were excluded from the analysis. 591
of residents within a single department and the number of respondents checked ‘‘not applicable’’ on some items, but the
departments were allowed to vary. Following variance com- response rate per item was never lower than 94%. The values
ponent estimation, we estimated the standard error of of these items were replaced for the psychometric analysis
measurement (SEM) for a single department. The SEM can using two-way imputation, a method that corrects both for
be interpreted on the original scoring scale (in this case 1–5) person effects and item effects (Sijtsma & van der Ark 2003).
and we decided to accept a maximum ‘‘noise level’’ of 1.0
on the scale. We therefore used a SEM 5 0.26 Exploratory factor analysis. We first analyzed 600 randomly
(1.96 0.26 2 ¼ 1.0) as the smallest admissible value for a chosen questionnaires using an exploratory factor analysis. For
95% confidence interval interpretation. a sound factor analysis a number of 5 subjects per item is the
To use D-RECT across a group of departments, we minimum, so this was a large enough sample for our analysis
Med Teach Downloaded from informahealthcare.com by The University of Manchester on 11/11/14
estimated the RMSE, which can be interpreted in the same (Streiner 1994). We eliminated 14 items with weak factor
way as the SEM but now at group level. loadings. Another 11 items were removed following the advice
We used Amos structural equation modeling software for of the Delphi panel. Fifteen items were included in the D-RECT
the confirmatory factor analysis, URGENOVA software to because residents in earlier qualitative research highlighted its
analyze generalizability, and SPSS for all the other analyses. specific importance and factor loadings were high, although
these items were not accepted by the Delphi panel (Table 1).
Five subscales were excluded completely, and five subscales
Results merged with other, related, subscales. This led to the definitive
50-item D-RECT with 11 subscales. Cronbach’s alpha var-
Creating the definitive D-RECT ied between 0.64 and 0.85 for the different factors (Tables 1
and 2).
For personal use only.
(continued )
824
Development and analysis of D-RECT
Table 1. Continued.
Notes: A professional translator rendered the original Dutch questionnaire into English. This version was checked by a British medical specialist for clarity. A native
speaker translated the questionnaire back into a Dutch version; this version was comparable to the original version.
N, number of respondents; , Cronbach’s alpha; and Delphi, item accepted (A) or not accepted (NA) by Delphi panel.
Role of the specialty tutor 5, 65 patient data because one shift is ending and the other shift is
Patient sign out 5, 23
beginning). These issues were brought to the fore by residents
in earlier qualitative research and confirmed by the Delphi
panel. They are hardly described in relation to learning
Table 3. Generalizability analysis of D-RECT total scores and climates (Sanfey et al. 2008) and can offer new insights why
subscales. some climates ‘‘work’’ while others do not.
SEM RMSE
n (residents)/ Strengths and weaknesses
n (residents) n (departments)
It is a strength of this study that different strategies were used
Total 3 2/3 in developing the questionnaire. Other studies reporting on
Supervision 7 4/4 the development and/or validation of learning climate ques-
Coaching and assessment 6 3/3
tionnaires did not describe their literature review or analytical
Feedback 11 4/6
Team work 8 4/4 methods and used Cronbach’s alpha as the sole indicator of
Peer collaboration 7 4/4 questionnaire stability (Roff et al. 1997; Cassar 2004; Roff et al.
Professional relations 9 4/6
2005). Some studies included an exploratory factor analysis
between attendings
Work is adapted to 7 4/3 (Bligh & Slade 1996; Pololi & Price 2000) but, to our
residents’ competence knowledge, no earlier study combined theoretical input, a
Attendings’ role 5 4/3
Formal education 7 4/6
Delphi procedure, exploratory and confirmatory analyses, and
Role of the specialty tutor 7 4/4 generalizability studies in developing and validating an
Patient sign out 8 4/5 instrument. Another strength of this study is that the data for
the psychometric analyses were obtained from residents in 26
Notes: SEM, standard error of measurement; number of residents needed to
get a reliable result for one department, RMSE ¼ root mean square error;
different specialties, at different levels of training, and from 76
number of residents needed to get a reliable outcome for a group of different hospitals. This strengthens the comprehensive appli-
departments (for instance, three residents from three departments are cability of D-RECT. Furthermore, the number of residents
needed to get a reliable outcome for the subscale ‘‘coaching and
assessment’’).
needed for a reliable outcome for one department lies
between 3 and 11, but most subscales can be judged reliably
by eight residents. This supports the feasibility of the instru-
among experts determined the final inclusion and exclusion of ment. As for groups of departments, even fewer residents (four
items and extensive psychometric analyses revealed a multi- from four different departments) would yield a reliable
factorial questionnaire. Generalizability theory was used to impression of most subscales. Other studies have shown
determine the number of respondents necessary for reliable similar generalizability outcomes or required (much) larger
825
K. Boor et al.
numbers of participants for a reliable outcome (Bierer et al. involved in residents’ training) dispatched the questionnaire
2004; van der Hem-Stokroos et al. 2005; Boor et al. 2007). and offered administrative support in analyzing the filled out
There are some caveats to take into account. First, the 26% surveys; they had no influence on the content or analysis of
response rate can be a source of potential bias. This could the data.
have been caused by the possible sensitive nature of the
Declaration of interest: The authors report no conflicts of
questionnaire and the lack of funding for sending a reminder –
interest. The authors alone are responsible for the content and
a strategy proven to be effective to increase response rates
writing of the article.
(Edwards et al. 2009). However, the goal of this part of the
study was to test D-RECTs psychometric properties: for this
goal, the number of respondents is sufficiently high (Streiner
1994; Field 2005). Second, we conducted our study within the
Notes on contributors
context of GME in the Netherlands. Whether D-RECT is also KLARKE BOOR is a resident and obtained a PhD in medical education.
valid outside the Netherlands warrants further investigations. CEES VAN DER VLEUTEN is a professor of education.
Finally, D-RECT has not been tested in its final form; PIM TEUNISSEN is a resident and obtained a PhD in medical education.
administering the 50-item questionnaire instead of the prelim- ALBERT SCHERPBIER is a professor of education.
inary 75-item version could possibly lead to slightly different FEDDE SCHEELE is a gynecologist and is a professor of education.
Med Teach Downloaded from informahealthcare.com by The University of Manchester on 11/11/14
outcomes.
order to offer a better learning climate for residents. In organizational culture and climate. Thousand Oaks CA: Sage
addition, D-RECT offers benchmarking opportunities. Publications. pp 1–18.
Departments who devote time and effort to creating an Bartram D, Foster F, Lindley PA, Brown AJ, Nixon S. 1993. The Learning
optimal learning climate can substantiate their qualities, while Climate Questionnaire. Employment Service and Newland Park
Associates Ltd.
departments in need of help can be made aware of their
Bierer SB, Fishleder AJ, Dannefer E, Farrow N, Hull AL. 2004. Psychometric
performance in relation to that of other departments. This is properties of an instrument designed to measure the educational
also relevant in light of governmental and public calls for quality of graduate training programs. Eval Health Prof 27(4):410–424.
transparency and monitoring of the quality of postgraduate Bligh J, Slade P. 1996. A questionnaire examining learning in general
medical education (Genn 2001; Afrin et al. 2006). practice. Med Educ 30(1): 65–70.
Boor K. 2009. The clinical learning climate. Amsterdam: VU Medical
From a research point of view, it would be interesting to
Center.
examine D-RECT’s consequential validity to strengthen the Boor K, Scheele F, van der Vleuten CP, Scherpbier AJ, Teunissen PW,
instrument’s evidence base (American Education Research Sijtsma K. 2007. Psychometric properties of an instrument to measure
Association and American Psychological Association 1999). the clinical learning environment. Med Educ 41(1):92–99.
D-RECT is intended to improve the educational climate. Does Cassar K. 2004. Development of an instrument to measure the surgical
it succeed? Are there unintended positive or adverse effects? operating theatre learning environment as perceived by basic surgical
trainees. Med Teach 26(3):260–264.
Another line of investigation would be to validate D-RECT for
Crossley J, Davies H, Humphris G, Jolly B. 2002. Generalisability: A key to
use in international settings. Finally, D-RECT may enable us to unlock professional assessment. Med Educ 36(10):972–978.
study the relationship between a good learning climate and Edwards PJ, Roberts I, Clarke MJ, Diguiseppi C, Wentz R, Kwan I, Cooper R,
patient and resident outcomes. Are high D-RECT scores related Felix LM, Pratap S. 2009. Methods to increase response to postal and
to the delivery of better patient care? Do they result in better electronic questionnaires. Cochrane database of systematic review.
3:MR000008.
specialists? Research could also examine whether some
Field A. 2005. Exploratory factor analysis. In: Discovering statistics using
subscales in particular are associated with the delivery of SPSS. 2nd ed. London: Sage. pp 619–680.
better patient care or better doctors. Fink A, Kosecoff J, Chassin M, Brook RH. 1984. Consensus
methods: Characteristics and guidelines for use. Am J Publ Health
74(9):979–983.
Acknowledgments Genn JM. 2001. AMEE medical education guide no. 23 ( part 2): Curriculum,
environment, climate, quality and change in medical education – a
The authors thank the participants of the Delphi panel and all unifying perspective. Med Teach 23(5):445–454.
residents who completed D-RECT. We also thank Ron Holey EA, Feeley JL, Dixon J, Whittaker VJ. 2007. An exploration of the use
Hoogenboom and Henk van Berkel for their invaluable of simple statistics to measure consensus and stability in Delphi studies.
BMC Med Res Meth 7(52):doi:10.1186/1471-2288-7-52.
assistance in analyzing the data. In addition, we thank
Iverson DJ. 1998. Meritocracy in graduate medical education? Some
Mereke Gorsira for her support in editing this article’s suggestions for creating a report card. Acad Med 73(12):1223–1225.
English and Tim Dornan, for commenting on D-RECT’s Jones J, Hunter D. 1995. Consensus methods for medical and health
proper usage of English terms. The CBOG (an institute services research. Br Med J 311(7001):376–380.
826
Development and analysis of D-RECT
Kanashiro J, McAleer S, Roff S. 2006. Assessing the educational environ- Rowe G, Wright G, Bolger F. 1991. Delphi. A reevaluation of research and
ment in the operating room: A measure of resident perception at one theory. Technol Forecast Soc Change 39:235–251.
Canadian institution. Surgery 139(2):150–158. Sanfey H, Stiles B, Hedrick T, Sawyer RG. 2008. Morning
McDonald RP, Ho MH. 2002. Principles and practice in reporting structural report: Combining education with patient handover. Surgeon
equation analyses. Psychol Methods 7(1): 64–82. 6(2):94–100.
Pololi L, Price J. 2000. Validation and use of an instrument to measure the Schonrock-Adema J, Heijne-Penninga M, Hell EA, Cohen-Schotanus J.
learning environment as perceived by medical students. Teach Learn 2008. Necessary steps in factor analysis: Enhancing validation studies of
Med. 12(4):201–207. educational instruments. The PHEEM applied to clerks as an example.
Roff S, McAleer S, Harden RM, Al-Qahtani M, Ahmed AU, Deza H, Med Teach. 31:1–7.
Groenen G, Primparyon P. 1997. Development and validation of the Sijtsma K, van der Ark LA. 2003. Investigation and treatment of missing item
Dundee Ready Education Environment Measure (DREEM). Med scores in test and questionnaire data. Multivariate Behav Res
Teach 19:295–299. 38:505–528.
Roff S, McAleer S, Skinner A. 2005. Development and validation of an Streiner DL. 1994. Figuring out factors: The use and misuse of factor
instrument to measure the postgraduate clinical learning and teaching analysis. Can J Psychiat 39(3):135–140.
educational environment for hospital-based junior doctors in the UK. van der Hem-Stokroos HH, van der Vleuten CP, Daelmans HE, Haarman
Med Teach 27(4): 326–331. HJ, Scherpbier AJ. 2005. Reliability of the clinical teaching effectiveness
instrument. Med Educ 39(9):904–910.
Med Teach Downloaded from informahealthcare.com by The University of Manchester on 11/11/14
For personal use only.
827