Beruflich Dokumente
Kultur Dokumente
a r t i c l e i n f o a b s t r a c t
Article history: The growth of big data is transforming many economic sectors, including the medical and healthcare
Received 15 October 2017 sector. Despite this, research into the practical application of data analytics to the development of health
Received in revised form 6 February 2018 policy is still limited. In this study we examine how data science and machine learning methods can
Accepted 25 February 2018
be applied to a variety of open health datasets, including GP prescribing data, disease prevalence data
Available online 2 March 2018
and economic deprivation data. This paper discusses the context of mental health and antidepressant
Keywords: prescribing in Northern Ireland and highlights its importance as a public policy issue. A hypothesis is
Health policy proposed, suggesting that the link between antidepressant usage and economic deprivation is mediated
Data analytics by depression prevalence. An analysis of various heterogeneous open datasets is used to test this
Big data hypothesis. A description of the methodology is provided, including the open health datasets under
Prescribing investigation and an explanation of the data processing pipeline. Correlations between key variables
Deprivation and several different clustering analyses are presented. Evidence is provided which suggests that the
Prevalence
depression prevalence hypothesis is flawed. Clusters of GP practices based on prescribing behaviour and
disease prevalence are described and key characteristics are identified and discussed. Possible policy
implications are explored and opportunities for future research are identified.
© 2018 Published by Elsevier Inc.
https://doi.org/10.1016/j.bdr.2018.02.002
2214-5796/© 2018 Published by Elsevier Inc.
42 B. Cleland et al. / Big Data Research 12 (2018) 41–48
Fig. 2. Data flow diagram illustrating the main data sources and data processing activities.
Classifying the data according to the ATC system also allowed pre- 3.3. Analytics tools and pipeline
scribing levels to be compared across different drugs. This was
done using a “defined daily dose” (DDD), defined by the World Jupyter Notebooks [18] were used to algorithmically restruc-
Health Organisation as “the assumed average maintenance dose ture, transform and merge datasets were required. Pandas [33]
per day for a drug used for its main indication in adults” [37]. Due and NumPy [31] were used for the statistical analysis. Visualisation
to some gaps in the NHS dm+d data, only chapters 1–10 of the of the data was also done through Jupyter Notebooks using Mat-
BNF system could be meaningfully analysed in this study. PlotLib [34] and Seaborn [35] for charts and graphs and iPyLeaflet
[17] for maps. Correlations between the key variables were ex-
plored using the Pearson correlations and p-values. The scikit-learn
library [34] was employed to perform K-means clustering on the
3.2.2. Economic deprivation data
data.
The metric for economic deprivation used in this study was
Two major challenges were addressed in the creation of the
the Multiple Deprivation Measure (MDM), which is published by
data pipeline. The first was identifying the defined daily dosage
the Northern Ireland Statistics and Research Agency (NISRA) [30].
(DDD) for each drug in the GP prescribing datasets. This was re-
Unlike in other parts of the UK, individual GP practice data does quired so that prescribing patterns could be meaningfully com-
not include deprivation measures. In order to link GP practices to pared across GP practices. In order to do this, the BNF codes used
the MDM data, the postcode for each practice was obtained from by the UK authorities had to be linked to the World Health Organ-
the Detail Data portal [8], an open data resource provided by the isation’s ATC classification system, since only the latter provides
Northern Ireland Council for Voluntary Action. The online MySo- standardised DDD data. After some searching it was discovered
ciety Mapit service [28] was then used to convert the postcode that the NHS Business Services Authority had created a table or
data into super output areas (SOA). By linking each practice to a correspondences between BNF and ATC codes for its own use, and
specific SOA we could then assign a deprivation measure for each that this table was available as open data from their website.
prescriber, based on the data supplied by NISRA [30]. While de- The second challenge was assigning economic deprivation mea-
privation measures are available for other geographic boundaries, sures to the individual GP practices. Unlike in the rest of the UK,
super output areas were chosen due to their relatively high granu- GP practice data does not include this information. This problem
larity and stability compared to other boundaries such as electoral was resolved by using practice data to interrogate the online Mapit
wards. service (provided by the not-for-profit MySociety organisation),
which allowed a Special Output Area (SOA) to be assigned to each
practice. These SOAs were then matched to the economic depriva-
tion data from NISRA to allocate a multiple deprivation measure
3.2.3. Disease prevalence data
(mdm) to each practice.
Disease prevalence data, that is the number of patients per
The resulting data pipeline is outlined in Fig. 2.
1000 diagnosed as suffering from different diseases, is available
across most of the UK under the Quality Outcomes Framework
3.4. Results
(QOF), a collection of data which is designed to measure GP perfor-
mance in order to support GP payments. In Northern Ireland the 3.4.1. Variation in prescribing patterns across GP practices
QOF data no longer includes disease prevalence figures, but for- Prescribing behaviour for all GP Practices was analysed accord-
tunately these are still published separately by the Department of ing to how many defined daily doses were prescribed by each
Health [13]. Since the prevalence data is linked directly to GP prac- practice per patient per month, grouped by BNF chapter. Pre-
tice identifiers, it allows for a direct comparison between the num- scribing levels for each drug type were then visualised on a box
ber of patients being diagnosed with depression and the amount plot (Fig. 3). As can be seen from the diagram, the most heav-
of antidepressants being prescribed. ily prescribed drugs are those for central nervous system disorders
44 B. Cleland et al. / Big Data Research 12 (2018) 41–48
Fig. 5. Heat map showing correlations between economic deprivation (mdm), an-
tidepressant prescribing (ddd_per_patient), and other drug prescribing according to
BNF chapter. (For interpretation of the colours in the figure(s), the reader is referred
to the web version of this article.)
Fig. 6. Heat map showing correlations between economic deprivation (mdm), an-
tidepressant prescribing (ddd_per_patient), and other disease prevalences.
before the findings may be extended to other drug categories, it is strongly correlated with both mental health issues and dia-
types of illness, or other regions. Moreover, since it relied purely betes prevalence. This suggests that antidepressants are being pre-
on open data, factors such as GP gender, which previous research scribed to patients diagnosed with other mental health issues. It
has indicated might be significant [24], have not been examined in also implies that antidepressant usage is strongly linked to dia-
this study. Generally speaking, there are a range of variables which betes, which reflects other studies on mental health comorbidities
cannot be fully explored in an open data context due to privacy or in diabetic patients [8]. On the other hand, the lack of corre-
confidentiality considerations. lation between antidepressants and depression prevalence once
raises questions about how patients receiving antidepressants are
4.3. Antidepressant prescribing and depression prevalence classified in terms of depression and other disease categories. Ul-
timately, an improved understanding of the relationships between
There has been some exploration within the literature of the diagnosis and prescribing may lead to a better explanation of the
validity of using pharmaceutical data as a proxy for measuring factors driving antidepressant usage, and of the wider healthcare
clinical conditions in a given population [6,7,14,16]. The arguments implications of this trend.
in favour of such an approach include that fact that disease preva-
lence data is often difficult to capture accurately [36], may be 4.5. Clustering practices based on prescribing patterns and disease
inconsistently reported [6], and in some cases may not be available prevalence
at all [7]. In such situations, prescribing data, if available, may offer
The clustering of GP practices based on prescribing patterns ac-
an attractive alternative for studying public health issues. Indeed,
cording to BNF drug classifications shows two quite distinct groups
some studies have suggested a strong correlation between some
based on the overall level of prescribing. The group of practices in
antidepressants and clinical diagnoses. Gardarsdottir et al. [10] for
more deprived areas tended to show higher levels of prescribing in
example observed that 73% of patients on an SSRI had a diagnosis
all categories except “Malignant Disease”. A three-cluster analysis
of depression or anxiety, while Henriksson et al. [12] found that
of the same data showed a similar pattern, but effectively parti-
82% using SSRIs had been diagnosed with depression.
tioned the low-deprivation practices into two clusters – a higher
While the case for using pharmaceutical data as a tool for pub-
prescribing group and a lower prescribing group. None of these
lic health surveillance may seem fairly strong, there are a num-
clusters were characterised by strong differences in prescribing for
ber of important caveats that should be taken into consideration.
a particular type of drug, but rather by higher or lower prescribing
Firstly, as Henriksson et al. [10] show, not all drugs are equally
levels across all categories.
strongly correlated with specific illnesses. Many drugs have more
The second clustering analysis, which used disease prevalence
than one use, and many antidepressants are used for a wide range
to classify GP practices, similarly grouped practices into higher and
of disorders including fibromyalgia, chronic pain, eating disorders,
lower levels of prevalence. Interestingly, in the two cluster analysis,
insomnia and migraine [10]. An analysis by Mercier et al. [21] of
there was no clear distinction between the clusters in terms of de-
antidepressant prescribing among French GPs found that 20% of
privation scores. When the prevalence data was analysed in terms
prescriptions are potentially unrelated to any psychiatric condition.
of three clusters, the practices were once again split along depri-
The perceived “safety” of SSRIs may exacerbate the tendency to-
vation lines. The GP practices with higher deprivation scores were
wards prescribing for a wider range of conditions [17].
divided into those with high, medium or low prevalence measures
Our results show that the link between depression prevalence
across all disease types.
and antidepressant prescribing is weak. On the other hand, there
does appear to be a strong correlation between antidepressant pre-
5. Conclusion
scribing and prevalence of mental health disorders. This suggests
that there may be an issue with how patients who receive antide-
Mental health has been clearly identified as a priority area for
pressants are classified – in particular whether they are classified
public investment, both within Northern Ireland and across the UK
as suffering from either depression or from mental health issues.
as a whole. Policymakers have also recognised the need to address
In any case, it is clear that caution must be exercised when us-
effectiveness and efficiency in how these services are delivered.
ing prescribing data as in indicator of depression prevalence in the
The impact of big data on other sectors suggests that data science
wider population.
can help to meet some of these needs, although this is an area
that is currently under-explored within the literature. In this study
4.4. Factors driving antidepressant prescribing we investigated data analytics might be applied to open prescrib-
ing data, disease prevalence data and economic deprivation data in
If, as the data suggests, dramatic increases in the levels of an- order to shine a light on public policy and service delivery issues.
tidepressant prescribing are not being motivated by corresponding Our findings highlight both the limitations and opportunities
increases in depression prevalence, the question arises as to what of such an approach. Some widespread assumptions about the
other factors might be behind these trends. A number of studies value of prescribing data as a proxy for disease prevalence have
have attempted to answer this question by interviewing GPs in or- been called into question. On the other hand, our analysis sug-
der to get their perspective on the problem. McDonald et al. [17] gests that interesting correlations between disease prevalences and
propose that few GPs believe that depression levels has actually GP prescribing do exist, and may have useful implications for fu-
risen, and that there were some concerns about whether current ture policymaking. Moreover, the clustering of GP practices based
prescribing levels were appropriate. When asked to identify factors on depression prevalence data implies that different populations
that were driving the increase in prescribing rates, suggestions in- respond to economic deprivation factors in different ways – an
cluded: successful awareness-raising campaigns on depression, the insight that merits further investigation by both researchers and
perceived safety of selective serotonin reuptake inhibitors (SSRIs), policymakers.
and a willingness among patients to ask for help. Some clinicians The seeming weakness of the correlation between antidepres-
believed that normal human “unhappiness” was being inappropri- sant prescribing and depression prevalence calls into question the
ately interpreted as a medical condition. medical basis for increasing antidepressant usage. Within North-
A key finding of our study is that antidepressant prescribing ern Ireland, where levels of antidepressant prescribing greatly ex-
is not strongly correlated with depression prevalence, although ceed other countries, this is a particularly urgent concern. In order
48 B. Cleland et al. / Big Data Research 12 (2018) 41–48
to address this problem, further data analysis might be used to [13] N.I. Health, 2016/17 raw disease prevalence trend data for North-
identify anomalous prescribing patterns and thereby enable tar- ern Ireland, https://www.health-ni.gov.uk/publications/201617-raw-disease-
prevalence-trend-data-northern-ireland, 2017.
geted interventions by the Department of Health. Such analysis
[14] S. Henriksson, G. Boëthius, J. Hakansson, G. Isacsson, Indications for and out-
could also help to identify hidden environmental or clinical factors, come of antidepressant medication in a general population: a prescription
and to provide GPs with information to support clinical decision- database and medical record study in Jämtland county, Sweden, 1995, Acta
making. Psychiatr. Scand. (2016).
Future research in this area might examine why depression [15] House of Commons, The Big Data dilemma, https://www.publications.
parliament.uk/pa/cm201516/cmselect/cmsctech/468/46802.htm, 2016.
prevalence and antidepressant prescribing are so weakly correlated
[16] C.A. Huber, T.D. Szucs, R. Rapold, O. Reich, Identifying patients with chronic
and what the potential implications are for public policy. The exis-
conditions using pharmacy data in Switzerland: an updated mapping approach
tence of distinct depression prevalence clusters also merits a more to the classification of medications, BMC Public Health 13 (2013) 1030.
detailed investigation. Interviews with GPs and other experts might [17] iPyLeaflet, An IPython widget for leaflet maps, https://github.com/ellisonbg/
be helpful in order to gain information that is not apparent from ipyleaflet, 2017.
a purely quantitative analysis. Finally, while this study has focused [18] Jupyter, Project Jupyter, http://jupyter.org/, 2017.
[19] H.M. Krumholz, Big Data and new knowledge in medicine: the thinking, train-
on the use of open datasets, combining these with non-open data ing, and tools needed for a learning health system, Health Aff. (Millwood, Va.)
about individual GPs and patients might allow more granular ex- 33 (2014) 1163–1170, https://doi.org/10.1377/hlthaff.2014.0053.
ploration and provide further policy insights. [20] G. Loewenstein, D.A. Asch, J.Y. Friedman, L.A. Melichar, K.G. Volpp, Can be-
havioural economics make us healthier? 2012.
[21] S. Macdonald, J. Morrison, M. Maxwell, R. Munoz-Arroyo, A. Power, M. Smith,
Acknowledgements
M. Sutton, P. Wilson, “a coal face option”: GPs’ perspectives on the rise in an-
tidepressant prescribing, Br. J. Gen. Pract. 59 (2009) e299–e307.
The MIDAS Consortium gratefully acknowledge the support to [22] A. Maguire, C. Hughes, C. Cardwell, D. O’Reilly, Psychotropic medications and
this project from the European Union research fund ’Big Data Sup- the transition into care: a national data linkage study, J. Am. Geriatr. Soc. 61
(2013) 215–221, https://doi.org/10.1111/jgs.12101.
porting Public Health Policies’, under Grant Agreement No. 727721
[23] A. Maguire, M. McCann, J. Moritarty, D. O’Reilly, The grief study: using admin-
(H2020-SC1-2016-CNECT SC1-PM-18-2016). istrative data to understand the mental health impact of bereavement, Eur. J.
Public Health (2014), https://doi.org/10.1093/eurpub/cku165.058.
References [24] J. McClure, The script report, http://script-report.thedetail.tv/, 2014.
[25] D. Meeker, T.K. Knight, M.W. Friedberg, J.A. Linder, N.J. Goldstein, C.R. Fox, A.
Rothfeld, G. Diaz, J.N. Doctor, Nudging guideline-concordant antibiotic prescrib-
[1] J. Andreu-Perez, C.C.Y. Poon, R.D. Merrifield, S.T.C. Wong, G.Z. Yang, Big Data for
ing: a randomized clinical trial, JAMA Intern. Med. 174 (2014) 425–431.
health, IEEE J. Biomed. Health Inform. 19 (2015) 1193–1208, https://doi.org/10.
[26] Mental Health Foundation, Mental health in Northern Ireland: fundamen-
1109/JBHI.2015.2450362.
tal facts, https://www.mentalhealth.org.uk/sites/default/files/FF16%20Northern%
[2] Bamford Review of Mental Health and Learning Disability. A Vision of a Com-
20ireland.pdf, 2016.
prehensive Child and Adolescent Mental Health Service (CAMHs), DHSSPS,
[27] J. Morrison, M-J. Anderson, M. Sutton, R. Munoz-Arroyo, S. McDonald, M.
Belfast, 2006.
Maxwell, A. Power, M. Smith, P. Wilson, Factors influencing variation in pre-
[3] J. Betts, J. Thompson, Mental Health in Northern Ireland: Overview, Strategies,
scribing of antidepressants by general practices in Scotland, Br. J. Gen. Pract.
Policies, Care Pathways, CAMHS and Barriers to Accessing Services, 2017.
59 (2009) e25–e31.
[4] G.S. Birkhead, M. Klompas, N.R. Shah, Uses of electronic health records for pub-
[28] MySociety, MapIt, https://mapit.mysociety.org/, 2017.
lic health surveillance to advance public health, Annu. Rev. Public Health 36
[29] NHS Digital, NHSBSA dm+d and NHSBSA dm+d supplementary, https://isd.
(2015) 345–359, https://doi.org/10.1146/annurev-publhealth-031914-122747.
digital.nhs.uk/, 2017.
[5] B.P. Bunting, S.D. Murphy, S.M. O’Neill, F.R. Ferry, Lifetime prevalence of mental
[30] NISRA, Northern Ireland multiple deprivation measure 2010: SOA results,
health disorders and delay in treatment following initial onset: evidence from
https://www.nisra.gov.uk/publications/nothern-ireland-multiple-deprivation-
the Northern Ireland Study of Health and Stress, Psychol. Med. 42 (8) (2011)
measure-2010-soa-results, 2017.
1727–1739, https://doi.org/10.1017/S0033291711002510.
[31] Numpy, http://www.numpy.org/, 2017.
[6] F. Chini, P. Pezzotti, L. Orzella, P. Borgia, G. Guasticchi, Can we use the phar-
[32] Open Data NI, GP prescribing data, https://www.opendatani.gov.uk/dataset/gp-
macy data to estimate the prevalence of chronic conditions? A comparison of
prescribing-data, 2017.
multiple data sources, BMC Public Health 11 (2011) 688.
[33] Pandas, Python Data Analysis Library, https://pandas.pydata.org/, 2017.
[7] A. Coates, A.Y. Ng, Learning feature representations with k-means, in: Neural
[34] Scikit-learn, Machine learning in Python, http://scikit-learn.org/stable/, 2017.
Networks: Tricks of the Trade, Springer, Berlin, Heidelberg, 2012, pp. 561–580.
[35] Seaborn. Seaborn: statistical data visualization, https://seaborn.pydata.org/,
[8] Detail Data, GP practices, http://data.nicva.org/dataset/gp-practices, 2017.
2017.
[9] L. Ducat, L.H. Philipson, B.J. Anderson, The mental health comorbidities of dia-
[36] M. Vercambre, F. Gilbert, Respondents in an epidemiologic survey had fewer
betes, JAMA 312 (7) (2014) 691–692.
psychotropic prescriptions than nonrespondents: an insight into health-related
[10] L.E. Egede, D. Zheng, K. Simpson, Comorbid depression is associated with in- selection bias using routine health insurance data, J. Clin. Epidemiol. (2012),
creased health care use and expenditures in individuals with diabetes, Diabetes https://doi.org/10.1016/j.jclinepi.2012.05.002.
Care 25 (3) (2002) 464–470. [37] WHO Collaborating Centre for Drug Statistics Methodology, Structure and prin-
[11] H. Gardarsdottir, R. Heerdink, L. van Dijk, A. Egberts, Indications for antidepres- ciples, https://www.whocc.no/atc/structure_and_principles/, 2017.
sant drug prescribing in general practice in the Netherlands, J. Affect. Disord. [38] M. Yang, M. Kiang, W. Shang, Filtering big data from social media – building an
(2007). early warning system for adverse drug reactions, J. Biomed. Inform. 54 (2015)
[12] J.S. Gonzalez, Depression, in: A. Peters, L. Laffel (Eds.), Type 1 Diabetes Source- 230–240, https://doi.org/10.1016/j.jbi.2015.01.011.
book, American Diabetes Association, Alexandria, VA, 2013, pp. 169–179.