Sie sind auf Seite 1von 1

Methods to Identify Patient Clusters and Build Precision Analytics for Diagnosis, Prognosis, and Treatment

Nicole Meister1, Hannah Cowley1, Corban Rivera1, Karla M Gray-Roncal1, Kathryn Fitzgerald2, Claudia Allshouse3, Anna Duerr3, Aalok Shah3, Paul Nagy3, Peter A Calabresi2, Antony
Rosen3, Ellen M Mowry3 and William R Gray-Roncal1
1Johns Hopkins University Applied Physics Laboratory, Laurel, MD; 2Johns Hopkins University School of Medicine, Baltimore, MD; 3Johns Hopkins University, Baltimore, MD

• Precision medicine promises great advances in the treatment of MS • We provide an initial toolbox to deploy data science methods across various patient data,
through leveraging data science techniques for rapid diagnosis and providing access to standardized models, novel features, and visualization (Fig. 3)
targeted treatment • Clustering based on clinical expertise and machine learning
• Disparate datasets lead to challenges in creating large datasets (Fig. methods helps researchers find sub-cohorts that may be
1) and a lack of a standardized data science framework makes it used in prediction and treatment (Fig. 4)
challenging to extend or explore various approaches • Our packages allow for quickly switching between research
questions and models; we demonstrate these tools to predict
25-foot walk time scores and Patient-Determined Disease
OBJECTIVE Steps (PDDS) (Fig. 5)
A. Histogram of PDDS score absolute errors (n=1653)

• Develop a proof-of-concept toolkit for data


Cohort percentage
fusion, quality assessment, sub-cohort Data Results
identification, and predictive analytics
Figure 3: Pairwise relationships between selected variables illustrate the
• Lower the barriers to adoption and problem complexity and opportunity when exploring features and patient
facilitate data science MRI Results information at scale
A. Clinician-selected clusters B. Data-driven clusters in PCA space
Figure 1: Creating enriched datasets PDDS error
allows for more accurate prediction of B.
METHODOLOGY MS outcomes Histogram of 25-ft walk time absolute errors (n=1653)

• Our tools are organized into easy-to-use python packages to be

Cohort percentage
deployed in a Jupyter notebook environment
• We support both data-driven exploration and clinician-guided
discovery to enable flexible and iterative experimentation (Fig. 2)

Figure 4. K-Means clustering to discover sub-cohorts using clinician-defined

Time (seconds)
and data-driven features. A) Visualization of 3 sub-cohorts identified using the
clinician-selected features of cognitive function, years of education, and fatigue Figure 5. Using our tools and the JHU
scores. B) Visualization of 5 sub-cohorts discovered using data-driven clustering. MS PATHS cohort, researchers can
model and visualize their results for two
DISCUSSION complementary research questions. (A)
Logistic regression modeling of PDDS
• We present an initial suite of tools to support cohort score (absolute error) (B) Linear
discovery and predictive analytics for precision medicine regression model predicting 25-ft walk time
• Using Generalized linear models provides accurate (absolute error)

predictions of 25-ft walk time and PDDS score as an engineering example

• Future work will package and extend these tools to a larger clinical research community to
Figure 2: Two synergistic approaches capture and extend clinical insights through hypothesis and facilitate extensible and reproducible analysis
data-driven approaches
Acknowledgments: This work was supported by internal research, Biogen, JH Precision Medicine, MS Center of Excellence and NIH R01NS082347.