Sie sind auf Seite 1von 25

Pilot study for ontology-based

analysis of INPC data:


Final report
T. Schleyer, A. Ruttenberg, B. Duncan,
F. Smith, A. Roberts
Original project goals
select several representative research
questions that use INPC data
model the data needed for these questions in
an ontology
replicate data retrieval/analysis using SPARQL
and R
compare understandability, documentation,
query complexity, workflow and extensibility
INPC data analysis current workflow
Receive request for data
Search for data
Return results to requestor
- Often a list of terms/criteria or a brief written description
- Sometimes a spreadsheet of codes (ICD9, CPT, etc) to search for
- Find any codes needed (e.g. look up medications by name or class)
- Map between coding systems (e.g. ICD9 to Regenstrief dictionary)
- Have requestor review codes
- Perform search across numerous tables, some of which duplicate information
- Iterative process refine and re-run query
Challenges with current process
data managers = Greek Oracle
relational database a technical/idiosyncratic
construct (e.g. naming constraints,
normalization, performance)
meaningful, real-time interaction about data
difficult
little to no opportunities to leverage external
data representation resources
hard to detect problems

Relational databases and hidden meaning
sys_id is coding system, such as ICD9,
local codes, SNOMED, LOINC.
code is the actual code, like ICD 920.1.
service_code is the question that the record is the answer to.
top_parent_service_code is the code of the parent question.
value_type indicates what the type is for the data (coded, numeric, etc.).
Sample query: Vending Machine:10 Breast cancer

WHERE
SERVICE_SYS_ID=1 and ((SERVICE_CODE in
('189' /*DX and COMPLAINTS*/, '4569'/*E.R. DIAGNOSIS*/, '4966'/*HOSP
DX*/, '7076'/*DX LISTS*/,
'7686'/*HOSP HX*/, '7909'/*DISCH DX*/, '9950'/*REHAB DX*/,
'9951'/*ORTHO DX*/, '9952'/*SURG DX*/, '9953'/*ENT DX*/, '9954'/*EYE DX*/,
'9955'/*DERM DX*/, '9956'/*NEURO DX*/, '7909'/*Disch Dx*/,
'14360'/*OB Discharge Diagnosis*/,
'36129'/*Axis IV Discharge Dx*/, '3871'/*Initial Dx*/,
'16501'/*Discharge Dx/Prob*/, '19825'/*Ekg.Cart.Dx*/, '21827'/*DIDS
DX*/,
'21669'/*ANDROLOGY DIAGNOSIS */, '22813'/*VISIT DIAGNOSIS*/,
'21237'/*Primary Care Dx*/,
'19788'/*Preoperative Diagnosis*/, '37081'/*OB Triage Admission
Diagnoses */,
'37086'/*OB Triage Discharge Diagnoses*/)
and (upper(VALUE_TEXT_FOR_DISPLAY)='BREAST CA') )
What are ontologies?
represent domains through classes and their
relationships
Each class in an ontology has a defined and
unique meaning.
Properties are semantic relationships among
classes, e.g.:
simple: "Patient has: gender, age
complex: is_a, is_treated_by, etc.

Example: Oral Health and Disease Ontology

(http://code.google.com/p/ohd-ontology/, http://www.ontobee.org/browser/index.php?o=OHD)
OHD - Caries finding

OHD tooth restoration procedure

OHD tooth

Reusing other ontologies

Finding breast cancer drugs
First we find cancer patients by querying for patients that:
have a cancer diagnosis ICD9 code
have a concept code in clinical variable that identifies a cancer
diagnosis
Found a total of ~1500 patients for the 1 year of records we
have.
We search the pharmacy_order table for prescriptions to
cancer patients:
About 39,000 total 26,000 have NDC codes, 13,000 dont!
The 13,000 prescriptions comprise ~400 prescription types
Examples include: MORPHINE SUL TAB 30MG ER, NAMENDA
TAB 5MG, NITROFURANTN CAP 100MG
Note that queries done at Regenstrief typically will miss 1/3 of
the prescriptions.

Components
Cancer patients
Prescriptions for them
Diagnoses of them
ICD9 Hierarchy
NDF-RT OWL translation
Mapping of NDC to RxNorm
Mapping of RxNORM to NDF-RT
Representation choices
Codes are information artifacts, about whom or
what they are coded.
Patients are actual patients.
NDF-RT are actual drugs.
Prescriptions are directive information entities.
OBO Ontologies: OBI, IAO, OGMS, OMRSE
Other ontologies/documents: NDF-RT, ICD9
Web services: RxNorm API
Store: OWLIM SE, Hoerst
Key Leverage
Use of NDF-RT hierarchies and relations
Ingredients
Physiological effects
Therapeutic classes
Cause May treat, Mechanism of Action
Use of ICD9, limited as it is
Leverage classification to be able to compute
malignant neoplasm = neoplasms benign
neoplasms
Transparency of data artifacts
Data team has learned about structure in process.

RxNorm to NDF-RT
Restricted to cancer patients in 1 year
Find all prescription NDC codes
Use internal concept mapping to get 1037
RxNorm codes
Use NDF-RT to get 47488 NDF-RT<->RxNorm
mappings using SPARQL against OWL NDF-RT


prefix rxcui: <http://purl.obolibrary.org/n/NDFRT_C818>
SELECT ?class ?rxnorm
WHERE {
?class rcxcui: ?rxnorm .
}




RxNorm to NDF-RT
328 RxNorms not in NDF-RT derived map
Use the RxNorm WEB API to find:
more general term
or, remapped term
more general term of remapped term
remapped, remapped term
more general term of remapped, remapped term
and add mapping if found
Leaving: 21 unmapped terms

RxNorm to NDF-RT mapping

1037 tried

1016 successful
9 have RxNorm codes that cant be resolved
207982,309937,311945,314058,314265,404282,562715,845521,966533
12 were not mapped
0.5 ML Influenza A virus vaccine, A-California-7-2009 (H1N1)-like virus 0.12 MG/ML / Influenza A virus vaccine, A-Victoria-361-2011
(H3N2)-like virus 0.12 MG/ML / Influenza B virus vaccine, B-Wisconsin-1-2010-like virus 0.12 MG/ML Prefilled Syringe [Fluzone
High-Dose 2012-2013 Formula]
Coal Tar 200 MG/ML Topical Solution
Influenza A virus vaccine, A-California-7-2009 (H1N1)-like virus 0.03 MG/ML / Influenza A virus vaccine, A-Victoria-361-2011
(H3N2)-like virus 0.03 MG/ML / Influenza B virus vaccine, B-Wisconsin-1-2010-like virus 0.03 MG/ML Injectable Suspension [Fluzone
2012-2013 Formula]
Isopropyl Alcohol 0.7 ML/ML Medicated Pad [BD Alcohol]
Isopropyl Alcohol 0.7 ML/ML Medicated Pad
POLYETHYLENE GLYCOL 3350 105 MG/ML / Potassium Chloride 0.00497 MEQ/ML / Sodium Bicarbonate 0.017 MEQ/ML / Sodium
Chloride 0.0479 MEQ/ML Oral Solution [NuLytely]
POLYETHYLENE GLYCOL 3350 105 MG/ML / Potassium Chloride 0.00497 MEQ/ML / Sodium Bicarbonate 0.017 MEQ/ML / Sodium
Chloride 0.0479 MEQ/ML Oral Solution [TriLyte]
POLYETHYLENE GLYCOL 3350 59 MG/ML / Potassium Chloride 0.01 MEQ/ML / Sodium Bicarbonate 0.02 MEQ/ML / Sodium Chloride
0.025 MEQ/ML / sodium sulfate 0.04 MEQ/ML Oral Solution [Gaviltye-G]
POLYETHYLENE GLYCOL 3350 59 MG/ML / Potassium Chloride 0.01 MEQ/ML / Sodium Bicarbonate 0.02 MEQ/ML / Sodium Chloride
0.025 MEQ/ML / sodium sulfate 0.04 MEQ/ML Oral Solution [Golytely]
POLYETHYLENE GLYCOL 3350 59 MG/ML / Potassium Chloride 0.01 MEQ/ML / Sodium Bicarbonate 0.02 MEQ/ML / Sodium Chloride
0.025 MEQ/ML / sodium sulfate 0.04 MEQ/ML Oral Solution
Prednisone 10 MG Oral Tablet
hydrocortisone acetate 10 MG/ML / Pramoxine hydrochloride 10 MG/ML Topical Foam [Epifoam]
Lessons learned
discovery of data quality issues, such as missing
results and data irregularities
maintaining classes easier than maintaining
queries and sets
leveraging other people's work reduces your own
transparency of data discovery/query refinement
process
inherent documentation in ontologies (as
opposed to information in Faye's head)
Thank you for your attention.

Questions?

Das könnte Ihnen auch gefallen