Sie sind auf Seite 1von 1

Ontology-based Classification of Social Media Text Data

Sasank 1
Maganti , Karen A 2
Monsen , Svetlana Yarosh 1
1Department of Computer Science and Engineering, University of Minnesota, Twin Cities 2School of Nursing, University of Minnesota, Twin Cities

INTRODUCTION
Pre-
process
7000000
6000000
Step 1: 5000000

The goal is to classify CaringBridge social Search for 54 4000000


3000000

media journal entries into various problem terms/stems 2000000


1000000 Comined Stems

Step 2: Consolidate 0 Plus Synonyms

communication/community/
concepts using Omaha System taxonomy

Income

skin
bowel

oral
caretaking/parenting

pain

nutrition
growth/development

medication

communicable/infectious
sanitation

Interpersonal/relationship

nuer/muscl/skelet
vision

speech/language
digestion/hydration
neighborhood/workplace

residence
grief

circulation

sleep/rest
urinary
abuse
social contact

sexuality

reproductive
postpartum
neglect

pregnancy

respiration
hearing
family planning

consciousness

health care

personal care

physical activity

mental health
role change

substance use

cognition

spirituality
into 42 Problem
CaringBridge is a web based social network Concepts using or
widely used for social support Step 3: Search for
terms/stems of definitions
and Signs/symptoms
Figure 7: Comparison of frequencies of all problem
Step 4: Search for related words concepts at Step 2 and step 4

Figure 3: Steps in checking feasibility of using the


problem concepts
DISCUSSION
RESULTS OF At step 3, 2.5% to 957.3% increase in
frequency of problem concept usage
FEASIBILITY STUDY At step 4, increase by problem concept
ranged from 7.4% to 381.2%
All 54 problem concepts stems were
mentioned in journals Any problem concept may be found in
11.24% of the journals
Frequencies of usage ranges from 336 to
2,685,494 Communicable/infectious condition,
Figure 1: Snapshot of a test Journal entry from a spirituality and Neuro-musculo-skeletal
CaringBridge site 4500000
4000000
function are found in more than 5.8 million
3500000

Omaha System provides standard taxonomy 3000000


2500000
2000000
journals
based terminology for nursing practice 1500000
1000000
21 out of 42 problem concepts found in one
500000

million or more journals


0
communication/community/resour

sleep/rest
caretaking/parenting

growth/development
Nuer/Muscl/skelet
communicable/infectious

interpersonal/relationship

speech/language
digestion/hydration

neighborhood/workplace
residence

skin
income
circulation

grief

bowel
urinary

oral
social contact

sexuality

reproductive

abuse
postpartum

parenting

personal care

nutrition

pain
respiration

neglect

pregnancy
family planning

mental health
physical activity

consciousness

health care

language
hearing

medication
role change

substance use

cognition
sanitation

spirituality

vision

Disease specific diagnostic terms were also


identified in the data
Some words like pray, sleep have very
Figure 4: Frequencies of problem concepts after step 2 high frequencies that resulted in the
(red line- combined terms and stems, blue line- step
1)
respective problem concepts being frequent

FUTURE WORK
4500000

4000000

3500000

3000000

2500000

2000000

1500000

1000000

500000
Problem/stem
Comined Stems Build models to classify the journal entries
Figure 2: Omaha System 0
Plus S/Sx

into various problem concepts


Feature selection to train the models

METHODS Features may include problem concept


stems, signs and symptoms and related
Figure 5: Frequencies of most frequent problems in words
13,757,900 de-identified CaringBridge journal entries after step 3
journal entries in data corpus 7000000

ACKNOWLEDGMENT
6000000

Preprocessed data with stop words, html


5000000

4000000

text removed and text lemmatized is used 3000000

2000000 Problem/stem
Comined Stems

A 4-step pass on the data to look at the


1000000

0
Plus S/Sx
Plus Synonyms
This study is funded by CaringBridge, a non-
feasibility of using the Omaha system profit organization located in Eagan,
problem concepts Minnesota
Related words are derived from consumer
versions of the Omaha system, clinical and Figure 6: Frequencies of most frequent problems in
terminology expertise and internet searches the journal entries after 4 steps

References
http://omahasystem.org/overview.html
Denecke K, Bamidis P, Bond C, et al. Ethical issues of social media usage in healthcare. Yearbook of Medical Informatics. 2015;10(1):137-147. doi:10.15265/IY-2015-001.
Bird S, Loper E, Klein E. Natural language processing with Python. OReilly Media Inc. 2009.

Das könnte Ihnen auch gefallen