Beruflich Dokumente
Kultur Dokumente
In collaboration with Jun Zhang (Ph.D., 2005), Jie Bao (Ph.D., 2007)
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Outline
• Background and motivation
• Learning from data revisited
• Learning predictive models from distributed data
• Learning predictive models from semantically
heterogeneous data
• Learning predictive models from partially specified data
• Current Status and Summary of Results
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
57-165
NES NLS
Terribilini, M., Lee. J-H., Yan, C., Carpenter, S., Jernigan, R., Honavar, V. and Dobbs, D. (2006)
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Background
Data revolution
Bioinformatics
– Over 200 data repositories of interest to molecular biologists alone
(Discala, 2000)
Environmental Informatics
Enterprise Informatics
Medical Informatics
Social Informatics ...
Information processing revolution: Algorithms as theories
– Computation: Biology::Calculus:Physics
Connectivity revolution (Internet and the web)
Integration revolution
– Need to understand the elephant as opposed to examining the trunk,
the tail, etc.
Needed – infrastructure to support collaborative, integrative analysis of data
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Outline
• Background and motivation
• Learning from data revisited
• Learning predictive models from distributed data
• Learning predictive models from semantically
heterogeneous data
• Learning predictive models from partially specified data
• Current Status and Summary of Results
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Assumptions
Data L h
Knowledge
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Data Learner
Labeled Examples Classifier
Classification
Outlook Ov
ny erca
{1, 2} Sun st
No Temp.
{3, 4} Entropy
|Di| |Di|
H ( D) = - ∑ ⋅ log 2
t
Co
Ho
ld
{4} {3}
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
{1, 2, 3, 4}
Outlook Ov
S unny erca
st
{1, 2}
No Temp.
{3, 4} Entropy
|Di| |Di|
H ( D) = - ∑ ⋅ log 2
t
Co
Ho
ld
{4} {3}
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Outlook Ov
ny erca
{1, 2} Sun st
No Temp.
{3, 4} Entropy
|Di| |Di|
H ( D) = - ∑ ⋅ log 2
t
Co
Ho
ld
i∈Classes |D| |D|
No Yes
{4} {3}
Count
s(
Count Wind, Clas
s(Clas s
Yes Wind s|Outl |Outlook),
ook)
Counts
Strong Weak
Data
Yes Data
No
ok)
unts(Class|Outlo
y, C la s s |Outlook), Co
ts(Humid it
Coun
Humidity Counts
High Normal
No Yes
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
D s(hi hi+1, D)
D s(D)
L L
θ’∈Θ θ’∈Θ
h∈ H h∈ H
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Hypothesis Construction s(hi > hi+1, D)
hi+1C(hi , s (hi > hi+1, D)) Data D
Statistical Query
Generation Query s(hi > hi+1, D)
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Outline
• Background and motivation
• Learning from data revisited
• Learning predictive models from distributed data
• Learning predictive models from semantically
heterogeneous data
• Learning predictive models from partially specified data
• Current Status and Summary of Results
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
q1 D1
S (D, hi>hi+1)
Query
Decomposition
Learner q2 D2
Answer
Query S (D, hi>hi+1)
Composition q3 D3
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Outline
• Background and motivation
• Learning from data revisited
• Learning classifiers from distributed data
• Learning classifiers from semantically heterogeneous data
• Learning Classifier from partially specified data
• Current Status and Summary of Results
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
HU(is-a)
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
H1(is-a) HU(is-a)
Mappings between O1 .. ON and O
Ontology M(O, O1..ON )
O
q1
D1, O1
SO (hi>hi+1 ,D) Query
Learner Decomposition D2, O2
q2
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
OU
H1(is-a) HU(is-a)
Outline
• Background and motivation
• Learning from data revisited
• Learning predictive models from distributed data
• Learning predictive models from semantically
heterogeneous data
• Learning predictive models from partially specified data
• Current Status and Summary of Results
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
On-Campus Off-Campus
Undergraduate Graduate h(Γ1)
TA RA AA
Senior Ph.D
Freshman Government Private
Master
Junior
Sophomore Federal Local Org
State Com
h(Γk)
[Zhang and Honavar, 2003; 2004; Zhang et al., 2006; Caragea et al., 2006]
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Outline
• Background and motivation
• Learning from data revisited
• Learning predictive models from distributed data
• Learning predictive models from semantically
heterogeneous data
• Learning predictive models from partially specified data
• Current Status and Summary of Results
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Summary
Algorithms learning classifiers from distributed data with
provable performance guarantees relative to their
centralized or batch counterparts
Tools for making data sources self-describing
Tools for specifying semantic correspondences between
data sources
Tools for answering statistical queries from semantically
heterogeneous data
Tools for collaborative construction of ontologies and
mappings, distributed reasoning..
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Current Directions
Further development of the open source tools for collaborative
construction of predictive models from data
Resource bounded approximations of statistical queries under
different access constraints and statistical assumptions
Algorithms for learning predictive models from semantically
disparate alternately structured data
Further investigation of OEDS – Description logics, RDF..
Relation to modular ontologies and knowledge importing
Distributed reasoning, privacy-preserving reasoning…
Applications in bioinformatics, medical informatics, materials
informatics, social informatics
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)
Iowa State University Department of Computer Science
Center for Computational Intelligence, Learning, and Discovery
Acknowledgements
• Students
– Doina Caragea, Ph.D., 2004
– Jun Zhang, Ph.D., 2005
– Jie Bao, Ph.D., 2007
– Cornelia Caragea, Ph.D., in progress
– Oksana Yakhnenko, Ph.D., in progress
• Collaborators
– Giora Slutzki
– George Voutsadakis
Research supported in part by grants from the National Science Foundation (IIS 0219699, 0711356)