Beruflich Dokumente
Kultur Dokumente
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
-2-
Distributed data mining on the Grid
It also offers
security, information service, data access and
management, communication, scheduling, fault
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
detection, … -3-
Distributed data mining on the Grid
Weka4WS
Knowledge Grid
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
-9-
Knowledge Grid
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
- 10 -
The Knowledge Grid
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
- 11 -
Knowledge Grid architecture
High-level K-Grid layer
Presentation
Access Algorithms Plan
Service
Service Access Management
Service Service
KDS RAEMS
KBR
Resource
Knowledge
KMR Allocation and
Directory
Execution
Service
Management KEPR
Service
The operations
provided by the Core
K-Grid services are
invoked both by High-
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
- 15 -
Knowledge Grid: Service operations
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
- 16 -
Knowledge Grid: High-level
application design start
schema: “/../DMService.wsdl”
operation: “clusterize”
argument: “EM”
data mining data mining data mining data mining argument: “-I 100 -N -1 -S 90”
...
schema: “/../DMService.wsdl”
operation: “classify”
argument: “J48”
argument: “-C 0.25 -M 2” data mining data mining data mining data mining
...
voting
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
endTechnologies
Peer-to-Peer
- 17 -
Weka4WS
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
- 18 -
The Weka4WS framework
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
- 22 -
Software components
Graphical Web
User Interface Service
local
task Weka Client Weka
Library Module Library
remote
task
GT4 Services GT4 Services
Explorer KnowledgeFlow
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
- 25 -
Weka4WS Explorer
GRID
Please run one service
for me
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
- 34 -
Grid Services for Mobile Data Mining
components:
provider provider
Data providers.
Mobile clients.
Mobile client
Mobile client Mobile client Mobile client
Mobile client
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
- 35 -
The Mining server
A Mining server implements two Grid Services:
Data Collection Service (DCS): invoked by a data
provider to store data in the data store.
Data Mining Service (DMS): invoked by a mobile
client to ask for the execution of a data mining task.
Mining server
Data collection
requests Store / update
DCS ops Data
data
Collection Data store
Service
Data provider
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
- 36 -
Grid Services for Mobile Data Mining
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
- 37 -
Impact of the WSRF overhead
Execution times
1.03E+06
9.24E+05
8.30E+05
7.26E+05
6.25E+05
1.0E+07
5.19E+05
4.16E+05
3.11E+05
data mining
2.12E+05
1.12E+05
1.0E+06 Resource creation
Execution time (ms)
Notification subscription
Task submission
1.0E+05
Dataset download
Data mining
1.0E+04
Results notification
Resource destruction
1.0E+03 Total
1.18E+06
9.64E+05
1.06E+06
8.49E+05
7.25E+05
6.16E+05
4.86E+05
3.61E+05
1.0E+07
2.46E+05
data mining
1.30E+05
1.0E+06 Resource creation
Execution time (ms)
Notification subscription
Task submission
1.0E+05
Dataset download
dataset download Data mining
1.0E+04
Results notification
Resource destruction
1.0E+03 Total
1.0E+02
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Dataset size (MB)
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
- 40 -
Final remarks
Thank you!
Credits:
Mario Cannataro
Eugenio Cesario
Antonio Congiusta
Marco Lackovic
Andrea Pugliese
Oreste Verta
European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and
Peer-to-Peer Technologies
- 42 -
Weka4WS KnowledgeFlow