Beruflich Dokumente
Kultur Dokumente
P. Papotti
Universita’ Roma Tre
Mappings
m1: “for every proj tuple there must be dept and project tuples such that …“
m2: “for every emp of a proj tuple there must be: dept, emp, worksOn, project … “
If we also had dependents under employees, then:
“for every dependent of an emp of a proj … “
and so on …
There is a lot of common mapping behavior that is repeated
E.g., m2 repeats the mapping behavior of m1 (although for a “subconcept”)
dept:
dept Set [ We would like to reuse (in m2) the
m2 dname
m1 budget “dept” and “project” tuples that the
proj:
proj Set [ emps:
emps Set [
dname ename simpler mapping m1 asserts.
pname salary
emps:
emps Set [ worksOn:
worksOn Set [
Make m2 assert only the “extra”
ename information
salary pid
] ] Also accumulate the corresponding
] ]
projects:
projects Set [ employees into one set
pid
pname
] ] Idea: Correlate the mapping formulas
based on their common part
Need re-grouping (over entire data) Single pass over the data
Generate duplicates No duplicates
1020 KB
Basic query execution time /
Nested query execution time
514 KB
1000
312 KB
100
Execution time for basic:
22 minutes
Execution time for nested:
10 1.1 seconds
1
1 2 3 4
Ne sti n g Le ve l
100,0
2111 KB
514 KB
Basic query output size /
Nested query output size
312 KB
1,0
1 2 3 4
Ne sting Le ve l
The nested mapping results in much more efficient execution with less redundant data
There are already commercial tools that use similar paradigms (e.g., IBM
Ascential DataStage TX) but most of the mapping generation work is manual.