Beruflich Dokumente
Kultur Dokumente
Datasets
Javad Chamanara, Michael Owonibi, Alsayed Algergawy, Roman Gerlach
Friedrich Schiller University of Jena
Germany
Email : firstname.lastname@uni-jena.de
DATASETS Symposium, June 2015
funded by :
Storage mechanism
Metadata + Primary data storage/archiving as
files
Metadata + Data schema definition (in
metadata) + Primary data storage as files / dbms
Data harmonization/integration
Flexible access pattern
Provenance management
Flexible security and access management
Machine and human interpretability of data
Semantic enablement
. . . . etc
Aim
Datasets predominantly tabular
Therefore, in order to effectively manage tabular data in a data repository, there is
a need to model the composition of tabular datasets such that it
satisfies the manifold data management outlined requirements
Annotation table
simple table + additional metadata
Group of tables
Earlier Work
High level concepts of research data repository conceptual model
cmp Component Model
Metadata Structure
Semantics
Data Structure
Geo
use
use
Metadata
Data
use
Administration
use
J. Chamanara and B. Konig-Ries, A conceptual model for data management in the field of ecology, Ecological Informatics, vol. 24, 2014
10
11
Data Structure
Data Descriptor
Contains information such as the name,
data type, unit of measurement, procedure
of obtaining data, methodology, scale, etc.
of the columns of datasets.
Variable or Parameter (variables auxiliary)
Semantic annotation capability
12
Data Descriptor
Temperature
Benefits
Cross dataset query
Easier data integration
Plot Temp(C)
Enhanced data discovery
Depth
T(F)
Time
1/12/98
22
95
25
10
21
103
2/12/98
Dataset 2
Data Structure
Dataset 1
13
14
Observation (Tuple)
Variable
Parameter
S.N.
Tmp
Time
Depth
Pos.
Hu.
14
22, 22
1/1/12
-10
46
13
23.22
1/1/12
-10
45
16
21, 24
1/1/12
-11
30
16
18, 18
2/1/12
-10
25
18
14, 15
2/1/12
-9
25
Data Structure
15
Depth
-10
-10
-11
-10
-9
Pos.
A
B
C
A
D
Hu
.
46
45
30
25
25
Soil_N.
14
13
16
16
18
Tm
p
22
23
21
18
14
Time
A1
A2
78
1
2
3
5
6
A3
A4
Yes
No
100
0.11
Amendments
red
16
Observation (Tuple)
0.10%
Rounded
Yes
Interval
Data Structure
Soil_Moi.
Depth
Pos.
Hu. Soil_N.
Tmp
Time
12
-10
46
14
22
10
-10
45
13
23
12
-11
30
16
21
15
-10
25
16
18
17
-9
25
18
14
1 Sec.
17
Soil_N. Tmp
Time
Soil_Moi Depth
Pos
Hu.
16
14
13
16
2/11/01
3/11/01
4/11/01
5/11/01
15
12
10
12
A
A
B
C
25
46
45
30
18
22
23
21
Source Dataset
-10
-10
-10
-11
Tmp
Time
Soil_Moi
18
22
23
2/11/01
3/11/01
4/11/01
15
12
10
View
18
Tmp
18
22
23
21
Time
2/11/01
3/11/01
4/11/01
5/11/01
Source Dataset 1
Soil_N.
26
14
13
Tmp
33
28
29
Soil_Moi
15
12
10
12
Time
1/1/11
1/2/11
1/3/11
Source Dataset 2
Depth
-10
-10
-10
-11
Soil_Moi
30
23
28
Pos
A
A
B
C
Hu.
25
46
45
30
Depth
-10
-10
-10
Pos.
B
C
D
Tmp
18
22
23
33
28
29
Hu.
15
32
21
Time
2/11/01
3/11/01
4/11/01
1/1/11
1/2/11
1/3/11
Soil_Moi
15
12
10
30
23
28
19
Based on
Checkout /Checkin mechanism
Version difference computation and
storage
20
Conclusion
Tabular data model presented
can be used to enforce the structure and type of information to be collected
as well as a base for data validation
21
Any Questions?
22