Datasets 2015

An Extensible Conceptual Model for Tabular Scientific
Datasets
Javad Chamanara, Michael Owonibi, Alsayed Algergawy, Roman Gerlach
Friedrich Schiller University of Jena
Germany
Email : firstname.lastname@uni-jena.de
DATASETS Symposium, June 2015
funded by :
Research Data Management (RDM)

Increasingly becoming more important
because of
Researchers increased awareness of benefits of
data management
Data proliferation
Funding agency requirements
Proliferation of RDM systems
Primary data and metadata management
Research Data Management (RDM) Systems

Examples BE BExIS, Pangaea, Dryad . . .
Data heterogeneity challenge
Data model heterogeneity

Structural heterogeneity
Syntactic heterogeneity
Semantic heterogeneity

Focus on making
Data discoverable by humans
Data downloadable as data files
Storage mechanism
Metadata + Primary data storage/archiving as
files
Metadata + Data schema definition (in
metadata) + Primary data storage as files / dbms

Typical requirements
Heterogenous data support
Data discovery beyond metadata
Which datasets have temperature higher than 35 C?
Data harmonization/integration
Flexible access pattern
Provenance management
Flexible security and access management
Machine and human interpretability of data
Semantic enablement
. . . . etc
Current data management practices in many RDM

systems can not support all the requirements
5
Aim
Datasets predominantly tabular
Therefore, in order to effectively manage tabular data in a data repository, there is
a need to model the composition of tabular datasets such that it
satisfies the manifold data management outlined requirements
Application in biodiversity research domain

Examples of data
Applicability in other domains
Related Work : W3C Tabular Data Model

Simple table
set of rows where each row contains information about an object
Annotation table
simple table + additional metadata
Group of tables
Related Work : INSPIRE Observation and

Measurementv(O&M) Model
Representation of records of scientific measurement
Observation as an event whose result is an
estimation of the value

of some property(ies)
of a feature-of-interest
obtained using a specified procedure(the instrument, algorithm or process
used)
at a specific time
under some conditions (event specific parameters e.g. instrument settings)
Related Work : Statistical Data and Metadata

Exchange (SDMX) Model
Standards for describing statistical data and metadata
Data structure definition as a set of columns
Columns
Function - dimension, measure or attribute

Roles - identity, time format, frequency
Based on a pre-defined concept
Other properties - , data type, domain
Earlier Work
High level concepts of research data repository conceptual model
cmp Component Model
Metadata Structure
Semantics
Data Structure
Geo
use
use
Metadata
Data
use
Administration
use
J. Chamanara and B. Konig-Ries, A conceptual model for data management in the field of ecology, Ecological Informatics, vol. 24, 2014
10
Core Model : Dataset

Dataset
Set of tuples
Data container for observations,
measurements, simulations, and other
supported forms of data
has one Data Structure
11
Core Model : Data Structure & Data

Descriptor
Data Structure
defines the organization & meaning of the

data
comprises several Data Descriptors
Data Descriptor
Contains information such as the name,
data type, unit of measurement, procedure
of obtaining data, methodology, scale, etc.
of the columns of datasets.
Variable or Parameter (variables auxiliary)
Semantic annotation capability
12
Core Model - Data Descriptor Reusability

Factor out reusable elements of variable
& parameter
Reuse different data structures used in
different datasets
Automatic unit conversion
functionality
Data Descriptor
Temperature
Benefits
Cross dataset query
Easier data integration
Plot Temp(C)
Enhanced data discovery
Depth
T(F)
Time
1/12/98
22
95
25
10
21
103
2/12/98
Dataset 2
Data Structure
Dataset 1
13
Core Model - Data Cell

Data Tuple as a collection of Data Cell
containing some values
Linked to Data Descriptor

Single vs Multiple Value Cell
Data Cell Auxilliary Infomation
Sampling time
Result time
Descriptions about the values
14
Core Model : Sample Table

Variable
Observation (Tuple)
Variable
Parameter
S.N.
Tmp
Time
Depth
Pos.
Hu.
14
22, 22
1/1/12
-10
46
13
23.22
1/1/12
-10
45
16
21, 24
1/1/12
-11
30
16
18, 18
2/1/12
-10
25
18
14, 15
2/1/12
-9
25
Multiple Value Cell
Data Structure
Single Value Cell
15
Model Extensions - Amendment

(Special) data cells
Attached to specific tuples
Example usage
capturing exceptional observations
Different tuples with different Amendment

Observation (Tuple)
Data Structure
Soil_Moi.
12
10
12
15
17
Depth
-10
-10
-11
-10
-9
Pos.
A
B
C
A
D
Hu
.
46
45
30
25
25
Soil_N.
14
13
16
16
18
Tm
p
22
23
21
18
14
Time
A1
A2
78
1
2
3
5
6
A3
A4
Yes
No
100
0.11
Amendments
red
16
Model Extensions - Extended Property

User defined, dataset specific attribute
whose value applies to a single column
Sample usage
Storing the error margin of the instrument
used to measure the values in a variable
Extended Properties
Error
Observation (Tuple)
0.10%
Rounded
Yes
Interval
Data Structure
Soil_Moi.
Depth
Pos.
Hu. Soil_N.
Tmp
Time
12
-10
46
14
22
10
-10
45
13
23
12
-11
30
16
21
15
-10
25
16
18
17
-9
25
18
14
1 Sec.
17
Model Extensions - View

Subset of a table obtained by selection or
projection
Purpose
Further processing, sharing or sampling
Security /Digital rights management
Soil_N. Tmp
Time
Soil_Moi Depth
Pos
Hu.
16
14
13
16
2/11/01
3/11/01
4/11/01
5/11/01
15
12
10
12
A
A
B
C
25
46
45
30
18
22
23
21
Source Dataset
-10
-10
-10
-11
Tmp
Time
Soil_Moi
18
22
23
2/11/01
3/11/01
4/11/01
15
12
10
View
18
Model Extensions -Spanning View

View across multiple dataset using the
same Data Structure
Data Structure
Sample Data Structure
Soil_N.
16
14
13
16
Tmp
18
22
23
21
Time
2/11/01
3/11/01
4/11/01
5/11/01
Source Dataset 1
Soil_N.
26
14
13
Tmp
33
28
29
Soil_Moi
15
12
10
12
Time
1/1/11
1/2/11
1/3/11
Source Dataset 2
Depth
-10
-10
-10
-11
Soil_Moi
30
23
28
Pos
A
A
B
C
Hu.
25
46
45
30
Depth
-10
-10
-10
Pos.
B
C
D
Tmp
18
22
23
33
28
29
Hu.
15
32
21
Time
2/11/01
3/11/01
4/11/01
1/1/11
1/2/11
1/3/11
Soil_Moi
15
12
10
30
23
28
Spanning View (based on Source Dataset1 & Source Dataset 2)
19
Model Extensions - Dateset Version

Permanent, change-resistant, citeable
copy of a dataset
Independent of subsequent changes
Composed of Data Tuples
Dataset can have multiple Dataset
Versions.
Based on
Checkout /Checkin mechanism
Version difference computation and
storage
20
Conclusion
Tabular data model presented
can be used to enforce the structure and type of information to be collected
as well as a base for data validation
Model assists scientists in

Datasets discovery, integration, quality management, provenance, citations,
interpretability
Used in BExIS 2 software

Projects using the RDM application include AquaDiva (https://aquadivapub1.inf-bb.uni-jena.de/), iDiv (http://idata.idiv.de/about-bdu )
About 5 more projects planning to migrate to /start using BExIS 2
21
Thanks For Your Attention
Any Questions?
22

Datasets 2015

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Datasets 2015

Hochgeladen von

Copyright:

Verfügbare Formate

An Extensible Conceptual Model for Tabular Scientific

Research Data Management (RDM)

Research Data Management (RDM) Systems

Data model heterogeneity

Research Data Management (RDM) Systems

Research Data Management (RDM) Systems

Current data management practices in many RDM

Application in biodiversity research domain

Related Work : W3C Tabular Data Model

Related Work : INSPIRE Observation and

estimation of the value

Related Work : Statistical Data and Metadata

Function - dimension, measure or attribute

Core Model : Dataset

Core Model : Data Structure & Data

defines the organization & meaning of the

Core Model - Data Descriptor Reusability

Core Model - Data Cell

Linked to Data Descriptor

Core Model : Sample Table

Multiple Value Cell

Single Value Cell

Model Extensions - Amendment

Different tuples with different Amendment

Model Extensions - Extended Property

Model Extensions - View

Model Extensions -Spanning View

Spanning View (based on Source Dataset1 & Source Dataset 2)

Model Extensions - Dateset Version

Model assists scientists in

Used in BExIS 2 software

Thanks For Your Attention

Das könnte Ihnen auch gefallen