Reading HDF5
Jeffrey Leek
Johns Hopkins Bloomberg School of Public HealthHDF5
+ Used for storing large data sets
+ Supports storing a range of data types
+ Heirarchical data format
+ groups containing zero or more data sets and metadata
- Have a group header with group name and list of attributes
- Have a group symbol table with a list of objects in group
datasets multidmensional array of data elements with metadata
- Have a header with name, datatype, dataspace, and storage layout
- Have a data array with the data
http:/www.hdfgroup.org/
29R HDF5 package
source( "http: //bieconductor .org/biocLite.R")
bioctite("rhdt5")
Library(rhd5)
created = hScreateFile("exanple.h5")
created
[1] TRUE
+ This will install packages from Bioconductor http://bioconductor.org/, primarily used for genomics
but also has good "big data” packages
Can be used to interface with hdf5 data sets.
+ This lecture is modeled very closely on the rhdfS tutorial that can be found here
http://www.bioconductor.org/packages/release/bioc/vignettes/thaf5/inst/doc/rhdiS pdt
309Create groups
created = hScreateGroup("example.h5", "foo" )
hScreateGroup("example.h5", "baa" )
created = hScreateGroup("example.h5" , "fo0/foobaa" )
h51s("example.h5")
created
group name —_otype delass dim
0 / baa HST_GROUP
1 /— £00 HSI_GROUP
2 /fo0 foobaa HST_GROUP
49Write to groups
A = matrix(1:10,nr=5,n¢
hSwrite(A, “example.h5","fo0/A")
B = array(seq(0-1,2.0,by=0.1) ,dimc(5,2,2))
attr(B, "“scale") < “liter”
hSwrite(B, "example.h5","£00/foobaa/B")
b51s("example.h5")
group name otype delass dim
o 7 baa -HST_GROUP
1 7 foo HST_GROUP
2 /f00 AHSI_DAYASET INTEGHR 5 x 2
3 /f00 foobaa _HST_GROUP
4 /£00/foobaa, BHSI_DATASET FLOAT 5 x 2 x 2
5Write a data set
a
data. frame(1L:5L,seq(0,1,length.out=5),
c("ab", "cde", "fghi","a","s"), stringsAsFactors-FALSE)
hSwrite(df, "example.hs","
h51s("example.h5")
group nane otype delass dim
° baa HST_GROUP
1 / af BST_DATASED coMpouND 5
2 foo 5T_RoUP
3 /€00 -ALBST_DRTASET INTEGER 5 x 2
4 /f00 foobaa —HST_GROUP
5 /foo/foobaa -—«B-HST_DATASET FLOAT 5 x 2 x 2
6Reading data
readA = hSread("example.h5" ,"£o0/A")
readB = hSread("example.h5" ,"foo/foobaa/B")
readdf= hSread("example.h5","d£")
reada
(1) 2)
Uy 1 6
207
Bl 30 8
4] 4 8
5) 5 10Writing and reading chunks
hSwrite(c(12,13,14) ,"example.h5" ,"f00/A" ,index-List (1:3,1))
hSread("exampleh5","f00/A")
(2) (2)
Un 2 6
2 17
1 4 8
1 48
[5 10Notes and further resources
+ hdfS can be used to optimize reading/writing from disc in R
+ The rhdf5 tutorial:
- http:/Awww.bioconductor.org/packages/release/bioc/vignettes/rhdf5/inst/doc/rhdf5.pdf
+ The HDF group has informaton on HDFS in general http://www.hdfgroup.org/HDF5/
9