Sie sind auf Seite 1von 17

Module netCDF4

Introduction
Python interface to the netCDF version 4 library. netCDF version 4 has many features not found in
earlier versions of the library and is implemented on top of HDF5. This module can read and write files
in both the new netCDF 4 and the old netCDF 3 format, and can create files that are readable by HDF5
clients. The API modelled afterScientific.IO.NetCDF, and should be familiar to users of that module.
Most new features of netCDF 4 are implemented, such as multiple unlimited dimensions, groups and
zlib data compression. All the new numeric data types (such as 64 bit and unsigned integer types) are
implemented. Compound and variable length (vlen) data types are supported, but the enum and opaque
data types are not. Mixtures of compound and vlen data types (compound types containing vlens, and
vlens containing compound types) are not supported.

Download
• Project page.
• Subversion repository.
• Source tar.gz.

Requires
• numpy array module http://numpy.scipy.org, version 1.2.1 or later.
• The HDF5 C library version 1.8.4-patch1 or higher
from ftp://ftp.hdfgroup.org/HDF5/current/src. Be sure to build with '--enable-hl --enable-
shared'.
• Libcurl, if you want OPeNDAP support.
• The netCDF-4 C library from ftp://ftp.unidata.ucar.edu/pub/netcdf. Version 4.1.1 or higher is
required. Be sure to build with '--enable-netcdf-4 --with-hdf5=$HDF5_DIR --enable-shared',
and '--enable-dap' if you want OPeNDAP support. $HDF5_DIR is the directory where HDF5
was installed.

Install
• install the requisite python modules and C libraries (see above).
• optionally, set the HDF5_DIR environment variable to point to where HDF5 is installed. (the
libs in$HDF5_DIR/lib, the headers in $HDF5_DIR/include).
• optionally, set the NETCDF4_DIR environment variable to point to where the netCDF version 4
library and headers are installed. If HDF5_DIR and NETCDF4_DIR are not set, some standard
locations will be searched.
• if HDF5 was build with szip, you may also need to set the SZIP_DIR environment variable to
point to where szip is installed. Note that netCDF 4.0 does not support szip compression, but
can read szip compressed files if the HDF5 lib is configured to support szip.
• run 'python setup.py install'
• run the tests in the 'test' directory by running python run_all.py.

Tutorial
1) Creating/Opening/Closing a netCDF file
To create a netCDF file from python, you simply call the Dataset constructor. This is also the method
used to open an existing netCDF file. If the file is open for write access (w, r+ or a), you may write any
type of data including new dimensions, groups, variables and attributes. netCDF files come in several
flavors (NETCDF3_CLASSIC, NETCDF3_64BIT, NETCDF4_CLASSIC, and NETCDF4). The first
two flavors are supported by version 3 of the netCDF library. NETCDF4_CLASSIC files use the
version 4 disk format (HDF5), but do not use any features not found in the version 3 API. They can be
read by netCDF 3 clients only if they have been relinked against the netCDF 4 library. They can also be
read by HDF5 clients. NETCDF4 files use the version 4 disk format (HDF5) and use the new features
of the version 4 API. The netCDF4 module can read and write files in any of these formats. When
creating a new file, the format may be specified using the format keyword in the Dataset constructor.
The default format is NETCDF4. To see how a given file is formatted, you can examine
the file_format Dataset attribute. Closing the netCDF file is accomplished via the close method of
the Dataset instance.
Here's an example:
>>> from netCDF4 import Dataset
>>> rootgrp = Dataset('test.nc', 'w', format='NETCDF4')
>>> print rootgrp.file_format
NETCDF4
>>>
>>> rootgrp.close()

Remote OPeNDAP-hosted datasets can be accessed for reading over http if a URL is provided to
the Dataset constructor instead of a filename. However, this requires that the netCDF library be built
with OPenDAP support, via the --enable-dap configure option (added in version 4.0.1).

2) Groups in a netCDF file


netCDF version 4 added support for organizing data in hierarchical groups, which are analagous to
directories in a filesystem. Groups serve as containers for variables, dimensions and attributes, as well
as other groups. AnetCDF4.Dataset defines creates a special group, called the 'root group', which is
similar to the root directory in a unix filesystem. To create Group instances, use
the createGroup method of a Dataset or Group instance. createGroup takes a single argument, a python
string containing the name of the new group. The new Group instances contained within the root group
can be accessed by name using the groups dictionary attribute of the Dataset instance.
Only NETCDF4formatted files support Groups, if you try to create a Group in a netCDF 3 file you will
get an error message.
>>> rootgrp = Dataset('test.nc', 'a')
>>> fcstgrp = rootgrp.createGroup('forecasts')
>>> analgrp = rootgrp.createGroup('analyses')
>>> print rootgrp.groups
{'analyses': <netCDF4._Group object at 0x24a54c30>,
'forecasts': <netCDF4._Group object at 0x24a54bd0>}
>>>

Groups can exist within groups in a Dataset, just as directories exist within directories in a unix
filesystem. Each Groupinstance has a 'groups' attribute dictionary containing all of the group instances
contained within that group. EachGroup instance also has a 'path' attribute that contains a simulated
unix directory path to that group.
Here's an example that shows how to navigate all the groups in a Dataset. The function walktree is a
Python generator that is used to walk the directory tree.
>>> fcstgrp1 = fcstgrp.createGroup('model1')
>>> fcstgrp2 = fcstgrp.createGroup('model2')
>>> def walktree(top):
>>> values = top.groups.values()
>>> yield values
>>> for value in top.groups.values():
>>> for children in walktree(value):
>>> yield children
>>> print rootgrp.path, rootgrp
>>> for children in walktree(rootgrp):
>>> for child in children:
>>> print child.path, child
/ <netCDF4.Dataset object at 0x24a54c00>
/analyses <netCDF4.Group object at 0x24a54c30>
/forecasts <netCDF4.Group object at 0x24a54bd0>
/forecasts/model2 <netCDF4.Group object at 0x24a54cc0>
/forecasts/model1 <netCDF4.Group object at 0x24a54c60>
>>>

3) Dimensions in a netCDF file


netCDF defines the sizes of all variables in terms of dimensions, so before any variables can be created
the dimensions they use must be created first. A special case, not often used in practice, is that of a
scalar variable, which has no dimensions. A dimension is created using the createDimension method of
a Dataset or Group instance. A Python string is used to set the name of the dimension, and an integer
value is used to set the size. To create an unlimited dimension (a dimension that can be appended to),
the size value is set to None or 0. In this example, there both the time and leveldimensions are
unlimited. Having more than one unlimited dimension is a new netCDF 4 feature, in netCDF 3 files
there may be only one, and it must be the first (leftmost) dimension of the variable.
>>> rootgrp.createDimension('level', None)
>>> rootgrp.createDimension('time', None)
>>> rootgrp.createDimension('lat', 73)
>>> rootgrp.createDimension('lon', 144)

All of the Dimension instances are stored in a python dictionary.


>>> print rootgrp.dimensions
{'lat': <netCDF4.Dimension object at 0x24a5f7b0>,
'time': <netCDF4.Dimension object at 0x24a5f788>,
'lon': <netCDF4.Dimension object at 0x24a5f7d8>,
'level': <netCDF4.Dimension object at 0x24a5f760>}
>>>

Calling the python len function with a Dimension instance returns the current size of that dimension.
The isunlimitedmethod of a Dimension instance can be used to determine if the dimensions is
unlimited, or appendable.
>>> for dimname, dimobj in rootgrp.dimensions.iteritems():
>>> print dimname, len(dimobj), dimobj.isunlimited()
lat 73 False
time 0 True
lon 144 False
level 0 True
>>>

Dimension names can be changed using the renameDimension method of a Dataset or Group instance.

4) Variables in a netCDF file


netCDF variables behave much like python multidimensional array objects supplied by the numpy
module. However, unlike numpy arrays, netCDF4 variables can be appended to along one or more
'unlimited' dimensions. To create a netCDF variable, use the createVariable method of
a Dataset or Group instance. The createVariable method has two mandatory arguments, the variable
name (a Python string), and the variable datatype. The variable's dimensions are given by a tuple
containing the dimension names (defined previously with createDimension). To create a scalar variable,
simply leave out the dimensions keyword. The variable primitive datatypes correspond to the dtype
attribute of a numpy array. You can specify the datatype as a numpy dtype object, or anything that can
be converted to a numpy dtype object. Valid datatype specifiers include: 'f4' (32-bit floating
point), 'f8' (64-bit floating point), 'i4' (32-bit signed integer),'i2' (16-bit signed integer), 'i8' (64-bit
singed integer), 'i1' (8-bit signed integer), 'u1' (8-bit unsigned integer),'u2' (16-bit unsigned
integer), 'u4' (32-bit unsigned integer), 'u8' (64-bit unsigned integer), or 'S1' (single-character string).
The old Numeric single-character typecodes ('f','d','h', 's','b','B','c','i','l'), corresponding to
('f4','f8','i2','i2','i1','i1','S1','i4','i4'), will also work. The unsigned integer types and the 64-bit integer
type can only be used if the file format is NETCDF4.
The dimensions themselves are usually also defined as variables, called coordinate variables.
The createVariable method returns an instance of the Variable class whose methods can be used later to
access and set variable data and attributes.
>>> times = rootgrp.createVariable('time','f8',('time',))
>>> levels = rootgrp.createVariable('level','i4',('level',))
>>> latitudes = rootgrp.createVariable('latitude','f4',('lat',))
>>> longitudes = rootgrp.createVariable('longitude','f4',('lon',))
>>> # two dimensions unlimited.
>>> temp = rootgrp.createVariable('temp','f4',('time','level','lat','lon',))

All of the variables in the Dataset or Group are stored in a Python dictionary, in the same way as the
dimensions:
>>> print rootgrp.variables
{'temp': <netCDF4.Variable object at 0x24a61068>,
'level': <netCDF4.Variable object at 0.35f0f80>,
'longitude': <netCDF4.Variable object at 0x24a61030>,
'pressure': <netCDF4.Variable object at 0x24a610a0>,
'time': <netCDF4.Variable object at 02x45f0.4.58>,
'latitude': <netCDF4.Variable object at 0.3f0fb8>}
>>>

Variable names can be changed using the renameVariable method of a Dataset instance.

5) Attributes in a netCDF file


There are two types of attributes in a netCDF file, global and variable. Global attributes provide
information about a group, or the entire dataset, as a whole. Variable attributes provide information
about one of the variables in a group. Global attributes are set by assigning values
to Dataset or Group instance variables. Variable attributes are set by assigning values
to Variable instances variables. Attributes can be strings, numbers or sequences. Returning to our
example,
>>> import time
>>> rootgrp.description = 'bogus example script'
>>> rootgrp.history = 'Created ' + time.ctime(time.time())
>>> rootgrp.source = 'netCDF4 python module tutorial'
>>> latitudes.units = 'degrees north'
>>> longitudes.units = 'degrees east'
>>> pressure.units = 'hPa'
>>> temp.units = 'K'
>>> times.units = 'hours since 0001-01-01 00:00:00.0'
>>> times.calendar = 'gregorian'

The ncattrs method of a Dataset, Group or Variable instance can be used to retrieve the names of all the
netCDF attributes. This method is provided as a convenience, since using the built-in dir Python
function will return a bunch of private methods and attributes that cannot (or should not) be modified
by the user.
>>> for name in rootgrp.ncattrs():
>>> print 'Global attr', name, '=', getattr(rootgrp,name)
Global attr description = bogus example script
Global attr history = Created Mon Nov 7 10.30:56 2005
Global attr source = netCDF4 python module tutorial

The __dict__ attribute of a Dataset, Group or Variable instance provides all the netCDF attribute
name/value pairs in a python dictionary:
>>> print rootgrp.__dict__
{'source': 'netCDF4 python module tutorial',
'description': 'bogus example script',
'history': 'Created Mon Nov 7 10.30:56 2005'}

Attributes can be deleted from a netCDF Dataset, Group or Variable using the python del statement
(i.e. del grp.fooremoves the attribute foo the the group grp).
6) Writing data to and retrieving data from a netCDF variable
Now that you have a netCDF Variable instance, how do you put data into it? You can just treat it like an
array and assign data to a slice.
>>> import numpy
>>> lats = numpy.arange(-90,91,2.5)
>>> lons = numpy.arange(-180,180,2.5)
>>> latitudes[:] = lats
>>> longitudes[:] = lons
>>> print 'latitudes =\n',latitudes[:]
latitudes =
[-90. -87.5 -85. -82.5 -80. -77.5 -75. -72.5 -70. -67.5 -65. -62.5
-60. -57.5 -55. -52.5 -50. -47.5 -45. -42.5 -40. -37.5 -35. -32.5
-30. -27.5 -25. -22.5 -20. -17.5 -15. -12.5 -10. -7.5 -5. -2.5
0. 2.5 5. 7.5 10. 12.5 15. 17.5 20. 22.5 25. 27.5
30. 32.5 35. 37.5 40. 42.5 45. 47.5 50. 52.5 55. 57.5
60. 62.5 65. 67.5 70. 72.5 75. 77.5 80. 82.5 85. 87.5
90. ]
>>>

Unlike NumPy's array objects, netCDF Variable objects with unlimited dimensions will grow along
those dimensions if you assign data outside the currently defined range of indices.
>>> # append along two unlimited dimensions by assigning to slice.
>>> nlats = len(rootgrp.dimensions['lat'])
>>> nlons = len(rootgrp.dimensions['lon'])
>>> print 'temp shape before adding data = ',temp.shape
temp shape before adding data = (0, 0, 73, 144)
>>>
>>> from numpy.random import uniform
>>> temp[0:5,0:10,:,:] = uniform(size=(5,10,nlats,nlons))
>>> print 'temp shape after adding data = ',temp.shape
temp shape after adding data = (5, 10, 73, 144)
>>>
>>> # levels have grown, but no values yet assigned.
>>> print 'levels shape after adding pressure data = ',levels.shape
levels shape after adding pressure data = (10,)
>>>

Note that the size of the levels variable grows when data is appended along the level dimension of the
variable temp, even though no data has yet been assigned to levels.
>>> # now, assign data to levels dimension variable.
>>> levels[:] = [1000.,850.,700.,500.,300.,250.,200.,150.,100.,50.]

However, that there are some differences between NumPy and netCDF variable slicing rules. Slices
behave as usual, being specified as a start:stop:step triplet. Using a scalar integer index i takes the ith
element and reduces the rank of the output array by one. Boolean array and integer sequence indexing
behaves differently for netCDF variables than for numpy arrays. Only 1-d boolean arrays and integer
sequences are allowed, and these indices work independently along each dimension (similar to the way
vector subscripts work in fortran). This means that
>>> temp[0, 0, [0,1,2,3], [0,1,2,3]]

returns an array of shape (4,4) when slicing a netCDF variable, but for a numpy array it returns an array
of shape (4,). Similarly, a netCDF variable of shape (2,3,4,5) indexed with [0, array([True, False,
True]), array([False, True, True, True]), :] would return a (2, 3, 5) array. In NumPy, this would raise an
error since it would be equivalent to [0, [0,1], [1,2,3], :]. While this behaviour can cause some
confusion for those used to NumPy's 'fancy indexing' rules, it provides a very powerful way to extract
data from multidimensional netCDF variables by using logical operations on the dimension arrays to
create slices.
For example,
>>> tempdat = temp[10:20:2, [1,3,6], lats>0, lons>0]

will extract time indices 10,12,14,16 and 18, pressure levels 850, 500 and 200 hPa, all Northern
Hemisphere latitudes and Eastern Hemisphere longitudes, resulting in a numpy array of shape (5, 3, 36,
71).
>>> print 'shape of fancy temp slice = ',tempdat.shape
shape of fancy temp slice = (5, 3, 36, 71)
>>>

Time coordinate values pose a special challenge to netCDF users. Most metadata standards (such as CF
and COARDS) specify that time should be measure relative to a fixed date using a certain calendar,
with units specified like hours since YY:MM:DD hh-mm-ss. These units can be awkward to deal with,
without a utility to convert the values to and from calendar dates. The functione
called num2date and date2num are provided with this package to do just that. Here's an example of
how they can be used:
>>> # fill in times.
>>> from datetime import datetime, timedelta
>>> from netCDF4 import num2date, date2num
>>> dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])]
>>> times[:] = date2num(dates,units=times.units,calendar=times.calendar)
>>> print 'time values (in units %s): ' % times.units+'\n',times[:]
time values (in units hours since January 1, 0001):
[ 17533056. 17533068. 17533080. 17533092. 17533104.]
>>>
>>> dates = num2date(times[:],units=times.units,calendar=times.calendar)
>>> print 'dates corresponding to time values:\n',dates
dates corresponding to time values:
[2001-03-01 00:00:00 2001-03-01 12:00:00 2001-03-02 00:00:00
2001-03-02 12:00:00 2001-03-03 00:00:00]
>>>

num2date converts numeric values of time in the specified units and calendar to datetime objects,
and date2num does the reverse. All the calendars currently defined in the CF metadata convention are
supported. A function calleddate2index is also provided which returns the indices of a netCDF time
variable corresponding to a sequence of datetime instances.
7) Reading data from a multi-file netCDF dataset.
If you want to read data from a variable that spans multiple netCDF files, you can use
the MFDataset class to read the data as if it were contained in a single file. Instead of using a single
filename to create a Dataset instance, create aMFDataset instance with either a list of filenames, or a
string with a wildcard (which is then converted to a sorted list of files using the python glob module).
Variables in the list of files that share the same unlimited dimension are aggregated together, and can be
sliced across multiple files. To illustrate this, let's first create a bunch of netCDF files with the same
variable (with the same unlimited dimension). The files must in be
in NETCDF3_64BIT, NETCDF3_CLASSIC orNETCDF4_CLASSIC format (NETCDF4 formatted
multi-file datasets are not supported).
>>> for nfile in range(10):
>>> f = Dataset('mftest'+repr(nfile)+'.nc','w',format='NETCDF4_CLASSIC')
>>> f.createDimension('x',None)
>>> x = f.createVariable('x','i',('x',))
>>> x[0:10] = numpy.arange(nfile*10,10*(nfile+1))
>>> f.close()

Now read all the files back in at once with MFDataset


>>> from netCDF4 import MFDataset
>>> f = MFDataset('mftest*nc')
>>> print f.variables['x'][:]
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]
>>>

Note that MFDataset can only be used to read, not write, multi-file datasets.

8) Efficient compression of netCDF variables


Data stored in netCDF 4 Variable objects can be compressed and decompressed on the fly. The
parameters for the compression are determined by the zlib, complevel and shuffle keyword arguments
to the createVariable method. To turn on compression, set zlib=True. The complevel keyword regulates
the speed and efficiency of the compression (1 being fastest, but lowest compression ratio, 9 being
slowest but best compression ratio). The default value ofcomplevel is 6. Setting shuffle=False will turn
off the HDF5 shuffle filter, which de-interlaces a block of data before compression by reordering the
bytes. The shuffle filter can significantly improve compression ratios, and is on by default.
Setting fletcher32 keyword argument to createVariable to True (it's False by default) enables the
Fletcher32 checksum algorithm for error detection. It's also possible to set the HDF5 chunking
parameters and endian-ness of the binary data stored in the HDF5 file with
the chunksizes and endian keyword arguments to createVariable. These keyword arguments only are
relevant for NETCDF4 and NETCDF4_CLASSIC files (where the underlying file format is HDF5) and
are silently ignored if the file format is NETCDF3_CLASSIC or NETCDF3_64BIT,
If your data only has a certain number of digits of precision (say for example, it is temperature data that
was measured with a precision of 0.1 degrees), you can dramatically improve zlib compression by
quantizing (or truncating) the data using the least_significant_digit keyword argument
to createVariable. The least significant digit is the power of ten of the smallest decimal place in the data
that is a reliable value. For example if the data has a precision of 0.1, then
setting least_significant_digit=1 will cause data the data to be quantized
usingnumpy.around(scale*data)/scale, where scale = 2**bits, and bits is determined so that a precision
of 0.1 is retained (in this case bits=4). Effectively, this makes the compression 'lossy' instead of
'lossless', that is some precision in the data is sacrificed for the sake of disk space.
In our example, try replacing the line
>>> temp = rootgrp.createVariable('temp','f4',('time','level','lat','lon',))

with
>>> temp = dataset.createVariable('temp','f4',('time','level','lat','lon',),zlib=True)

and then
>>> temp = dataset.createVariable('temp','f4',('time','level','lat','lon',),zlib=True,least_significant_digit=3)

and see how much smaller the resulting files are.

9) Beyond homogenous arrays of a fixed type - compound data types


Compound data types map directly to numpy structured (a.k.a 'record' arrays). Structured arrays are
akin to C structs, or derived types in Fortran. They allow for the construction of table-like structures
composed of combinations of other data types, including other compound types. Compound types
might be useful for representing multiple parameter values at each point on a grid, or at each time and
space location for scattered (point) data. You can then access all the information for a point by reading
one variable, instead of reading different parameters from different variables. Compound data types are
created from the corresponding numpy data type using the createCompoundType method of
a Dataset orGroup instance. Since there is no native complex data type in netcdf, compound types are
handy for storing numpy complex arrays. Here's an example:
>>> f = Dataset('complex.nc','w')
>>> size = 3 # length of 1-d complex array
>>> # create sample complex data.
>>> datac = numpy.exp(1j*(1.+numpy.linspace(0, numpy.pi, size)))
>>> # create complex128 compound data type.
>>> complex128 = numpy.dtype([('real',numpy.float64),('imag',numpy.float64)])
>>> complex128_t = f.createCompoundType(complex128,'complex128')
>>> # create a variable with this data type, write some data to it.
>>> f.createDimension('phony_dim',None)
>>> v = f.createVariable('phony_var',complex128_t,'phony_dim')
>>> data = numpy.empty(size,complex128)
>>> data['real'] = datac.real; data['imag'] = datac.imag
>>> v[:] = data
>>> # close and reopen the file, check the contents.
>>> f.close(); f = Dataset('complex.nc')
>>> v = f.variables['phony_var']
>>> datain = v[:] # read in all the data into a numpy structured array
>>> # create an empty numpy complex array
>>> datac2 = numpy.empty(datain.shape,numpy.complex128)
>>> # .. fill it with contents of structured array.
>>> datac2.real = datain['real']; datac2.imag = datain['imag']
>>> print datac.dtype,datac
complex128 [ 0.54030231+0.84147098j -0.84147098+0.54030231j -0.54030231-0.84147098j]
>>>
>>> print datac2.dtype,datac2
complex128 [ 0.54030231+0.84147098j -0.84147098+0.54030231j -0.54030231-0.84147098j]
>>>

Compound types can be nested, but you must create the 'inner' ones first. Here's a more complex
example that uses a nested compound type to represent meteorological observations at stations:
>>> # compound type example.
>>> from netCDF4 import chartostring, stringtoarr
>>> f = Dataset('compound_example.nc','w') # create a new dataset.
>>> # create an unlimited dimension call 'station'
>>> f.createDimension('station',None)
>>> # define a compound data type (can contain arrays, or nested compound types).
>>> NUMCHARS = 80 # number of characters to use in fixed-length strings.
>>> winddtype = numpy.dtype([('speed','f4'),('direction','i4')])
>>> statdtype = numpy.dtype([('latitude', 'f4'), ('longitude', 'f4'),
... ('surface_wind',winddtype),
... ('temp_sounding','f4',10),('press_sounding','i4',10),
... ('location_name','S1',NUMCHARS)])
>>> # use this data type definitions to create a compound data types
>>> # called using the createCompoundType Dataset method.
>>> # create a compound type for vector wind which will be nested inside
>>> # the station data type. This must be done first!
>>> wind_data_t = f.createCompoundType(winddtype,'wind_data')
>>> # now that wind_data_t is defined, create the station data type.
>>> station_data_t = f.createCompoundType(statdtype,'station_data')
>>> # create nested compound data types to hold the units variable attribute.
>>> winddtype_units = numpy.dtype([('speed','S1',NUMCHARS),('direction','S1',NUMCHARS)])
>>> statdtype_units = numpy.dtype([('latitude', 'S1',NUMCHARS), ('longitude', 'S1',NUMCHARS),
... ('surface_wind',winddtype_units),
... ('temp_sounding','S1',NUMCHARS),
... ('location_name','S1',NUMCHARS),
... ('press_sounding','S1',NUMCHARS)])
>>> # create the wind_data_units type first, since it will nested inside
>>> # the station_data_units data type.
>>> wind_data_units_t = f.createCompoundType(winddtype_units,'wind_data_units')
>>> station_data_units_t =
... f.createCompoundType(statdtype_units,'station_data_units')
>>> # create a variable of of type 'station_data_t'
>>> statdat = f.createVariable('station_obs', station_data_t, ('station',))
>>> # create a numpy structured array, assign data to it.
>>> data = numpy.empty(1,station_data_t)
>>> data['latitude'] = 40.
>>> data['longitude'] = -105.
>>> data['surface_wind']['speed'] = 12.5
>>> data['surface_wind']['direction'] = 270
>>> data['temp_sounding'] = (280.3,272.,270.,269.,266.,258.,254.1,250.,245.5,240.)
>>> data['press_sounding'] = range(800,300,-50)
>>> # variable-length string datatypes are not supported inside compound types, so
>>> # to store strings in a compound data type, each string must be
>>> # stored as fixed-size (in this case 80) array of characters.
>>> data['location_name'] = stringtoarr('Boulder, Colorado, USA',NUMCHARS)
>>> # assign structured array to variable slice.
>>> statdat[0] = data
>>> # or just assign a tuple of values to variable slice
>>> # (will automatically be converted to a structured array).
>>> statdat[1] = (40.78,-73.99,(-12.5,90),
... (290.2,282.5,279.,277.9,276.,266.,264.1,260.,255.5,243.),
... range(900,400,-50),stringtoarr('New York, New York, USA',NUMCHARS))

All of the compound types defined for a Dataset or Group are stored in a Python dictionary, just like
variables and dimensions:
>>> print f.cmptypes
{'wind_data': <netCDF4.CompoundType object at 0x14dd698>,
'station_data_units':<netCDF4.CompoundType object at 0x14dd620>,
'wind_data_units':<netCDF4.CompoundType object at 0x14dd648>,
'station_data':<netCDF4.CompoundType object at 0x14dd670>}
>>>

Attributes cannot be assigned directly to compound type members, However, a compound data type
can be created to hold an attribute for each member. In this example we have created the compound
types wind_data_units_t andstation_data_units_t to hold the units attribute for each member of the
nested compound type station_data_t. Now we can fill a numpy array with strings describing the units,
then assign that array to the units attribute of the station data variable. Note again that since there is no
fixed-length string type in netCDF, we have to use arrays of characters to represent strings. Variable
length strings are supported (see the next section), but not inside compound types.
>>> windunits = numpy.empty(1,winddtype_units)
>>> stationobs_units = numpy.empty(1,statdtype_units)
>>> windunits['speed'] = stringtoarr('m/s',NUMCHARS)
>>> windunits['direction'] = stringtoarr('degrees',NUMCHARS)
>>> stationobs_units['latitude'] = stringtoarr('degrees north',NUMCHARS)
>>> stationobs_units['longitude'] = stringtoarr('degrees west',NUMCHARS)
>>> stationobs_units['surface_wind'] = windunits
>>> stationobs_units['location_name'] = stringtoarr('None', NUMCHARS)
>>> stationobs_units['temp_sounding'] = stringtoarr('Kelvin',NUMCHARS)
>>> stationobs_units['press_sounding'] = stringtoarr('hPa',NUMCHARS)
>>> statdat.units = stationobs_units

Now let's close the file, reopen it, and see what's in there. The command line utility ncdump can also be
used to get a quick look at the contents of the file.
>>> # close and reopen the file.
>>> f.close(); f = Dataset('compound_example.nc')
>>> statdat = f.variables['station_obs']
>>> # print out data in variable.
>>> # (also, try 'ncdump compound_example.nc' on the command line
>>> # to see what's in the file)
>>> print 'data in a variable of compound type:'
>>> print '----'
>>> for data in statdat[:]:
>>> for name in statdat.dtype.names:
>>> if data[name].dtype.kind == 'S': # a string
>>> # convert array of characters back to a string for display.
>>> print name,': value =',chartostring(data[name]),
... ': units=',chartostring(statdat.units[name])
>>> elif data[name].dtype.kind == 'V': # a nested compound type
>>> print name,data[name].dtype.names,': value=',data[name],': units=',
... tuple([''.join(u.tolist()) for u in statdat.units[name]])
>>> else: # a numeric type.
>>> print name,': value=',data[name],': units=',chartostring(statdat.units[name])
>>> print '----'
data in a variable of compound type:
----
latitude : value= 40.0 : units= degrees north
longitude : value= -105.0 : units= degrees west
surface_wind ('speed', 'direction') : value= (12.5, 270) : units= ('m/s', 'degrees')
temp_sounding : value= [280.3 272. 270. 269. 266. 258. 254.1 250.245.5 240.] : units= Kelvin
press_sounding : value= [800 750 700 650 600 550 500 450 400 350] : units= hPa
location_name : value = Boulder, Colorado, USA : units= None
----
latitude : value= 40.78 : units= degrees north
longitude : value= -73.99 : units= degrees west
surface_wind ('speed', 'direction') : value= (-12.5, 90) : units= ('m/s','degrees')
temp_sounding : value= [290.2 282.5 279. 277.9 276. 266. 264.1 260. 255.5 243.] : units= Kelvin
press_sounding : value= [900 850 800 750 700 650 600 550 500 450] : units= hPa
location_name : value = New York, New York, USA : units= None
----
>>>
>>> f.close()

10) Variable-length (vlen) data types.


NetCDF 4 has support for variable-length or "ragged" arrays. These are arrays of variable length
sequences having the same type. To create a variable-length data type, use the createVLType method
method of a Dataset or Group instance.
>>> f = Dataset('tst_vlen.nc','w')
>>> vlen_t = f.createVLType(numpy.int32, 'phony_vlen')

The numpy datatype of the variable-length sequences and the name of the new datatype must be
specified. Any of the primitive datatypes can be used (signed and unsigned integers, 32 and 64 bit
floats, and characters), but compound data types cannot. A new variable can then be created using this
datatype.
>>> x = f.createDimension('x',3)
>>> y = f.createDimension('y',4)
>>> vlvar = f.createVariable('phony_vlen_var', vlen_t, ('y','x'))

Since there is no native vlen datatype in numpy, vlen arrays are represented in python as object arrays
(arrays of dtypeobject). These are arrays whose elements are Python object pointers, and can contain
any type of python object. For this application, they must contain 1-D numpy arrays all of the same
type but of varying length. In this case, they contain 1-D numpy int32 arrays of random length betwee
1 and 10.
>>> import random
>>> data = numpy.empty(len(y)*len(x),object)
>>> for n in range(len(y)*len(x)):
>>> data[n] = numpy.arange(random.randint(1,10),dtype='int32')+1
>>> data = numpy.reshape(data,(len(y),len(x)))
>>> vlvar[:] = data
>>> print 'vlen variable =\n',vlvar[:]
vlen variable =
[[[ 1 2 3 4 5 6 7 8 9 10] [1 2 3 4 5] [1 2 3 4 5 6 7 8]]
[[1 2 3 4 5 6 7] [1 2 3 4 5 6] [1 2 3 4 5]]
[[1 2 3 4 5] [1 2 3 4] [1]]
[[ 1 2 3 4 5 6 7 8 9 10] [ 1 2 3 4 5 6 7 8 9 10]
[1 2 3 4 5 6 7 8]]]
Numpy object arrays containing python strings can also be written as vlen variables, For vlen strings,
you don't need to create a vlen data type. Instead, simply use the python str builtin instead of a numpy
datatype when calling thecreateVariable method.
>>> z = f.createDimension('z',10)
>>> strvar = rootgrp.createVariable('strvar', str, 'z')

In this example, an object array is filled with random python strings with random lengths between 2
and 12 characters, and the data in the object array is assigned to the vlen string variable.
>>> chars = '1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> data = NP.empty(10,'O')
>>> for n in range(10):
>>> stringlen = random.randint(2,12)
>>> data[n] = ''.join([random.choice(chars) for i in range(stringlen)])
>>> strvar[:] = data
>>> print 'variable-length string variable:\n',strvar[:]
variable-length string variable:
[aDy29jPt jd7aplD b8t4RM jHh8hq KtaPWF9cQj Q1hHN5WoXSiT MMxsVeq td LUzvVTzj
5DS9X8S]

All of the code in this tutorial is available in examples/tutorial.py, Unit tests are in the test directory.

Contact: Jeffrey Whitaker <jeffrey.s.whitaker@noaa.gov>


Copyright: 2008 by Jeffrey Whitaker.
License: Permission to use, copy, modify, and distribute this software and its documentation for any
purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies
and that both the copyright notice and this permission notice appear in supporting documentation. THE
AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES
OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
SOFTWARE.
Version: 0.9.2

Classes
CompoundType
A CompoundType instance is used to describe a compound data type.
Dataset
Dataset(self, filename, mode="r", clobber=True, format='NETCDF4')
Dimension
Dimension(self, group, name, size=None)
Group
Group(self, parent, name)
MFDataset
MFDataset(self, files, check=False, exclude=[])
VLType
A VLType instance is used to describe a variable length (VLEN) data type.
Variable
Variable(self, group, name, datatype, dimensions=(), zlib=False, complevel=6, shuffle=True,
fletcher32=False, contiguous=False, chunksizes=None, endian='native',
least_significant_digit=None,fill_value=None)

Functions

chartostring(b)
convert a character array to a string array with one less dimension.

date2index(dates, nctime, calendar=None, select='exact')


Return indices of a netCDF time variable corresponding to the given dates.

date2num(dates, units, calendar='standard')


Return numeric time values given datetime objects.

getlibversion()
returns a string describing the version of the netcdf library used to build the module, and when it was
built.

num2date(times, units, calendar='standard')


Return datetime objects given numeric time values.

stringtoarr(a, NUMCHARS)
convert a string to a character array of length NUMCHARS

stringtochar(a)
convert a string array to a character array with one extra dimension
Variables
__hdf5libversion__ = '1.8.4-patch1'
__netcdf4libversion__ = '4.1.1'
__package__ = None
__required_hdf5version__ = '1.8.4-patch1'
__required_netcdf4version__ = '4.1.1'

Function Details

chartostring(b)
convert a character array to a string array with one less dimension.

Parameters:
• b - Input character array (numpy datatype 'S1'). Will be converted to a array of strings,
where each string has a fixed length of b.shape[-1] characters.
Returns:
A numpy string array with datatype 'SN' and shape b.shape[:-1], where N=b.shape[-1].

date2index(dates, nctime, calendar=None, select='exact')


Return indices of a netCDF time variable corresponding to the given dates.

Parameters:
• dates - A datetime object or a sequence of datetime objects. The datetime objects
should not include a time-zone offset.
• nctime - A netCDF time variable object. The nctime object must have
a units attribute.
• calendar - Describes the calendar used in the time calculation. Valid
calendars 'standard', 'gregorian', 'proleptic_gregorian'
'noleap', '365_day', '360_day', 'julian', 'all_leap',
'366_day'. Default is 'standard', which is a mixed Julian/Gregorian calendar
Ifcalendar is None, its value is given by nctime.calendar or standard if no
such attribute exists.
• select - 'exact', 'before', 'after', 'nearest' The index selection
method. exact will return the indices perfectly matching the dates
given. before and after will return the indices corresponding to the dates just before
or just after the given dates if an exact match cannot be found. nearest will return the
indices that correspond to the closest dates.
Returns:
an index (indices) of the netCDF time variable corresponding to the given datetime object(s).

date2num(dates, units, calendar='standard')


Return numeric time values given datetime objects. The units of the numeric time values are described
by the unitsargument and the calendar keyword. The datetime objects must be in UTC with no
time-zone offset. If there is a time-zone offset in units, it will be applied to the returned numeric
values.

Like the matplotlib date2num function, except that it allows for different units and calendars.
Behaves the same ifunits = 'days since 0001-01-01 00:00:00' and calendar =
'proleptic_gregorian'.

Parameters:
• dates - A datetime object or a sequence of datetime objects. The datetime objects
should not include a time-zone offset.
• units - a string of the form 'time units since reference time' describing
the time units. time units can be days, hours, minutes or seconds. reference
time is the time origin. A valid choice would be units='hours since 1800-01-
01 00:00:00 -6:00'.
• calendar - describes the calendar used in the time calculations. All the values
currently defined in the CF metadata convention are supported. Valid
calendars 'standard', 'gregorian', 'proleptic_gregorian'
'noleap', '365_day', '360_day', 'julian', 'all_leap',
'366_day'. Default is 'standard', which is a mixed Julian/Gregorian calendar.
Returns:
a numeric time value, or an array of numeric time values.

The maximum resolution of the numeric time values is 1 second.

num2date(times, units, calendar='standard')


Return datetime objects given numeric time values. The units of the numeric time values are described
by the unitsargument and the calendar keyword. The returned datetime objects represent UTC
with no time-zone offset, even if the specified units contain a time-zone offset.

Like the matplotlib num2date function, except that it allows for different units and calendars.
Behaves the same ifunits = 'days since 001-01-01 00:00:00' and calendar =
'proleptic_gregorian'.

Parameters:
• times - numeric time values. Maximum resolution is 1 second.
• units - a string of the form 'time units since reference time' describing
the time units. time units can be days, hours, minutes or seconds. reference
time is the time origin. A valid choice would be units='hours since 1800-01-
01 00:00:00 -6:00'.
• calendar - describes the calendar used in the time calculations. All the values
currently defined in the CF metadata convention are supported. Valid
calendars 'standard', 'gregorian', 'proleptic_gregorian'
'noleap', '365_day', '360_day', 'julian', 'all_leap',
'366_day'. Default is 'standard', which is a mixed Julian/Gregorian calendar.
Returns:
a datetime instance, or an array of datetime instances.

The datetime instances returned are 'real' python datetime objects if the date falls in the
Gregorian calendar (i.e.calendar='proleptic_gregorian', or calendar =
'standard' or 'gregorian' and the date is after 1582-10-15). Otherwise, they are 'phony'
datetime objects which support some but not all the methods of 'real' python datetime objects.
This is because the python datetime module cannot the uses
the 'proleptic_gregorian'calendar, even before the switch occured from the Julian
calendar in 1582. The datetime instances do not contain a time-zone offset, even if the
specified units contains one.

stringtoarr(a, NUMCHARS)
convert a string to a character array of length NUMCHARS

Parameters:
• a - Input python string.
• NUMCHARS - number of characters used to represent string (if len(a) < NUMCHARS, it
will be padded on the right with blanks).
Returns:
A rank 1 numpy character array of length NUMCHARS with datatype 'S1'

stringtochar(a)
convert a string array to a character array with one extra dimension

Parameters:
• a - Input numpy string array with numpy datatype 'SN', where N is the number of
characters in each string. Will be converted to an array of characters (datatype 'S1') of
shape a.shape + (N,).
Returns:
A numpy character array with datatype 'S1' and shape a.shape + (N,), where N is the length of
each string in a.

Das könnte Ihnen auch gefallen