Reading a Scientific Data Set from an HDF File
The following sections detail how a user may utilize the HDF library and the SD API within a computer program to read a scientific data set from an HDF file. In this section, the tutorial will concentrate on using the FORTRAN programming language and the SD API. However, examples of the appropriate C code will also be given for certain steps. For the purpose of this tutorial, we are choosing the example of reading an entire data array that is the first and only data set in the HDF file. Similar to writing an HDF file, the user should follow these simple steps:
Does the current version of HDF support your computer platform and operating system?
As outlined in Section 4, the HDF library can not be run on just any available computer platform or operating system. Before downloading the HDF library software, the user should make sure that the current release of HDF supports his/her computer and operating system. Otherwise, the user will be unable to work with the HDF library and files. There is also a possibility that previous releases of HDF may support the Users computer platform while the latest version does not. In this event, the user may wish to obtain the earlier software.
Downloading and Installing the HDF library
The HDF library and software is public domain software and available free to all users. The library and code can be downloaded from the NCSA anonymous ftp server (ftp://ftp.ncsa.uiuc.edu/). Directions on how to install the HDF library can also be found at this location.
Are all libraries and programs properly linked and compiled?
In order to eventually run the HDF software, the library and the needed application routines and programs must first be properly compiled and linked. As of the current release of HDF (4.1r3), four separate libraries must be compiled and linked. These are the libmfhdf.a, libdf.a, libjpeg.a, and libz.a libraries. Provided below are examples of the command(s) that can be used for this action. It must be noted that the order in which the libraries are linked is important and should not vary from the order shown below:
For C programs:
cc -o <your program> <your program>.c \
-I<pathf for hdf include directory>\
-L<path for hdf libraries> -lmfhdf -ldf -ljpeg -lz
For FORTRAN programs:
f77 -o <your program> <your program>.f \
-I<path for hdf include directory>\
-L<path for hdf libraries> -lmfhdf -ldf -ljpeg -lz
For the various commands needed to link and compile the HDF library on each individual platform, please see Section 4:Compiling the HDF Library.
Writing a short program to read an HDF data set
As mentioned previously, the HDF library and programs can only be run by using either the C or FORTRAN programming language. This choice is up to the user depending on availability and the language he or she feels most familiar and comfortable with.
Make sure all include files are in place
Earlier, it was noted that a series of standard HDF definitions and declarations of file access codes (i.e. read, write, etc.) and data types (i.e. integer, character) must be included within the programs that the user writes to utilize the various application routines. In the C programs, this is accomplished simply by adding the line #include "hdf.h" at the beginning of the program. This line effectively includes all the needed constants and definitions from the HDF software. When writing FORTRAN programs, this may also be done by simply adding an include statement that brings in only the needed definitions and declarations (constants.f) from the hdf.h header file. This is done by the following code: "include constants.f". However, all FORTRAN compilers (particularly the older ones) do not support the use of include statements. In this event, the user must type in/declare all the constants and definitions found in the constants.f file. It is advised that all declarations, whether through Include statements or not, should be done at the beginning of the program.
Make all variables and parameter declarations
As with any program, the scientist/user should declare and initialize all variables and parameters at the beginning of the program. This includes all variables and arguments that will be used by the HDF commands to follow. The variable and parameter declarations needed for each call will be provided in the example boxes of the individual steps. These statements always belong at the top of the program.
Initialize access to the SD interface and open HDF file
The first real HDF programming step actually accomplishes two things:
This is done by the following code:
sd_id = sfstart(filename, access_mode) (FORTRAN)
or
sd_id = SDstart(filename, access_mode) ( C )
where
sd_id = HDF file id returned by the sfstart/SDstart command
filename = the name of the existing HDF file (character string)
access_mode = Type of access required for this file
All available options for the access-mode argument are defined in the hdf.h header file mentioned previously and need only to be identified for all C and most FORTRAN operations. All options begin with the prefix "DFACC_" and include:
DFACC_CREATE (File Creation Access)
DFACC_RDONLY (Read Access)
DFACC_RDWR (Read and Write Access)
These definitions are stated in the hdf.h header file.
In the event that the user's FORTRAN compiler can not handle include statements with the header file (hdf.h), the DFACC_ variable must be defined, along with its assigned value, at the beginning of the program. This is done by a code line such as:
parameter (DFACC_RDONLY = 1) (For FORTRAN only)
Example:
FORTRAN:
integer*4 sd_id
integer sfstart
parameter(DFACC_RDONLY = 1)
sd_id=sfstart("wind.hdf", DFACC_RDONLY)
C:
#includehdf.h"
int32 sd_id;
sd_id=Sdstart("wind.hdf", DFACC_RDONLY);
Select data set to be read from the HDF file
After initializing the SD interface and opening and assigning a file id (sd_id) to the HDF file to be used, the next step is to select the HDF Scientific Data Set (SDS) which will be read. This is done by the following code:
sds_id = sfselect (sd_id, sds_index) (FORTRAN)
or
sds_id = SDselect (sd_id, sds_index) ( C )
where
sds_id = HDF SDS array id returned by the sfselect/SDselect command
sd_id = the HDF file id created in the previous step (sfstart/SDstart)
sds_index = index number of data set within file
(i.e. 0 = first data set, 1 = second data set, etc.)
Example:
FORTRAN:
integer sds_id, sds_index, sd_id
integer sfselect
c sds_index = 0 represents the first data set
sds_id = sfselect(sd_id,0)
C:
int32 sd_id, dims[2];
dims[0] = YL;
dims[1] = XL;
sds_id = Sdselect(sd_id,0);
Read an existing data set/array
After initializing the API and selecting the HDF file and HDF SDS to be read to, the next step is to actually read the existing HDF data by using the SDreaddata (sfrdata) command. This command is used to read either all or part of the existing n-dimensional data set (termed a "slab") into the sds_id array with the same number of dimensions. In addition, the size of each dimension of the data "slab" must be the same or smaller then the corresponding dimension of the sds_id. The SDreaddata/sfrdata command is used in the following fashion :
ret=sfrdata (sds_id, start, stride, edge, data) (FORTRAN)
or
ret=SDreaddata (sds_id, start, stride, edge, data); ( C )
(It should be noted that there are two versions of the read routine in FORTRAN. The sfrdata routine reads numeric scientific data while sfrcdata reads character scientific data.)
where
sds_id = the SDS id returned by using SDcreate/SDselect (sfcreate/sfselect)
start = An array which identifies where in the SDS that the reading will begin
The start array identifies the location or position in the SDS where the reading of the data "slab" will begin. This array must have the same number of dimensions (rank) as the SDS and can not be larger (in each dimension) then the SDS array. The declaration of the start variables can be done at the top of the program or just preceding the call of the sfrdata (SDreaddata) command. As an example, to read the existing data set to the beginning of a new 2-dimensional SDS the following must be specified:
start(1) = 0 (FORTRAN)
start(2) = 0
or
start[0] = 0; ( C )
start[1] = 0;
If the user wishes to begin reading the data at a location other then the beginning of the data set, say at a first dimension (X) of 15, the declarations would be:
start(1) = 15 (FORTRAN)
start(2) = 0
or
start[0] = 15; ( C )
start[1] = 0;
stride = An array specifying the interval between written values in each dimension
The stride argument specifies, for each dimension, the interval between consecutive written values of the data set. In other words, how many array locations are skipped with each reading of the data. Like the start array, the stride argument is predefined before calling the sfrdata (SDreaddata) command, either directly before the call or at the top of the program.
If the user does not wish to skip any array locations in a new 2-dimensional SDS, the following is to be declared:
stride(1) = 1 (FORTRAN)
stride(2) = 1
or
stride[0] = 1; ( C )
stride[1] = 1;
However, if the user wishes to skip every other X (dimension 1) location, the following would be used:
stride(1) = 2 (FORTRAN)
stride(2) = 1
or
stride[0] = 2; ( C )
stride[1] = 1;
edge = An array defining the number of data values to be read in each dimension
The edge array defines the number of data values/elements that will be read along each dimension of the multi-dimensional SDS array. In plain terms, this argument defines the size of the data slab (all or part of the data) to be written to the new SDS array and each dimension.
The parameter edge must be specified for each dimension of the data set and SDS array, and can not be larger then the entire length of the array being read.
Similar to stride and start, the edge argument needs to be defined prior to the calling of the sfrdata (SDreaddata) command, whether it be at the top of the program or directly before the routine call. The file containing this data should be opened at the beginning of the program and the data read in and stored into the necessary arrays before beginning the HDF operations.
As an example: Most often, the user will wish to read an HDF file which contains one data set (winddata), which starts from the beginning and does not contain any missing data or blanks.
For a 2-dimensional array of 30X30, read and stored into the data array "testdata", this can be done by:
start(1) = 0 (FORTRAN)
start(2) = 0
stride(1) = 1
stride(2) = 1
edge(1) = 30
edge(2) = 30
retn = sfrdata(sds_id, start, stride, edges, winddata)
or
start[0] = 0; ( C )
start[1] = 0;
stride[0] = 1;
stride[0] = 1;
edge[0] = 30;
edge[1] = 30;
retn = SDreaddata(sds_id, start, stride, edges, winddata);
Example:
For reading the entire data set from an HDF file which contains only one 2-D array
FORTRAN:
integer start(2), edges(2), stride(2)
integer retn sfrdata
c Define the location, pattern + size of data to be read
YL = 30
XL = 30 start(1) = 0 start(2) = 0 stride(1) = 1 stride(2) = 1 edge(1) = XL
edge(2) = YL
retn = sfrdata(sds_id, start,stride,edges,winddat)
C:
/* Define the location, pattern + size of data to be read */
YL = 30;
XL = 30;
dims[0] = YL;
dims[1] = XL;
start[0] = 0;
start[1] = 0;
stride[0] = 1;
stride[1] = 1;
edge[0] = dims[0];
edge[1] = dims[1];
retn = SDreaddata(sds_id, start,stride,edges,winddat);
Using standard FORTRAN and C statements for writing, the non-HDF data is written into a new file (storage). In addition, the user may wish to print out all or parts of the HDF data set to view the data or as a check of the procedure/operation.
Optional operation: Get and Read Metadata
After opening the HDF file using the sfstart/SDstart, the first step is to see if the file or data sets do indeed contain attributes. This is done by using the following code:
attr_index = SDfindattr (sd_id, attr_name); ( C )
attr_index = sffattr (sd_id, attr_name) (FORTRAN)
where
attr_index = valid attribute index returned if attribute exists
sd_id = file identifier
attr_name = name of attribute (i.e.,Contents of file")
If there is a attribute index, the name, data type (num_type), and count (number of characters) of the attribute can be obtained:
retn= SDattrinfo(sd_id, attr_index, attr_name, num_type, count); (C)
retn= sfgainfo (sd_id, attr_index, attr_name, num_type, count) (FORTRAN)
After completing these operations, the attributes can be read using the following:
retn= SDreadattr (sd_id, attr_index, buffer); ( C )
retn= sfrattr (sd_id, attr_index, buffer) (FORTRAN)
where
buffer is allocated to hold the attribute data
The above steps can also be followed for each data set within the file by getting the data set id (sds_id) of the data, finding a particular attribute (i.e.,"Units") and getting and reading the data.
Example:
FORTRAN:
sd_id=sfstart ("wind.hdf", DFACC_RDONLY)
attr_index= sffattr (sd_id,"file_contents")
retn= sfgainfo (sd_id, attr_index, "file_contents", data_type, count)
retn= sfrattr (sd_id, attr_index, buffer)
and
sds_id=sfselect (sd_id, 0)
attr_index= sffattr (sds_id,"units")
retn= sfgainfo (sds_id, attr_index,"units", data_type, count)
retn= sfrattr (sds_id, attr_index, buffer)
C:
sd_id=SDstart ("wind.hdf", DFACC_RDONLY);
attr_index= SDfindattr (sd_id,"file_contents");
retn= SDattrinfo (sd_id, attr_index,"file_contents", data_type, count);
retn= SDreadattr (sd_id, attr_index, buffer);
and
sds_id=SDselect (sd_id, 0);
attr_index= SDfindattr (sds_id,"units");
retn= SDattrinfo (sds_id, attr_index,"units", data_type, count);
retn= SDreadattr (sds_id, attr_index, buffer);
Terminate/Close access to all files, data sets, and APIs
After writing the data to the new SDS array within the new HDF file, it is necessary to terminate or close access to the new data set in order to prevent any possible loss of data. This is done by the following:
retn = sfendacc(sds_id) (FORTRAN)
or
retn = SDendaccess(sds_id) ( C )
In addition, the API called within the program must also be closed to prevent any data loss:
retn = sfend(sd_id) (FORTRAN)
or
retn = SDend(sd_id) ( C )
Example:
FORTRAN:
integer sfendacc, sfend
retn = sfendacc(sds_id)
retn = sfend(sd_id)
C:
retn = SDendaccess(sds_id);
retn = SDend(sd_id);
Execute like a standard FORTRAN or C program.