Sie sind auf Seite 1von 26

A STUDY ON HOW A STATE OFFICE CAN SERVE AS A FOCAL POINT AND CLEARING HOUSE FOR SPATIAL DATA COMMUNICATIONS

IN A THREE TEIR GOVERNMENT NAME NWANKWO JOEL C MATRIC NO 980405049 COURSE SVY 822 LECTURE DR OLUSINA

Contents
CHAPTER 1 .................................................................................................................................................... 3 INTRODUCTION Spatial Data Clearinghouse................................................................................................. 3 What is a Clearinghouse?.......................................................................................................................... 3 Common Goals for a Clearinghouse ...................................................................................................... 3 Accessing Data in a Clearinghouse Sample configuration ...................................................................... 3 What Defines a Site /Node? ...................................................................................................................... 4 Logical Grouping: .................................................................................................................................. 4 Spatial Data Cleaning and Validation .................................................................................................... 4 Preparing Related Tabular Data ................................................................................................................ 5 Language ............................................................................................................................................... 5 Geo-processing ..................................................................................................................................... 5 Spatial Extent ........................................................................................................................................ 5 Field Names........................................................................................................................................... 6 Study Objectives ....................................................................................................................................... 6 Initiatives at the Federal Level .................................................................................................................. 7 Chapter 2: The Prototype Spatial Data Clearinghouse ................................................................................. 8 State Controlled Geospatial Data Clearinghouse...................................................................................... 8 Content ................................................................................................................................................. 8 Technology ................................................................................................................................................ 8 Management............................................................................................................................................. 9 Policy ......................................................................................................................................................... 9 Describing Data Sets with Standard Metadata ........................................................................................... 10 Search and Retrieval ................................................................................................................................... 11 The Search Form ..................................................................................................................................... 11 Clearinghouse Implementation & Reaction ............................................................................................ 12 How the Clearinghouse can be constructed ........................................................................................... 13 The Common Gateway Interface ............................................................................................................ 13 Other Options for Constructing the Clearinghouse ................................................................................ 13 Advantages offered by free WAIS-sf: ...................................................................................................... 13

Advantages offered by a Relational Database ........................................................................................ 14 Isite ...................................................................................................................................................... 14 Other Uses of the Clearinghouse ............................................................................................................ 15 CHAPTER 3 .................................................................................................................................................. 17 KEY FACTORS IN MANAGEMENT OF A CLEARING HOUSE .......................................................................... 17 The Number of Data Suppliers ................................................................................................................ 17 The Type of Data Accessibility ................................................................................................................ 17 The Metadata-Standard Used ................................................................................................................. 17 The Number of Spatial Datasets ............................................................................................................. 17 The Most Recently Produced Dataset .................................................................................................... 18 The Number of Web References............................................................................................................. 18 The Number of Monthly Visitors ............................................................................................................ 18 The Frequency of Web Updates ............................................................................................................. 18 The Languages Used ............................................................................................................................... 18 The Use of Maps for Searching ............................................................................................................... 19 Registration-Only Access......................................................................................................................... 19 Summary ..................................................................................................................................................... 19 Responding to data requests from Clearinghouse users. ....................................................................... 23 CONCLUSIONS ............................................................................................................................................. 24

CHAPTER 1 INTRODUCTION Spatial Data Clearinghouse


What is a Clearinghouse?
A clearinghouse can be defined as a network of searchable Internet servers containing structured metadata and, potentially, data. In other words, a clearinghouse can be thought of as being made up of three parts: 1. Metadata providing a method of cataloguing and search. 2. Internet providing the backbone for the transfer of information. 3. Software tools providing search and storage capability. Sometimes a clearinghouse will also provide direct access to data as well. In a clearinghouse, multiple sites (or nodes) can be searched and their information presented in a consistent way. These sites often have links to the data, enabling direct downloading, or access to points of sale and distribution. Clearinghouse allows individual agencies, consortia, or geographically-defined communities to band together and promote their available digital spatial data. In this way, a clearinghouse is an excellent promoter and product of a partnership. Similarly, a clearinghouse provides advertising for an organization and/or partnerships data holdings.

Common Goals for a Clearinghouse


The clearinghouse provides the framework upon which metadata can be used and are useful. As a result, the goals and benefits associated with a clearinghouse often reflect the goals and benefits of metadata. For example, a clearinghouse will help: Promote the use of standards. Develop free reference implementations and software for public and commercial re-use. Promote a common vocabulary for geospatial data discovery on the Internet. Minimize duplication of effort in spatial data collection and processing. Provide means to advertise data collection requirements, inventory, and quality.

Accessing Data in a Clearinghouse Sample configuration


Spatial data (via metadata) are discovered typically in this configuration. A user query is sent from a web client to a special web server (via a form) that forwards the search request to many servers through a distributed search client installed in the web server. This client can connect to multiple Clearinghouse Nodes at the same time. Results are collated by the gateway and sent to the user as HTML.

What Defines a Site /Node?


An NSDI Clearinghouse Node includes a public Z39.50 protocol server supporting the FGDC metadata (GEO profile). A Node can include a Web server to provide forms and organizational reference. Servers may be established institutionally or geographically to optimize sharing of common computer resources. To be part of Geospatial Data Clearinghouse (FGDC), data descriptions (metadata) must be compliant with FGDC metadata standards and the server with the metadata needs to be registered as a clearinghouse server. Logical Grouping: Each warehouse is a set of logically related information objects, related tabular data and spatial data that are in the same geo-referencing coordinate system. Depending on your data, you may wish to or need to define more than one warehouse. For example, if you have data at widely varying scale you could create a global warehouse (1:5M scale), a national warehouse (1:1M and 1:250K scale), one or more provincial warehouses (1:100K or larger scale), or even site specific warehouse(s). Another approach would be to create subject-specific data warehouses with data at any scale. For example, you could have a natural resource warehouse, a political warehouse, an economic indicators warehouse, etc. The main criteria for choosing the data storage scheme is to consider what the user needs and then put those like groups of data together in a logical way. There is no limit to the number of map layers and amount of data each warehouse can contain; the limit, however, is more a practical limit of how many map layers and related data sets you want to present to the user. Too much data can cause the usability of the warehouse to suffer. Spatial Data Cleaning and Validation Check to see that: I. II. III. IV. V. Each map layer has a unique name. Each vector map layer is in ESRI shape or Arc/Info coverage format. Each map layer is topologically clean no weird polygons, slivers, over shoots, undershoots. There are no Multipoint type map layers they should all be converted to single point features. All map layers have valid business features there are no unwanted graphic primitives from the digitizing or topological processing. Any features that are not valid business features should be deleted from the map layer or the map layer should be re-processed with larger geoprocessing tolerances. For those map layers that will be rendered using a value or as a label, there must be appropriate value rendering field(s) or label rendering fields(s) appended to the map layer dbf table. Identify the selectable map layers. For Selectable Map Layers: Each map layer dbf contains a business feature ID, business feature Name field. ID and Name fields should have intuitive names such as MUN_ID for a Municipal identifier. A good convention is<Map layer name>_ID and <Map layer name>_NAM. If necessary, the name and the ID fields can be the same field. All tabular data sets (system or user) must have the same business feature ID as used for the map layer ID.

VI.

VII.

VIII.

IX.

X.

Each business feature ID value should be unique across the map layer. Often the business feature ID will be determined by the business program/area, e.g., each mine has a unique identifier. If so, this value should be used as the feature business identifier. In all cases the business feature identifier should be the ID field, unless it is not unique. In this case define a unique ID field and use the business identifier as the Name field or as an additional attribute in the dbf. Each map layer should have a spatial index created on the feature type field, and for selectable map layers on the business feature ID field, and the business feature Name field.

Preparing Related Tabular Data


Sometimes the map layers .dbf file contains information that would be better stored in a relational database. Define one or more tabular databases that will act as repositories for all tabular data. Tabular data can be organized by subject, time period, language, geography, or some other logical group. A database can contain one or more physical tables and one or more SQL Views defined on one or more physical tables. Each tabular database should be located under the Warehouse/System Data Sets folder. If necessary, use sub-folders to organize multiple databases (note: this is not applicable to an Oracle dB). For each Selectable Map Layers shape .dbf file: I. II. If the data warrant creates a system data set from the contents of the .dbf file. Create a clean .dbf file for the selectable map layer either in Access or in ArcView or another GIS tool. Delete all the unnecessary attributes in the table except for the business feature ID field and the business feature Name field. If necessary, rename the business feature ID field using the following naming convention <map layer name>_ID and ensure that the fields physical type is the same as the business identifier in the System Data Set table.

Language Consider whether or not you will need to translate the data and field names into another language to support clients. If so, you may also want to create a separate warehouse based on language. Geo-processing Perform any geo-processing on the map layer that might aid in its rendering such as merging or dissolving on features. For example, if you want to conduct value rendering on a map layer of, say, elevation, you may first want to dissolve all the features so that each elevation has a unique record rather than multiple records. Spatial Extent Consider the spatial extent of the data. For example, if the shape file extent extends beyond the boundaries of the warehouse, you should clip the shapes to fit.

Field Names As noted above, field names in the .dbf and any related tabular databases should be carefully considered for clarity. For numeric fields, ensure the unit of measure is given.

Study Objectives
This report covers a study on how a state office can serve as a focal point and clearing house for spatial data to both government agencies and other states. The lab report is centered on the idea of a data cooperative where many tiers of government starting from the state would work together to share data and expertise for individual and mutual needs. The development of the Clearinghouse network has been motivated by a desire to minimize the duplication of effort in the collection of expensive digital spatial data and to foster cooperative digital data collection activities. The Clearinghouse and associated metadata documentation of geographic holdings, promotes the availability, quality, and requirements for digital data through a searchable on-line system and provides a primary data dissemination mechanism to traditional and non-traditional spatial data users. Possible participants can come from all levels of government, the private sector, the academic community, utilities, and the not-for-profit sector. The project research activities can involve participants from across the state and many additional participants from the national GIS community. Presentations and articles in regional user group newsletters can be used to inform the GIS community about the project activities and to invite participation. Work can be done with the relevant state and national agencies /department to improve the ability of any individual or organization to identify the existence of relevant data sets, to enhance understanding of the value of geographic information systems, and to explore the need for coordination of efforts to use GIS and spatial data in New York State. The project team identified a set of deliverables that reflected the proposal submitted by DEC, and which, to the extent possible, also addressed some of the charges to the NYS Temporary GIS Council which was getting underway at the same time. The study is focused on two areas. The first was the development of a prototype designed to demonstrate the efficacy of an on-line clearinghouse of metadata and spatial data sets. The Clearinghouse would be available to public, private, academic, and non-profit users as a mechanism to share data. The federal metadata standard was adopted for use in the prototype Clearinghouse. In a parallel project activity, the federal standard for metadata was further analyzed for its usability to support data sharing. The second focus was to review the literature and work with the GIS community to gather data on the value of GIS as a decision making tool, to identify effective approaches to assessing costs and benefits, to identify barriers to sharing and coordination of GIS activities, and to gather information and recommendations from the community regarding the future coordination of GIS in New York State. Within this larger framework, the project team pursued three specific objectives:

1. Demonstrate the value of GIS by examining exemplary applications and existing evaluation approaches. 2. Identify barriers to sharing spatial data and explore potential solutions for overcoming those barriers. 3. Investigate some practical tools to support GIS coordination in New York State. This paper reports on the project efforts associated with the development of a web based prototype data repository as a mechanism to support GIS coordination in New York State. An overview of the larger environment within which the NYS Clearinghouse exists, the functionality of the Clearinghouse and the underlying technical structure are presented. Implementation alternatives and recommendations for implementation are also presented. (Please see Appendix A for a related products list.)

Initiatives at the Federal Level


The federal government has created a comprehensive national initiative focused on the value of spatial data. The federal government has begun to address GIS issues with the establishment of the National Spatial Data Infrastructure (NSDI). According to FGDC, the NSDI is a set of policies, standards, materials, technologies, people and procedures, as well as spatial data that provide a foundation for more efficient collection, management, and use of data. The goal is better access to higher quality spatial data at lower costs to all. The NSDI requires cooperation and interaction among various levels of government, the private sector, and academia. The major components of the NSDI are: Standards to facilitate data collection, documentation, access, and transfer A basic framework of digital spatial data that meets the minimum needs of large numbers of data users over any given geographic area A clearinghouse to serve, search, query, find, access, and use spatial data Education and training in the collection, management, and use of spatial data. While the NSDI is managed at the federal level under the leadership of the Federal Geographic Data Committee (FGDC), many state and local governments, as well as academic and private sector organizations, have joined the effort to promote better access to spatial information, to increase communication and cooperation within the GIS community, and to eliminate costly data redundancy. The efforts of all of these organizations together form the NSDI. The National Spatial Data Clearinghouse (NSDC), an Internetbased tool to facilitate search and retrieval of spatial data sets, is a key part of the NSDI. A number of federal agencies and state governments have built spatial data clearinghouses on the Internet that can be accessed from the NSDI home page.

Chapter 2: The Prototype Spatial Data Clearinghouse


State Controlled Geospatial Data Clearinghouse
Clearinghouse is known to be a distributed database server system connecting data producer, data user and manager that can be accessed through the internet. A clearinghouse is considered by many as a useful tool to search, find, evaluate and obtain any kind of data. The National Geospatial Data Clearinghouse is a clearinghouse that deals with geospatial databases produced and used within the territory of Indonesia. The development of National Geospatial Data Clearinghouse involves all geospatial data stakeholders in Indonesia.

Content
The content of the state controlled Geospatial Data Clearinghouse is information about all sets of geospatial data. Information about data is known as metadata. Because this data deals with geospatial data then this metadata is called as geospatial metadata. Metadata required by the Geospatial Data Clearinghouse consists of organization metadata, collection metadata and inventory metadata. Organization metadata is metadata that consists of information about the data producer organization. Collection metadata is metadata that contains information about a collection of data. And finally the inventory metadata is metadata that refers to information about all collection data in detail. To develop metadata for the National Geospatial Data Clearinghouse, a content standard of metadata is necessary. The study model, in this case, has decided to adopt the Federal Geographic Data Committee metadata standard called 'Content Standard for Digital Geospatial Metadata', as the content standard of metadata. This adoption means that all data producers have to use this standard in developing their geospatial metadata. To implement this standard an application has to be developed to generate the geospatial metadata information and present it on the web.

Technology
The National Geospatial Data Clearinghouse consists of a number of metadata servers that are interconnected forming a network. These metadata servers belong to data producer organizations and are called network nodes. Within this network, there is an additional server that is operated as a network gateway that allows connections to other networks. This server is called metadata gateway server. This network is developed on the TCP/IP protocol and is connected to the internet. The connection to the internet is considered so that all metadata servers that are located in all parts of Indonesia can be connected into the network and all metadata users can access the network from internet. Metadata server consists of metadata server applications that are designed to generate metadata database and transmits the server to the internet. This metadata database is generated using Geo Profile as a reference of technical specification of metadata standard with Z39.50 protocol. The type of metadata information that was stored on the server is a collection metadata and or an inventory metadata. This metadata server should be developed and maintained by any institution that produces the geospatial data.

The metadata gateway server contains the registry of all node servers belong to this clearinghouse network system. In addition, this server also has the information of institutions metadata servers who register their servers. Besides, this gateway server consists of internet based applications that are designed to search and retrieve metadata information that was stored in metadata database on the node servers. This Search Engine is developed using Z39.50 protocol. Users can access metadata information on the node server through metadata gateway server on the internet. In relation to this concept, Indonesia has been developing an application to implement this metadata gateway which is called the DDSN. The database system adopted by the National Geospatial Data Clearinghouse is a distributed database system. All data producer should develop geospatial metadata databases under their responsibility. Each metadata database will be collected, stored and maintained in each data producer. The development of the geospatial metadata in each data producer should ensure that this comply with the content metadata standard adopted by the National Geospatial Data Clearinghouse. The use of protocol Z39.50 allows the National Geospatial Data Clearinghouse to be promoted into the world metadata community networks by making links to the regional and global clearinghouse. In addition, this protocol allows the implementation of the distributed database system. In this case the Geospatial Data Clearinghouse will be linked to the regional network called the Regional Spatial Data Infrastructure (RSDI), and to the global network called the National Spatial Data Infrastructure (NSDI) in which all other states are members. The Geospatial Data Clearinghouse then becomes an integrated national node to APSDI and GSDI clearinghouses.

Management
In principle, the Geospatial Data Clearinghouse is managed by the stakeholders consisting of all geospatial data producers and users in Indonesia. The management of the Geospatial Data Clearinghouse is implemented through the three organizations established for this purpose. These organizations are established by the stakeholders. These organizations consist of the Permanent Committee, Metadata Gateway and the Geospatial Metadata Centers. The Permanent Committee deals with activities in directing, administering controlling, and monitoring the existence of the Geospatial Data Clearinghouse. The National Metadata Gateway is an inter-institution body that is established to develop, maintain and operate the l Metadata. The Geospatial Metadata Centers are a unit in data producer institutions that is established to develop, maintain and operate a metadata server within each institution.

Policy
The Geospatial Data Clearinghouse will be served as a metadata gateway for all geospatial metadata produced in the country. This means that all metadata users can only access national geospatial metadata through this focal Geospatial Data Clearinghouse. To improve service for the users, the National Geospatial Data Clearinghouse will be linked to the RSDI and NSDI clearinghouses and become an integrated node of these clearinghouses.

The Spatial Data Clearinghouse will be developed in concert with the Federal initiatives to facilitate the exchange of spatial data among members of the national GIS community. The Spatial Data Clearinghouse will be available on the Internet and is designed to increase the value of spatial data through sharing. As part of this national effort, the Clearinghouse provides a mechanism for potential users of spatial data to determine whether data sets they need are already available or under development. This means of improving access to and sharing of spatial data has the potential to lower the cost and greatly increase the use of these data through out the nation. The Clearinghouse is unique among state Clearinghouses for two primary reasons. First, it is a statewide resource available to support the sharing of spatial data for all sectors. State agencies, local governments, non-profits, academia, utilities, and the private sector may all use the Clearinghouse as a mechanism for sharing their spatial data as well as for identifying useful spatial data. Second, the Clearinghouse is the only state clearinghouses which offer spatially searchable standard metadata for the three tiers of government. The primary purpose of the Clearinghouse is to allow producers of geographic data to describe what data sets they have available and to allow users of Geographic Information Systems to find the data sets they need. Once the appropriate data set is located, the system provides information on how to obtain the data files, including an option for immediate online transfer of the files using ftp. Users who access the Clearinghouse may search for available data and review detailed descriptions of the data. Once a data set of interest is identified, information is provided on how to obtain the data files. For some data sets, an option for immediate on-line transfer of the files is available using standard file transfer protocol (ftp). The URL for the prototype Spatial Data Clearinghouse is: http://www.xxxxx.html. Figure 1 shows the World Wide Web home page for the Clearinghouse.

Describing Data Sets with Standard Metadata


Standard metadata (information describing data) is an essential prerequisite for effective information sharing. Contributors to the Clearinghouse described their data sets using the FGDC Content Standard for Digital Geospatial Metadata. The FGDC standard specifies what should be contained in a metadata record for spatial data sets. It includes such information as who produced the data, the geographic area covered, the data set category or theme, scale, accuracy information, and instructions for how to obtain the data set. While the FGDC standard may be further refined for state. Results from the prototype also demonstrate that the standard could be used as-is in a production environment until further refinement takes place. Standard metadata is critical to collaboration and exchange of data sets. A standard data description offers these necessary components for a successful spatial data clearinghouse:

I.

II.

III.

Standard metadata provide a common language for GIS users to describe data sets. Since one goal is to facilitate the exchange of data sets among a large number of independent organizations, a common language for describing the data sets is needed. Standard metadata help ensure that the data sets will be described thoroughly. The metadata template identifies mandatory and suggested fields; consequently, it serves as a guideline to those describing the data sets. The metadata can serve as an important tool for an organizations internal documentation as well as for data set exchange. Standard metadata offer the opportunity for automation. Once the metadata are standardized, software can be developed for creating, collecting, and searching the metadata on the Internet.

Search and Retrieval


The Clearinghouse offers two methods for data set identification. The first method is a directory which lists all available data sets, organized by category and distributor. Users can browse through the directory to find the data sets they need. The second method uses a search form to enter the specific criteria required. Both the directory list and the search results list contain links to the full metadata record. When a data set is of interest, the full metadata document can be reviewed in order to obtain a fuller understanding of its properties. In some cases, the metadata document also contains an image which reflects the geographic region and features that the data set encompasses. The distribution section of the metadata contains instructions for obtaining the data set. Those instructions, supplied by the metadata provider, may include on-line file transfers, electronic order forms or instructions for ordering by phone or mail.

The Search Form


The search form is an HTML forms document which allows users to enter their criteria for locating data sets. Three options are available for searching: I. Geographic Area (mandatory): The geographic area can be selected by clicking on an area identified in the displayed list or by entering the latitude and longitude coordinates for the bounding rectangle of the desired area. The search will find metadata records whose bounding coordinates overlap any part of the area chosen. When the user selects an area using the displayed list (instead of supplying actual coordinates), the software automatically selects corresponding coordinates for the area and then conducts the search. Data set Category or Theme (optional): To narrow the search further, the user can select one or more of eighteen categories. The categories represent broad data set themes, such as Cadastral, Demographics, Environment, or Infrastructure. A Geo Thesaurus (Figure 3) has been constructed to provide further help in choosing the appropriate category. When the Thesaurus option is clicked, the screen displays a table of common terms and identifies which categories may be useful for finding related information. For example, the term 'tax map' points to Cadastral, while 'roads' points to Infrastructure.

II.

III.

General Query (optional): Words listed in this input box are matched against the entire metadata document. Desired words can simply be listed in the input box, separated by spaces, or more complex Boolean searches can be constructed using AND, OR, and parentheses. Righthand truncation is also permissible with the use of an asterisk, so that sch* will find all documents which contain words beginning with "sch."

Clearinghouse Implementation & Reaction


The Clearinghouse can go on-line in, with the relevant organizations providing metadata for nearly all data sets. The Federal Geographic Data Committee should create a link to the clearinghouse site from the National Spatial Data Clearinghouse under its State and University listings. The Clearinghouse should also be registered with general Internet search sites such as Yahoo and Google. The numbers of visits made to the Clearinghouse and the number of the rate of growth of hits per business day. User reaction has been uniformly positive. An on-line survey gathering both general user information and specific feedback on the Clearinghouse was included in the prototype. The survey questions were designed to collect information about users and about their reaction to the Clearinghouse. User questions sought information about each visitors level of Internet and GIS experience, search methods, and primary interest in visiting the Clearinghouse. Other questions asked how the search form, instructions, and results listings could be improved and what additional information could be included in the Clearinghouse. Few suggestions were made for improvement and those made were primarily confined to a desire for more universal use of existing Clearinghouse features, such as automatic downloading of files through ftp and use of geographic map images to describe the data sets. Experience with the Clearinghouse has motivated some new efforts to provide metadata and spatial data. The Department of Environmental Conservation was an active participant in the design and implementation of the Clearinghouse and is now in the process of developing its own site for serving spatial data. Other positive reactions came from local governments. After attending a demonstration of the Clearinghouse, the Orange County Water Authority provided metadata for six data sets. Orange County Water Authority staff then demonstrated the Clearinghouse to other county officials to build support for providing full access to the Countys data resources via a clearinghouse mechanism. As a result, Orange County is implementing a local server connected to the NYS Spatial Data Clearinghouse to provide no-cost on-line access to metadata and spatial data sets. Rockland County is considering a similar program. Despite these early successes, much remains to be learned about repository management and about willingness and ability to use a web-based repository to support data sharing. As the prototype Clearinghouse contains less than fifty metadata descriptions, the full range and volume of use could not be tested. These and other user acceptance and performance issues will need further analysis as the Clearinghouse develops and grows.

How the Clearinghouse can be constructed


The metadata records should be collected using a simple metadata template that the contributors could update using any word processor. The completed records were placed in a directory on the CTG server (Digital Equipment Corporations DECstation 5000 with a RISC processor running ULTRIX 4.2) and were indexed using a software tool called freeWAIS-sf. FreeWAIS-sf was developed by . FreeWais-sf is an extension of freeWAIS, software provided by the Clearinghouse for Networked Information Discovery and Retrieval. FreeWAIS-sf provides two general types of indexing. Global indexing allows every word in the document to be indexed so that the entire text can be searched for a given word or query string. Local indexing allows specific fields to be defined within the document for more precise searching. Both of these methods were employed by the Clearinghouse as our search form allows queries against specific fields as well as against the entire metadata document.

The Common Gateway Interface


The search form interfaces with the http (web) server (NCSA http version 1.3) through the use of a common gateway interface (CGI). CGIs allow HTML forms to pass information from the form to the server and to indicate which program on the server should be executed when the data transmission takes place. This mechanism allows the search criteria to be sent to the server and allows programs to be executed which query the WAIS index for matching documents. The CGI can be built using Perl script programs as a foundation. The programs were modified to suit specific needs of the Clearinghouse. Complete details on the construction of the search mechanism for the Clearinghouse, including sample documents and the program code.

Other Options for Constructing the Clearinghouse


Through the use of common gateway interfaces, the same forms and functionality can be preserved on the client side of an Internet application, while a variety of options can be employed on the back end for carrying out the search and retrieval. The method employed in the prototype assumes that the metadata will be defined in an HTML or text document, with each field preceded by its tag (field name). Free Waissf is able to identify specific fields by recognizing these tags. A number of commercial products are available to perform the document searches conducted at the server. These alternatives to free WAIS-sf are primarily relational databases which allow you to store metadata and then search the database using a standard query language.The advantages of each are outlined below.

Advantages offered by free WAIS-sf:


Incorporates the Z39.50 standard (http://www.research.att.com/~wald/z3950.html) which is widely accepted by the library community and mandated by the Government Information Locator Service (http://vinca.cnidr.org/protocols/z3950/iitf_report.html ) and which provides for a common protocol for searching the databases of multiple servers; Allows for both field searching and total document searching. Most relational databases allow only field searching, although there are new products emerging which will allow full text searching within a relational database;

Is freely available; Is easy to learn and implement; It can search records in many different formats, such as text, HTML, gif, etc.

Advantages offered by a Relational Database


Allows the metadata to be constructed using a relational model rather than the current hierarchical model found in the FGDC standard; this model may more accurately reflect information about the underlying data sets and will allow common sections of the metadata to be created only once for multiple data sets, thus reducing the effort required for metadata creation and maintenance; Does not require that field tags be replicated in each document; Allows use of the Standard Query Language (SQL), a widely accepted search language with more sophisticated options than that offered by Boolean searching alone; Has attracted greater commercial investment and therefore users can expect better documentation, more sophisticated tools, easier interfaces, and a variety of support options.

Isite
The Clearinghouse for Network Information Discovery and Retrieval (CNIDR) has developed software called Isite (http://vinca.cnidr.org/software/Isite/Isite.html ), designed to incorporate some of the benefits offered by both free WAIS-sf and relational databases. Isite uses the Z39.50 protocol for communications between the client and server. On the server side, it includes its own text indexer and search engine. Alternatively, it includes an application program interface (API) which can be used to perform commercial database searches. Spatial Data Clearinghouse Technical Information Environment The Spatial Data Clearinghouse exists as part of a larger spatial data clearinghouse environment. The National Spatial Data Clearinghouse (NSDC), an initiative of the Federal Geographic Data Committee, is a network of virtual and physical repositories of spatial data available over the Internet. Hardware The Clearinghouse can be implemented within CTG on a DECstation 5000 running Ultrix 4.2, with 10 gigabyte of storage and 8 gigabytes of memory. Contents Forty-nine spatial data sets are described in the Clearinghouse with metadata which conform to the Federal Standard for Digital Geospatial Metadata. The Federal Metadata Standard was adopted as the standard for the Clearinghouse. Organizations interested in submitting data were provided with a packet containing information about the standard and a template of the standard on disk with embedded html code. The template was provided to support the collection of the metadata. In addition to providing

metadata some users provided both the data set and a image of the area represented by the data set. Structure The figure on the following page represents the structure of the prototype Clearinghouse

Other Uses of the Clearinghouse


The Clearinghouse could serve as a "one-stop shopping" center for GIS-related information. Specifically, in addition to providing information on existing spatial data, it could be used to Identify the data set needs of organizations looking for collaborators Provide information concerning GIS user groups Announce upcoming events Disseminate technical papers Disseminate vendor information Provide links to other Clearinghouses

Recommendations for Implementation The prototype Clearinghouse is a central site for storing metadata and some spatial data sets. A centrally managed site can offer savings, but we also believe that data management must remain in the hands of the data owners. Although it may be appropriate for a central site to contain metadata, the data sets themselves do not need to be stored at the same site. The size and quantity of the data sets may preclude central site storage. Information on existing Spatial Data Clearinghouse yields the following recommendations.

Fully automate as many aspects of the process as practical including metadata creation, editing, and collection; updating of all Clearinghouse web pages as metadata is added or changed; and data set transfer through the use of ftp and online order forms. Centralize the home page and technical support for the Clearinghouse. The Clearinghouse application and its technical administration can be managed most efficiently from a single site, avoiding the need for duplicative development and maintenance by multiple organizations. A central site also ensures a uniform presentation of the metadata and provides a principal forum for information exchange. Data set owners, who know the data best, should create and manage their own metadata. Metadata creation, update, and deletion needs to be the responsibility of each independent organization. The development of automated tools will make the independent management of metadata possible.

Organizations should house their own data sets and create their own ftp sites. The metadata can contain a link to the corresponding ftp site so that a simple click of the transfer option initiates the downloading of the file. Because the data location makes no difference to the user, several organizations can work together to form a cooperative ftp site. Clearinghouse efforts should be leveraged by incorporating additional information and functionality for the GIS community. Making the site a one-stop shopping center for GISrelated activities in a nation will improve upon the utility of the Clearinghouse, increase visits to the site, and ensure more widespread participation in its ongoing development.

CHAPTER 3 KEY FACTORS IN MANAGEMENT OF A CLEARING HOUSE


The Number of Data Suppliers
This characteristic describes the number and diversity of data suppliers. The power of a clearinghouse is that several data suppliers can disseminate their products via this facility. The average number of data suppliers participating in a clearinghouse is high; however, there is great variety between the clearinghouses. For Austria, the Czech Republic, Slovenia, and the U.S., the number of data suppliers exceeds 100. In Canada, there are 1758 data suppliers. This contrasts with the 35 clearinghouses that have fewer than 10 suppliers (notably in South America and Asia with their powerful national mapping agencies.

The Type of Data Accessibility


This characteristic describes the presentation of the content. Not all existing clearinghouses give access to data or metadata. For example, in some cases the clearinghouse presents only a simple (not standardized) description of the datasets. For this reason, three classes of accessibility are distinguished: 1) abstract (simple/short description about the databases without using any formal meta-data description); 2) metadata; and 3) data (+metadata). In most clearinghouses, the user has access to metadata (Table 3). However, in eight countries (Australia, Canada, Dominica, Finland, Malaysia, Portugal, Singapore, and the U.S.), an option exists to access the data itself.

The Metadata-Standard Used


This characteristic describes the metadata-standard used. With the diverse sources from which spatial databases are built, it is extremely important to maintain information about the content, quality, source, and lineage of the data. A number of standard organizations have developed (or are in the process of developing) standards for storing and maintaining metadata. The most mature of these have been developed by the Federal Geographic Data Committee (FGDC) (1995) and the European Committee for Standardization (CEN/ 287 1996). These metadata-standards form the backbone of national clearinghouses. The FGDC metadata-standard is the most applied and distributed standard around the world (Table 4). The CEN standard is only applied in Europe. Recently, the International Organization of Standardization has created the ISO19115 standard (ISO/TC-211 2001). Currently, 10 countries have started a project to apply this last-mentioned standard for their national clearinghouse

The Number of Spatial Datasets


A means to quantify the content of a clearinghouse is the number of datasets. However, it does not represent the importance of the accessible datasets to the economic and social development of the country. The variety in the number of datasets is enormous (Table 5). For example, the U.S. federal clearinghouse can give access to almost 100,000 datasets (December 6, 2001), while the average of the 24 European clearinghouses is 440. The difference in the total number of accessible datasets between the U.S. and Europe is easily noticed (100,000 vs. 10,000). In total, the clearinghouses describe 170,000

spatial datasets together. 10 clearinghouses have more than 1000 datasets described (Australia, Austria, Canada, the Czech Republic, Japan, Mexico, South Africa, Switzerland, Uruguay, and the U.S.).

The Most Recently Produced Dataset


This characteristic describes the up-to-date nature of content and the management of content in the clearinghouse. It is the difference in months between the date of Web survey and the date of the most recently produced dataset described in the national clearinghouse. On average, the time of the production of the most recent dataset is more than 2 years (Table 6). However, 22 national clearinghouses describe spatial datasets produced within 1 year of the Web survey. However, for 12 national clearinghouses, this duration is longer than 3 years (mainly countries located in South America or Asia).

The Number of Web References


This number can be interpreted as a means to measure the popularity (use) of the clearinghouse site within the Internet network. The Free Link Popularity Service http: //www.linkpopularity.com (The PC Edge, Inc.) is used, which measures the number of links to the home page of the national clearinghouse that can be checked by the following search engines: AltaVista and Google. A well-linked popularity can dramatically increase traffic to the specific Web site. The link popularity of national clearinghouse is high, which means that they are an excellent source of consistent and targeted Web traffic. However, the variety is enormous (Table 7). The Number of Web references does not differ that much between the regions and so the popularity of a national clearinghouse can be considered as universal. The following national clearinghouses have high link popularity: Australia, Canada, Colombia, Finland, New Zealand, Norway, the U.S., and Venezuela.

The Number of Monthly Visitors


This characteristic describes the use of national clearinghouses for accessing spatial datasets. This amount is related to the number of visitors who have visited the homepage of the clearinghouse. The average number of visits of this page exceeds the 5000 visitors. It is worth noting that the variety between the implementations is high due to some particularly popular clearinghouses (Table 8). The following national clearinghouses are visited the most: Canada, Finland Portugal, Slovenia, and the U.S.; Portugals clearinghouse has approximately 60,000 visits per month

The Frequency of Web Updates


This characteristic describes the management of the content in the clearinghouse. One possible indication of a well-managed clearinghouse can be seen by the frequency of updated information. The average number of days of last update is high for the whole population of clearinghouses due to instances of poor management (with some updates exceeding 100 days) in Europe and Asia (Table 9). The variety between clearinghouses is high as, alongside the poorer managed clearinghouse, numerous excellently managed facilities operate (update within 1 day).

The Languages Used


This characteristic describes the number and diversity of users able to access data because of their familiarity and knowledge of the given language. 30 clearinghouses do not have a search mechanism

written in English (in addition, five of these are written in Arabic, Chinese, Greek, Japanese and Korean script). 29 clearinghouses use only their home language. These language problems reduce the accessibility to data (for English-speaking people).

The Use of Maps for Searching


The use of this facility can improve the accessibility to data. In 18 clearinghouses, maps can be used as an option to search for (Meta) data. This relatively advanced alternative for searching is popular in Europe and Asia.

Registration-Only Access
This characteristic describes the management and possible limitations of use. Before accessing the data, users must register themselves by entering personal details. This characteristic could have a negative impact on accessibility. For eight national clearinghouses, the user is required to register to access metadata or data (Canada, El Salvador, Finland, Hungary, Malaysia, Singapore, Spain, Canada, and Uruguay).

Summary
Information technologists can apply the environmentalists concept of reduce, reuse, and recycle to improve on the management of data. Members of this project sought to maximize the benefits of developing spatial data by reducing redundant data creation efforts, re-using existing expensive data sets, and recycling spatial data information. The recognition that organizational units can benefit from cooperative data development and exchange is part of an ongoing trend. It began when functional departments within organizations began to see themselves, not as separate entities with independent informational needs, but rather as part of an integrated whole which should work together to create an integrated information base. This trend progressed further as organizations realized the benefits of reaching out beyond their organizational walls, and began exchanging information with their economic partners, such as suppliers and customers. The participants in this project have extended the concept of information sharing one step further and have demonstrated the economic benefits of cooperatively developing and exchanging data among organizations which are essentially independent, but which have overlapping informational needs. The Internet-based Clearinghouse has proven to be a practical tool for fostering the desired cooperation and achieving the desired benefits. It is being used today with the tools currently available and will most certainly evolve into a more powerful instrument. Discussions are underway to transfer the Clearinghouse to a state agency, such as the State Library, for permanent operations. The focal Spatial Data Clearinghouse exists as part of a larger spatial data clearinghouse environment. The National Spatial Data Clearinghouse (NSDC), an initiative of the Federal Geographic Data Committee, is a network of virtual and physical repositories of spatial data available over the Internet. Hardware

The Clearinghouse can be implemented within CTG on a DECstation 5000 running Ultrix 4.2, with 10 gigabyte of storage and 8 gigabytes of memory. Contents Forty-nine spatial data sets are described in the Clearinghouse with metadata which conform to the Federal Standard for Digital Geospatial Metadata. The Federal Metadata Standard was adopted as the standard for the Clearinghouse. Organizations interested in submitting data were provided with a packet containing information about the standard and a template of the standard on disk with embedded html code. The template was provided to support the collection of the metadata. In addition to providing metadata some users provided both the data set and an image of the area represented by the data set. Structure The figure on the following page represents the structure of the prototype Clearinghouse.

Data clearing house policies in Nigeria

Figure 1 Information Structure for Spatial Clearinghouse

1. Every geospatial data producer shall provide metadata for each of its data holdings. 2. Government, through the lead agency and in consultation with the NGDI Committee, shall establish electronic geospatial metadata catalogue and Clearinghouses in NGDI node agencies in partnership with those agencies

3. A custodian of a fundamental dataset must, not later than 30 days after updating, furnish all updates of the base dataset to the clearinghouse; the clearinghouse shall in turn inform the custodian(s) of the derivative dataset(s) within 7 days, in order to ensure synchronous maintenance of the fundamental and derivative datasets. There are also unique costs brought about by the requirement for a National Geospatial Data Clearinghouse. These Clearinghouse costs include: Internet connectivity and system administration. Server hardware purchase and maintenance. Metadata creation for new and existing data sets. Maintenance of inactive data sets. Metadata distribution.

Responding to data requests from Clearinghouse users


Once the implementation is underway the management of expectations will begin. Here the goal is to deliver some operational capability through a pilot project which can be used to demonstrate capabilities using an initial geospatial data base. This serves to show the capabilities in an USACE environment while delivering on a functional requirement and to achieve a first success which should solidify management support. At the same time it allows the beginning of staff training and provides a set of lessons learned for the larger implementations to follow.

CONCLUSIONS
Since 1994, the number of national clearinghouses has steadily increased to a total of 59. Looking at the trend of implementation, countries can expect to see additional national clearinghouses established. In fact, building clearinghouses is a global activity (with the exception of Africa and the Middle East (as well as Australasia and Oceania)). Most existing clearinghouses are established in Europe, Southeast Asia, and North and South America. The main initiatives for establishment come from Anglo-Saxon countries, such as the U.S., South Africa, and Australia. The U.S., in particular, which is supported by the FGDC, has stimulated many (American continent) countries to build a clearinghouse. However, 124 countries have still not shown any initiative to build one. There are several reasons for this. For example, a country may not have appropriate network architecture or there may be institutional bottlenecks for implementation. The differences in content, use, and management between the clearinghouses are broad. An example of such broad difference in content is the total number of accessible datasets described in a clearinghouse. In the U.S. clearinghouse, this number is 10 times as high as the total number of all 24 European clearinghouses. The reason for such difference is due to each countrys unique historical, institutional, economic, legal, technical, and cultural setting. Especially in Europe, there are great contrasts in the number of datasets, suppliers, visitors, Web references, and frequency of Web updates, probably as a result of the high institutional, economic, legal, technological, and cultural diversity within this region. However, similarities between clearinghouses do exist (for example, the type of data accessibility and the metadata-standard used). The most applied metadata-standard is the FGDC. However, looking to the numerous projects to apply the ISO standard, it is likely that ISO19115 will be the most applied standard in the future. This international consensus standard reflects FGDC, CEN, and other inputs. It provides detail that goes beyond FGDC and CEN metadata, including special coverage of raster and imagery information. Currently, there are several initiatives to create implementable subsets and extensions of ISO19115 so that conversion of FGDC-support tools and implementations to meet ISO conformance requirements are facilitated (Federal Geographic Data Committee Metadata Staff Coordinator 2001). Looking to the average number of data suppliers, Web references, and visitors, we can conclude that national clearinghouses are a popular facility to distribute and access spatial data. Finally, in the future, it is highly probable that many national clearinghouses will give access to spatial data itself and provide complementary services such as online mapping. However, a concern could be the low frequency of Web updates of several clearinghouses due to poor management. Therefore, special attention has to be given to keep clearinghouse managers motivated for having a well-managed clearinghouse. Based on the 12 characteristics used, we can conclude that Australia, Canada, Portugal, and the U.S. have the best existing national clearinghouses. Additionally, this Web survey shows that not

only the richest countries have good clearinghouses. Examples of relatively poorer countries with suitable national clearinghouse are El Salvador, Nicaragua, and Uruguay. Based on the above research, for all countries, it seems that one of the keys for successful clearinghouse implementation is high political support and interest by means of funding and long-term strategy.

Das könnte Ihnen auch gefallen