Metadata Resources

CSDGM Overview

Metadata Tools
Metadata Links
 
Home > Metadata Resources > Why Metadata ?

What are metadata and why they are important


Introduction

In the last few decades GIS has become a crucial support for research and natural resources planning. GIS expertises have been developing a huge amount of geographical data that can offer a vast support to the needs of science and management. As many other new developing technologies, GIS has suffered some errors due to its booming in a relative short time. Most of the GIS dataset have been often created with a tight focus on specific projects and not particular attention has been given to the potential use of the same dataset for other applications. In fact, most of the GIS dataset produced until few years ago suffers of a lack of documentation of its technical characteristics, methodology and quality, which is extremely necessary for any future of the dataset from interested third parties. Undocumented dataset has lower potential to be used for other purposes because their capabilities to fit other purposes is unknown. The outcome of this situation is the abundance of extensive datasets that have limited potentials for future applications.

The abundance of un-documented geospatial datasets has created two main problems:

  • The absence of documentation on existing dataset has shielded its existence to many potential users.
  • Even if third parties are aware of the existence of these data, the lack of information on technical characteristics has raised many doubts on the fitness of the datasets for other potential applications. Often the level of accuracy, quality and completeness of a GIS dataset is relative depending on the focus of the application. This might generate a lot of reasonable doubts to adopt or not a specific GIS dataset because it is uncertain if it could be effective for the objectives of the research.

This general lack of knowledge, rather than a lack of GIS resources, has often forced GIS institutions to spend considerable amount on data production, duplicating information already existing, limiting the time available for data analysis.

Due to this situation, in the 90's there has been a big effort from governmental agencies to create some standard methods for GIS data description with two main objectives:

  • To document a set of characteristics of GIS datasets that can make the information more useful for third parties
  • To list with machine readable standards data attributes in such a way that the GIS community can easily retrieve any useful dataset for certain given thematic and geographical attributes by using online search engines.

The results of the implementation of GIS data documentation are twofold: inform the community of data availability and document its quality and fitness for other potential use. Once that a standard methodology to document GIS data can be achieved among data producers, we can reasonably assume that there will be major spread of knowledge of existing data, less duplication of data and an increase in data analysis. Nowadays there is big effort to create information that document geospatial data set or 'metadata' with standard formats. Most digital geospatial files often have some associated metadata in a little standardized way, but in order to be effective metadata would require a common level of standardization. Several organizations have already implemented a data documentation policy, while other are slowly starting to implement or haven't implemented it yet.


Definition of Metadata

The simplest definition of metadata is “data about data”. Although the definition of the word metadata is quite new, the concept behind it is not as new. Many people don’t think of metadata very much, but we create and use them often in daily life. For instance, a map legend is pure metadata defining map information such as its spatial reference, attributes description, scale, map producer, etc. Metadata has been applied to produce information about many types of documents. For instance it has been used for long time in libraries or archives to index text attributes using specific citations that set in a specific order the author, title or publication date of the resource. The following citation is a very simple example of metadata:

Maidment, D. R. (1993). Handbook of hydrology. New York, McGraw-Hill.

The information as described above gives the necessary information to point to a unique document. If the above information were not provided in a standardized format it would be almost impossible to retrieve such a document out of the enormous amount of documents available in the world. A metadata record consists of a number of pre-defined elements representing specific attributes of a resource. Something important to note about a reference is that we understand the information it is trying to convey by convention. The first element is the author's names, then the year, then the title, then the publishing place and publisher.

As we can understand, metadata are necessary to help a potential user to retrieve data and to evaluate whether or not these are appropriate for a specific use. Metadata help a data producer to publicize and support the use of any created data. The implementation of metadata increases the value of a dataset, because potential users are more likely to retrieve information about its existence and use it more properly. It also protects an organization's investment in data during the years. Because it is likely that personnel change in the years in an organization, undocumented data may lose their value. Subsequent workers may have little understanding of the contents and uses for a geospatial dataset and they may doubt the results generated from these data. The lack in confidence in a dataset might lead to duplication of the same data set just to achieve a level of confidence just for the absence of documentation of existing dataset.

GIS data is often created for a particular purpose. While these data may be useful for some applications, it may also be not suitable for others. Metadata records promote awareness among the GIS community on the suitable or unsuitable use of dataset.

Metadata are the background information that describes the content, quality and condition, and other specific characteristics of the data. In essence, metadata answer several significant questions such as what, who, where, when, why and how about the data. The most useful metadata makes it clear how a resource, once discovered, can be located and accessed.

It's important to realize that metadata is simply data about data and not the data in itself. Thus, making metadata public alone does not imply that the resource the metadata describes is also publicly available. Access to the data can be more or less difficult according to the wish of the owner of the resource. Obviously, it's not very useful for users to obtain information on data that can not access.

A note on the grammatical use of the word metadata. Although the word metadata, like the word data, is commonly used both as singular and plural noun, according to correctly grammatical rules is a plural noun. Therefore the academic GIS community adheres to the plural use of both these words, therefore, we should state 'what are metadata' rather than 'what is metadata'.


The burden of metadata

The creation of metadata to novel data producers might seem burdensome, but the long term advantages are far superior to the disadvantages of the initial burden of implementing a Metadata policy within an organization. The initial expense of documenting data clearly outweighs the potential costs of duplicated or redundant data generation.

The reasons why metadata are often absent are two-fold: it requires an initial commitment of time that people are initially not prepare to do. Even when a metadata policy is properly implemented, metadata is not embedded to the data itself. The first issue can be solved by providing guidelines and tools to facilitate the implementation, development and exchange of metadata. Unfortunately the second issue will remain problem until GIS software developer will develop tools to efficiently embed metadata into the GIS dataset structure they belong to.

The implementation of a metadata policy does not imply only creating metadata for any newly produced dataset, but also the documentation of all the dataset for any previously created dataset. The latest is by far the biggest burden for an organization. The information required to describe data created in the past is often missed as data creators might have either left the organization or forgotten the needed information over the years. It is important to understand that by postponing the description of existing dataset, the knowledge of its characteristics will diminish over time. As metadata will become a necessary information in the future to guarantee quality standards, it is strongly recommended to plan metadata establishment for existing dataset as soon as possible.


Who and when should write metadata

Metadata development and maintenance requires a more conscious effort from data producers and the subsequent users who modify the data to suit their particular needs. In the coming years, presumably GIS data quality will increase and metadata use will increase to reduce the effort in data acquisition and increase the time available for data analysis. Metadata will be unequivocally a need to insure data quality to potential users. Therefore, data management needs to be implemented in conscious way to guarantee high level of quality. In the process of metadata publication the management of the people involved is crucial.

The best time to collect metadata is while the data area processed, when the information required from the metadata standard are best known. Collecting information for metadata during the data development is simple as having a data-logger that records detailed information that might be forgotten even soon after the data is developed. During the data development, metadata collection is also useful for the data development because it provides an analytical tool that helps monitoring the methodologies used for data development.


Machine readable standards

One of the benefits of creating standardized metadata is that it provides a standard way to search for data. Because the format and the content of the document are known, programmers can develop precise tools to search for particular data.

Standard formats allow the followings:

  • They make possible to use automated tools to validate the metadata to comply with the standard format used, and much easier for a metadata reviewer to assess information in the metadata
  • They make easier to incorporate metadata with automatized routine from different data producers into a clearinghouse search index
  • They increase the exchangeability of metadata between agencies.

The strict definition of Metadata is "machine-readable information about electronic resources or other things".

Metadata that are machine-readable can be searched and parsed by search engines when loaded on a spatial data clearinghouse. A clearinghouse can be used as warehouse to store spatial that can be retrieved by other organizations. Because it is expensive and time consuming to create spatial data, it is useful to share data with other organizations.

In the past several agencies have implemented documentation of geospatial dataset, but specific formatting standards for metadata were not available or required. The general freedom in publishing metadata due to the past lack of formatting rules have generated sets of metadata that could not be accepted or exported to generic form of metadata. Individual agencies have produce metadata which comply with the request of providing metadata but formatted in different ways, thus their resources could not be easily incorporated into a searchable Clearinghouse or exchanged. To overcome these problems, the Federal Geographic Data Committee (FGDC) has created in the US an encoding standard to which all the US agencies have to comply. When all the agencies have started to comply with this unique standard all the metadata could start to be included in spatial data clearinghouse and their content retrieved with search engines.


FGDC standards - How the standard was developed?

The Federal Geographic Data Committee (FGDC) in 1994 approved a first version of a content standard for metadata, the Content Standard for Digital Geospatial Metadata (CSDGM). The Executive Order 12906 signed by President Clinton in 1994, "Coordinating Geographic Data Acquisition and Access: The National Spatial Data Infrastructure," requires all Federal agencies to use the CSDGM standard to document data that they produce beginning as of January 1995. In June 1998, the FGDC endorses the 2nd version of the FGDC Content Standard for Digital Geospatial Metadata (FGDC-STD-001-1998). At the present time all the US federal agencies that produce geospatial data are required to use the CSDGM standard. This standard is complex, but it provides a common framework for agencies to build detailed metadata upon. State and local agencies have been encouraged to adopt this metadata standard to help support the National Spatial Data Infrastructure (NSDI). Metadata which follow the Content Standards are machine-readable so that they can be searched and parsed on distributed NSDI Clearinghouses.

The standard provides a standard system for users to know:

  • What data are available
  • Whether the data meet their specific needs
  • Where to find the data
  • How to access the data

These standards are now slowly being implemented also by governmental, non-profit, and commercial participants worldwide that can make their collections of spatial information searchable and accessible on the Internet using free reference implementation software developed by the FGDC.

For an overall view of the FGDC standard refer to the CSDGM Overview section within the CSI Metadata Resource web page.


Other Geospatial Metadata Standards

Other metadata standards for geospatial data have been published:

ISO 19115

The Technical Committee 211 (ISO/TC 211) Geographic information/Geomatics of the International Organization for Standardization (ISO) has developed the ISO 19100 series as multi-part structured set of international standards for any information concerning geospatial phenomena.

In February 2001, TC211 Secretariat has approved for publication the first draft of the ISO 19115, as standard for Metadata of geographic information. This standard provides guidelines to document geospatial datasets for four major objectives: data discovery, data fitness for other uses, data access and use of data. The ISO 19115 standard characterizes the identity, extent, quality, spatial and temporal extent, spatial reference and data distribution.

The ISO 19115 standard development has been carefully developed to accommodate the requirements of the different international metadata organizations. On the other side, the majority of the standards for geospatial metadata will likely consolidate in the future through robust international discussions into a unique international accepted standard.

While, the implementation of a worldwide accepted ISO standard will mean changes in the future for other standards, the TC211 is working together with the main international metadata committee (such as FGDC, ANZLIC and Dublin Core) to guarantee that ISO standards will develop compatibly with the existing standards.

Dublin Core

The Dublin Core Metadata Initiative (DCMI) is an organization dedicated to promote the widespread adoption of a simple yet effective metadata standard for describing a wide range of networked resources. Its name is due to the fact that the first Dublin Core Series Workshop took place in Dublin, Ohio in 1995 (nothing to do with Ireland). The Dublin Core Metadata Element Set (DCMES) was the first metadata standard published in 1998 from DCMI as an IETF RFC 2413. The objective of the Dublin Core metadata is not to replace existing metadata, but rather to suplement existing methods for searching and indexing elettronic resources and codument on the internet. It represents a standard that can be applied to a broad range of products and disciplines of study. The Dublin Core element set is much simpler that other standards. It has been kept intentionally simple and small to allow non-specialist to create easily and unexpensively simple description, while providing for effective retrieval to potential users.

The Dublin Core Metadata Element Set is organized in two levels. The "Simple" level comprises 15 elements describing the general characteristics of the resource; the "Qualified" level includes an additional element and a group of element refinements that refine the description of the simple elements in ways that make the resource easier to discover.

Several initiatives are underway for mapping the Dublin Core standard to FGDC. Successful mapping between these standards will allow interoperability within catalogs using either the Dublin Core or the FGDC standard.

ANZLIC

ANZLIC (Australia New Zealand Land Information Council) is an inter-governmental Council responsible for the coordination of spatial information management in Australia and New Zealand. In 1996 ANZLIC published the first version of a standard to define the metadata elements that characterize spatial datasets. The ANZLIC Guidelines were, as far as possible, made consistent with the CSDGM guidelines of the FGDC. In 2001, the current ANZLIC metadata guidelines were published. This review was strongly encouraged as an interim measure to support the development of the international standard ISO 19115. A further revision of the ANZLIC guidelines is likely to be undertaken once the international standard ISO 19115 becomes stable, with the objective of implementing ANZLIC guidelines as a "profile" of the ISO 19115 standard. This standard consists of 41 core elements grouped into ten categories: dataset, custodian, description, data currency, dataset status, access, data quality, contact information, metadata date and additional metadata.


XML format

Several GIS softwares use Extensible Markup Language (XML) as storage mechanism for metadata. HTML (Hypertext Markup Language) is very useful language to display information such as text and graphics in web browsers according to a set of rules. Although HTML has been a very powerful tool to describe much of the information as it is in the World Wide Web, it does not specify the structure of the information provided in the document, thus limiting its use for the purposes of a structured metadata standard. XML, unlike HTML, let users specify the structure and function of the data contained in a document, and will likely revolutionize the way information is transferred on the Web in the coming years. XML is in many ways similar to a spreadsheet or a database in the way it stored data and information in a structured manner.

For more information about XML, visit the web site of the World Wide Web Consortium.


Clearinghouses

A data Clearinghouse functions as detailed catalog service with the support of links to metadata for available digital spatial data. GIS clearinghouses takes advantage of Web technology and offer tools for query, search and presentation of available digital data. Interested users can parse the collection of metadata with search engines and eventually retrieve the dataset they need for their intended application. Search engines inside a clearing house are connected to metadata records and can be used to search for specific attributes such as keywords, titles, abstracts, bounding coordinates, dates, etc.. The search engine parse the catalog, and gives in return a complete description of the attributes of the dataset, maps and links, if available, to download the data. Clearinghouses allow different types of agencies and organization to band together and promote their available geospatial data. Search engines for handling correctly metadata need to rely on the metadata being exactly predictable by having a standardized structure of variables and attributes. Therefore data can be retrieved correctly by potential users only if metadata conform to a standard format. Clearinghouse metadata provide a very powerful and cheap system to advertise data to the community through the internet. Clearinghouse might function either as repository site of the metadata, otherwise the metadata can be stored on computer servers maintained by the agencies that generate the data. This last option let data producers manage and make changes directly to the metadata accessed by the public.


The National Spatial Data Infrastructure (NSDI) manages the FGDC clearinghouse, which is currently the largest data collection of data available in the world. All the US federal agencies and many other public and private US and international organizations share their metadata on this powerful search engine. The FGDC clearinghouse has a large network based on a series of nodes and gateways. The gateways function as entry points, and implement web servers searching the FGDC clearinghouse. By searching for specific attribute values, the gateway search engines parse for the metadata records that are stored at clearinghouse nodes located in the data contributing organizations. Once the specific data set have being found the information on the access to the data are accesable to the user.

The FGDC clearinghouse can be accesses at http://clearinghouse1.fgdc.gov/.

Beside the FCDG clearinghouse, there are several other clearinghouses that use FGDC standards to inform the public on available data set. The following are instances of commonly used clearinghouse:

Back to Top

Suggestions and Comments :csi@cgiar.org  
© 2004. CGIAR - Consortium for Spatial Information (CGIAR-CSI)