Metadata Creation
Metadata describe information about data, including who, what, where, when, why, and how, so that it can be understood, re-used, and integrated with other data. Metadata records follow a standard format to enable interoperability.
Why Do We Need Metadata?
Metadata are crucial for any use or reuse of data; no one can responsibly re-use or interpret data without metadata that explains how the data were created, why, where it is geographically located, and details about the structure of the data.
Uses for Metadata
Metadata are used for enabling data discovery, understanding data, analysis and synthesis, maintaining longevity of data, tracking the progress of a research project, and demonstrating the return on investment for research at an institution.
Getting Started with Metadata Creation
Gather content for the metadata record
- Understand what goes into a metadata record (e.g. title, abstract, methods, keywords, etc.).
- Use the Metadata Questionnaire [PDF] or Metadata in Plain Language [PDF] to gather content for building a metadata record or use metadata creation tools which will ask you the same questions about your data.
Federal agencies are mandated by Executive Order 12906 to use metadata standards endorsed by the Federal Geographic Data Committee (FGDC) below:
- Content Standard for Digital Geospatial Metadata (CSDGM) or its extensions for biological data (Biological Data Profile) and shoreline data
- International Organization for Standardization (ISO) series of standards (19115, 19115-2, 19139, etc.). There is an ongoing effort to move towards adopting the ISO metadata standard.
Both FGDC-CSDGM and ISO require metadata to be formatted in Extensible Markup Language (.xml) although a stylesheet can be applied over the XML to make it easier to read. Learn more about XML for Advanced Users.
FGDC-CSDGM Standard
Examples of metadata records in FGDC-CSDGM for different types of information products. View the metadata record in its native XML code or with a stylesheet applied to be easier to read.
- Biological data with taxonomy (Biological Data Profile) [XML][Stylesheet]
- Geospatial data [XML][Stylesheet]
- Tabular non-spatial data [XML][Stylesheet]
- Project level [XML] [Stylesheet]
- Systems Level Applications or Collections [XML]
- Water Sampling Site [XML]
- Database [XML][Stylesheet]
- Models [Stylesheet]
ISO Standard
An example of a metadata record in ISO 19115-2. Please note that it may contain only certain sections of the ISO standard.
- USGS Barnegat Bay hydrodynamic model for March-September 2012 [XML][Stylesheet]
For more information about metadata as it pertains to the USGS data release process, visit Metadata for Scientific Data FAQs. Also, check out the Metadata for Research Data and Structuring and Documenting a USGS Public Data Release training modules on the Data Management Website's Training page.
Tools for Creating Metadata Records
The following free tools create or edit FGDC CSDGM metadata in XML. For a wider selection of tools see the FGDC Metadata Tools. For a list of tools for the ISO metadata standard, refer to the FGDC ISO Metadata Editor Review.
- USGS Metadata Wizard - A Python toolbox in Esri ArcGIS Desktop for creating FGDC-CSDGM metadata for geospatial data. The tool ingests geospatial files and through a semi-automated workflow, creates and updates metadata records in Esri’s 10.x software. Best for geospatial data (e.g. raster and shapefiles) and tabular data (e.g. Esri geodatabase or database file). Comma separated value files can be used but must first be converted into Esri formats.
- USGS Metadata Wizard 2.x - a cross-platform, desktop application modeled off of the original Metadata Wizard to create CSDGM metadata. This version of the Metadata Wizard does not have Esri dependencies and provides support for additional tabular data file formats.
- USGS TKME - A Windows platform tool for creating FGDC-CSDGM which can be configured for Biological Data Profile and other extensions. The software program is closely aligned with the Metadata Parser, and can be configured for French and Spanish.
- mdEditor - create ISO and FGDC-CSDGM metadata with this web-based tool
- Data dictionary conversion service - convert a data dictionary table to/from metadata format (instructions).
- USDA Metavist - A desktop metadata editor for creating FGDC-CSDGM for geospatial metadata. Includes the Biological Data Profile (version 1.6). Produced and maintained by the USDA Forest Service.
- Microsoft XML Notepad - A simple intuitive user interface for browsing and editing XML files. Does not automatically produce FGDC-CSDGM records but allows easy editing and validating of existing metadata records. See Advanced Users to learn how to configure this tool.
Best Practices for Metadata Creation
- Gather all information together, especially if multiple people have information that you need.
- Use information that is already developed.
- Re-use text from grant or funding proposals (e.g. abstract, purpose, date, etc.).
- Reference the data dictionary that was used during data collection and processing to complete the Entity & Attribute section of a CSDGM metadata record.
- Choose a descriptive title for your data that incorporates who, what, where, when, and scale.
- The single most important text in the metadata record is the title because it is the main thing people will see in metadata catalogs (e.g., USGS Science Data Catalog, Data.gov).
- Example template: [Measurement] of [phenomenon] in [geographic feature] at [geographic location] during [timeframe]
- Examples:
- Vertical chemical profiles collected across haloclines in the water column of the Ox Bel Ha cave network within the coastal aquifer of the Yucatan Peninsula in January 2015 and January 2016
- Aerial imagery and photogrammetric products from unmanned aerial systems (UAS) flights over the Lake Ontario shoreline at Sodus Bay, New York, July 12 to 14, 2017
- Greater Yellowstone Rivers from 1:126,700 U.S. Forest Service Visitor Maps between 1961-1983
- Choose keywords wisely: Consider all of the possible interpretations of your word choices and use a thesaurus to add descriptive terms you may not have otherwise selected.
- Placement of the DOI for the data in a CSDGM metadata record
- The DOI should go in the primary <onlink> in the Citation Information section.
- Make sure that the format of the DOI is a URL, (not of the format doi:10.5066/ABCD123). Your DOI should be entered in the format https://doi.org/10.5066/ABCD123. If your DOI is not entered as a URL, your metadata record will be rejected by catalogs such as the USGS Science Data Catalog and Data.gov
- Placement of the DOI for the related publication in a CSDGM record
- The related publication is usually cited as a Larger Work Citation in the metadata. The Larger Work Citation has its own <onlink> field, and this is the correct location for the publication's DOI.
- Make sure that the format of the DOI is a URL, (not of the format doi:10.3133/ABCD123). Your DOI should be entered in the format https://doi.org/10.3133/ABCD123. If your DOI is not entered as a URL, your metadata record will be rejected by catalogs such as the USGS Science Data Catalog and Data.gov.
- Include as many details as you can in the metadata record for future users of the data.
- Whenever you change your metadata record, update the metadata date (date stamp) so that metadata repositories will know which version of the record is most recent.
- Use the best practices described in the Systems Level Applications or Collections [PDF] for large data systems or when describing "collections" of data.
- Specific guidance is available on creating metadata to accompany a NetCDF data release.
Validating Metadata Records
You must validate metadata to ensure it has been created properly and all required elements have been filled in. Validation compares the metadata standard to the XML metadata record to ensure it conforms to the structure of the standard. See best practices for Checking Metadata with Data [PDF] with FGDC-CSDGM metadata.
Tools:
- Metadata Wizard - Users can run validation on a metadata record within this tool.
- USGS Metadata Parser – A tool that validates XML metadata records against the FGDC-CSDGM standard and generates error reports if any. Good for geospatial and non-geospatial data. Users can view XML metadata records in easy-to-read formats (html, text). It is multilingual (English, French and Spanish) and can be configured for the Biological Data Profile and other extensions. For advanced users, learn how to Run MP from the Command Line window.
- Microsoft XML Notepad – The tool offers the ability to validate records but requires a schema package. See Advanced Users to learn more.
Learn more about reviewing metadata on the Metadata Review page.
My Metadata is Created, What’s Next?
- USGS policy requires a formal review of the data and metadata if intended as a USGS data release. Learn more about reviewing metadata on the Metadata Review page.
- Package your data and metadata together whenever possible since the metadata record is critical to understanding the data.
- Work with your organization to identify how metadata should be shared or visit Publish and Share for more information. Sharing metadata improves discoverability, access, and reuse of the data. The USGS Science Data Catalog is the approved mechanism for serving USGS metadata to data.doi.gov, data.gov, and geoplatform.gov, etc. Learn more about how to submit your metadata records to the Science Data Catalog at Publish/Share > Data Catalogs.
- The latest USGS Science Data Catalog (SDC) now requires that every metadata record be assigned a unique persistent identifier (PID), so that records can be individually tracked in both the SDC and the downstream federal catalogs for uniqueness, provenance, and versioning. The metadata PID must be unique, registered in the USGS Asset Identifier Service (AIS), and must be placed in a specific location in the CSDGM or ISO XML record. Depending upon the repository selected, the metadata PID may need to be assigned by the metadata author prior to deposit in the repository, or it may be assigned by the repository staff as part of the finalization of the data release. Consult the PIR Tool FAQ site for information regarding responsible party for metadata PID registration and insertion in the final XML file.
Advanced Users
Microsoft XML Notepad - An XML editor that can help create and edit metadata records directly in XML code. The software is free to download but only available for PC systems.
- Instructions for using XML Notepad [PDF]
- Sample Starter Template [XML] - A starter metadata record that can be filled in with content.
- Metadata Wizard Stylesheet [XSL]: Use XML Notepad to display metadata in an easy to read form with the stylesheet. See Section 5 of the PDF, "Instructions for using XML Notepad."
- Find and Correct Errors: Use a schema package to ensure the metadata record is correct according to the FGDC-CSDGM standard. Once downloaded, schemas must be reconfigured in XML Notepad to point to the file location of the schema on your local computer. While the schemas help identify some errors, you must use a validation tool for the final metadata record.
EML to CSDGM-BDP Transform [XSL] - This transform file can transform metadata in the Ecological Metadata Language (EML) standard to FGDC-CSDGM Biological Data Profile. After transformation, validate the metadata record and check to ensure content was adequate transferred.
What the U.S. Geological Survey Manual Requires:
The USGS Survey Manual chapter SM 502.7 Fundamental Science Practices: Metadata for USGS Scientific Information Products including Data provides metadata requirements for USGS scientific information products and scientific data that are Bureau-approved for release.
SM 502.7 further specifies metadata must accompany all USGS scientific data and other information products. Metadata records are to be developed in a standardized way that enables users to understand the context and to evaluate the usefulness of the data or information product. Metadata records for scientific data must comply with standards such as the FGDC Content Standard for Digital Geospatial Metadata, the International Organization for Standardization suite of standards, or other USGS endorsed FCDC standards. A minimum of one metadata review by a qualified reviewer is required for all USGS scientific data and other information products approved for release.
The USGS Survey Manual chapter SM 502.8 Fundamental Science Practices: Review and Approval of Scientific Data for Release discusses when metadata requirements apply for release of scientific data.
SM 502.8 further specifies scientific data approved for release must comply with the metadata requirements as described in SM 502.7, and the metadata must be deposited in and shared through the USGS Science Data Catalog. Reviews of the data and the associated metadata are required, and these reviews must be documented in the internal USGS Information Product Data System (IPDS).
For additional guidance, please refer to the Fundamental Science Practices FAQ: Metadata for USGS Scientific Data.
Recommended Reading
References
- Chatfield, T., Selbach, R. February, 2011. Data Management for Data Stewards. Data Management Training Workshop. Bureau of Land Management (BLM).
- DataONE Data Management Skillbuilding Hub.
Page last updated 1/2/24.