Skip to main content
U.S. flag

An official website of the United States government

ScienceBase Updates - Winter 2022

Winter 2022 topics include information on section 508 and data release, open data policies, a tip on the ScienceBase Cloud Upload, and a featured data release on landsat burned area products for the U.S.

ScienceBase Updates Header
ScienceBase Updates Header

Section 508 and Data Release

Section 508 is federal policy under the Rehabilitation Act, and requires the USGS “to ensure any information and communication technology (ICT) it develops, procures, uses, or maintains is accessible to both Federal employees and members of the public with disabilities” (USGS Survey Manual 600.6). While Section 508 compliance has been a major topic of discussion for USGS staff who manage websites and prepare reports for publication, it can often be unclear to data authors what exactly they need to do to make their data release compliant with Section 508. The ScienceBase Data Release (SBDR) team has compiled some resources and answers for data authors, in coordination with the Science Publishing Network (SPN), and helpful USGS data managers and center staff. A few quick tips are outlined below. Note that ScienceBase will flag non-compliant documents as best we can, but ultimately, it is up to data authors and the internal review process to ensure that data and related documents are compliant. 

Whenever possible, use open, machine-readable formats 

Using open, machine-readable formats, (CSV, TXT, ASCII, XML, JSON, etc.) often means automatic compliance with section 508. Machine-readable formats are generally able to be accessed by screen readers, and thus fulfill accessibility requirements. Below are a few considerations for making sure your machine-readable formats are fully accessible:  

  • Ensure that spreadsheets do not rely on colors or other visual elements to convey meaning that the text does not convey,  
  • Use descriptive file names that convey what is within the document (e.g., avoid non-descriptive names like “Table 1” or “Bird Data”) 
  • Verify that the data structure and any data tables are in a logical reading order and do not require visual aids to understand. For example, avoid merged cells, as it can be difficult to understand how these cells fit into the data structure without visual cues. 

Embed alternative text in open image formats 

Whenever possible, images released through a data release should be in an open format (PNG, TIFF, JPG, etc.), with alternative text for each image embedded within the image’s metadata. If data authors only need to add alternative text for a few images, it can be easiest to edit the file directly. For PC users, this means right-clicking on the image within the file explorer, going to ‘Properties’, and then ‘Details’. Here, you can add a title and descriptive text in the ‘title’ and ‘subject’ fields (fig. 3). 

Details tab in the properties of a jpg image, focusing on the title and subject fields
Details tab in the properties of a jpg image, focusing on the title and subject fields

For Mac users, a control-click and navigating to ‘Get Info’ (or pressing command-I) should lead to a similar screen. 

For adding alternative text to a large batch of image files where the alternative text may only need to differ in small ways (e.g., site name), the SBDR team can share Python scripts, created by the Geology, Geophysics, and Geochemistry Science Center’s data managers, with you to help you get started. 

Finally, imagery headers (e.g., EXIF, or exchangeable image file format) can be an option for embedding metadata into an image. In EXIF, for example, ImageDescription would be an appropriate field in which to place alternative text. There are several free tools available for helping users edit EXIF metadata, including some batch editing capabilities. Examples include EXIF Tool (https://exiftool.org/) and XnView MP (https://www.xnview.com/en/xnviewmp/). 

Avoid PDFs and Word documents 

PDFs and Word documents are the most common data release documents to be flagged for being non-compliant with Section 508. These document types are often also the most difficult to remediate, as they need to include a logical reading order, alternative texts for any images or figures, and appropriate tagging of document elements and tables. If possible, it is best to avoid these formats, instead using .txt or .csv. 

If you must use these formats, there are resources on www.section508.gov that can help you make PDFs and Word documents 508 compliant. Adobe Acrobat DC/Pro includes an accessibility checker that is helpful for checking your PDF (fig. 3), as does Microsoft Word (fig. 4). The USGS Section 508 Coordinator demonstrated remediating both PDF and Microsoft Word documents during the Community for Data Integration’s Data Management Working Group June 2021 monthly meeting. View or download the recording here

Screenshot of the accessibility checker in Adobe PDF
Screenshot of the accessibility checker in Adobe PDF

 

Screenshot of the accessibility checker within Microsoft Word
Screenshot of the accessibility checker within Microsoft Word

Did You Know?

The ScienceBase Cloud Upload / File Manager is now the best way to upload files larger than 1GB. It has officially replaced the Large File Uploader. Learn more about using the new File Manager on the ScienceBase About Site

Screenshot of the ScienceBase Cloud Upload/File Management setting section
Screenshot of the ScienceBase Cloud Upload/File Management setting section
Screenshot of ScienceBase Cloud Upload interface
Screenshot of ScienceBase Cloud Upload interface

 

Featured Data Release

Heat map of the United States
Heat map of the United States

Caption: All burned areas mapped at 30-m resolution across the conterminous United States from 1984 through 2018 (Hawbaker and others, 2020). 

Hawbaker, T.J., Vanderhoof, M.K., Schmidt, G.L., Beal, Y., Picotte, J.J., Takacs, J.D., Falgout, J.T., and Dwyer, J.L, 2020, The Landsat Burned Area products for the conterminous United States (ver. 2.0, October 2021): U.S. Geological Survey data release, https://doi.org/10.5066/P9QKHKTQ

USGS Data Owner: Geosciences and Environmental Change Science Center 

Wildfires have been a major subject of research in recent years as fires become more severe and wide-spread. In response to the need for more accurate data on burn areas, the USGS created an algorithm that identifies burned areas in temporally-dense time series of Landsat Analysis Ready Data (ARD) scenes to produce the Landsat Burned Area Products. Some products of this effort included in the data release are the maximum burn probability (BP), burn classification count (BC) or the number of scenes a pixel was classified as burned, filtered burn classification (BF) with burned areas persistent from the previous year removed, and the burn date (BD) or the Julian date of the first Landsat scene a burned areas was observed in. 

The data release landing page in ScienceBase has been heavily accessed since its publication in 2020, with over 1,900 downloads of the data, and over 10,000 visits to the main landing page, not including APIs (data obtained from the ScienceBase Data Release Summary Dashboard). The related publication has also received attention, having been cited by 22 other publications since July 2020. 

If you know of a data product available in ScienceBase that has gone on to be reused in other projects, inform policy decisions, garner attention in major media outlets, or any other interesting use, we'd love to hear about it. Please complete this form to contribute your data story.

 

Open Data Policies

The OPEN Government Data Act (S. 760 / H.R. 1770) is part of the Foundations for Evidence-Based Policymaking Act of 2018 (H.R.4174), which was signed into law in early 2019. It requires that data assets published by Federal agencies be 1) machine-readable, 2) in an open format, and 3) made available under an open license.  

What does this mean for USGS data authors? Here's some additional information that may be helpful.  

Machine readable means that to the extent possible, data are stored in a format that is consistent in structure, exhibits clean formatting (e.g., columns, values, etc.), and is well-documented such that it can be opened and read with common software libraries and tools. 

Let's say you are publishing a journal article, with supporting data in an associated data release. The data are presented in both places. The journal article contains a table that summarizes the data in an image (fig. 1). There is extra formatting to make it easier for a person to read, but it isn't considered machine-readable.  

Example of a formatted table in a publication
Example of a formatted table in a publication

The corresponding data release contains the original data in CSV format (part of which is displayed in fig. 2). It's structured in a way that makes it easy for analyses to be run on the data. For example, the CSV file contains just one data table and doesn’t have extra descriptions or formatting that would require a person to read through to make sense of what data are present, or what they represent. Instead, a computer can programmatically ingest the data. This makes it an example of a machine-readable file.  

Example of a .csv with data
Example of a .csv with data

An open format is one that is free to use, has public specifications, and is platform independent, which means that both non-proprietary and proprietary applications can read and create files in this format. Examples include CSV, GeoTIFF, shapefiles, XML, netCDF, and PNG. 

If you use proprietary formats in your data release, please also include a copy of the data in an additional, more open format, when possible. 

A note: the ScienceBase team encourages authors to provide tabular data in open formats such as CSV or tab delimited, as opposed to or in addition to Excel formats. Excel is technically an open format, but it supports features that can make it less machine-readable (e.g., special formatting).  

Lastly, here is some information about open licenses

Data created and released by the USGS are by default in the public domain (the equivalent of a CC0 license). The OPEN Data Act doesn’t allow restrictions on data use. For example, a data author can't require that users contact the author before using their data. 

If the USGS is obtaining data produced by non-Federal parties, the parties may request that an open license be applied to the data, instead of the default public domain. An open license is one that does not place restrictions on “copying, publishing, distributing, transmitting, citing, or adapting” data (OPEN Data Act). For example, a CC BY license is considered open because there are no restrictions on how people use the data; the only requirement is for attribution. If more restrictions are necessary, these should be documented in a legal agreement and data management plan at the beginning of the project. 

An additional note: The ScienceBase team shares guidance and information (such as this summary of the OPEN Data Act) with the goal of streamlining and improving compliance with policies. Responsibility for ensuring the requirements of the OPEN Data Act are met rest with the approving officials and data authors. If a data release has been reviewed and approved in IPDS, ScienceBase can host the data.  

References:

Open, Public, Electronic, and Necessary (OPEN) Government Data Act, S.760 / H.R. 1770, 115th Cong. (2017-2018). 

H.R.4174 - 115th Congress (2017-2018): Foundations for Evidence-Based Policymaking Act of 2018. (2019, January 14). https://www.congress.gov/bill/115th-congress/house-bill/4174.

Howard, R.J., 2019, Plant community establishment in a coastal marsh restored using sediment additions, Barataria Basin, Louisiana: U.S. Geological Survey data release, https://doi.org/10.5066/P9VGVX76

Howard, R.J., Rafferty, P.S., and Johnson, D.J., 2020, Plant Community Establishment in a Coastal Marsh Restored Using Sediment Additions: Wetlands v. 40, p. 877–892. https://doi.org/10.1007/s13157-019-01217-z

Subscribe to the ScienceBase Mailing List for Quarterly Updates.