Skip to main content
U.S. flag

An official website of the United States government

ScienceBase Updates - Winter 2021

Winter 2021 topics include information on center names in ScienceBase, revision guidance updates, cleaning data with OpenRefine, a tip on file order in ScienceBase, and a featured data release on global distribution of critical minerals.

ScienceBase Updates Header
ScienceBase Updates Header

Center Names in ScienceBase

Screenshot of a ScienceBase data release landing page with the SDC Data Owner field circled in red
Screenshot of a ScienceBase data release landing page with the SDC Data Owner field circled in red.

As part of Bureau-wide efforts to better cross-link primary systems and support flexible information retrieval, ScienceBase now uses USGS center names from a controlled list to assign a Data Owner to every ScienceBase data release. This list, consumed from a machine accessible web service, provides an unambiguous set of active USGS centers for the current fiscal year and ensures accuracy and consistency in labeling across multiple tools and systems. Values from the authoritative list of center names are used to populate the 'SDC Data Owner' field for data release products and enable browsing/querying for data by center, both in ScienceBase and in the Science Data Catalog. These center name values are also used to assign ScienceBase data releases, as well as other automated product content, to a specific center (with consistency in spelling and display) in the Drupal Content Management System that populates USGS web pages.

Examples of these ScienceBase queries, built using names as a structured query parameter, can be seen here.

Background on the USGS Centers Web service

The USGS center list used to manage ScienceBase data releases is pulled from the Science Inventory – Proposals to Products (SIPP) (requires USGS internal network access) web services maintained by the Water Mission Area Business Analytics Team. These web services provide machine access (from the internal USGS network) to organizational information for USGS business and science operations. USGS Science Data Management Branch (SDM) staff have worked in collaboration with the SIPP developers to maintain and refine these services, not only for delineating active USGS centers but also to support auto-fill features and internal linking across systems (e.g., ScienceBase, the USGS DOI tool, the Science Data Catalog, the USGS Publications Warehouse, center workflows, etc.).

In the web service, centers are defined as a logical grouping of cost centers and Federal Payroll and Personnel System (FPPS) organizations, usually with a director and requirement to operate independently. This approach is based on the 'Cost Center' definition in Survey Manual 320.1 and includes additional fields to facilitate cross linking between organizational resources in the USGS.

Each center has a unique, two or three character, alphanumeric code derived from the codes for each of its cost centers. Active status is determined in collaboration with the Office of Accounting and Funds Management (OAFM) (requires USGS internal network access) by determining accounts and cost centers that require funding. At the beginning of each fiscal year, the list is reevaluated to determine which centers have merged, split, or become inactive, and the web service is updated accordingly. Changes in the USGS centers web service (and in any downstream picklists such as in the ScienceBase Data Release (SBDR) Tool that pull from this list) will not occur until the change is implemented in our financial systems, usually at the beginning of a fiscal year. SDM staff and SIPP managers work with Center Directors and Deputy Directors to ensure the spelling and format of the names for any merged, split, or renamed centers are correct in the web service.

Maintaining Center Names in ScienceBase

At the beginning of each fiscal year, the SBDR Team syncs the ScienceBase Active Center List with the USGS SIPP Centers service. It is this list that is currently driving picklists in the SDM's suite of tools, including the SBDR Tool and the USGS DOI Tool. When center names are updated, the SBDR Team will automatically update the names for the Science Data Catalog (SDC) Data Owner on the ScienceBase data release landing pages for that center. Likewise, when centers merge, the SDC Data Owner for each data release from the merged centers will be updated with the new merged name. In the case of centers that split, the SBDR Team will work with the centers' data manager(s) to determine how to divide the existing data releases. If a center is deprecated, the SDC Data Owner is left as is on its data releases, unless an active center decides to take on ownership of those data.

Screenshot of ScienceBase's list of active organizations
Screenshot of ScienceBase's list of active organizations

Learn More

If you have questions about why a data release is labeled in a particular way, please reach out to the SBDR Team at sciencebase_datarelease@usgs.gov.

For additional questions about the SIPP web services, please contact Brian Reece (bdreece@usgs.gov).

Users interested in working with the SIPP services and other USGS data resources programmatically should contact Brandon Serna (bserna@usgs.gov) or Drew Ignizio (dignizio@usgs.gov).

 

Did You Know?

Attached files in ScienceBase are displayed in the order in which they are uploaded. ScienceBase users sometimes ask if it's possible to change file order without re-uploading all their files (large files can take a significant amount of time to upload and some authors have numerous large file in their data releases). Although it is possible to reorder attached files 'on the fly' through the user interface, this does not change the order of the files in the underlying item JSON. If you have a need to change the file order in the JSON itself, the ScienceBase team has a Python script that can reorder attached files alphabetically by filename. If you'd like to access the script, or for ScienceBase staff to run this script on an item, please let us know at sciencebase_datarelease@usgs.gov.

Revision Guidance Updates

Do you need to make a change or update to a published data release? Guidance for data release revision has recently been updated.

Revision Levels 

The first step in revising a data release is to determine under which revision level the changes fall. There are now five revision levels: 

  • Level 1 revisions do not change the data itself. 
  • Level 2 revisions are changes that are not expected to have a significant impact on the use of the data, and only apply to a small number of data values. 
  • Level 3 revisions are data-appending revisions, usually adding new data without changing the data structure.
  • Level 4 revisions are changes to data structure that are expected to cause issues for existing automated processes that have been using the data. 
  • Level 5 revisions are changes that are expected to have such a significant impact on the use of the data that the original data must be withdrawn. 

Note: Revision levels 2-5 require approval in IPDS for any updates.

Also, new to the guidance is a table outlining the steps to take for each revision level (portion of the table shown below).

Table showing data release revision levels
Table showing data release revision levels.

Common elements for revision levels 2-5 include the following: a version history text (.txt) file that details the changes made, updates to the metadata to include any new processing steps, addition of an 'update' date to the DOI, and inclusion of a versioning element in the title and citation (ex: ver. 2.0, January 2021). Specifics for these changes may vary by revision level.

Contacting the SBDR team at the start of your revision is advised. In addition to offering revision tips, the SBDR Team can help with things like duplicating the current landing page to create a working copy, determining the best structure for the revision, and help you plan for periodic data updates.

Please email sciencebase_datarelease@usgs.gov for more on data release revision.

Featured Data Release

critical mineral map
Approximate locations of mines, deposits, and districts where critical minerals are found. The critical minerals are discussed in USGS Professional Paper 1802, and many of these locations are described in further detail in that report.

Labay, K., Burger, M.H., Bellora, J.D., Schulz, K.J., DeYoung, J.H., Jr., Seal, R.R., II, Bradley, D.C., Mauk, J.L., and San Juan, C.A., 2017, Global Distribution of Selected Mines, Deposits, and Districts of Critical Minerals: U.S. Geological Survey data release, https://doi.org/10.5066/F7GH9GQR.

USGS Data Owner: Mineral Resources Program

Mineral research has been considered a priority research topic in the U.S. and globally in recent years. This release contains a geodatabase focusing on the global distribution of selected mineral resource features (deposits, mines, districts, mineral regions) for 22 minerals or mineral commodities considered critical to the economy and security of the United States as of 2017. The related publication for this data release has been cited by eleven news outlets, tweeted 82 times, and read by 221 Mendeley users. Most significantly, the data contained in this release and publication have been used to inform policy decisions for the U.S. (Critical Minerals and U.S. Public Policy, 2019) and used to inform international publications (Study on the EU's list of critical raw materials, 2020).

Use of data for decision-making and policy building, as shown in the example above, is only one way of measuring the impact of a data release. If you know of a data product available in ScienceBase that has gone on to be reused in other projects, inform policy decisions, garner attention in major media outlets, or any other interesting use, we'd love to hear about it. Please complete this form to contribute your data story.

 

OpenRefine     

When working with data, it can be important to map information from a raw format to one that can be easily refined and analyzed, which is not always a trivial task. OpenRefine, previously known as Google Refine, is a great solution to organize and format information for large datasets. OpenRefine is a free, open source software application for working with messy data. With this tool, the user can clean up misspellings, split columns, track changes, and a variety of other features. OpenRefine is useful for getting a quick snapshot of a dataset's content and resolving inconsistencies.

OpenRefine can support a variety of file formats, including CSV and XML. Data can also be transformed using common programming languages within the OpenRefine interface, such as Python, General Refine Expression Language (GREL), and Clojure. All actions that are performed on a dataset can easily be undone or reversed. Often, repetitive steps are necessary when working with multiple files of data. OpenRefine can make the process more efficient by replaying actions on multiple datasets, saving time and effort.

You can download and install the software from the OpenRefine homepage. The list of External Resources located on the OpenRefine wiki page is also a good place to get additional ideas and recipes for ways to work with data.

*Disclaimer: Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government. ScienceBase is not affiliated with OpenRefine.

Subscribe to the ScienceBase Mailing List for Quarterly Updates.