Skip to main content
U.S. flag

An official website of the United States government

Deep Storage Data Release

Deep storage data releases in ScienceBase enable USGS researchers to release large volumes of data that may not be feasible or cost-effective to release through traditional publishing paths.

As the size of datasets increase, it may not always be feasible or cost-effective to host the data on standard ScienceBase on-premises or cloud storage. In these cases, the ScienceBase Data Release (SBDR) Team can help authors leverage BlackPearl storage that is made accessible through Globus. The SBDR Team is calling data released through BlackPearl/Globus “Deep Storage” data releases. 

 

What is BlackPearl? 

BlackPearl is an object-based storage platform with an S3-like interface. It can simplify workflows for managing large volumes of data on various storage targets. USGS provides disk-based and tape-based storage targets behind the BlackPearl interface. 

 

What is Globus? 

Globus is a service that allows users to efficiently, reliably, and securely move data between systems through a single web interface. Globus enables data sharing through “Guest Collections” on top of USGS storage systems, such as BlackPearl and S3 buckets. 

 

Under what circumstances are Deep Storage data releases appropriate?  

Data releases with large volumes of data files that, when compiled, total multiple TBs of data may be appropriate for Deep Storage data releases.   

 

Under what circumstances are Deep Storage data releases NOT appropriate?  

Deep Storage data releases are NOT appropriate when the data authors need fast, programmatic access to data file content via web services (e.g., Cloud Optimized GeoTIFFs). For example, web applications or workflows requiring streaming data cannot be built on top of the data in this deep storage. 

 

What do I need to do to create a Deep Storage data release? 

  1. Request a ScienceBase data release landing page and digital object identifier (DOI) through the ScienceBase Data Release Tool, as you would for a standard data release. 
  1. First-time Globus users will need to log into Globus for the first time to instantiate their account. Select U.S. Geological Survey and you will be routed through the USGS multifactor authentication flow. 
     

    Screenshot of the Globus login screen displaying "U.S. Geological Survey" in the organizational login dropdown menu
  1. Contact the SBDR Team at sciencebase_datarelease@usgs.gov with information about your data, including the total size, number of individual files, and anticipated downstream use of the data. If the SBDR Team agrees that the data warrants a Deep Storage data release, they will create a Globus guest collection to host the data and will provide you with the URL. 
  1. Transfer your data to the Globus guest collection using Globus File Manager. If the files are located on a personal computer or server, you may need to install Globus Connect Personal on that machine to enable the transfer. Globus Connect Personal is available for Windows, Mac, and Linux. Please read the installation instructions thoroughly before installing. You should NOT need USGS service desk assistance to install the software. 
     

    Screenshot of the Globus File Manager interface with the Panels option circled in red
  1. Upload a collection metadata record to the ScienceBase landing page 
  1. Add one of the following statements to the end of the summary on the ScienceBase landing page: 
    1. Data files for this data release are large and cannot be easily downloaded through standard protocols. Therefore, the data files are provided via a Globus Access Portal (linked under Related External Resources). Globus* is a fast, secure, and reliable way to move large data files and will automatically resume transfers when there are network disruptions. Learn more at https://www.globus.org/data-transfer.   
       
      * Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.  
       
    2. Data files for this data release are numerous and cannot be easily displayed for browsing in the ScienceBase interface. Therefore, the data files are provided via a Globus Access Portal (linked under Related External Resources). Globus* is a fast, secure, and reliable way to move large data files and will automatically resume transfers when there are network disruptions. Learn more at https://www.globus.org/data-transfer   
       
      * Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. 
  1. Contact the SBDR Team at sciencebase_datarelease@usgs.gov to make the data release public. 

 

What steps will the SBDR Team take to finalize a Deep Storage data release? 

  1. Perform all the same checks that are performed for a standard data release (see ScienceBase Data Release Checklist
  1. The SBDR Team will add a link from the ScienceBase data release landing page’s Related External Resources section to the Globus guest collection 
  1. A copy of the final XML metadata and the ScienceBase landing page JSON will be uploaded to the Globus guest collection 
  1. Update permissions to the Globus guest collection (authors’ write permissions will be removed and read permissions will be granted to all Globus users). To access the data, users will need a free Globus account. They can use their institutional, GitHub, Google, or ORCID identity to create an account. 
  1. Make the ScienceBase landing page and DOI public. 
Was this page helpful?