High-Resolution, Interagency Biosurveillance of Threatened Surface Waters in the United States
Advances in information technology now provide large volume, high-frequency data collection which may improve real-time biosurveillance and forecasting. But, big data streams present challenges for data management and timely analysis. As a first step in creating a data science pipeline for translating large datasets into meaningful interpretations, we created a cloud-hosted PostgreSQL database that collates climate data served from PRISM (https://climatedataguide.ucar.edu/climate-data) and water-quality data from the National Water Quality Portal (https://www.waterqualitydata.us/) and NWIS (https://waterdata.usgs.gov/nwis; fig 1). Using Python-based code, these data streams are queried and updated every 24 hours, and the spatial and temporal components of these data are delineated by the locations and frequencies of environmental DNA (eDNA) sampling (T. bryosalmonae, invasive smallmouth bass, and E. coli) at USGS streamgages in the Yellowstone River, Montana. Following additional processing, the data are formatted for Bayesian hierarchical occupancy analysis to estimate eDNA detection probabilities and to relate these probabilities to attributes from the different data streams.
Principal Investigator : Sara L Eldridge
Co-Investigator : Elliott P Barnhart, Adam J Sepulveda
Image caption: Schematic representation of the processing steps to create a Digital Ocean PostgreSQL database for users to synthesize and analyze large and disparate environmental data streams.
- Source: USGS Sciencebase (id: 5cd2055ee4b09b8c0b7a59ba)
Advances in information technology now provide large volume, high-frequency data collection which may improve real-time biosurveillance and forecasting. But, big data streams present challenges for data management and timely analysis. As a first step in creating a data science pipeline for translating large datasets into meaningful interpretations, we created a cloud-hosted PostgreSQL database that collates climate data served from PRISM (https://climatedataguide.ucar.edu/climate-data) and water-quality data from the National Water Quality Portal (https://www.waterqualitydata.us/) and NWIS (https://waterdata.usgs.gov/nwis; fig 1). Using Python-based code, these data streams are queried and updated every 24 hours, and the spatial and temporal components of these data are delineated by the locations and frequencies of environmental DNA (eDNA) sampling (T. bryosalmonae, invasive smallmouth bass, and E. coli) at USGS streamgages in the Yellowstone River, Montana. Following additional processing, the data are formatted for Bayesian hierarchical occupancy analysis to estimate eDNA detection probabilities and to relate these probabilities to attributes from the different data streams.
Principal Investigator : Sara L Eldridge
Co-Investigator : Elliott P Barnhart, Adam J Sepulveda
Image caption: Schematic representation of the processing steps to create a Digital Ocean PostgreSQL database for users to synthesize and analyze large and disparate environmental data streams.
- Source: USGS Sciencebase (id: 5cd2055ee4b09b8c0b7a59ba)