Skip to main content
U.S. flag

An official website of the United States government

Preliminary machine learning models of manganese and 1,4-dioxane in groundwater on Long Island, New York

March 22, 2023

Manganese and 1,4-dioxane in groundwater underlying Long Island, New York, were modeled with machine learning methods to demonstrate the use of these methods for mapping contaminants in groundwater in the Long Island aquifer system. XGBoost, a gradient boosted, ensemble tree method, was applied to data from 910 wells for manganese and 553 wells for 1,4-dioxane. Explanatory variables included soil properties, groundwater flow, land use, and other features that describe the hydrogeology and geochemistry of the aquifer system. Four models were developed to predict the probability of manganese concentrations greater than a detection level of 10 micrograms per liter (μg/L) and greater than three threshold concentrations (50, 150, and 300 μg/L) relevant to drinking-water quality. One model was developed to predict the probability of 1,4-dioxane concentrations greater than a detection level of 0.07 μg/L. The 1,4-dioxane model was limited geographically to Suffolk County because of data availability. Predictions were made for two layers in the upper glacial aquifer and three layers in the Magothy aquifer, which are the upper two of the three major aquifers of the Long Island aquifer system.

The objective of the study described in this report was to demonstrate the application of the methods rather than to develop precise estimates of manganese or 1,4-dioxane concentrations at any given location. The predictive models developed in the study are considered preliminary in the sense that they are an initial effort at developing these kinds of models specifically for Long Island. The models could be improved by the inclusion of additional data, by the use of methods to improve the modeling of infrequent high concentrations of manganese and 1,4-dioxane (above threshold concentrations), and by including more explanatory variables that specifically describe conditions and contaminant sources on Long Island. Nonetheless, the distribution of model predictions and the influence of explanatory variables in the models were consistent with the expected relations between contaminant concentrations and groundwater-flow-system characteristics and the distribution of manmade sources.

Mapped predictions indicated that manganese detections were more probable in the upper glacial aquifer and along the southern shore of Long Island, consistent with the distribution of anoxic conditions in groundwater in the Long Island aquifer system. Manganese was infrequently predicted at concentrations greater than thresholds of concern for drinking-water quality in any of the aquifer layers. Detections of 1,4-dioxane were predicted in the western, more highly developed parts of Suffolk County, in the upper glacial aquifer and the top and middle layers of the Magothy aquifer, and in northwestern Suffolk County in the bottom layer of the Magothy aquifer. Although preliminary in nature and based on limited data, these mapped predictions can be used to generally identify areas where manganese and 1,4-dioxane may be present at concentrations of concern to prioritize areas for future monitoring and to guide future modeling and mapping efforts.

Publication Year 2023
Title Preliminary machine learning models of manganese and 1,4-dioxane in groundwater on Long Island, New York
DOI 10.3133/sir20225120
Authors Leslie A. DeSimone
Publication Type Report
Publication Subtype USGS Numbered Series
Series Title Scientific Investigations Report
Series Number 2022-5120
Index ID sir20225120
Record Source USGS Publications Warehouse
USGS Organization New England Water Science Center; Advanced Research Computing (ARC)