Techniques to improve ecological interpretability of black box machine learning models

October 28, 2021

Statistical modeling of ecological data is often faced with a large number of variables as well as possible nonlinear relationships and higher-order interaction effects. Gradient boosted trees (GBT) have been successful in addressing these issues and have shown a good predictive performance in modeling nonlinear relationships, in particular in classification settings with a categorical response variable. They also tend to be robust against outliers. However, their black-box nature makes it difficult to interpret these models. We introduce several recently developed statistical tools to the environmental research community in order to advance interpretation of these black-box models. To analyze the properties of the tools, we applied gradient boosted trees to investigate biological health of streams within the contiguous USA, as measured by a benthic macroinvertebrate biotic index. Based on these data and a simulation study, we demonstrate the advantages and limitations of partial dependence plots (PDP), individual conditional expectation (ICE) curves and accumulated local effects (ALE) in their ability to identify covariate–response relationships. Additionally, interaction effects were quantified according to interaction strength (IAS) and Friedman’s H2">H² statistic. Interpretable machine learning techniques are useful tools to open the black-box of gradient boosted trees in the environmental sciences. This finding is supported by our case study on the effect of impervious surface on the benthic condition, which agrees with previous results in the literature. Overall, the most important variables were ecoregion, bed stability, watershed area, riparian vegetation and catchment slope. These variables were also present in most identified interaction effects. In conclusion, graphical tools (PDP, ICE, ALE) enable visualization and easier interpretation of GBT but should be supported by analytical statistical measures. Future methodological research is needed to investigate the properties of interaction tests. Supplementary materials accompanying this paper appear on-line.

Publication Year	2022
Title	Techniques to improve ecological interpretability of black box machine learning models
DOI	10.1007/s13253-021-00479-7
Authors	Thomas Welchowski, Kelly O. Maloney, Richard M. Mitchell, Matthias Schmid
Publication Type	Article
Publication Subtype	Journal Article
Series Title	Journal of Agricultural, Biological, and Environmental Statistics
Index ID	70225740
Record Source	USGS Publications Warehouse
USGS Organization	Eastern Ecological Science Center

Techniques to improve ecological interpretability of black box machine learning models

Research Ecologist

Research Ecologist

Eastern Ecological Science Center at the Leetown Research Laboratory

U.S. Geological Survey

U.S. Department of the Interior

Techniques to improve ecological interpretability of black box machine learning models

Citation Information

Related Content

Kelly O Maloney, Ph.D.

Research Ecologist

Related Content

Kelly O Maloney, Ph.D.

Research Ecologist