Capture-recapture meets big data: integrating statistical classification with ecological models of species abundance and occurrence
Advances in new technologies such as remote cameras, noninvasive genetics and bioacoustics provide massive quantities of electronic data. Much work has been done on automated (“machine learning”) methods of classification which produce “sample class designations” (e.g., identification of species or individuals) that are regarded as observed data in ecological models. However, these “data” are actually derived quantities (or synthetic data) and subject to various important sources of bias and error. If the derived quantities are used to make ecological determinations without consideration of these biases, those inferences which inform monitoring, conservation, and management will be flawed. We propose to develop the concept of coupled classification in which statistical classification models are linked to ecological models of species abundance or occurrence. In this new framework, classification (e.g., species identification) takes into account the local structure of populations, communities and landscapes and does not assume that where a sample is collected is independent of the class structure of the population, as all current classification methods do. The proposed work addresses a significant bottleneck in the utilization of data from new technologies for monitoring and assessment of populations and communities – the lack of formal statistical frameworks (which fully propagate uncertainty) for automatically integrating observed digital monitoring data to ecological objectives of scientific and management concern. This connection between digital data and ecological objectives has yet to be made, except as outlined in our proposal. The proposed work is transformative because it provides a mechanism for directly integrating remotely sensed "big data" with ecological models while accounting for misclassification. With a coupled classification system there stands the possibility of fully automated data collection and processing systems.
Figure 1: Conceptual formulation of the coupled classification/ecological model. All modern technologies produce data that are subjected to a classification process (blue box) that produces output used as data in a secondary (ecological) modeling process (yellow). Presently, these two processes occur in isolation with little communication or understanding between statisticians working on the classification process and statisticians working on the ecological models.
Principal Investigators:
J. Andrew Royle (USGS)
Angela K. Fuller (USGS)
Participants:
Kathi Irvine (Montana State University)
Terri Donovan (USGS CRU and Univ. Vermont)
Jackie Guzy (Caribbean-Florida WSC)
Kristen Hart (Caribbean-Florida WSC)
Wayne Thogmartin (USGS MESC)
Evan Grant (LSC Conte Fish Lab)
Richard Chandler (University of Georgia)
Ben Augustine (Cornell Univ)
Katie Banner (Montana State Univ)
Chris Wikle (Univ. Missouri)
Toryn Schafer (Univ. Missouri)
Maxwell Joseph (Univ. Colorado at Boulder)
Jonathan Gomes Selman (Stanford University)
- Source: USGS Sciencebase (id: 5d53203ce4b01d82ce8e2fd9)
Advances in new technologies such as remote cameras, noninvasive genetics and bioacoustics provide massive quantities of electronic data. Much work has been done on automated (“machine learning”) methods of classification which produce “sample class designations” (e.g., identification of species or individuals) that are regarded as observed data in ecological models. However, these “data” are actually derived quantities (or synthetic data) and subject to various important sources of bias and error. If the derived quantities are used to make ecological determinations without consideration of these biases, those inferences which inform monitoring, conservation, and management will be flawed. We propose to develop the concept of coupled classification in which statistical classification models are linked to ecological models of species abundance or occurrence. In this new framework, classification (e.g., species identification) takes into account the local structure of populations, communities and landscapes and does not assume that where a sample is collected is independent of the class structure of the population, as all current classification methods do. The proposed work addresses a significant bottleneck in the utilization of data from new technologies for monitoring and assessment of populations and communities – the lack of formal statistical frameworks (which fully propagate uncertainty) for automatically integrating observed digital monitoring data to ecological objectives of scientific and management concern. This connection between digital data and ecological objectives has yet to be made, except as outlined in our proposal. The proposed work is transformative because it provides a mechanism for directly integrating remotely sensed "big data" with ecological models while accounting for misclassification. With a coupled classification system there stands the possibility of fully automated data collection and processing systems.
Figure 1: Conceptual formulation of the coupled classification/ecological model. All modern technologies produce data that are subjected to a classification process (blue box) that produces output used as data in a secondary (ecological) modeling process (yellow). Presently, these two processes occur in isolation with little communication or understanding between statisticians working on the classification process and statisticians working on the ecological models.
Principal Investigators:
J. Andrew Royle (USGS)
Angela K. Fuller (USGS)
Participants:
Kathi Irvine (Montana State University)
Terri Donovan (USGS CRU and Univ. Vermont)
Jackie Guzy (Caribbean-Florida WSC)
Kristen Hart (Caribbean-Florida WSC)
Wayne Thogmartin (USGS MESC)
Evan Grant (LSC Conte Fish Lab)
Richard Chandler (University of Georgia)
Ben Augustine (Cornell Univ)
Katie Banner (Montana State Univ)
Chris Wikle (Univ. Missouri)
Toryn Schafer (Univ. Missouri)
Maxwell Joseph (Univ. Colorado at Boulder)
Jonathan Gomes Selman (Stanford University)
- Source: USGS Sciencebase (id: 5d53203ce4b01d82ce8e2fd9)