A cross-validation package driving Netica with python
Bayesian networks (BNs) are powerful tools for probabilistically simulating natural systems and emulating process models. Cross validation is a technique to avoid overfitting resulting from overly complex BNs. Overfitting reduces predictive skill. Cross-validation for BNs is known but rarely implemented due partly to a lack of software tools designed to work with available BN packages. CVNetica is open-source, written in Python, and extends the Netica software package to perform cross-validation and read, rebuild, and learn BNs from data. Insights gained from cross-validation and implications on prediction versus description are illustrated with: a data-driven oceanographic application; and a model-emulation application. These examples show that overfitting occurs when BNs become more complex than allowed by supporting data and overfitting incurs computational costs as well as causing a reduction in prediction skill. CVNetica evaluates overfitting using several complexity metrics (we used level of discretization) and its impact on performance metrics (we used skill).
Citation Information
Publication Year | 2014 |
---|---|
Title | A cross-validation package driving Netica with python |
DOI | 10.1016/j.envsoft.2014.09.007 |
Authors | Michael N. Fienen, Nathaniel G. Plant |
Publication Type | Article |
Publication Subtype | Journal Article |
Series Title | Environmental Modelling and Software |
Index ID | 70128127 |
Record Source | USGS Publications Warehouse |
USGS Organization | Wisconsin Water Science Center |