Python Hyperspectral Analysis Tool (PyHAT)

Active

By Astrogeology Science Center March 21, 2024

The Python Hyperspectral Analysis Tool (PyHAT) provides access to data processing, analysis, and machine learning capabilities for spectroscopic applications. It includes a GUI so you can get straight to analyzing data without writing any code. Or, if you are comfortable writing code, PyHAT can be imported just like any other Python package.

Get PyHAT Code

Sources/Usage: Public Domain. View Media Details

Data Format

PyHAT is built on a simple .csv data format that is read into a Pandas data frame for maximum flexibility. Each row of the data frame contains one spectrum and associated metadata and compositional information if available. Each column uses a two-level labeling system which allows the user to do coarse (such as selecting all spectra) or fine (such as selecting the column corresponding to a single wavelength) data selection for simple access to the data.

screenshot of an example data table in PyHAT format — Example of the simple PyHAT data format.

Hyperspectral Cubes

PyHAT works with hyperspectral cubes by flattening them to match the tabular PyHAT format. Geographic information is stored as metadata so the cube can be reconstructed as needed. PyHAT has built-in support for reading and generating summary parameters from Moon Mineralogy Mapper (M3) and the Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) data cubes. Other data formats can be used if flattened into the PyHAT format, and future work will add more support for common data sets.

CRISM mineral map image, with Red = Olivine, Green = High-Ca Pyroxene, and Blue = Low-Ca pyroxene — CRISM summary parameter map of the fan deposit in Jezero crater, Mars.

Preprocessing

Baseline Removal

PyHAT includes multiple baseline removal algorithms that can be used to remove non-informative signal and leave only the useful features of a spectrum. Built-in plotting functions allow the user to visualize how these algorithms are working and decide which is most suitable.

Plot showing the LIBS spectrum of basalt, with colored lines approximating the baseline using different algorithms. — Comparison of several baseline removal algorithms applied to a laser induced breakdown spectroscopy (LIBS) spectrum.

Dimensionality Reduction

PyHAT includes a variety of dimensionality reduction methods from scikit-learn and other libraries, including:

Principal Component Analysis (PCA)
Independent component analysis (ICA)
t-distributed Stochastic Neighbor Embedding (tSNE)
Locally Linear Embedding (LLE)
Non-Negative Matrix Factorization (NNMF)
Linear Discriminant Analysis (LDA)
Minimum Noise Fraction (MNF)
Local Fisher's Discriminant Analysis (LFDA)

Graph of PCA scores, color coded by Fe2O3T content, and the loading vectors used to calculate the scores. — PyHAT includes a specialized plotting function for generating score and loading plots for PCA and ICA. Points in the scores plot can be color coded based on metadata or compositional data. In this example using LIBS spectra, positive values of PC1 can be seen to correlate with high Fe2O3T abundance.

Clustering

Both K-Means and Spectral clustering methods are available in PyHAT. Clusters are stored as metadata columns, and can be used in plotting functions to color-code points.

Plot of PCA scores and loadings. The points in the scores plot are color coded based on the cluster assigned by k-means — Plot of the same data as above, but with color coding corresponding to 8 K-means clusters.

Outlier Identification

PyHAT leverages the Scikit-Learn library for many capabilities., including outlier identification. Two algorithms for outlier identification are implemented: Local Outlier Factor (LOF) and Isolation Forest (IF). Once spectra have been flagged as potential outliers, data manipulation functions allow them to be removed or partitioned into a separate data set.

Scatter plot showing PCA scores as points. Several points are marked in red as outliers. — PCA scores plot for a database of LIBS spectra. Points marked in red have been flagged as potential outliers.

Regression

A significant focus of PyHAT development has been regression: estimation of a quantitative property based on statistical models that have been trained on spectra of known targets. This is the approach used by the ChemCam and SuperCam teams to derive chemical compositions of Mars rocks using LIBS, but it is broadly applicable to many types of data. PyHAT provides easy access to the following regression algorithms:

Ordinary Least Squares (OLS)
Partial Least Squares (PLS)
Least Absolute Shrinkage and Selection Operator (LASSO)
Ridge Regression
Elastic Net
Bayesian Ridge Regression (BRR)
Automatic Relevance Determination (ARD)
Least Angle Regression (LARS)
Orthogonal Matching Pursuit (OMP)
Support Vector Regression (SVR)
Gradient Boosting Regression (GBR)
Local Regression (algorithm developed by the PyHAT team)

Cross Validation

A key step in the process of training a regression model is tuning its hyperparameters. To avoid overfitting, models must be cross-validated by iteratively withholding some of the training data and predicting it as if it is unknown. This allows the user to identify the parameters that will result in the best balance between training set accuracy and generalizability. PyHAT provides a cross validation module that makes this process simple and utilizes parallel processing to run through the often time-consuming cross validation calculations more efficiently.

Prediction

Once model parameters have been tuned, PyHAT provides the ability to perform predictions on novel data. Multiple different regression methods can be optimized and compared, and the plotting functions include settings to produce a "one-to-one" plot comparing predicted vs actual values to aid in model evaluation and comparison. PyHAT also includes the ability to blend the predictions from multiple submodels, so that the benefits of specialized models can be combined (see related publications for more details).

Scatter plot comparing predicted vs actual CaO content for a set of spectra of geologic targets using two regression models — Comparison of two regression models used to predict CaO content based on LIBS spectra.

January 21, 2022

Post-landing major element quantification using SuperCam laser induced breakdown spectroscopy

The SuperCam instrument on the Perseverance Mars 2020 rover uses a pulsed 1064 nm laser to ablate targets at a distance and conduct laser induced breakdown spectroscopy (LIBS) by analyzing the light from the resulting plasma. SuperCam LIBS spectra are preprocessed to remove ambient light, noise, and the continuum signal present in LIBS observations. Prior to quantification, spectra are...

Authors

Ryan Anderson, Olivier Forni, Agnes Cousin, Roger C. Wiens, Samuel M. Clegg, Jens Frydenvang, Travis S. J. Gabriel, Ann M. Ollila, Susanne Schröder, Olivier Beyssac, Erin Gibbons, David Vogt, Elise Clave, Jose-Antonio Manrique, Carey Legett, Paolo Pilleri, Raymond Newell, Joseph Sarrao, Sylvestre Maurice, Gorka Arana, Karim Benzerara, Pernelle Bernardi, Sylvain Bernard, Bruno Bousquet, Adrian J. Brown, Cesar Alvarez-Llamas, Baptiste Chide, Edward A. Cloutis, Jade Comellas, Stephanie Connell, Erwin Dehouck, Dorothea Delapp, Ari Essunfeld, Cecile Fabre, Thierry Fouchet, Cristina Garcia, Laura Garcia-Gomez, Patrick J. Gasda, Olivier Gasnault, Elisabeth Hausrath, Nina L. Lanza, Javier Laserna, Jeremie Lasue, Guillermo Lopez, Juan Manuel Madariaga, Lucia Mandon, Nicolas Mangold, Pierre-Yves Meslin, Marion Nachon, Anthony Nelson, Horton E. Newsom, Adriana Reyes-Newell, Scott Robinson, Fernando Rull, Shiv Sharma, Justin I Simon, Pablo Sobron, Imanol Torre Fernandez, Arya Udry, Dawn Venhaus, Scott McLennan, Richard V. Morris, Bethany L. Ehlmann

Natural Hazards Mission Area, Astrogeology Science Center

March 1, 2017

Recalibration of the Mars Science Laboratory ChemCam instrument with an expanded geochemical database

The ChemCam Laser-Induced Breakdown Spectroscopy (LIBS) instrument onboard the Mars Science Laboratory (MSL) rover Curiosity has obtained > 300,000 spectra of rock and soil analysis targets since landing at Gale Crater in 2012, and the spectra represent perhaps the largest publicly-available LIBS datasets. The compositions of the major elements, reported as oxides (SiO2, TiO2, Al2O3...

Authors

Samuel M. Clegg, Roger C. Wiens, Ryan Anderson, Olivier Forni, Jens Frydenvang, Jeremie Lasue, Agnès Cousin, Valerie Payre, Tommy Boucher, M. Darby Dyar, Scott M. McLennan, Richard V. Morris, Trevor G. Graff, Stanley A Mertzman, Bethany L. Ehlmann, Ines Belgacem, Horton E. Newsom, Ben C. Clark, Noureddine Melikechi, Alissa Mezzacappa, Rhonda E. McInroy, Ronald Martinez, Patrick J. Gasda, Olivier Gasnault, Sylvestre Maurice

Natural Hazards Mission Area, Astrogeology Science Center

February 13, 2017

Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models

Accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same...

Authors

Ryan Anderson, Samuel M. Clegg, Jens Frydenvang, Roger C. Wiens, Scott M. McLennan, Richard V. Morris, Bethany L. Ehlmann, M. Darby Dyar

Natural Hazards Mission Area, Astrogeology Science Center

Python Hyperspectral Analysis Tool (PyHAT)

Data Format

Hyperspectral Cubes

Preprocessing

Baseline Removal

Dimensionality Reduction

Clustering

Outlier Identification

Regression

Cross Validation

Prediction

Astrogeology Science Center

Ryan Bradley Anderson

Physical Scientist

Travis S.J. Gabriel, Ph.D.

Research Physical Scientist

Itiya P Aneece

Research Geographer

Data Format

Hyperspectral Cubes

Preprocessing

Baseline Removal

Dimensionality Reduction

Clustering

Outlier Identification

Regression

Cross Validation

Prediction

Astrogeology Science Center

Ryan Bradley Anderson

Physical Scientist

Travis S.J. Gabriel, Ph.D.

Research Physical Scientist

Itiya P Aneece

Research Geographer

U.S. Geological Survey

U.S. Department of the Interior