Document Type

Article

Publication Date

6-2026

Publication Title

Data in Brief

Abstract

Machine learning has become an increasingly important tool for overcoming agricultural challenges by enabling efficient and consistent classification of crop-related data. Training such supervised models requires high quality labeled datasets. This work presents a dataset consisting of raw and preprocessed hyperspectral imaging (HSI) files capturing reflectance in the visible to near-infrared range (400–1000 nm) from two problematic weed species on California’s Central Coast: annual sowthistle (Sonchus oleraceus) and little mallow (Malva parviflora). Hyperspectral imaging provides rich spectral-spatial data cubes that can support the development of deep learning models and autonomous technology for precision weed management. Plants were grown in a greenhouse under five conditions: standard, drought, overwatering, excess fertilizer, and no fertilizer. Custom MATLAB scripts were utilized for preprocessing, including k-means clustering to define regions of interest (ROIs), and extraction of spectral metrics. Data visualization was performed using Wolfram language and MATLAB. The dataset includes both raw and ENVI-formatted hyperspectral cubes and pre-processed MATLAB outputs, supporting spectral feature engineering, benchmark development, and exploratory machine learning workflows for controlled environment stress classification.

Comments

Published in Data in Brief by Elsevier Inc. Available via doi: 10.1016/j.dib.2026.112858.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Share

COinS