Critical evaluation of the effects of a cross-validation strategy and machine learning optimization on the prediction accuracy and transferability of a soybean yield prediction model using UAV-based remote sensing

Luthfan Nur Habibi, Tsutomu Matsui, Takashi Tanaka*

*Corresponding author for this work

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

Abstract

Crop yield prediction models are critical tools for evaluating growth performance and informing decisions during farm management. Developing yield prediction models that are robust not only in the ranges of the model spatial domain but also in additional locations using a data-driven approach is challenging. The main objective of this study was to investigate an appropriate cross-validation (CV) strategy for establishing transferable UAV-based yield prediction models across different spatial domains (i.e., meeting extrapolation mapping objectives). In this study, we compared three data splitting procedures for the CV protocols, including random data splitting (random CV), cluster-based spatial splitting (spatial CV), and field-specific hold-out data splitting (leave-one-field-out CV). Model optimization was also examined to determine whether these factors affect the transferability of the yield model, including performing recursive feature elimination (RFE) and comparing the effects of algorithms utilized in the yield prediction model. Three base learner algorithms, namely, random forest, XGBoost, and LASSO regression, were utilized, and a stacked ensemble technique model formed with these base learners was also implemented. The established models were later tested on an independent field as a test dataset to evaluate the model transferability performance. Random CV exhibited poor error tracking performance in predicting yield beyond the model spatial domain, while spatial CV and leave-one-field-out CV approaches provided better expectation on yield predictions outside the model's training spatial domain. Furthermore, simple models as implementing LASSO regression and RFE improved the model capability in extrapolation tasks. The results of this study suggest that spatially-aware CV should be used as the standard method rather than conventional random CV for validating the yield model to ensure a more realistic and reliable yield model in extrapolation objectives.
Original languageEnglish
Article number101096
JournalJournal of Agriculture and Food Research
DOIs
Publication statusPublished - Mar 2024

Keywords

  • Spatial data
  • Leave-one-field-out
  • Extrapolation
  • Vegetation indices
  • Spatial clustering

Fingerprint

Dive into the research topics of 'Critical evaluation of the effects of a cross-validation strategy and machine learning optimization on the prediction accuracy and transferability of a soybean yield prediction model using UAV-based remote sensing'. Together they form a unique fingerprint.

Cite this