Files
Abstract
Recognizing the crucial role alfalfa plays in global food security, and noting recent successes in an increasing number of domains due to advances in artificial intelligence and machine learning (ML), the current work endeavors to contribute to increasing the efficiency of alfalfa farming through applied ML. We propose a novel ML-based time series forecasting technique that outperforms traditional statistical methods, and we used this as the backbone for a proposed application we call Predict Your CropS (PYCS). This what-if and crop forecasting tool and its underlying techniques are intended to help farmers develop improved contingency plans for possible shortages and surpluses of alfalfa, and they forecast future crop yields based on historical weather data and historical crop yields. Under ideal alfalfa farm management conditions, experiments have shown our forecaster to be surprisingly accurate, producing symmetric mean absolute percent error (sMAPE) scores as low as 9.81%, beating the performance of traditional, non-ML-based techniques like ARIMA and SARIMAX, which is especially encouraging considering that this is a difficult domain. This work also explores estimating historical and present-day tabular data using data synthesis. In that phase of the project, we proposed a novel tabular data synthesizer we call Scale Invariant Tabular Synthesis (SITS), which helps boost theperformance of our ML models by increasing training dataset sizes. We show that our synthesis algorithm leads to R scores over 100% higher than the established synthesizer in this domain. Training with data from one location to estimate historical and present-day data in another location provides insight into which regions can be effectively used to train models to estimate other target regions, especially when the target region's dataset is too small to train its own model. We call this non-local training, and when we include synthesis in the pipeline, we call this approach Synthetic Non-Local Training (SNLT), and it is essentially a form of domain adaptation (DA). Three primary contributions of the work are (1) our novel ML-based forecasting technique, (2) our novel DA technique combining the SITS data synthesizer with pre-training, and (3) combing our DA and forecasting techniques into one enhanced forecaster.