Coffee Yield Prediction

By: Teofilo Ligawa, Cedric Kiplimo

Background

Predicting coffee yield is an important aspect of the coffee industry, allowing farmers and businesses to plan ahead for harvest, sales, and potential shortages. The quest to predict the coffee yield using environmental and agricultural data, then build models that can predict crop yield based on historical data and weather data. These models could aid governments in making food security assessments and assist farmers in projecting their future earnings. There are two main approaches to coffee yield prediction: statistical methods and machine learning. Statistical methods like linear regression use historical data on factors like rainfall, temperature, and fertilizer application to estimate yield. Machine learning techniques, such as artificial neural networks, can analyze larger datasets and identify complex relationships between these factors and yield.

Accomplishments

The study has managed to perform multivariate linear regression and ensemble techniques to predict yield. The predictor variables were the weather variables, and the predicted variable is coffee yield. The models did not perform as well as we hoped they would. The metric used to evaluate the model's performance is the root mean square error.

The study has also managed to perform time series modelling, where yield is predicted based on its historical data. Traditional time series models like autoregressive integrated moving average (ARIMA) and seasonal autoregressive moving average (SARIMA), as well as other methods like Facebook’s prophet model and deep learning, specifically long short-term memory (LSTM), which is basically a recurrent neural network. Like the multivariate regression techniques, the performance of the models was not desirable using these techniques.

The main challenge that has been encountered in this study is unavailability of sufficient data. This has a significant impact on the outcomes of the attempted models.

Next Steps

Incorporating multivariate regression techniques into time series modelling. Acquiring more data to help improve the performance of the modelling.