We were asked by a major dairy exporter to investigate and propose analytics techniques to help improve the pricing of their wholesale dairy contracts. Milk is a vital food resource and milk production is a huge industry. Global milk production is around 700 million tonnes, annually.
What was the problem?
The problem was that price volatility had doubled over the previous decade and was predicted to continue to increase. By way of illustration (Fig. 1) we can look at the price of a specific class of milk, US Class III, over the period 1980 – 2013. The increase in volatility of milk prices over the period is clearly evident. Our specific goal was to investigate, recommend and scope out technical solutions to help improve the pricing decision process.
Fig. 1 Price swings in class III Milk Prices during the period 1980 – 2013
We started by defining our ideal dataset as:
- Historical daily or weekly dairy prices going back several decades for each of the significant dairy producing regions (e.g. Eu, US, NZ)
- Historical data on each of the factors influencing dairy prices
- Access to news articles, trade reports and commentaries on the dairy industry
- Interview access to the organisation’s dairy market experts
We then determined the data that was accessible. It comprised:
- Regional, monthly dairy prices going back several decades *
- Anecdotal information on the factors influencing dairy prices**
- Access to an in-house, trade information portal. Access to internet news articles and commentaries.
- Interview access to senior management was available
*In the Eu, dairy prices are updated monthly. In the US, prices are updated biweekly. ** A cause and effect review to identify the most likely factors was to be a core part of the investigation process.
Where did we source the data and what did it look like?
Since this was an investigation and our remit was to identify and recommend analytics approaches, we focused on using publicly available dairy price data covering the period, 2000 – 2010. We obtained datasets for each of the following regions: Eu, US and NZ.
Our exploratory analysis revealed the following:
The time series data was sparse, just 12 data points per year in Europe. Nevertheless, we deemed it a core part of the analysis to investigate the time series and see what lessons could be learned. We identified the following time-series analyses to be especially attractive in processing the data:
- ARIMA and GARCH analysis (where volatility is not assumed constant over time) of the time series
- Spectral decomposition of the time series to extract cyclical components. It was reported that this technique has been used effectively in the US market in at least one instance.
- Multivariate analysis using cause and effect analysis to identify other factors which may be influencing prices
- Artificial Neural Networks (ANN)
- Text/sentiment analysis – initial research suggested that it may be possible to improve value forecasting by sensing market sentiment. Machine learning and sentiment analysis tools would need to be applied to text based data sources, ranging from trade publications to social media outlets such as Twitter.
- We introduced an intermediate target for the project – to estimate the direction of dairy price movements in the near term (1-4 months) as a stepping stone towards forecasting future absolute prices.
The statistical and predictive modelling step:
A time-series analysis was conducted using the statistical programming software, R. Dairy price data for the interval of 2000 – 2009 was used as a training set and the data for 2010 as the test set. We generated a dynamic video demonstrator of the 4-months ahead price-direction forecast. The concept of using ANNs to predict the direction of movement of dairy prices was explored using a dataset of 20 years of milk prices. Initial trials of the concept on a subset of the data showed encouraging results.
Interpretation of the results:
The initial results are encouraging insofar as they demonstrate the potential of the proposed approaches. Ideally, the following work should be carried out:
- Update the time-series forecast to the present day. Express the confidence intervals for the predictions and check the results against market data.
- Conduct the cause and effect analysis. Incorporate influential factor data in a multi-variate analysis of the problem.
- Perform an ANN analysis using the whole dataset
- Conduct sentiment/mining using appropriate data sources
Challenge the results:
A fundamental question regarding the time-series analysis in this case is whether the data is frequent enough to support accurate predictions for future price movements.
Synthesize/write up results:
This document is a synopsis of the work done to date. An extensive report was provided to the client with specific recommendations for the client. More work remains to be done.
Create reproducable code:
We used the R programming language to perform the time-series analysis of the historical milk price data and to generate the range of predicted values.