Sales forecasting and machine learning, what people get wrong

Sales and promotion forecasting is a critical capability for retailers both online and offline. This blog post sheds light on three myths that we believe are holding retailers back from better forecasting.


The past months we have received a lot of questions from retailers about sales and promotions forecasting. It is evident that accurate and accessible forecasts are becoming increasingly sought after by retail top management and category organizations. But also that there are some major myths about forecasting out there. And that these myths prevent retailers from improving their forecasts.

What retailers tell us about forecasting

Retailers tell us that they need better forecasts to optimize prices so as to avoid stock outs and overstock, predict demand to reduce waste in the supply chain and to craft efficient promotions to maximize growth, just to mention a few application areas. At the same time nine out of ten retailers also tell us that they are struggling with inefficient forecasting infrastructure and processes:

  • The forecasting processes are manual and labour intensive, Ad Hoc rather than automated

  • A limited set of statistical models are used, rather than hundreds of self-learning models that adjust to changes in demand and that competes against each other

  • To the extent forecasting tools are used, they are often inaccessible for anyone but data scientists or other expert users

We believe that these challenges are in part due to a number of myths and misconceptions about forecasting. And that debunking these myths is a critical first step for every retailer that wants to improve forecasting capabilities.

Three myths about forecasting

1.     We don’t collect enough data to get started

It is tempting to think that more data is always better. However, we believe this is a misconception and recommend retailers to take an economistic view on the forecasting data. There is always a cost of adding more data sources to a forecast, and this cost must be balanced against the benefit the added data creates. A rule of thumb is to start your sales and promotions forecasting with data that is readily available in your company databases. Just think about all the data that you find in your e-commerce platform and BI system: sales volumes, price (huge impact), product hierarchies, promotions. That data provides you with a  very good starting point. And on top of that you can add external factors such as seasonal patterns, product trends, salary payments that can be readily extracted from your transactions data.  Less standardized data such as above the line marketing data (e.g. TV campaign impressions), campaigns plans, survey data etc. can and should be added to your forecasts further down the road. Since that kind of data will typically require quite a lot of data engineering and manual fixes.

2.      Algorithms are more important than data

This myth has been lurking around for some time. The idea is that you simply need a good algorithm to make an accurate forecast. Or that machine learning/AI algorithms are some kind of magic box where you can pour in lots of unstructured data and expect good results.
In fact, for most problems it is much more important to use the right data and the right data structure for the specific problem. It doesn't matter if you use machine learning models, neural network, deep learning models or more traditional statistical models. You always have to feed and train the models using well-structured data. When you set up a forecast you should expect to invest the majority of your time and money in data engineering, and only a fraction of that in algorithm development. And, as far as algorithms concern, we strongly recommend that retailers use a multitude of models to make forecasts, rather than trying to build the "one perfect model" (it doesn't exist!).

3.     Once we have the data and the algorithm, we’re good to go

Sometimes frustrated retailers ask us for help because "they have the models and all the data, but they still don't get the forecasts to work!". This is because data and algorithms in themselves can only take you so far. Retailers that want to predict product sales or promotions, and use it in their ongoing business decisions, need an infrastructure to scale the forecasts. This includes infrastructure to validate (check for errors) the data that goes into the forecasting models, infrastructure to run and evaluate all forecasting models, and infrastructure to aggregate analysis results. Just to give an example: let’s say you have 50 000 SKUs in your assortment, and you want new forecasts every week. That means you need a way to run and evaluate the results from at least 50 000 predictions (assuming you only run one model per SKU). It goes without saying that this cannot be done manually.

To conclude, forecasting is critical for retailers' competitiveness and has applications for business decisions well outside the traditional supply chain context. And the key to good forecasts goes via a limited number of data sources, with high quality and well structured data that is analysed by multiple models. To make this happen you need an infrastructure that scale analytics and make it accessible for the whole team. Not just the number crunchers.

Merry forecasting! / the Formuate team

Arvid Stenback Lund