Analytic sales forecast
Analytic functions have been very helpful at my work. One of the good examples is trying to forecast next years sales for each item taking into consideration seasonal variations and whether the item has been going up in sales the last years or declining. In this blog post I demonstrate how to use REGR_SLOPE for that purpose.
We have our sales statistics by month (though we're thinking about going to weekly data.)
So for the simple demo we create this very simple table:
We populate it with real data for the years 2008, 2009 and 2010 for two items:
Let us now pretend we are now in January 2011 and wishes to forecast the 2011 sales.
Let me start by showing the result we are aiming for (larger picture here) :
Snowchain (in blue) sells in the winter months. The dotted line shows it is generally going up in sales. The stapled line (forecast) is in a sense the 2010 curve moved 12 months to the right along the dotted line. Therefore the 2011 forecast of 882 is higher than 2010 sales of 691.
Similarly for Sunshade (in red), but the peaks are here in the summer months and the sales decline.
Let us see what we can do with REGR_SLOPE analytic function :
REGR_SLOPE gives us the slope of the linear extrapolation of the dots in the graph.
The vertical axis of the graph contains sales.qty .
The horizontal axis is a numeric representation of the month (unit 1 = 1 month.)
PARTITION BY considers each item separately.
ORDER BY defines the order the RANGE clause will consider.
RANGE then says to calculate the slope based on the current row and 23 months back (= 2 years data.)
Note: our slope keeps changing as it uses a sliding 2-year window (rather than the simplified dotted line in the graph above.) By calculating the slope with a sliding window we can handle items that used to go up in sales the first years we sold it, but then started to decline as for example new technology gets on the market.
The calculated slope we can then use for moving each graph point 12 months into the future:
For each month in 2010 we take the quantity and add the slope multiplied by 12 months.
That will give us the quantity expected for the same month next year.
But we like the numbers to be non-negative integers :
Here we round the numbers and make sure they don't go negative.
Also we add 12 months to the date so we get the month the forecast is for.
And presto - we have seen the future :-)
We can then concatenate this to the actual sales if we wish :
And that is the data used for the graph shown above. "Forecast" data is the stapled lines.
We have our sales statistics by month (though we're thinking about going to weekly data.)
So for the simple demo we create this very simple table:
SQL> CREATE TABLE sales (
2 item VARCHAR2(10),
3 mth DATE,
4 qty NUMBER
5 )
6 /
Table created.
We populate it with real data for the years 2008, 2009 and 2010 for two items:
- Snowchain, which typically sells in the winter and has been going up in sales.
- Sunshade, which sells in the summer and has been slightly declining.
Let us now pretend we are now in January 2011 and wishes to forecast the 2011 sales.
Let me start by showing the result we are aiming for (larger picture here) :
Snowchain (in blue) sells in the winter months. The dotted line shows it is generally going up in sales. The stapled line (forecast) is in a sense the 2010 curve moved 12 months to the right along the dotted line. Therefore the 2011 forecast of 882 is higher than 2010 sales of 691.
Similarly for Sunshade (in red), but the peaks are here in the summer months and the sales decline.
Let us see what we can do with REGR_SLOPE analytic function :
SQL> SELECT sales.item
2 , sales.mth
3 , sales.qty
4 , REGR_SLOPE(
5 sales.qty
6 , EXTRACT(YEAR FROM sales.mth) * 12 + EXTRACT(MONTH FROM sales.mth)
7 )
8 OVER (
9 PARTITION BY sales.item
10 ORDER BY sales.mth
11 RANGE BETWEEN INTERVAL '23' MONTH PRECEDING AND CURRENT ROW
12 ) slope
13 FROM sales
14 ORDER BY sales.item, sales.mth;
ITEM MTH QTY SLOPE
---------- ---------- ---------- ----------
Snowchain 2008-01-01 79
Snowchain 2008-02-01 133 54
Snowchain 2008-03-01 24 -27,5
...
Snowchain 2010-10-01 1 -2,2743478
Snowchain 2010-11-01 73 -2,3630435
Snowchain 2010-12-01 160 -,99086957
Sunshade 2008-01-01 4
Sunshade 2008-02-01 6 2
Sunshade 2008-03-01 32 14
...
Sunshade 2010-10-01 11 ,217391304
Sunshade 2010-11-01 3 -,20043478
Sunshade 2010-12-01 5 -,57391304
72 rows selected.
REGR_SLOPE gives us the slope of the linear extrapolation of the dots in the graph.
The vertical axis of the graph contains sales.qty .
The horizontal axis is a numeric representation of the month (unit 1 = 1 month.)
PARTITION BY considers each item separately.
ORDER BY defines the order the RANGE clause will consider.
RANGE then says to calculate the slope based on the current row and 23 months back (= 2 years data.)
Note: our slope keeps changing as it uses a sliding 2-year window (rather than the simplified dotted line in the graph above.) By calculating the slope with a sliding window we can handle items that used to go up in sales the first years we sold it, but then started to decline as for example new technology gets on the market.
The calculated slope we can then use for moving each graph point 12 months into the future:
SQL> SELECT item
2 , mth
3 , qty
4 , qty + 12 * slope qty_next_year
5 FROM (
6 SELECT sales.item
7 , sales.mth
8 , sales.qty
9 , REGR_SLOPE(
10 sales.qty
11 , EXTRACT(YEAR FROM sales.mth) * 12 + EXTRACT(MONTH FROM sales.mth)
12 )
13 OVER (
14 PARTITION BY sales.item
15 ORDER BY sales.mth
16 RANGE BETWEEN INTERVAL '23' MONTH PRECEDING AND CURRENT ROW
17 ) slope
18 FROM sales
19 )
20 WHERE mth >= DATE '2010-01-01'
21 ORDER BY item, mth;
ITEM MTH QTY QTY_NEXT_YEAR
---------- ---------- ---------- -------------
Snowchain 2010-01-01 167 188,313043
Snowchain 2010-02-01 247 304,855652
Snowchain 2010-03-01 42 96,3913043
Snowchain 2010-04-01 0 42,6991304
Snowchain 2010-05-01 0 30,8869565
Snowchain 2010-06-01 0 19,0747826
Snowchain 2010-07-01 0 7,2626087
Snowchain 2010-08-01 1 -3,4295652
Snowchain 2010-09-01 0 -16,121739
Snowchain 2010-10-01 1 -26,292174
Snowchain 2010-11-01 73 44,6434783
Snowchain 2010-12-01 160 148,109565
Sunshade 2010-01-01 2 -11,617391
Sunshade 2010-02-01 8 -11,137391
Sunshade 2010-03-01 28 9,11304348
Sunshade 2010-04-01 26 8,86086957
Sunshade 2010-05-01 23 9,66434783
Sunshade 2010-06-01 46 39,1130435
Sunshade 2010-07-01 73 79,4486957
Sunshade 2010-08-01 25 31,7147826
Sunshade 2010-09-01 13 18,0504348
Sunshade 2010-10-01 11 13,6086957
Sunshade 2010-11-01 3 ,594782609
Sunshade 2010-12-01 5 -1,8869565
24 rows selected.
For each month in 2010 we take the quantity and add the slope multiplied by 12 months.
That will give us the quantity expected for the same month next year.
But we like the numbers to be non-negative integers :
SQL> SELECT item
2 , ADD_MONTHS(mth, 12) mth
3 , GREATEST(ROUND(qty + 12 * slope), 0) forecast
4 FROM (
5 SELECT sales.item
6 , sales.mth
7 , sales.qty
8 , REGR_SLOPE(
9 sales.qty
10 , EXTRACT(YEAR FROM sales.mth) * 12 + EXTRACT(MONTH FROM sales.mth)
11 )
12 OVER (
13 PARTITION BY sales.item
14 ORDER BY sales.mth
15 RANGE BETWEEN INTERVAL '23' MONTH PRECEDING AND CURRENT ROW
16 ) slope
17 FROM sales
18 )
19 WHERE mth >= DATE '2010-01-01'
20 ORDER BY item, mth;
ITEM MTH FORECAST
---------- ---------- ----------
Snowchain 2011-01-01 188
Snowchain 2011-02-01 305
Snowchain 2011-03-01 96
Snowchain 2011-04-01 43
Snowchain 2011-05-01 31
Snowchain 2011-06-01 19
Snowchain 2011-07-01 7
Snowchain 2011-08-01 0
Snowchain 2011-09-01 0
Snowchain 2011-10-01 0
Snowchain 2011-11-01 45
Snowchain 2011-12-01 148
Sunshade 2011-01-01 0
Sunshade 2011-02-01 0
Sunshade 2011-03-01 9
Sunshade 2011-04-01 9
Sunshade 2011-05-01 10
Sunshade 2011-06-01 39
Sunshade 2011-07-01 79
Sunshade 2011-08-01 32
Sunshade 2011-09-01 18
Sunshade 2011-10-01 14
Sunshade 2011-11-01 1
Sunshade 2011-12-01 0
24 rows selected.
Here we round the numbers and make sure they don't go negative.
Also we add 12 months to the date so we get the month the forecast is for.
And presto - we have seen the future :-)
We can then concatenate this to the actual sales if we wish :
SQL> SELECT item
2 , mth
3 , qty
4 , type
5 FROM (
6 SELECT sales.item
7 , sales.mth
8 , sales.qty
9 , 'Actual' type
10 FROM sales
11 UNION ALL
12 SELECT item
13 , ADD_MONTHS(mth, 12) mth
14 , GREATEST(ROUND(qty + 12 * slope), 0) qty
15 , 'Forecast' type
16 FROM (
17 SELECT sales.item
18 , sales.mth
19 , sales.qty
20 , REGR_SLOPE(
21 sales.qty
22 , EXTRACT(YEAR FROM sales.mth) * 12 + EXTRACT(MONTH FROM sales.mth)
23 )
24 OVER (
25 PARTITION BY sales.item
26 ORDER BY sales.mth
27 RANGE BETWEEN INTERVAL '23' MONTH PRECEDING AND CURRENT ROW
28 ) slope
29 FROM sales
30 )
31 WHERE mth >= DATE '2010-01-01'
32 )
33 ORDER BY item, mth;
ITEM MTH QTY TYPE
---------- ---------- ---------- --------
Snowchain 2008-01-01 79 Actual
Snowchain 2008-02-01 133 Actual
Snowchain 2008-03-01 24 Actual
Snowchain 2008-04-01 1 Actual
Snowchain 2008-05-01 0 Actual
Snowchain 2008-06-01 0 Actual
Snowchain 2008-07-01 0 Actual
Snowchain 2008-08-01 0 Actual
Snowchain 2008-09-01 1 Actual
Snowchain 2008-10-01 4 Actual
Snowchain 2008-11-01 15 Actual
Snowchain 2008-12-01 74 Actual
Snowchain 2009-01-01 148 Actual
Snowchain 2009-02-01 209 Actual
Snowchain 2009-03-01 30 Actual
Snowchain 2009-04-01 2 Actual
Snowchain 2009-05-01 0 Actual
Snowchain 2009-06-01 0 Actual
Snowchain 2009-07-01 0 Actual
Snowchain 2009-08-01 1 Actual
Snowchain 2009-09-01 0 Actual
Snowchain 2009-10-01 3 Actual
Snowchain 2009-11-01 17 Actual
Snowchain 2009-12-01 172 Actual
Snowchain 2010-01-01 167 Actual
Snowchain 2010-02-01 247 Actual
Snowchain 2010-03-01 42 Actual
Snowchain 2010-04-01 0 Actual
Snowchain 2010-05-01 0 Actual
Snowchain 2010-06-01 0 Actual
Snowchain 2010-07-01 0 Actual
Snowchain 2010-08-01 1 Actual
Snowchain 2010-09-01 0 Actual
Snowchain 2010-10-01 1 Actual
Snowchain 2010-11-01 73 Actual
Snowchain 2010-12-01 160 Actual
Snowchain 2011-01-01 188 Forecast
Snowchain 2011-02-01 305 Forecast
Snowchain 2011-03-01 96 Forecast
Snowchain 2011-04-01 43 Forecast
Snowchain 2011-05-01 31 Forecast
Snowchain 2011-06-01 19 Forecast
Snowchain 2011-07-01 7 Forecast
Snowchain 2011-08-01 0 Forecast
Snowchain 2011-09-01 0 Forecast
Snowchain 2011-10-01 0 Forecast
Snowchain 2011-11-01 45 Forecast
Snowchain 2011-12-01 148 Forecast
Sunshade 2008-01-01 4 Actual
Sunshade 2008-02-01 6 Actual
Sunshade 2008-03-01 32 Actual
Sunshade 2008-04-01 45 Actual
Sunshade 2008-05-01 62 Actual
Sunshade 2008-06-01 58 Actual
Sunshade 2008-07-01 85 Actual
Sunshade 2008-08-01 28 Actual
Sunshade 2008-09-01 24 Actual
Sunshade 2008-10-01 19 Actual
Sunshade 2008-11-01 6 Actual
Sunshade 2008-12-01 8 Actual
Sunshade 2009-01-01 2 Actual
Sunshade 2009-02-01 13 Actual
Sunshade 2009-03-01 29 Actual
Sunshade 2009-04-01 60 Actual
Sunshade 2009-05-01 29 Actual
Sunshade 2009-06-01 78 Actual
Sunshade 2009-07-01 56 Actual
Sunshade 2009-08-01 22 Actual
Sunshade 2009-09-01 11 Actual
Sunshade 2009-10-01 13 Actual
Sunshade 2009-11-01 5 Actual
Sunshade 2009-12-01 3 Actual
Sunshade 2010-01-01 2 Actual
Sunshade 2010-02-01 8 Actual
Sunshade 2010-03-01 28 Actual
Sunshade 2010-04-01 26 Actual
Sunshade 2010-05-01 23 Actual
Sunshade 2010-06-01 46 Actual
Sunshade 2010-07-01 73 Actual
Sunshade 2010-08-01 25 Actual
Sunshade 2010-09-01 13 Actual
Sunshade 2010-10-01 11 Actual
Sunshade 2010-11-01 3 Actual
Sunshade 2010-12-01 5 Actual
Sunshade 2011-01-01 0 Forecast
Sunshade 2011-02-01 0 Forecast
Sunshade 2011-03-01 9 Forecast
Sunshade 2011-04-01 9 Forecast
Sunshade 2011-05-01 10 Forecast
Sunshade 2011-06-01 39 Forecast
Sunshade 2011-07-01 79 Forecast
Sunshade 2011-08-01 32 Forecast
Sunshade 2011-09-01 18 Forecast
Sunshade 2011-10-01 14 Forecast
Sunshade 2011-11-01 1 Forecast
Sunshade 2011-12-01 0 Forecast
96 rows selected.
And that is the data used for the graph shown above. "Forecast" data is the stapled lines.
The complete spool output of the demo script can be downloaded here.
And you can read more in Oracle documentation on REGR_SLOPE or the data warehousing guide on linear regression.
great post Kim.
ReplyDeletethanks.
Kim
ReplyDeleteI'm not good in programming SQL but still trying to make some sollutions.
Could you possible provide me with an example as above forecasting by date. I have a long history but thought of using one year history forecasting one month ahead.
Best
Fredrik Bjønness
Hi, Fredrik
DeleteWell, what forecasting model would fit your data? That is the big question ;-)
In this example I do an extremely primitive model that simply transposes points on the graph linearly. I have written an article (http://bit.ly/kibeha_forecast_SQL_article) about using a slightly more advanced (though still fairly primitive) time series model.
But both models are best for data with a seasonal/cyclic variation. In my case monthly datapoints in a yearly cycle. Daily datapoints in a monthly cycle has the problem that there are not the same number of days in each month, which in these primitive models are tricky. If the cycle is weekly, you could use the model to forecast 4 weeks based on 52 weeks history, for example.
However, if you don't know already that your data fits some sort of cyclic time series predictive model, it will usually give better results to use a forecasting tool instead of modelling it manually in SQL. If you have the license for Advanced Analytic Option, you can use the data mining packages for this (I have no experience myself with this.) If you don't have the license, you can pull out the data and use other tools like R or Python. Again I am not an expert, but I have written another article on using R (http://bit.ly/kibeha_forecast_R_article), where the forecasting functions in R can automatically try out many models and pick the one that fits the data best.
Having said that, here's a quick example of doing the very simple forecasting of this post on a daily basis: https://livesql.oracle.com/apex/livesql/s/itd3qkums22bux4icxv3qit5p
In that LiveSQL script I create 500 days of history (2018-12-31 and 500 days back). For all days in december I calculate the slope based on 1 year backwards from that day. This I use to transpose every day of december 31 days forward as a forecast of january. It is as stated a rather primitive forecasting model I use, but if you can put it to some use, go right ahead :-)
Cheerio
/Kim