Analytic sales forecast

Analytic functions have been very helpful at my work. One of the good examples is trying to forecast next years sales for each item taking into consideration seasonal variations and whether the item has been going up in sales the last years or declining. In this blog post I demonstrate how to use REGR_SLOPE for that purpose.

We have our sales statistics by month (though we're thinking about going to weekly data.)
So for the simple demo we create this very simple table:

```SQL> CREATE TABLE sales (
2   item  VARCHAR2(10),
3   mth  DATE,
4   qty  NUMBER
5  )
6  /

Table created.
```

We populate it with real data for the years 2008, 2009 and 2010 for two items:
• Snowchain, which typically sells in the winter and has been going up in sales.
• Sunshade, which sells in the summer and has been slightly declining.
(My demo script that populates the table can be downloaded here.)

Let us now pretend we are now in January 2011 and wishes to forecast the 2011 sales.
Let me start by showing the result we are aiming for (larger picture here) :

Snowchain (in blue) sells in the winter months. The dotted line shows it is generally going up in sales. The stapled line (forecast) is in a sense the 2010 curve moved 12 months to the right along the dotted line. Therefore the 2011 forecast of 882 is higher than 2010 sales of 691.

Similarly for Sunshade (in red), but the peaks are here in the summer months and the sales decline.

Let us see what we can do with REGR_SLOPE analytic function :
```
SQL> SELECT sales.item
2     , sales.mth
3     , sales.qty
4     , REGR_SLOPE(
5          sales.qty
6        , EXTRACT(YEAR FROM sales.mth) * 12 + EXTRACT(MONTH FROM sales.mth)
7       )
8       OVER (
9          PARTITION BY sales.item
10          ORDER BY sales.mth
11          RANGE BETWEEN INTERVAL '23' MONTH PRECEDING AND CURRENT ROW
12       ) slope
13    FROM sales
14   ORDER BY sales.item, sales.mth;

ITEM       MTH               QTY      SLOPE
---------- ---------- ---------- ----------
Snowchain  2008-01-01         79
Snowchain  2008-02-01        133         54
Snowchain  2008-03-01         24      -27,5
...
Snowchain  2010-10-01          1 -2,2743478
Snowchain  2010-11-01         73 -2,3630435
Snowchain  2010-12-01        160 -,99086957
Sunshade   2008-02-01          6          2
Sunshade   2008-03-01         32         14
...
Sunshade   2010-10-01         11 ,217391304
Sunshade   2010-11-01          3 -,20043478
Sunshade   2010-12-01          5 -,57391304

72 rows selected.
```

REGR_SLOPE gives us the slope of the linear extrapolation of the dots in the graph.
The vertical axis of the graph contains sales.qty .
The horizontal axis is a numeric representation of the month (unit 1 = 1 month.)
PARTITION BY considers each item separately.
ORDER BY defines the order the RANGE clause will consider.
RANGE then says to calculate the slope based on the current row and 23 months back (= 2 years data.)

Note: our slope keeps changing as it uses a sliding 2-year window (rather than the simplified dotted line in the graph above.) By calculating the slope with a sliding window we can handle items that used to go up in sales the first years we sold it, but then started to decline as for example new technology gets on the market.

The calculated slope we can then use for moving each graph point 12 months into the future:
```
SQL> SELECT item
2     , mth
3     , qty
4     , qty + 12 * slope qty_next_year
5    FROM (
6   SELECT sales.item
7        , sales.mth
8        , sales.qty
9        , REGR_SLOPE(
10      sales.qty
11    , EXTRACT(YEAR FROM sales.mth) * 12 + EXTRACT(MONTH FROM sales.mth)
12          )
13          OVER (
14      PARTITION BY sales.item
15      ORDER BY sales.mth
16      RANGE BETWEEN INTERVAL '23' MONTH PRECEDING AND CURRENT ROW
17          ) slope
18     FROM sales
19       )
20   WHERE mth >= DATE '2010-01-01'
21   ORDER BY item, mth;

ITEM       MTH               QTY QTY_NEXT_YEAR
---------- ---------- ---------- -------------
Snowchain  2010-01-01        167    188,313043
Snowchain  2010-02-01        247    304,855652
Snowchain  2010-03-01         42    96,3913043
Snowchain  2010-04-01          0    42,6991304
Snowchain  2010-05-01          0    30,8869565
Snowchain  2010-06-01          0    19,0747826
Snowchain  2010-07-01          0     7,2626087
Snowchain  2010-08-01          1    -3,4295652
Snowchain  2010-09-01          0    -16,121739
Snowchain  2010-10-01          1    -26,292174
Snowchain  2010-11-01         73    44,6434783
Snowchain  2010-12-01        160    148,109565
Sunshade   2010-01-01          2    -11,617391
Sunshade   2010-02-01          8    -11,137391
Sunshade   2010-03-01         28    9,11304348
Sunshade   2010-04-01         26    8,86086957
Sunshade   2010-05-01         23    9,66434783
Sunshade   2010-06-01         46    39,1130435
Sunshade   2010-07-01         73    79,4486957
Sunshade   2010-08-01         25    31,7147826
Sunshade   2010-09-01         13    18,0504348
Sunshade   2010-10-01         11    13,6086957
Sunshade   2010-11-01          3    ,594782609
Sunshade   2010-12-01          5    -1,8869565

24 rows selected.
```

For each month in 2010 we take the quantity and add the slope multiplied by 12 months.
That will give us the quantity expected for the same month next year.

But we like the numbers to be non-negative integers :
```
SQL> SELECT item
2     , ADD_MONTHS(mth, 12) mth
3     , GREATEST(ROUND(qty + 12 * slope), 0) forecast
4    FROM (
5   SELECT sales.item
6        , sales.mth
7        , sales.qty
8        , REGR_SLOPE(
9      sales.qty
10    , EXTRACT(YEAR FROM sales.mth) * 12 + EXTRACT(MONTH FROM sales.mth)
11          )
12          OVER (
13      PARTITION BY sales.item
14      ORDER BY sales.mth
15      RANGE BETWEEN INTERVAL '23' MONTH PRECEDING AND CURRENT ROW
16          ) slope
17     FROM sales
18       )
19   WHERE mth >= DATE '2010-01-01'
20   ORDER BY item, mth;

ITEM       MTH          FORECAST
---------- ---------- ----------
Snowchain  2011-01-01        188
Snowchain  2011-02-01        305
Snowchain  2011-03-01         96
Snowchain  2011-04-01         43
Snowchain  2011-05-01         31
Snowchain  2011-06-01         19
Snowchain  2011-07-01          7
Snowchain  2011-08-01          0
Snowchain  2011-09-01          0
Snowchain  2011-10-01          0
Snowchain  2011-11-01         45
Snowchain  2011-12-01        148

24 rows selected.
```

Here we round the numbers and make sure they don't go negative.
Also we add 12 months to the date so we get the month the forecast is for.
And presto - we have seen the future :-)

We can then concatenate this to the actual sales if we wish :
```
SQL> SELECT item
2     , mth
3     , qty
4     , type
5    FROM (
6   SELECT sales.item
7        , sales.mth
8        , sales.qty
9        , 'Actual' type
10     FROM sales
11   UNION ALL
12   SELECT item
13        , ADD_MONTHS(mth, 12) mth
14        , GREATEST(ROUND(qty + 12 * slope), 0) qty
15        , 'Forecast' type
16     FROM (
17      SELECT sales.item
18    , sales.mth
19    , sales.qty
20    , REGR_SLOPE(
21         sales.qty
22       , EXTRACT(YEAR FROM sales.mth) * 12 + EXTRACT(MONTH FROM sales.mth)
23      )
24      OVER (
25         PARTITION BY sales.item
26         ORDER BY sales.mth
27         RANGE BETWEEN INTERVAL '23' MONTH PRECEDING AND CURRENT ROW
28      ) slope
29        FROM sales
30          )
31    WHERE mth >= DATE '2010-01-01'
32       )
33   ORDER BY item, mth;

ITEM       MTH               QTY TYPE
---------- ---------- ---------- --------
Snowchain  2008-01-01         79 Actual
Snowchain  2008-02-01        133 Actual
Snowchain  2008-03-01         24 Actual
Snowchain  2008-04-01          1 Actual
Snowchain  2008-05-01          0 Actual
Snowchain  2008-06-01          0 Actual
Snowchain  2008-07-01          0 Actual
Snowchain  2008-08-01          0 Actual
Snowchain  2008-09-01          1 Actual
Snowchain  2008-10-01          4 Actual
Snowchain  2008-11-01         15 Actual
Snowchain  2008-12-01         74 Actual
Snowchain  2009-01-01        148 Actual
Snowchain  2009-02-01        209 Actual
Snowchain  2009-03-01         30 Actual
Snowchain  2009-04-01          2 Actual
Snowchain  2009-05-01          0 Actual
Snowchain  2009-06-01          0 Actual
Snowchain  2009-07-01          0 Actual
Snowchain  2009-08-01          1 Actual
Snowchain  2009-09-01          0 Actual
Snowchain  2009-10-01          3 Actual
Snowchain  2009-11-01         17 Actual
Snowchain  2009-12-01        172 Actual
Snowchain  2010-01-01        167 Actual
Snowchain  2010-02-01        247 Actual
Snowchain  2010-03-01         42 Actual
Snowchain  2010-04-01          0 Actual
Snowchain  2010-05-01          0 Actual
Snowchain  2010-06-01          0 Actual
Snowchain  2010-07-01          0 Actual
Snowchain  2010-08-01          1 Actual
Snowchain  2010-09-01          0 Actual
Snowchain  2010-10-01          1 Actual
Snowchain  2010-11-01         73 Actual
Snowchain  2010-12-01        160 Actual
Snowchain  2011-01-01        188 Forecast
Snowchain  2011-02-01        305 Forecast
Snowchain  2011-03-01         96 Forecast
Snowchain  2011-04-01         43 Forecast
Snowchain  2011-05-01         31 Forecast
Snowchain  2011-06-01         19 Forecast
Snowchain  2011-07-01          7 Forecast
Snowchain  2011-08-01          0 Forecast
Snowchain  2011-09-01          0 Forecast
Snowchain  2011-10-01          0 Forecast
Snowchain  2011-11-01         45 Forecast
Snowchain  2011-12-01        148 Forecast
Sunshade   2008-01-01          4 Actual
Sunshade   2008-02-01          6 Actual
Sunshade   2008-03-01         32 Actual
Sunshade   2008-04-01         45 Actual
Sunshade   2008-05-01         62 Actual
Sunshade   2008-06-01         58 Actual
Sunshade   2008-07-01         85 Actual
Sunshade   2008-08-01         28 Actual
Sunshade   2008-09-01         24 Actual
Sunshade   2008-10-01         19 Actual
Sunshade   2008-11-01          6 Actual
Sunshade   2008-12-01          8 Actual
Sunshade   2009-01-01          2 Actual
Sunshade   2009-02-01         13 Actual
Sunshade   2009-03-01         29 Actual
Sunshade   2009-04-01         60 Actual
Sunshade   2009-05-01         29 Actual
Sunshade   2009-06-01         78 Actual
Sunshade   2009-07-01         56 Actual
Sunshade   2009-08-01         22 Actual
Sunshade   2009-09-01         11 Actual
Sunshade   2009-10-01         13 Actual
Sunshade   2009-11-01          5 Actual
Sunshade   2009-12-01          3 Actual
Sunshade   2010-01-01          2 Actual
Sunshade   2010-02-01          8 Actual
Sunshade   2010-03-01         28 Actual
Sunshade   2010-04-01         26 Actual
Sunshade   2010-05-01         23 Actual
Sunshade   2010-06-01         46 Actual
Sunshade   2010-07-01         73 Actual
Sunshade   2010-08-01         25 Actual
Sunshade   2010-09-01         13 Actual
Sunshade   2010-10-01         11 Actual
Sunshade   2010-11-01          3 Actual
Sunshade   2010-12-01          5 Actual
Sunshade   2011-01-01          0 Forecast
Sunshade   2011-02-01          0 Forecast
Sunshade   2011-03-01          9 Forecast
Sunshade   2011-04-01          9 Forecast
Sunshade   2011-05-01         10 Forecast
Sunshade   2011-06-01         39 Forecast
Sunshade   2011-07-01         79 Forecast
Sunshade   2011-08-01         32 Forecast
Sunshade   2011-09-01         18 Forecast
Sunshade   2011-10-01         14 Forecast
Sunshade   2011-11-01          1 Forecast
Sunshade   2011-12-01          0 Forecast

96 rows selected.
```

And that is the data used for the graph shown above. "Forecast" data is the stapled lines.

The complete spool output of the demo script can be downloaded here.
And you can read more in Oracle documentation on REGR_SLOPE or the data warehousing guide on linear regression.

1. great post Kim.
thanks.

2. Kim

I'm not good in programming SQL but still trying to make some sollutions.
Could you possible provide me with an example as above forecasting by date. I have a long history but thought of using one year history forecasting one month ahead.

Best
Fredrik Bjønness

1. Hi, Fredrik

Well, what forecasting model would fit your data? That is the big question ;-)

In this example I do an extremely primitive model that simply transposes points on the graph linearly. I have written an article (http://bit.ly/kibeha_forecast_SQL_article) about using a slightly more advanced (though still fairly primitive) time series model.

But both models are best for data with a seasonal/cyclic variation. In my case monthly datapoints in a yearly cycle. Daily datapoints in a monthly cycle has the problem that there are not the same number of days in each month, which in these primitive models are tricky. If the cycle is weekly, you could use the model to forecast 4 weeks based on 52 weeks history, for example.

However, if you don't know already that your data fits some sort of cyclic time series predictive model, it will usually give better results to use a forecasting tool instead of modelling it manually in SQL. If you have the license for Advanced Analytic Option, you can use the data mining packages for this (I have no experience myself with this.) If you don't have the license, you can pull out the data and use other tools like R or Python. Again I am not an expert, but I have written another article on using R (http://bit.ly/kibeha_forecast_R_article), where the forecasting functions in R can automatically try out many models and pick the one that fits the data best.

Having said that, here's a quick example of doing the very simple forecasting of this post on a daily basis: https://livesql.oracle.com/apex/livesql/s/itd3qkums22bux4icxv3qit5p

In that LiveSQL script I create 500 days of history (2018-12-31 and 500 days back). For all days in december I calculate the slope based on 1 year backwards from that day. This I use to transpose every day of december 31 days forward as a forecast of january. It is as stated a rather primitive forecasting model I use, but if you can put it to some use, go right ahead :-)

Cheerio
/Kim