## Tuesday, February 14, 2012

### Analytic sales forecast

Analytic functions have been very helpful at my work. One of the good examples is trying to forecast next years sales for each item taking into consideration seasonal variations and whether the item has been going up in sales the last years or declining. In this blog post I demonstrate how to use REGR_SLOPE for that purpose.

We have our sales statistics by month (though we're thinking about going to weekly data.)
So for the simple demo we create this very simple table:

```SQL> CREATE TABLE sales (
2   item  VARCHAR2(10),
3   mth  DATE,
4   qty  NUMBER
5  )
6  /

Table created.
```

We populate it with real data for the years 2008, 2009 and 2010 for two items:
• Snowchain, which typically sells in the winter and has been going up in sales.
• Sunshade, which sells in the summer and has been slightly declining.

Let us now pretend we are now in January 2011 and wishes to forecast the 2011 sales.
Let me start by showing the result we are aiming for (larger picture here) :

Snowchain (in blue) sells in the winter months. The dotted line shows it is generally going up in sales. The stapled line (forecast) is in a sense the 2010 curve moved 12 months to the right along the dotted line. Therefore the 2011 forecast of 882 is higher than 2010 sales of 691.

Similarly for Sunshade (in red), but the peaks are here in the summer months and the sales decline.

Let us see what we can do with REGR_SLOPE analytic function :
```
SQL> SELECT sales.item
2     , sales.mth
3     , sales.qty
4     , REGR_SLOPE(
5          sales.qty
6        , EXTRACT(YEAR FROM sales.mth) * 12 + EXTRACT(MONTH FROM sales.mth)
7       )
8       OVER (
9          PARTITION BY sales.item
10          ORDER BY sales.mth
11          RANGE BETWEEN INTERVAL '23' MONTH PRECEDING AND CURRENT ROW
12       ) slope
13    FROM sales
14   ORDER BY sales.item, sales.mth;

ITEM       MTH               QTY      SLOPE
---------- ---------- ---------- ----------
Snowchain  2008-01-01         79
Snowchain  2008-02-01        133         54
Snowchain  2008-03-01         24      -27,5
...
Snowchain  2010-10-01          1 -2,2743478
Snowchain  2010-11-01         73 -2,3630435
Snowchain  2010-12-01        160 -,99086957
...

72 rows selected.
```

REGR_SLOPE gives us the slope of the linear extrapolation of the dots in the graph.
The vertical axis of the graph contains sales.qty .
The horizontal axis is a numeric representation of the month (unit 1 = 1 month.)
PARTITION BY considers each item separately.
ORDER BY defines the order the RANGE clause will consider.
RANGE then says to calculate the slope based on the current row and 23 months back (= 2 years data.)

Note: our slope keeps changing as it uses a sliding 2-year window (rather than the simplified dotted line in the graph above.) By calculating the slope with a sliding window we can handle items that used to go up in sales the first years we sold it, but then started to decline as for example new technology gets on the market.

The calculated slope we can then use for moving each graph point 12 months into the future:
```
SQL> SELECT item
2     , mth
3     , qty
4     , qty + 12 * slope qty_next_year
5    FROM (
6   SELECT sales.item
7        , sales.mth
8        , sales.qty
9        , REGR_SLOPE(
10      sales.qty
11    , EXTRACT(YEAR FROM sales.mth) * 12 + EXTRACT(MONTH FROM sales.mth)
12          )
13          OVER (
14      PARTITION BY sales.item
15      ORDER BY sales.mth
16      RANGE BETWEEN INTERVAL '23' MONTH PRECEDING AND CURRENT ROW
17          ) slope
18     FROM sales
19       )
20   WHERE mth >= DATE '2010-01-01'
21   ORDER BY item, mth;

ITEM       MTH               QTY QTY_NEXT_YEAR
---------- ---------- ---------- -------------
Snowchain  2010-01-01        167    188,313043
Snowchain  2010-02-01        247    304,855652
Snowchain  2010-03-01         42    96,3913043
Snowchain  2010-04-01          0    42,6991304
Snowchain  2010-05-01          0    30,8869565
Snowchain  2010-06-01          0    19,0747826
Snowchain  2010-07-01          0     7,2626087
Snowchain  2010-08-01          1    -3,4295652
Snowchain  2010-09-01          0    -16,121739
Snowchain  2010-10-01          1    -26,292174
Snowchain  2010-11-01         73    44,6434783
Snowchain  2010-12-01        160    148,109565

24 rows selected.
```

For each month in 2010 we take the quantity and add the slope multiplied by 12 months.
That will give us the quantity expected for the same month next year.

But we like the numbers to be non-negative integers :
```
SQL> SELECT item
3     , GREATEST(ROUND(qty + 12 * slope), 0) forecast
4    FROM (
5   SELECT sales.item
6        , sales.mth
7        , sales.qty
8        , REGR_SLOPE(
9      sales.qty
10    , EXTRACT(YEAR FROM sales.mth) * 12 + EXTRACT(MONTH FROM sales.mth)
11          )
12          OVER (
13      PARTITION BY sales.item
14      ORDER BY sales.mth
15      RANGE BETWEEN INTERVAL '23' MONTH PRECEDING AND CURRENT ROW
16          ) slope
17     FROM sales
18       )
19   WHERE mth >= DATE '2010-01-01'
20   ORDER BY item, mth;

ITEM       MTH          FORECAST
---------- ---------- ----------
Snowchain  2011-01-01        188
Snowchain  2011-02-01        305
Snowchain  2011-03-01         96
Snowchain  2011-04-01         43
Snowchain  2011-05-01         31
Snowchain  2011-06-01         19
Snowchain  2011-07-01          7
Snowchain  2011-08-01          0
Snowchain  2011-09-01          0
Snowchain  2011-10-01          0
Snowchain  2011-11-01         45
Snowchain  2011-12-01        148

24 rows selected.
```

Here we round the numbers and make sure they don't go negative.
Also we add 12 months to the date so we get the month the forecast is for.
And presto - we have seen the future :-)

We can then concatenate this to the actual sales if we wish :
```
SQL> SELECT item
2     , mth
3     , qty
4     , type
5    FROM (
6   SELECT sales.item
7        , sales.mth
8        , sales.qty
9        , 'Actual' type
10     FROM sales
11   UNION ALL
12   SELECT item
14        , GREATEST(ROUND(qty + 12 * slope), 0) qty
15        , 'Forecast' type
16     FROM (
17      SELECT sales.item
18    , sales.mth
19    , sales.qty
20    , REGR_SLOPE(
21         sales.qty
22       , EXTRACT(YEAR FROM sales.mth) * 12 + EXTRACT(MONTH FROM sales.mth)
23      )
24      OVER (
25         PARTITION BY sales.item
26         ORDER BY sales.mth
27         RANGE BETWEEN INTERVAL '23' MONTH PRECEDING AND CURRENT ROW
28      ) slope
29        FROM sales
30          )
31    WHERE mth >= DATE '2010-01-01'
32       )
33   ORDER BY item, mth;

ITEM       MTH               QTY TYPE
---------- ---------- ---------- --------
Snowchain  2008-01-01         79 Actual
Snowchain  2008-02-01        133 Actual
Snowchain  2008-03-01         24 Actual
Snowchain  2008-04-01          1 Actual
Snowchain  2008-05-01          0 Actual
Snowchain  2008-06-01          0 Actual
Snowchain  2008-07-01          0 Actual
Snowchain  2008-08-01          0 Actual
Snowchain  2008-09-01          1 Actual
Snowchain  2008-10-01          4 Actual
Snowchain  2008-11-01         15 Actual
Snowchain  2008-12-01         74 Actual
Snowchain  2009-01-01        148 Actual
Snowchain  2009-02-01        209 Actual
Snowchain  2009-03-01         30 Actual
Snowchain  2009-04-01          2 Actual
Snowchain  2009-05-01          0 Actual
Snowchain  2009-06-01          0 Actual
Snowchain  2009-07-01          0 Actual
Snowchain  2009-08-01          1 Actual
Snowchain  2009-09-01          0 Actual
Snowchain  2009-10-01          3 Actual
Snowchain  2009-11-01         17 Actual
Snowchain  2009-12-01        172 Actual
Snowchain  2010-01-01        167 Actual
Snowchain  2010-02-01        247 Actual
Snowchain  2010-03-01         42 Actual
Snowchain  2010-04-01          0 Actual
Snowchain  2010-05-01          0 Actual
Snowchain  2010-06-01          0 Actual
Snowchain  2010-07-01          0 Actual
Snowchain  2010-08-01          1 Actual
Snowchain  2010-09-01          0 Actual
Snowchain  2010-10-01          1 Actual
Snowchain  2010-11-01         73 Actual
Snowchain  2010-12-01        160 Actual
Snowchain  2011-01-01        188 Forecast
Snowchain  2011-02-01        305 Forecast
Snowchain  2011-03-01         96 Forecast
Snowchain  2011-04-01         43 Forecast
Snowchain  2011-05-01         31 Forecast
Snowchain  2011-06-01         19 Forecast
Snowchain  2011-07-01          7 Forecast
Snowchain  2011-08-01          0 Forecast
Snowchain  2011-09-01          0 Forecast
Snowchain  2011-10-01          0 Forecast
Snowchain  2011-11-01         45 Forecast
Snowchain  2011-12-01        148 Forecast

96 rows selected.
```

And that is the data used for the graph shown above. "Forecast" data is the stapled lines.

The complete spool output of the demo script can be downloaded here.
And you can read more in Oracle documentation on REGR_SLOPE or the data warehousing guide on linear regression.

#### 1 comment:

1. great post Kim.
thanks.