Monday, January 23, 2017

Temporal validity and open/closed intervals

A short while ago I created a SQL quiz for Oracle Dev Gym (PL/SQL Challenge) demonstrating the use of temporal validity and VERSIONS PERIOD FOR syntax to create a "change report" for changes in item prices. Use of temporal validity makes this easier, shorter and more readable code, but even though I thought I had it all covered, player Iudith Mentzel pointed out a tiny quirk I'd overlooked.

You see, the "change in price" wasn't really happening at one specific point in time - the old price was valid up to (but not including) a point in time (open-ended interval) and the new price became valid from (and including) that point in time (close-ended interval). If my "change report" happened to match those "ends" of the validity intervals exactly, it would never catch the "change" happening.

All of the SQL shown here can be found in the demo script here.

So for this demonstration I'll have a price table with a temporal validity period valid_price defined on it. I'll assume in this case each item will always have exactly one valid price at a given point in time - that is, that the periods for one item never overlaps and never has gaps in them.


VALID_FROM is included in the valid_price period interval (closed interval), VALID_UP_TO is excluded from valid_price period interval (open interval), and NULL in either date column means "from/to infinity".

create table prices (
   item        varchar2(10) not null
 , valid_from  date
 , valid_up_to date
 , price       number not null
 , period for valid_price (valid_from, valid_up_to)
)
/

insert into prices values ('Santabeard', null             , date '2016-12-11', 100)
/
insert into prices values ('Santabeard', date '2016-12-11', date '2016-12-27',  75)
/
insert into prices values ('Santabeard', date '2016-12-27', null             , 105)
/
insert into prices values ('Beachball' , null             , date '2016-11-10',  14)
/
insert into prices values ('Beachball' , date '2016-11-10', date '2016-12-01',  12)
/
insert into prices values ('Beachball' , date '2016-12-01', null             ,  15)
/
insert into prices values ('Chessgame' , null             , date '2016-12-16',  23)
/
insert into prices values ('Chessgame' , date '2016-12-16', null             ,  21)
/
commit
/

So let's just see an overview of the data:

alter session set nls_date_format = 'YYYY-MM-DD'
/

select *
  from prices
 order by item, valid_from nulls first
/

ITEM       VALID_FROM VALID_UP_T      PRICE
---------- ---------- ---------- ----------
Beachball             2016-11-10         14
Beachball  2016-11-10 2016-12-01         12
Beachball  2016-12-01                    15
Chessgame             2016-12-16         23
Chessgame  2016-12-16                    21
Santabeard            2016-12-11        100
Santabeard 2016-12-11 2016-12-27         75
Santabeard 2016-12-27                   105

With AS OF we can view valid prices at a specific point in time:

select *
  from prices as of period for valid_price date '2016-12-12'
 order by item, valid_from nulls first
/

ITEM       VALID_FROM VALID_UP_T      PRICE
---------- ---------- ---------- ----------
Beachball  2016-12-01                    15
Chessgame             2016-12-16         23
Santabeard 2016-12-11 2016-12-27         75

Typically used in a query for "currently valid price":

select *
  from prices as of period for valid_price sysdate
 order by item, valid_from nulls first
/

ITEM       VALID_FROM VALID_UP_T      PRICE
---------- ---------- ---------- ----------
Beachball  2016-12-01                    15
Chessgame  2016-12-16                    21
Santabeard 2016-12-27                   105

Note that VALID_UP_TO being "open-ended interval" means the Santabead price 100 (that has VALID_UP_TO value of 2016-12-11 00:00:00) is valid at 1 second before midnight:

select *
  from prices as of period for valid_price
                       to_date('2016-12-10 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
 where item = 'Santabeard'
 order by item, valid_from nulls first
/

ITEM       VALID_FROM VALID_UP_T      PRICE
---------- ---------- ---------- ----------
Santabeard            2016-12-11        100

But at midnight, the price 100 is no longer valid, instead the price is 75:

select *
  from prices as of period for valid_price
                       to_date('2016-12-11 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
 where item = 'Santabeard'
 order by item, valid_from nulls first
/

ITEM       VALID_FROM VALID_UP_T      PRICE
---------- ---------- ---------- ----------
Santabeard 2016-12-11 2016-12-27         75

While AS OF gives valid price at specific point in time, VERSIONS BETWEEN gives all prices that are valid within a time period:

select *
  from prices versions period for valid_price
                 between to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
                     and to_date('2016-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
order by item, valid_from nulls first, valid_up_to nulls last
/

ITEM       VALID_FROM VALID_UP_T      PRICE
---------- ---------- ---------- ----------
Beachball  2016-12-01                    15
Chessgame             2016-12-16         23
Chessgame  2016-12-16                    21
Santabeard            2016-12-11        100
Santabeard 2016-12-11 2016-12-27         75
Santabeard 2016-12-27                   105

We can use analytic LEAD function to have the VALID_UP_TO and PRICE of the "old" row and the VALID_FROM and PRICE of the "new" row together

select item
     , valid_up_to as old_valid_up_to
     , price as old_price
     , lead(valid_from) over (
          partition by item order by valid_up_to
       ) as new_valid_from
     , lead(price) over (
          partition by item order by valid_up_to
       ) as new_price
  from prices versions period for valid_price
                 between to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
                     and to_date('2016-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
order by item, valid_from nulls first, valid_up_to nulls last
/

ITEM       OLD_VALID_  OLD_PRICE NEW_VALID_  NEW_PRICE
---------- ---------- ---------- ---------- ----------
Beachball                     15
Chessgame  2016-12-16         23 2016-12-16         21
Chessgame                     21
Santabeard 2016-12-11        100 2016-12-11         75
Santabeard 2016-12-27         75 2016-12-27        105
Santabeard                   105

Which can lead us to a query showing "price changes in December":

select item, old_valid_up_to, old_price, new_valid_from, new_price
  from (
   select item
        , valid_up_to as old_valid_up_to
        , price as old_price
        , lead(valid_from) over (
             partition by item order by valid_up_to
          ) as new_valid_from
        , lead(price) over (
             partition by item order by valid_up_to
          ) as new_price
     from prices versions period for valid_price
                    between to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
                        and to_date('2016-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
  )
 where new_price is not null
 order by item, old_valid_up_to
/

ITEM       OLD_VALID_  OLD_PRICE NEW_VALID_  NEW_PRICE
---------- ---------- ---------- ---------- ----------
Chessgame  2016-12-16         23 2016-12-16         21
Santabeard 2016-12-11        100 2016-12-11         75
Santabeard 2016-12-27         75 2016-12-27        105

All good - we have a "change report" query. But now let's look at valid prices in November instead of December:

select *
  from prices versions period for valid_price
                 between to_date('2016-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
                     and to_date('2016-11-30 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
order by item, valid_from nulls first, valid_up_to nulls last
/

ITEM       VALID_FROM VALID_UP_T      PRICE
---------- ---------- ---------- ----------
Beachball             2016-11-10         14
Beachball  2016-11-10 2016-12-01         12
Chessgame             2016-12-16         23
Santabeard            2016-12-11        100

And then we try the "change report" for November:

select item, old_valid_up_to, old_price, new_valid_from, new_price
  from (
   select item
        , valid_up_to as old_valid_up_to
        , price as old_price
        , lead(valid_from) over (
             partition by item order by valid_up_to
          ) as new_valid_from
        , lead(price) over (
             partition by item order by valid_up_to
          ) as new_price
     from prices versions period for valid_price
                    between to_date('2016-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
                        and to_date('2016-11-30 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
  )
 where new_price is not null
 order by item, old_valid_up_to
/

ITEM       OLD_VALID_  OLD_PRICE NEW_VALID_  NEW_PRICE
---------- ---------- ---------- ---------- ----------
Beachball  2016-11-10         14 2016-11-10         12

OK, we've now shown the change report for the entirety of November, which reports a single price change for Beachball. Before we had the change report for the entirety of December, which reports no price changes for Beachball.

But aren't we missing something? How about if we make a change report for November+December together?

select item, old_valid_up_to, old_price, new_valid_from, new_price
  from (
   select item
        , valid_up_to as old_valid_up_to
        , price as old_price
        , lead(valid_from) over (
             partition by item order by valid_up_to
          ) as new_valid_from
        , lead(price) over (
             partition by item order by valid_up_to
          ) as new_price
     from prices versions period for valid_price
                    between to_date('2016-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
                        and to_date('2016-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
  )
 where new_price is not null
 order by item, old_valid_up_to
/

ITEM       OLD_VALID_  OLD_PRICE NEW_VALID_  NEW_PRICE
---------- ---------- ---------- ---------- ----------
Beachball  2016-11-10         14 2016-11-10         12
Beachball  2016-12-01         12 2016-12-01         15
Chessgame  2016-12-16         23 2016-12-16         21
Santabeard 2016-12-11        100 2016-12-11         75
Santabeard 2016-12-27         75 2016-12-27        105

This time we spot that there was a change for Beachball from a price of 12 valid up to (not including) 2016-12-01 to a price of 15 valid from 2016-12-01. Even though our two previous change reports covered the same time period, neither reported this change.

The "problem" is the price interval "ends" match completely the intervals we are using in our November and December price change report, thus we won't get "old" and "new" row in the same set of data.

If we want such a "price change" report, we'll have to use a "closed" interval for the month we are reporting for, so that when we do next months report, we'll actually have a one second "overlap" with the report from this month:

select item, old_valid_up_to, old_price, new_valid_from, new_price
  from (
   select item
        , valid_up_to as old_valid_up_to
        , price as old_price
        , lead(valid_from) over (
             partition by item order by valid_up_to
          ) as new_valid_from
        , lead(price) over (
             partition by item order by valid_up_to
          ) as new_price
     from prices versions period for valid_price
                    between to_date('2016-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
                        and to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
  )
 where new_price is not null
 order by item, old_valid_up_to
/

ITEM       OLD_VALID_  OLD_PRICE NEW_VALID_  NEW_PRICE
---------- ---------- ---------- ---------- ----------
Beachball  2016-11-10         14 2016-11-10         12
Beachball  2016-12-01         12 2016-12-01         15

This isn't demonstrating any problem with temporal validity. The interval definitions are as they have to be and temporal validity is simply an easier way to deal with these things than doing it yourself with predicates involving proper combinations of AND, OR, >=, < and IS NULL.

But no matter if you do it yourself or you do it with temporal validity, you should beware of how "closed" and "open" intervals work and why the "semi-closed" intervals in temporal validity sometimes can fool you, like missing that I'd actually need "overlap" when doing an otherwise seemingly straight-forward "price change report".

Thank $deity for sharp Dev Gym players - I can always count on them to discover the little quirks I miss myself.

3 comments:

  1. Hi Kim,

    Before, there was a change that you didn't report in either month. Now that change is reported in both months.

    Personally, as a user I would want the change reported once, in this case in December when the new price became effective.

    select * from (
    select ITEM, VALID_UP_TO price_change_date, PRICE old_price,
    lead(price) over(partition by item order by valid_from nulls first) new_price
    from prices
    where (valid_up_to is null or valid_up_to >= date '2016-12-01')
    and (valid_from is null or valid_from < add_months(date '2016-12-01',1))
    )
    where new_price is not null
    order by item, price_change_date;

    Of course, this doesn't use temporal validity syntax anymore, which is unfortunate...

    Best regards, Stew

    ReplyDelete
    Replies
    1. Hi, Stew

      I figured it was just a matter of time before you would comment - you were just faster than I thought ;-)

      Anyway - no, my little "1 second overlap" does not report change in both months, only one. I do my November change report as shown in the blog post with:

      from prices versions period for valid_price
      between to_date('2016-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
      and to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')

      That shows 2 Beachball price changes - one in the month, one at the month switch.
      Then I do my December change report with:

      from prices versions period for valid_price
      between to_date('2016-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
      and to_date('2017-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')

      And it does not show any beachball price changes.

      You're right, though, that from a user perspective, it would probably be better to have my one-second overlap in the other direction, so it shows in December change report instead.

      from prices versions period for valid_price
      between date '2016-12-01' - interval '1' second
      and date '2016-12-01' + interval '1' month - interval '1' second

      That'll also do the trick and get me same results as your query.

      For this use case, I think your query is "neater". Having to do this "trick" of a one-second overlap is kind of unfortunate for temporal validity, but I don't see a way that it could be different.

      Main point is to beware of this, as temporal validity syntax makes it easy to "fall into the trap" that I did ;-)

      Thanks for the comment
      /Kim

      Delete
  2. Hi Kim & Stew,

    Thanks a lot for your thorough analysis :)

    The problem arises from the usual speaking (and Oracle !) semantics of the word BETWEEN,
    which means an interval closed at both edges.

    Thus, the combination of BETWEEN with the temporal validity semi-closed interval logic,
    like in the VERSIONS PERIOD FOR ... BETWEEN query is the cause of the "missing edge"
    effect that we see.

    Temporal validity seems "more naturally suited" for the one-point-in-time
    AS OF PERIOD FOR queries than for the BETWEEN closed interval logic.


    It is always easier "to philosophize" after, than while playing a quiz in a competition :)

    And, just like Kim, I myself should thank the "$deity" once again
    for having given us the treasure of Kim's quizzes and web posts :) :) :)

    Cheers & Best Regards,
    Iudith

    ReplyDelete