Oracle Recursive With With Currywürste

At the age of 53 I have discovered the currywurst, a most delicious fast food found all over Germany. This food needs to be commercially discovered in the UK, my home, so I can enjoy it on-demand there too. On the internationally recognised O’Shea tastiness and satisfaction scale, the currywurst has a value of 6.022*1023, a magic number in more ways than one.  I tweeted my satisfaction with the currywurst a few weeks back too. My like for these things is also not fleeting, I have just purchased another one, and I include a photograph of this specific gastronomic marvel below. I am just about to eat it.

Yum yum, a lot gone, but still a bit left to gobble, and gobble it I will.

It is all gone now 🙁 Here is further photographic evidence. I wrote I was going to down it, I did, and I am a man of my word after all. I might opt for a German weißbier now too, perhaps two, but I digress. Those of you that know me can attest to this fact too, though I would politely request you do so to yourself quietly and not make comment on social media.

The basic ingredients of my currywurst are a bread roll (according to my German visual dictionary termed a Brötchen but here in Munich known as a Semmel), that in turn has ingredients of flour, water, salt, butter, yeast, and sugar (well I’m guessing these are the ingredients, I don’t know, perhaps I have forgotten seeds and more). Similarly there is the diced sausage, the curry powder, the sauce and its ingredients, and outlets other than Yormas Munich Hbf may offer the currywurst with chips rather than a semmel, so the recipe may vary.

Finally, all these ingredients have different manufacturing dates, and different expiry dates. No currywurst should be made with expired ingredients, and no currywurst should be sold to the consumer outside its expiry date either (that, by-the-way, is no more 10 seconds after it has been prepared). There are of course those peculiar food types too, and I doubt any would be included in the currywurst, that have a best after date, or do not use before date, that include some hung or uncured meats, wines, and cheeses.

In manufacturing terms, the currywurst ingredients, the ingredients ingredients, the ingredients ingredients ingredients, and so on, are a Bill of Materials (BOM).

For end-to-end

  • currywurst production (and most food), there will be quality processes in place to ensure that no out-of-date ingredients are included in the baking or cooking processes. Just thinking out loud – the use of out-of-date ingredients in food production could result in food poisoning, visits by food authorities, prosecution, and more.
  • software development, where there may be many in-house or third-party licenced components that have different component licence expiry dates, and their internal or embedded components in turn have different licence dates, there must also be a QA processes to ensure that nothing is included in the finished software application that is out-of-licence. Directly paraphrasing what I have written above, using components that are out-of-licence could result in claims of software piracy, royalty payment litigation, and more.

In an organisation of any size, the inventory and BOM would likely be in a database of some sort and (and this part I’ve just made up for the sake of discussion) it will be stored in a hierarchy. Rather than refer to individual software components, or currywurst [or their plural that is currywürste (I’m writing in English, but in German written as Currywürste as nouns are capitalised irrespective of where they appear in the sentence)], from here-on-in I will genericize the examples. For example, I represent some arbitrary widget as a hierarchical aggregate of other widgets, and these widgets in turn may be aggregates of other widgets, and so on. The example to the left shows one component, named Product A, that consists of other items that may be widgets or other products in their own right, ie. Product B and Product C. A currywurst, a software application, and an interdependent set of contracts all fit into this generic model, as does much else.  Furthermore, as above in the discussion about food expiry dates or software licence dates, each widget whatever it represents may have a valid-from and valid-to date, or its valid-from/to dates determined from its components.

How is this hierarchical information, the product names, the date boundaries, the relationship between the widgets, all represented within the database? I have concocted some dummy data to minimally represent this information, and show it below.

SQL>
SQL>
SQL>
SQL> SELECT *
  2    FROM bom;

 ID        WIDGETNAME   PARENTID VALIDFROM   VALIDTO
---------- ---------- ---------- --------- ---------
  100      Product A
 2016      Widget2016        100
 9994      Widget9994       2016
   81      Widget0081       9994
 1025      Widget1025         81 14-FEB-17
  105      Widget0105         81
  107      Widget0107       9994 20-FEB-17 10-NOV-17
  108      Widget0108       9994 01-MAY-17 14-AUG-17
  106      Widget0106       2016 03-FEB-17
  101      Widget0101        100
 8192      Widget8192        100 11-NOV-17
 9000      Product B         101 24-MAY-17 15-JUL-17
  110      Widget0110       9000
  109      Widget0109       9000 25-MAY-17
 3000      Widget3000        101
 1000      Product C        3000
 3110      Widget3110       1000 19-MAR-17 15-JAN-18
  104      Widget0104       3110 01-JAN-17 28-SEP-17
 3218      Widget3218       3110 16-MAR-17 31-DEC-17
 3219      Widget3219       3110 14-FEB-17 14-SEP-17
  103      Widget0103       1000 13-MAR-17 19-AUG-17
  102      Widget0102       3000
   14      Widget0014       3000 23-MAR-17 16-OCT-17

23 rows selected.

SQL>
SQL>

It is a little difficult to see the wood from the trees in this data, so to present it a more visually using the Oracle  START WITH CONNECT BY PRIOR query syntax, the way you used to perform hierarchical queries vide infra, the aggregate components of Product A is shown below. I have also added a bit of padding to show some component hierarchy and defaulted values for missing dates with the earliest and maximum dates supported by the current version of Oracle.

SQL> SELECT id,
 2      LPAD(' ', 4 * LEVEL - 1) || widgetName widgetName,
 3      COALESCE(validFrom, DATE '-4712-1-1') validFrom,
 4      COALESCE(validTo, DATE '9999-12-31') validTo
 5    FROM bom
 6      START WITH widgetName = 'Product A'
 7        CONNECT BY PRIOR id = parentID;

 ID        WIDGETNAME                               VALIDFROM   VALIDTO
---------- ---------------------------------------- --------- ---------
  100      Product A                                01-JAN-12 31-DEC-99
  101          Widget0101                           01-JAN-12 31-DEC-99
 3000              Widget3000                       01-JAN-12 31-DEC-99
   14                  Widget0014                   23-MAR-17 16-OCT-17
  102                  Widget0102                   01-JAN-12 31-DEC-99
 1000                  Product C                    01-JAN-12 31-DEC-99
  103                      Widget0103               13-MAR-17 19-AUG-17
 3110                      Widget3110               19-MAR-17 15-JAN-18
  104                          Widget0104           01-JAN-17 28-SEP-17
 3218                          Widget3218           16-MAR-17 31-DEC-17
 3219                          Widget3219           14-FEB-17 14-SEP-17
 9000              Product B                        24-MAY-17 15-JUL-17
  109                  Widget0109                   25-MAY-17 31-DEC-99
  110                  Widget0110                   01-JAN-12 31-DEC-99
 2016          Widget2016                           01-JAN-12 31-DEC-99
  106              Widget0106                       03-FEB-17 31-DEC-99
 9994              Widget9994                       01-JAN-12 31-DEC-99
   81                  Widget0081                   01-JAN-12 31-DEC-99
  105                      Widget0105               01-JAN-12 31-DEC-99
 1025                      Widget1025               14-FEB-17 31-DEC-99
  107              Widget0107                       20-FEB-17 10-NOV-17
  108              Widget0108                       01-MAY-17 14-AUG-17
 8192         Widget8192                            01-JAN-12 11-NOV-17

23 rows selected.

SQL>
SQL>

With the data presented in this hierarchy, it is clear that it maps to the image I have shown above for Products A, B, and C. Summarising, I have painted the BOM picture, and the data format, and that some dates are missing and use system limitation defaults. The big question now is how in SQL do you efficiently traverse this tree to get the most recent valid-from date and the earliest valid-to date. In other words the window of valid-from/valid-to derived from all the valid-from/valid-to dates for any widget in the database in the hierarchy, and the hierarchy may be large (an A380 is estimated to have 4 million parts, and Airbus make more than just this single airplane, and comparing a currywurst to an A380 is, I appreciate, a stretch by even my standards but my point is that this task requires a database and efficient design and it isn’t something that can be done efficiently or reliably by-hand or by eye-balling the data).

I have also been slightly mischievous in the wording of the sentences above,  the most recent valid-from date, and the earliest valid-to date. Put another way, if you believe that this data can be derived from a trivial MAX(validFrom) and MIN(validTo) columns of data shown in the hierarchical query above, then you have made a schoolboy error. I will explain.

The problem requires some thought, and the START WITH CONNECT BY PRIOR query syntax, although exceptionally simple and ideal for traversing hierarchical data, is deficient for the requirement. My direct translation of this non-standard and proprietary query syntax, this time without the widget name padding, is shown below. The query syntax is what is known colloquially as the Oracle Recursive WITH (but in the Oracle documentation referred to as recursive subquery refactoring, but in PostgreSQL, SQL Server, and ANSI standards referred to as Common Table Expressions or CTE’s) and initially it does look a little more complicated.

SQL>
SQL>
SQL>
SQL>
SQL> WITH dd AS (SELECT DATE '-4712-1-1' minValidFrom, DATE '9999-12-31' maxValidTo FROM DUAL),
 2    x (widgetName, childID, parentID, validFrom, validTo) AS
 3       (
 4           SELECT t0.widgetName, -- anchor member/start
 5                  t0.id childID,
 6                  t0.parentID parentID,
 7                  COALESCE(t0.validFrom, d0.minValidFrom),
 8                  COALESCE(t0.validTo, d0.maxValidTo)
 9             FROM bom t0
 10              CROSS JOIN dd d0
 11                WHERE t0.widgetName = 'Product A'
 12        UNION ALL
 13          SELECT t1.widgetName, -- recursive member
 14                 t1.id childID,
 15                 t1.parentID parentID,
 16                 COALESCE(t1.validFrom, d1.minValidFrom),
 17                 COALESCE(t1.validTo, d1.maxValidTo)
 18            FROM bom t1
 19             CROSS JOIN dd d1
 20               INNER JOIN x rw ON rw.childID = t1.parentID
 21       )
 22      SELECT t2.childID id,
 23             t2.widgetName,
 24             t2.validFrom,
 25             t2.validTo
 26        FROM x t2;

 ID        WIDGETNAME       VALIDFROM   VALIDTO
---------- ---------------- --------- ---------
  100      Product A        01-JAN-12 31-DEC-99
 2016      Widget2016       01-JAN-12 31-DEC-99
  101      Widget0101       01-JAN-12 31-DEC-99
 8192      Widget8192       01-JAN-12 11-NOV-17
 9994      Widget9994       01-JAN-12 31-DEC-99
  106      Widget0106       03-FEB-17 31-DEC-99
 9000      Product B        24-MAY-17 15-JUL-17
 3000      Widget3000       01-JAN-12 31-DEC-99
   81      Widget0081       01-JAN-12 31-DEC-99
  107      Widget0107       20-FEB-17 10-NOV-17
  108      Widget0108       01-MAY-17 14-AUG-17
  110      Widget0110       01-JAN-12 31-DEC-99
  109      Widget0109       25-MAY-17 31-DEC-99
 1000      Product C        01-JAN-12 31-DEC-99
  102      Widget0102       01-JAN-12 31-DEC-99
   14      Widget0014       23-MAR-17 16-OCT-17
 1025      Widget1025       14-FEB-17 31-DEC-99
  105      Widget0105       01-JAN-12 31-DEC-99
 3110      Widget3110       19-MAR-17 15-JAN-18
  103      Widget0103       13-MAR-17 19-AUG-17
  104      Widget0104       01-JAN-17 28-SEP-17
 3218      Widget3218       16-MAR-17 31-DEC-17
 3219      Widget3219       14-FEB-17 14-SEP-17

23 rows selected.

SQL>
SQL>

The result set however is the same.

To find the window, the most recent valid-from and valid-to dates, can be obtained using the the MIN and MAX aggregate functions thus:

SQL>
SQL>
SQL> WITH dd AS (SELECT DATE '-4712-1-1' minValidFrom, DATE '9999-12-31' maxValidTo FROM DUAL),
 2    x (widgetName, childID, parentID, validFrom, validTo) AS
 3       (
 4           SELECT t0.widgetName,
 5                   t0.id childID,
 6                   t0.parentID parentID,
 7                   COALESCE(t0.validFrom, d0.minValidFrom),
 8                   COALESCE(t0.validTo, d0.maxValidTo)
 9            FROM bom t0
 10             CROSS JOIN dd d0
 11               WHERE t0.widgetName = 'Product A'
 12       UNION ALL
 13          SELECT t1.widgetName,
 14                  t1.id childID,
 15                  t1.parentID parentID,
 16                  COALESCE(t1.validFrom, d1.minValidFrom),
 17                  COALESCE(t1.validTo, d1.maxValidTo)
 18           FROM bom t1
 19             CROSS JOIN dd d1
 20               INNER JOIN x rw ON rw.childID = t1.parentID
 21      )
 22    SELECT MAX(t2.validFrom) validFrom,
 23           MIN(t2.validTo) validTo
 24      FROM x t2;

VALIDFROM VALIDTO
--------- ---------
25-MAY-17 15-JUL-17

SQL>

The syntax of this SQL query also closely resembles the Standard, and also as implemented in PostgreSQL and Microsoft SQL Server too, so this has to be a good thing.

Here is the schoolboy error I referred to above. Did you notice the error in the data. Junk data would never occur in a corporate database would it? The junk … well notice that the validFrom date for widget0109 is after the validFrom date for Product B. How can Product B have a validFrom date from the date before?

Using the Recursive WITH, appending logic to prevent the query traversing the tree where the parent validTo node was before the child validFrom date can be addressed with a predicate. For example, this scenario can be directly accommodated for with the addition of

 WHERE COALESCE(t1.validFrom, d1.minValidFrom) <= rw.validFrom
    OR rw.validFrom = d1.minValidFrom

so execution of the full code snippet, tweaked to include this additional predicate, would be

SQL>
SQL>
SQL>
SQL> WITH dd AS (SELECT DATE '-4712-1-1' minValidFrom,  DATE '9999-12-31' maxValidTo FROM DUAL),
  2        x (widgetName, childID, parentID, validFrom, validTo) AS
  3        (
  4              SELECT t0.widgetName,
  5                     t0.id childID,
  6                     t0.parentID parentID,
  7                     COALESCE(t0.validFrom, d0.minValidFrom),
  8                     COALESCE(t0.validTo, d0.maxValidTo)
  9                FROM bom t0
 10                  CROSS JOIN dd d0
 11                    WHERE t0.widgetName = 'Product A'
 12          UNION ALL
 13              SELECT t1.widgetName,
 14                     t1.id childID,
 15                     t1.parentID parentID,
 16                     COALESCE(t1.validFrom, d1.minValidFrom),
 17                     COALESCE(t1.validTo, d1.maxValidTo)
 18                FROM bom t1
 19                  CROSS JOIN dd d1
 20                    INNER JOIN x rw ON rw.childID = t1.parentID
 21                      WHERE COALESCE(t1.validFrom, d1.minValidFrom) <= rw.validFrom
 22                         OR rw.validFrom = d1.minValidFrom
 23       )
 24     SELECT MAX(t2.validFrom) validFrom,
 25            MIN(t2.validTo) validTo
 26        FROM x t2;

 VALIDFROM VALIDTO
 --------- ---------
 24-MAY-17 15-JUL-17

SQL>
SQL>

Note the difference in the valid-from date, from 25-MAY-2017 to 24-MAY-2017.

What I have demonstrated in this blog post is the Oracle Recursive WITH (or CTE vide supra), as a replacement for the Oracle START WITH CONNECT BY PRIOR syntax.  I have also demonstrated how to add a predicate to the recursive member (there are two SELECT statements in the CTE, the anchor, a UNION ALL, and then the recursive member) to conditionally limit tree traversal. What I have not done, but it should be obvious but I will need to point this out, is that the START WITH CONNECT BY PRIOR syntax is restricted to traversing hierarchically organised data in a single table. The CTE can be used for multiple tables – there is nothing to limit the recursive or anchor member from containing any reasonable piece of SQL with multiple table joins, should the underlying data be structured in this manner.

I have rabbited on a bit (perhaps I did have those two weißbier after all  ….. burp) providing some background to a business problem where this type of hierarchical query could be used, and dressed it up a little to make things less of a dry read.

The code in this blog article was developed and tested against the following Oracle version.

SQL>
SQL> set linesize 132 wrap off tab off trunc off
SQL>
SQL> SELECT *
2      FROM v$version;

BANNER                                                                        CON_ID
----------------------------------------------------------------------------  ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production 0
PL/SQL Release 12.1.0.2.0 - Production                                        0
CORE 12.1.0.2.0 Production                                                    0
TNS for HPUX: Version 12.1.0.2.0 - Production                                 0
NLSRTL Version 12.1.0.2.0 - Production                                        0
SQL>
SQL>

— Published by Mike, 18:45:31 28 May 2017 (CET)

Leave a Reply