At the age of 53 I have discovered the currywurst, a most delicious fast food found all over Germany. This food needs to be commercially discovered in the UK, my home, so I can enjoy it on-demand there too. On the internationally recognised O’Shea tastiness and satisfaction scale, the currywurst has a value of 6.022*1023, a magic number in more ways than one. I tweeted my satisfaction with the currywurst a few weeks back too. My like for these things is also not fleeting, I have just purchased another one, and I include a photograph of this specific gastronomic marvel below. I am just about to eat it.
Yum yum, a lot gone, but still a bit left to gobble, and gobble it I will.
It is all gone now 🙁 Here is further photographic evidence. I wrote I was going to down it, I did, and I am a man of my word after all. I might opt for a German weißbier now too, perhaps two, but I digress. Those of you that know me can attest to this fact too, though I would politely request you do so to yourself quietly and not make comment on social media.
The basic ingredients of my currywurst are a bread roll (according to my German visual dictionary termed a Brötchen but here in Munich known as a Semmel), that in turn has ingredients of flour, water, salt, butter, yeast, and sugar (well I’m guessing these are the ingredients, I don’t know, perhaps I have forgotten seeds and more). Similarly there is the diced sausage, the curry powder, the sauce and its ingredients, and outlets other than Yormas Munich Hbf may offer the currywurst with chips rather than a semmel, so the recipe may vary.
Finally, all these ingredients have different manufacturing dates, and different expiry dates. No currywurst should be made with expired ingredients, and no currywurst should be sold to the consumer outside its expiry date either (that, by-the-way, is no more 10 seconds after it has been prepared). There are of course those peculiar food types too, and I doubt any would be included in the currywurst, that have a best after date, or do not use before date, that include some hung or uncured meats, wines, and cheeses.
In manufacturing terms, the currywurst ingredients, the ingredients ingredients, the ingredients ingredients ingredients, and so on, are a Bill of Materials (BOM).
For end-to-end
In an organisation of any size, the inventory and BOM would likely be in a database of some sort and (and this part I’ve just made up for the sake of discussion) it will be stored in a hierarchy. Rather than refer to individual software components, or currywurst [or their plural that is currywürste (I’m writing in English, but in German written as Currywürste as nouns are capitalised irrespective of where they appear in the sentence)], from here-on-in I will genericize the examples. For example, I represent some arbitrary widget as a hierarchical aggregate of other widgets, and these widgets in turn may be aggregates of other widgets, and so on. The example to the left shows one component, named Product A, that consists of other items that may be widgets or other products in their own right, ie. Product B and Product C. A currywurst, a software application, and an interdependent set of contracts all fit into this generic model, as does much else. Furthermore, as above in the discussion about food expiry dates or software licence dates, each widget whatever it represents may have a valid-from and valid-to date, or its valid-from/to dates determined from its components.
How is this hierarchical information, the product names, the date boundaries, the relationship between the widgets, all represented within the database? I have concocted some dummy data to minimally represent this information, and show it below.
SQL> SQL> SQL> SQL> SELECT * 2 FROM bom; ID WIDGETNAME PARENTID VALIDFROM VALIDTO ---------- ---------- ---------- --------- --------- 100 Product A 2016 Widget2016 100 9994 Widget9994 2016 81 Widget0081 9994 1025 Widget1025 81 14-FEB-17 105 Widget0105 81 107 Widget0107 9994 20-FEB-17 10-NOV-17 108 Widget0108 9994 01-MAY-17 14-AUG-17 106 Widget0106 2016 03-FEB-17 101 Widget0101 100 8192 Widget8192 100 11-NOV-17 9000 Product B 101 24-MAY-17 15-JUL-17 110 Widget0110 9000 109 Widget0109 9000 25-MAY-17 3000 Widget3000 101 1000 Product C 3000 3110 Widget3110 1000 19-MAR-17 15-JAN-18 104 Widget0104 3110 01-JAN-17 28-SEP-17 3218 Widget3218 3110 16-MAR-17 31-DEC-17 3219 Widget3219 3110 14-FEB-17 14-SEP-17 103 Widget0103 1000 13-MAR-17 19-AUG-17 102 Widget0102 3000 14 Widget0014 3000 23-MAR-17 16-OCT-17 23 rows selected. SQL> SQL>
It is a little difficult to see the wood from the trees in this data, so to present it a more visually using the Oracle START WITH CONNECT BY PRIOR query syntax, the way you used to perform hierarchical queries vide infra, the aggregate components of Product A is shown below. I have also added a bit of padding to show some component hierarchy and defaulted values for missing dates with the earliest and maximum dates supported by the current version of Oracle.
SQL> SELECT id, 2 LPAD(' ', 4 * LEVEL - 1) || widgetName widgetName, 3 COALESCE(validFrom, DATE '-4712-1-1') validFrom, 4 COALESCE(validTo, DATE '9999-12-31') validTo 5 FROM bom 6 START WITH widgetName = 'Product A' 7 CONNECT BY PRIOR id = parentID; ID WIDGETNAME VALIDFROM VALIDTO ---------- ---------------------------------------- --------- --------- 100 Product A 01-JAN-12 31-DEC-99 101 Widget0101 01-JAN-12 31-DEC-99 3000 Widget3000 01-JAN-12 31-DEC-99 14 Widget0014 23-MAR-17 16-OCT-17 102 Widget0102 01-JAN-12 31-DEC-99 1000 Product C 01-JAN-12 31-DEC-99 103 Widget0103 13-MAR-17 19-AUG-17 3110 Widget3110 19-MAR-17 15-JAN-18 104 Widget0104 01-JAN-17 28-SEP-17 3218 Widget3218 16-MAR-17 31-DEC-17 3219 Widget3219 14-FEB-17 14-SEP-17 9000 Product B 24-MAY-17 15-JUL-17 109 Widget0109 25-MAY-17 31-DEC-99 110 Widget0110 01-JAN-12 31-DEC-99 2016 Widget2016 01-JAN-12 31-DEC-99 106 Widget0106 03-FEB-17 31-DEC-99 9994 Widget9994 01-JAN-12 31-DEC-99 81 Widget0081 01-JAN-12 31-DEC-99 105 Widget0105 01-JAN-12 31-DEC-99 1025 Widget1025 14-FEB-17 31-DEC-99 107 Widget0107 20-FEB-17 10-NOV-17 108 Widget0108 01-MAY-17 14-AUG-17 8192 Widget8192 01-JAN-12 11-NOV-17 23 rows selected. SQL> SQL>
With the data presented in this hierarchy, it is clear that it maps to the image I have shown above for Products A, B, and C. Summarising, I have painted the BOM picture, and the data format, and that some dates are missing and use system limitation defaults. The big question now is how in SQL do you efficiently traverse this tree to get the most recent valid-from date and the earliest valid-to date. In other words the window of valid-from/valid-to derived from all the valid-from/valid-to dates for any widget in the database in the hierarchy, and the hierarchy may be large (an A380 is estimated to have 4 million parts, and Airbus make more than just this single airplane, and comparing a currywurst to an A380 is, I appreciate, a stretch by even my standards but my point is that this task requires a database and efficient design and it isn’t something that can be done efficiently or reliably by-hand or by eye-balling the data).
I have also been slightly mischievous in the wording of the sentences above, the most recent valid-from date, and the earliest valid-to date. Put another way, if you believe that this data can be derived from a trivial MAX(validFrom) and MIN(validTo) columns of data shown in the hierarchical query above, then you have made a schoolboy error. I will explain.
The problem requires some thought, and the START WITH CONNECT BY PRIOR query syntax, although exceptionally simple and ideal for traversing hierarchical data, is deficient for the requirement. My direct translation of this non-standard and proprietary query syntax, this time without the widget name padding, is shown below. The query syntax is what is known colloquially as the Oracle Recursive WITH (but in the Oracle documentation referred to as recursive subquery refactoring, but in PostgreSQL, SQL Server, and ANSI standards referred to as Common Table Expressions or CTE’s) and initially it does look a little more complicated.
SQL> SQL> SQL> SQL> SQL> WITH dd AS (SELECT DATE '-4712-1-1' minValidFrom, DATE '9999-12-31' maxValidTo FROM DUAL), 2 x (widgetName, childID, parentID, validFrom, validTo) AS 3 ( 4 SELECT t0.widgetName, -- anchor member/start 5 t0.id childID, 6 t0.parentID parentID, 7 COALESCE(t0.validFrom, d0.minValidFrom), 8 COALESCE(t0.validTo, d0.maxValidTo) 9 FROM bom t0 10 CROSS JOIN dd d0 11 WHERE t0.widgetName = 'Product A' 12 UNION ALL 13 SELECT t1.widgetName, -- recursive member 14 t1.id childID, 15 t1.parentID parentID, 16 COALESCE(t1.validFrom, d1.minValidFrom), 17 COALESCE(t1.validTo, d1.maxValidTo) 18 FROM bom t1 19 CROSS JOIN dd d1 20 INNER JOIN x rw ON rw.childID = t1.parentID 21 ) 22 SELECT t2.childID id, 23 t2.widgetName, 24 t2.validFrom, 25 t2.validTo 26 FROM x t2; ID WIDGETNAME VALIDFROM VALIDTO ---------- ---------------- --------- --------- 100 Product A 01-JAN-12 31-DEC-99 2016 Widget2016 01-JAN-12 31-DEC-99 101 Widget0101 01-JAN-12 31-DEC-99 8192 Widget8192 01-JAN-12 11-NOV-17 9994 Widget9994 01-JAN-12 31-DEC-99 106 Widget0106 03-FEB-17 31-DEC-99 9000 Product B 24-MAY-17 15-JUL-17 3000 Widget3000 01-JAN-12 31-DEC-99 81 Widget0081 01-JAN-12 31-DEC-99 107 Widget0107 20-FEB-17 10-NOV-17 108 Widget0108 01-MAY-17 14-AUG-17 110 Widget0110 01-JAN-12 31-DEC-99 109 Widget0109 25-MAY-17 31-DEC-99 1000 Product C 01-JAN-12 31-DEC-99 102 Widget0102 01-JAN-12 31-DEC-99 14 Widget0014 23-MAR-17 16-OCT-17 1025 Widget1025 14-FEB-17 31-DEC-99 105 Widget0105 01-JAN-12 31-DEC-99 3110 Widget3110 19-MAR-17 15-JAN-18 103 Widget0103 13-MAR-17 19-AUG-17 104 Widget0104 01-JAN-17 28-SEP-17 3218 Widget3218 16-MAR-17 31-DEC-17 3219 Widget3219 14-FEB-17 14-SEP-17 23 rows selected. SQL> SQL>
The result set however is the same.
To find the window, the most recent valid-from and valid-to dates, can be obtained using the the MIN and MAX aggregate functions thus:
SQL> SQL> SQL> WITH dd AS (SELECT DATE '-4712-1-1' minValidFrom, DATE '9999-12-31' maxValidTo FROM DUAL), 2 x (widgetName, childID, parentID, validFrom, validTo) AS 3 ( 4 SELECT t0.widgetName, 5 t0.id childID, 6 t0.parentID parentID, 7 COALESCE(t0.validFrom, d0.minValidFrom), 8 COALESCE(t0.validTo, d0.maxValidTo) 9 FROM bom t0 10 CROSS JOIN dd d0 11 WHERE t0.widgetName = 'Product A' 12 UNION ALL 13 SELECT t1.widgetName, 14 t1.id childID, 15 t1.parentID parentID, 16 COALESCE(t1.validFrom, d1.minValidFrom), 17 COALESCE(t1.validTo, d1.maxValidTo) 18 FROM bom t1 19 CROSS JOIN dd d1 20 INNER JOIN x rw ON rw.childID = t1.parentID 21 ) 22 SELECT MAX(t2.validFrom) validFrom, 23 MIN(t2.validTo) validTo 24 FROM x t2; VALIDFROM VALIDTO --------- --------- 25-MAY-17 15-JUL-17 SQL>
The syntax of this SQL query also closely resembles the Standard, and also as implemented in PostgreSQL and Microsoft SQL Server too, so this has to be a good thing.
Here is the schoolboy error I referred to above. Did you notice the error in the data. Junk data would never occur in a corporate database would it? The junk … well notice that the validFrom date for widget0109 is after the validFrom date for Product B. How can Product B have a validFrom date from the date before?
Using the Recursive WITH, appending logic to prevent the query traversing the tree where the parent validTo node was before the child validFrom date can be addressed with a predicate. For example, this scenario can be directly accommodated for with the addition of
WHERE COALESCE(t1.validFrom, d1.minValidFrom) <= rw.validFrom OR rw.validFrom = d1.minValidFrom
so execution of the full code snippet, tweaked to include this additional predicate, would be
SQL> SQL> SQL> SQL> WITH dd AS (SELECT DATE '-4712-1-1' minValidFrom, DATE '9999-12-31' maxValidTo FROM DUAL), 2 x (widgetName, childID, parentID, validFrom, validTo) AS 3 ( 4 SELECT t0.widgetName, 5 t0.id childID, 6 t0.parentID parentID, 7 COALESCE(t0.validFrom, d0.minValidFrom), 8 COALESCE(t0.validTo, d0.maxValidTo) 9 FROM bom t0 10 CROSS JOIN dd d0 11 WHERE t0.widgetName = 'Product A' 12 UNION ALL 13 SELECT t1.widgetName, 14 t1.id childID, 15 t1.parentID parentID, 16 COALESCE(t1.validFrom, d1.minValidFrom), 17 COALESCE(t1.validTo, d1.maxValidTo) 18 FROM bom t1 19 CROSS JOIN dd d1 20 INNER JOIN x rw ON rw.childID = t1.parentID 21 WHERE COALESCE(t1.validFrom, d1.minValidFrom) <= rw.validFrom 22 OR rw.validFrom = d1.minValidFrom 23 ) 24 SELECT MAX(t2.validFrom) validFrom, 25 MIN(t2.validTo) validTo 26 FROM x t2; VALIDFROM VALIDTO --------- --------- 24-MAY-17 15-JUL-17 SQL> SQL>
Note the difference in the valid-from date, from 25-MAY-2017 to 24-MAY-2017.
What I have demonstrated in this blog post is the Oracle Recursive WITH (or CTE vide supra), as a replacement for the Oracle START WITH CONNECT BY PRIOR syntax. I have also demonstrated how to add a predicate to the recursive member (there are two SELECT statements in the CTE, the anchor, a UNION ALL, and then the recursive member) to conditionally limit tree traversal. What I have not done, but it should be obvious but I will need to point this out, is that the START WITH CONNECT BY PRIOR syntax is restricted to traversing hierarchically organised data in a single table. The CTE can be used for multiple tables – there is nothing to limit the recursive or anchor member from containing any reasonable piece of SQL with multiple table joins, should the underlying data be structured in this manner.
I have rabbited on a bit (perhaps I did have those two weißbier after all ….. burp) providing some background to a business problem where this type of hierarchical query could be used, and dressed it up a little to make things less of a dry read.
The code in this blog article was developed and tested against the following Oracle version.
SQL> SQL> set linesize 132 wrap off tab off trunc off SQL> SQL> SELECT * 2 FROM v$version; BANNER CON_ID ---------------------------------------------------------------------------- ---------- Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production 0 PL/SQL Release 12.1.0.2.0 - Production 0 CORE 12.1.0.2.0 Production 0 TNS for HPUX: Version 12.1.0.2.0 - Production 0 NLSRTL Version 12.1.0.2.0 - Production 0 SQL> SQL>
— Published by Mike, 18:45:31 28 May 2017 (CET)
By Month: November 2022, October 2022, August 2022, February 2021, January 2021, December 2020, November 2020, March 2019, September 2018, June 2018, May 2018, April 2018
Apple, C#, Databases, Faircom, General IT Rant, German, Informatics, LINQ, MongoDB, Oracle, Perl, PostgreSQL, SQL, SQL Server, Unit Testing, XML/XSLT
Leave a Reply