Oracle SQL – Querying columns containing CSV ID’s
Data modellers, developers, and purists all hate it when multiple ID values are stored in a single database column value.
- Data modellers hate it as there are no Foreign Key (FK) constraints to some underlying table with the key being the ID. After all, how can these professionals use a UI tool to model data by clicking “next”, “next”, “next” and then having a pretty ERD autogenerated if the FK relationship does not preexist?
- Purists hate it as the underlying data is not atomic and therefore not even in First Normal Form (1NF). That really grates them … they learned that to not honour 1NF in their first relational theory class is a sin, but I suspect too that they learned about populating some sort of LOB or XMLTYPE with XML data is OK, despite it not being atomic, and didn’t raise an eyebrow as they didn’t think for themselves
- Developers hate it as there are no easy joins between ID values on the underlying tables, and to tease out these ID values requires more than mediocre development skills. Of course, deconvoluting joins in this manner may impact on performance too, considerably.
Regardless of the underlying reason why lists of ID values may be found in a single column of data, and the reasons may be valid too, in many performant production systems columns of data containing ID values to some other lookup table do exist. Yes, a mapping table seems the obvious approach, but again, there might be valid reasons for the approach.
About a year ago there was tearoom chatter in my current client to upgrade their oraas versions from 11g to a more recent version of Oracle. Oracle 19c was eventually rolled out. As part of the risk and impact analysis, and out of general interest, during a bit of down- and weekend-time I familiarised myself with the new language features on offer. Put another way, I was interested in new feature, changes, bug fixes and so on solely from the perspective of a developer.
One of the things that really stuck out for me was to be able to have an inline procedure or function as part of the WITH clause (more formally a CTE Expression), in other words a transient method that I can write as I would write normal PL/SQL, that doesn’t already have to be in the schema requiring additional permissions or roles. A good summary of this now not so new functionality is described on Oracle-Base.
Today at work I had the requirement to convert a single CSV column of data containing lookup ID’s to a single column of data containing the underlying lookup values. It was required for ad hoc report and was looked upon by me as a disposable use-once piece of SQL. As part of my extracurricular play with this new Oracle functionality, I already had a test/play script and all I needed to do was to tart it up a little. In the process I thought I would also write this small blog article up to further solidify my understanding and the rationale behind the approach. Further, and this is not really surprising, I find that a lot of software development nowadays is done using the software development by search engine paradigm. Basically this paradigm is typing in a few keywords into a search engine, picking the top or near top example in the hits where someone else has touted an approach, and then adopting (ie. copy/paste) this approach too. Depending on the keywords chosen, the search engine will serve up the same old crap that in turn gets rolled into new systems and the cycle repeats itself over and over. Yes, software development can be quite demoralising at times. Hopefully the search engine will serve up my approach …. so here goes.
Anyway, I have painted the picture, so now on with a little technical implementation detail.
Here is a contrived table, some employees, and some CSV list of ID’s for ISO country code.
SQL> select * from employees;
ID NAME RESIDENTINCOUNTRIES
---------- -------------------- --------------------
2 Christoph 49
3 Heiko 49
1 Mike 64,44,49,41
As I write above, do not judge the design, I am just writing that this design exists for a number of reasons and sometimes you just have to suck it up.
Historically to separate out the ID’s values would require some complex table multiset cast, a home-rolled PSM, or worse. The crux of this blog article is to demonstrate that these custom PSM’s are not now necessary and the CTE can use out-of-the box Oracle functionality (having the added bonus of not having to reach out to ever cooperative production DBA’s justifying why you need atypical PSM execute privileges).
Should the ID’s of 64, 49 etc. represent the ID values from some country lookup table, and that lookup table name is known, then the query can be performed as such:
SQL> with function inlineCsvLookup(list in varchar2) return varchar2 is
2 csv varchar2(4000) := null;
4 select listagg(c.countryName, ',')
5 within group (order by c.countryName asc)
6 into csv
7 from (
8 select to_number(regexp_substr(list, '[^,]+', 1, level)) id
9 from dual
10 connect by regexp_substr(list, '[^,]+', 1, level) is not null
11 ) t,
12 countries c
13 where t.id = c.id;
14 return csv;
16 select e.name, inlineCsvLookup(e.residentInCountries) residentInCountries
17 from employees e;
Mike Germany,New Zealand,Switzerland,United Kingdom
There is a lot of scope for improvement here, and the implementation will clearly be tailored for every specific system. Regardless this remains an innocuous way of deconvoluting CSV ID lists over the historical approaches. I will leave it to the reader to search out these historical approaches; what you will find is both frightening and shocking so ensure you are sitting down first!
ps, I am very away that this Oracle CONNECT BY hierarchical query proprietary syntax has been superseded. I have written a blog article on this previously. I have opted to use the CONNECT BY syntax herein however as it is a lot less wordy thereby not distracting from my point.
Published by Mike, 20:18:46 16 Feb 2021 (GMT)