1NF and preservation of database result-set ordering through a database driver
According to the basics of relational theory, and specifically First Normal Form (1NF), a relation is unordered. Put another way,
- in theory there is no left to right ordering of relation attributes, or top to bottom or other ordering of tuples
- in practice there is no left to right ordering of table columns, or top to bottom or other ordering of table rows.
The relational database logical schema, an implementation of relational theory, embodies the base tables. The database external schema may expose the base tables, or perhaps views, or perhaps even an API where result-sets are returned as explicit cursors.
Of course middleware or front-end applications don’t communicate with the database directory, or more correctly the Database Management System (DBMS) directly, but through a Database Driver to the various objects exposed in the database external schema. The Driver, that might be something ubiquitous such as ODBC, but more often a native driver, does the end-to-end communication abstracting the intimate detail from the process, and provides a mechanism for interacting with database PSM’s, tables, views, mapping cursors etc.
Here’s a question: Is the Driver mandated by ISO/ANSI/BSI or other recognised Standards to preserve column ordering when a query is executed.
Here’s an answer: No
To add a bit of verbiage around this terse question and answer, if
SELECT a, b, c, d, e FROM somewhere
where provided to the Driver, would the application process or developer really care if the result set returned had column ordering b, d, e, c, a, or even c, a, b, d, e, or some other permutation? The underlying theory is clear – there shouldn’t be any sort of column ordering (attribute ordering). The only time a developer may raise an eyebrow is if column content were accessed by offset, ie. row n column 0 was expected to contain column a data, column 1 was expected to contain column b data, and so on. Nowadays, accepted practice is to retrieve data by name, eg. r[“a”], where “a” is the named column, or r[“b”] where “b” is the name column, …., under most circumstances. Accessing columns by name rather than offset aids source code maintenance, and insulates developers from column reordering, dropping, or insertion of new column content, and more.
So far, I have not yet written anything contentious or touched on anything even surprising.
Here’s a second and very similar question: Is the Driver mandated by ISO/ANSI/BSI or other Standards to preserve row ordering when a query is executed.
Here’s an answer: No
To put this question another way as I have done above, if
SELECT a, b, c, d, e FROM somewhere ORDER BY a ASC
where provided to the Driver, would the application process or developer really care if the result set returned was not ordered by “a” ASC, or unordered?
The answer should be no.
Relational theory states there is no ordering, the ORDER BY clause provides a type of override within the DBMS, but the Driver does not have to honour the ordering to invoking processes.
In a recent client project, I encountered a niche database Driver that actually partitioned the source query and made two parallel executions against the same database (with different predicates appended, for load balancing purposes). The result-set was the same, but the result-set was sorted/ordered differently almost every time the query was executed.
So in summary, database Drivers are not mandated to preserve row ordering by ISO/ANSI/BSI Standards, 1NF is unambiguous on the issue, but database implementations and the underlying theory differ. Almost all Drivers offered by mainstream vendors preserve result-set ordering by default, likely as they are just a conduit passing through the database result-set without much ado, but it isn’t mandated.
To summarise from a different viewpoint, a viewpoint I encountered on a recent client project, if some middeware or front-end process is assuming that the result-set returned from a database invocation should be ordered, an explicit sorting/ordering process should also be performed on the object collection post hoc. If the result-set returned by the database method invocation is already ordered (the ORDER BY clause is honoured by the Driver or this is one of those 999/1000 times where the result-set comes back ordered), no result-set reordering will be necessarily and the extra time will be the cost of an O(N) pass of the object collection. It’s also no big deal to perform this quick result-set sort either – if some ORM implementation is providing an object collection, that collection can be sorted in a trivial one-liner using technologies such as LINQ. The days of writing your own non-performant buggy sorting algorithm are well and truly gone.
— Published by Mike, 18:47:13 03 September 2016