Yesterday the system was running fun. Today it is slow. What changed? How many times has this damn question been asked? The answer is of course nothing, a little, or a lot, depending on your perspective, and especially when considering systems where the rate determining step involves a database.
Imagine some system where this month, the response time for aggregation of data for a critical report is taking 2hrs. Imagine for next month, the projection is 2hrs 20mins, the month after that 2hrs 40mins, then 3hrs, and so on. A finger in the air estimate is an additional 20mins/month due (due to an ever increasing amount of data populating the database). At some point, the time taken to aggregate the data for the report would become unbearable, impact on BAU processes, the phones would start ringing, and Directors would start banging their fists (or perhaps their head) on the desk.
You could plot the performance metrics on a graph, and extrapolate things out for the purposes of planning in a task to address the performance issues. I have contrived some data, and done just that, to support my thesis.
So
Any developer worth their salt, or manager worth the job title for that matter (this is merely SLA monitoring), would plan in some technical task to address performance degradation prior to month 31. Put another way, today is month 11, month 31 is therefore 20 months away, so there is time to address something else and more urgent now. This is the common sense approach, isn’t it? Or basic task prioritisation?
Enter the Oracle Cost Based Optimiser (CBO). The CBO is the crucial component of the Oracle (O)RDBMS for determining which of the many execution strategies should executed. The deciding factor in execution strategy, or plan, is cost, where cost means minimal system resource use. This is all an alternative way of saying a-function-of time (I dumb it down a bit, but this is the essence of the approach). Worded differently, there are likely many possible execution strategies and the CBO determines and selects the one that will execute the fastest, and this is the one that is used to retrieve the data.
The source of truth used by the CBO for determining the optimal execution strategy/path is some statistics; hardware performance metrics (single/multiblock read times), CPU speeds, and the distribution of the underlying data. From this information, for each SQL query, the CBO determines the strategy on whether, for example, if nested loops are used, which table is accessed first, or whether to use indexes (and if so, which ones) or perform full table scans, and so on.
As the underlying data changes, it is quite feasible that the execution strategy/plan changes. I refer to this as Execution Plan Lability rather than Execution Plan Stability – the theme however is that Oracle will choose a different execution strategy based on the statistics for the underlying data distribution at that time. Of course, if an alternate execution plan does produce a lower cost, and does execute faster, this is a good thing. On many many occasions it does not however (and yes, I want my cake and eat it too). On many occasions, for no apparent reason (likely a small edge-case change in the underlying data distribution, and/or a model or scenario not accommodated for within the CBO, or a bug), a totally inappropriate execution plan may be chosen once the database statistics are refreshed (this may occur nightly). When this happens, execution performance may dramatically and detrimentally change, and seemingly without reason. This scenario is depicted in orange in the graph above. In this scenario, without warning, around month 20 the system is hit by significant performance issues due to the lability of the execution plan. This is a terrible surprise – after all, how could any organisation plan for this scenario? The answer is it cannot, and the incontrovertible fact is that this is an unpredictable system. Furthermore, to rub salt into the wound, the system has been designed to be unpredictable. Is it really defensible to retain a system that exhibits such behaviour?
No! That is my answer.
In summary
Obviously I have a bee in my bonnet about Oracle CBO (and others including Microsoft SQL Server) DBMS implementations for determining execution path strategy, and more pointedly, execution plan lability. Am I justified in whinging about this approach while not offering an alternative? Perhaps not, but there remains nothing wrong with voicing opinion on why the current model for determining execution path strategy is broken.
— Published by Mike, 13:36:12 09 May 2017 (CET)
By Month: November 2022, October 2022, August 2022, February 2021, January 2021, December 2020, November 2020, March 2019, September 2018, June 2018, May 2018, April 2018
Apple, C#, Databases, Faircom, General IT Rant, German, Informatics, LINQ, MongoDB, Oracle, Perl, PostgreSQL, SQL, SQL Server, Unit Testing, XML/XSLT
Leave a Reply