LINQ: Expression or Query syntax? They’re equivalent aren’t they?

Vocal members of the development community regularly spout their opinions. What they say and proffer has to be taken as fact, doesn’t it, especially as hoards of other vocal, spam, and often copy-cats repeat these viewpoints regurgitating the same thing ad nauseam just using different verbiage?  Eventually we reach a point where putting your head above the parapet and positing an alternative view is unthinkable.

The bottom line is that accepted truth is often wrong and goes unchallenged because of the vast volume of repeated tripe is initially presented convincingly or by someone, or supported by someone, with gravitas.

One of these arguments presented with such dogma is that LINQ expression or query syntax are equivalent.

So, firstly and very tersely, what is the difference between LINQ expression and query syntaxes? Here is a small program that demonstrates one of the most simple joins, an inner join, between two enumerable sets. Read this code, and this question answers itself.

using System.Linq;

namespace ConsoleApp952
{
    class Program
    {
        static void Main(string[] args)
        {
            int[] set1 = { 1, 2, 3 };
            int[] set2 = { 3, 4 };

            //query syntax
            var join1 = from digit1 in set1
                        from digit2 in set2
                        join digit2 in set2 on digit1 equals digit2
                        select digit2;

            //expression syntax
            var join2 = set1.Intersect(set2);
        }
    }
}

 

The LINQ joins above are likely convergent and by this I mean that both LINQ joins are compiled down to identical binary IL code. I have not checked it. There might be some differences between compilers for which set is evaluated first during the inner join, .NET or .NET Core versions, platform dependencies, the way deferred execution works etc. The premise is however that execution performance and all other benchmarks from both LINQ joins are identical because the IL code generated is identical. This is the fundamental argument used to qualify the statement that LINQ expression and query syntaxes are identical, and also the argument is presented over and over as though this is the end of the story.

The story does not stop here.

The fundamental difference between the two joins is that

  • one expression syntax uses an extension method, albeit in the System.Linq namespace (.Intersect) and is imperative in nature, ie. in the expression syntax, elements are piped and the match, the join, is performed against elements in the second set in that order
  • the query syntax is declarative and the order of execution nor, implicitly, join algorithm is specified.

Should some new optimization be introduced in the expression syntax, then the internal transpilation and/or subsequent compilation of the query expression as currently implemented would likely piggy back on this optimization for free and again the IL code and all else would be identical.

Should some new optimization be introduced in the query syntax, the same is not true. The query syntax is declarative; the way the compiler chooses to implement the join just happens, for now, to be to compile down the code to the same IL. Imagine however that the .NET compiler team decide to introduce an efficiency where the currently unspecified flow and algorithm was to first sort each set, then probe each element in larger set with an element in the smaller set, …. in this case the runtime execution plan would be very different despite the query result being the same. The efficiency could be runtime time too … the way that SQL optimisers identify the ‘driving table’ at runtime in database theory, perhaps augmented by query or table hinting. Perhaps no sorting is even performed and the join algorithm is just tree traversal than ordered set probing (this is how it can be done in Oracle, eg. see this post by Jonathan Lewis and my response in 2018). The declarative query syntax lends itself to these efficiency enhancements without change and the theory is already proven.

OK, I do not have a crystal ball. I do not know the aspiration of the .NET architects and whether an optimization or extension to the limited LINQ query expression is on the cards, but in summary the two query syntaxes are not the same. Where I can, I write my LINQ using the query syntax. It even has better readability and anyone that has written even a line of SQL before, the learning curve is lower. The syntax also future proofs me a bit should the .NET compiler team choose alternative LINQ query execution optimizations and move away from common IL code used for both query and extension syntaxes. I do not think this will happen, but again my view is that the two syntaxes are perceived as equivalent is not the whole story.

Just my tuppence  …..

— Published by Mike, Samstag 22:21:34 08 Oktober 2022 (CEST)

Leave a Reply