‘SELECT’ing a distinct set of values from CSV :- LINQ and Regex

Over the past couple of years, I have found myself refactoring a lot of code written by others, spaghetti code, and often written by informed and able people. My eyes roll each time I see the same old pattern where a simple lexing operation, usually extracting bits of strings out from some delimited format (tab, CSV) etc, is performed with a few loops involving variables i and j, some string substring function, and … well I am even boring myself writing this down.

Each time I see this pattern, I come to the same conclusion: the solution is and will always be a bit of LINQ with a regular expression smuggled in there somewhere.

I decided, this time, to document my code on my new blog, rather than include the code in some project hidden away in Subversion.

The code excerpt below shows concisely how a collate a distinct set of trimmed words from a CSV delimited string, with minimal C#.

  String ss = "the quick brown fox,, , jumps over the lazy dog dog  dog  ";

  var ds = from f in Regex.Split(ss, @"\W|_")
    where f != String.Empty
      group f by f into gf
       select gf.Key;
 
   foreach (var s in ds)
     Console.WriteLine(s);

I have included a partial screenshot inline below, showing the runtime output as unscripted developer proof of what the code does. I would struggle to write this type of algorithm any other way now, and exhort others to use this type of code construct too.

LINQSelectDistinctFromCSV

 

Now isn’t that just so simple and minimal, clear to understand, and well on the way to codebase that is maintainable !

— Published by Mike, 10:26:20 01 August 2016

Leave a Reply