The Evolution Of LINQ And Its Impact On The Design Of C#

was a huge fan of the Connections series, hosted by James Burke, when it aired on the Discovery Channel. Its basic premise: how seemingly unrelated discoveries influenced other discoveries, which ultimately led to some modern-day convenience. The moral, if you will, is that no advancement is made in isolation. Not surprisingly, the same is true for Language Integrated Query (LINQ).

In simple terms, LINQ is a series of language extensions that supports data querying in a type-safe way; it will be released with the next version Visual Studio, code-named “Orcas.” The data to be queried can take the form of XML (LINQ to XML), databases (LINQ-enabled ADO.NET, which includes LINQ to SQL, LINQ to Dataset and LINQ to Entities), objects (LINQ to Objects), and so on. The LINQ architecture is shown in Figure 1.

Figure 1 LINQ Architecture
Figure 1 LINQ Architecture (Click the image for a smaller view)

Figure 1 LINQ Architecture
Figure 1 LINQ Architecture (Click the image for a larger view)

Let’s look at some code. A sample LINQ query in the upcoming “Orcas” version of C# might look like:

var overdrawnQuery = from account in db.Accounts                      where account.Balance < 0                      select new { account.Name, account.Address };

When the results of this query are iterated over using foreach, each element returned would consist of a name and address of an account that has a balance less than 0.

It’s immediately obvious from the sample above that the syntax is like SQL. Several years ago, Anders Hejlsberg (chief designer of C#) and Peter Golde thought of extending C# to better integrate data querying. Peter, who was the C# compiler development lead at the time, was investigating the possibility of making the C# compiler extensible, specifically to support add-ins that could verify the syntax of domain-specific languages like SQL. Anders, on the other hand, was conceiving a deeper, more specific level of integration. He was thinking about a set of “sequence operators” that would operate on any collection that implemented IEnumerable, as well as remote queries for types that implemented IQueryable. Ultimately, the sequence operator idea gained the most support, and in early 2004 Anders submitted a paper about the idea to Bill Gates’s Thinkweek. The feedback was overwhelmingly positive. In the early stages of the design, a simple query had the following syntax:

sequence<Customer> locals = customers.where(ZipCode == 98112);

Sequence, in this case, was an alias for IEnumerable<T>, and the word “where” was a special operator understood by the compiler. The implementation of the where operator was a normal C# static method that took in a predicate delegate (that is, a delegate of the form bool Pred<T>(T item)). The idea was for the compiler to have special knowledge about the operator. This would allow the compiler to correctly call the static method and create the code to hook up the delegate to the expression.

Let’s suppose that the example above would be the ideal syntax for a query in C#. What would this query look like in C# 2.0, without any language extensions?

IEnumerable<Customer> locals = EnumerableExtensions.Where(customers,                                                     delegate(Customer c)         {             return c.ZipCode == 98112;         });

This code is frightfully verbose, and worse, it requires significant digging to find the relevant filter (ZipCode == 98112). And this example is simple; imagine how much more unreadable this would be with several filters, projections, and so forth. The root of the verbosity is the syntax required for anonymous methods. In the ideal query, the expression would require nothing but the expression to be evaluated. The compiler would then attempt to infer the context; for example, that ZipCode was really referring to the ZipCode defined on Customer. How to fix this problem? Hardcoding the knowledge of specific operators into the language didn’t sit well with the language design team, so they started looking for an alternate syntax for anonymous methods. They wanted it to be extremely concise, and yet not necessarily require more knowledge than the compiler currently needed for anonymous methods. Ultimately they devised lambda expressions.