Linq to SQL – DataLoadOptions (Previously DataShape)

DataLoadOptions allows you to specify which child entities you automatically want to be loaded with their parent entity. By defining how the data should be loaded, we can reduce the number of requests made to the database to retrieve information and thus improving performance of our application.
Take this code for example:

var query = from c in db.Customers
                  select c;

foreach (var c in query)
{
   Console.WriteLine(“{0}, Phone: {1} has {2} orders”, c.ContactName, c.Phone, c.Orders.Count);
}

This is a simple query which outputs the name, telephone number and the number of orders they have to the console.  There doesn’t look to be anything wrong with this, however if we run SQL Profiler that a large number of queries are being executed.

The query process is that all the customers are loaded into memory, and then for each customer a query is made to the orders table to return how many records they have which is then counted in memory.  Having queries like this will soon put pressure on your database.

This is where DataLoadOptions plays a part.  We can define that when a Customer object is loaded from the database, that all the Orders for that Customer should also be loaded with this code.

DataLoadOptions options = new DataLoadOptions();
options.LoadWith(Customer => Customer.Orders);
db.LoadOptions = options;
var query……

Now if we re-run the application, instead of having a query for each customer to find out the number of orders, we just have a single query being executed which returns the Customer and Order information.

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country],
[t0].[Phone], [t0].[Fax], [t1].[OrderID], [t1].[CustomerID] AS [CustomerID2], [t1].[EmployeeID], [t1].[OrderDate], [t1].[RequiredDate], [t1].[ShippedDate], [t1].[ShipVia],
[t1].[Freight], [t1].[ShipName], [t1].[ShipAddress], [t1].[ShipCity], [t1].[ShipRegion], [t1].[ShipPostalCode], [t1].[ShipCountry], (
    SELECT COUNT(*)
    FROM [dbo].[Orders] AS [t2]
    WHERE [t2].[CustomerID] = [t0].[CustomerID]
    ) AS [count]
FROM [dbo].[Customers] AS [t0]
LEFT OUTER JOIN [dbo].[Orders] AS [t1] ON [t1].[CustomerID] = [t0].[CustomerID]
ORDER BY [t0].[CustomerID], [t1].[OrderID]

Changing the foreach loop to output the order information will still only result in a single query being executed on the server.

foreach (var c in query)
{
    Console.WriteLine(“{0}, Phone: {1}”, c.ContactName, c.Phone);
    foreach (var o in c.Orders)
    {
        Console.WriteLine(“OrderID: {0}”, o.OrderID);
    }
}

However, it doesn’t solve all of our problems.  If we wanted a query which provided details on the Order Lines within the ProductID being outputted, we could write something like this:  

foreach (var c in query)
{
    Console.WriteLine(“{0}, Phone: {1}”, c.ContactName, c.Phone);
    foreach (var o in c.Orders)
    {
        Console.WriteLine(“Order Details for OrderID: {0} with {1} lines”, o.OrderID, o.Order_Details.Count);

        foreach (var od in o.Order_Details)
        {
            Console.WriteLine(“Product ID: {0}”, od.ProductID);
        }
    }
}

You would have thought that by adding the following code we would be able to limit the number of queries:

options.LoadWith(Order => Order.Order_Details);

Sadly, this is not the case and will still cause a query to be executed for each customer, in fact it is actually for each order. According to Michael Pizzo on the product team, this is by design with this explanation:

In LINQ to SQL, in order to reduce the complexity and amount of data returned by the join query required to do span, only one association in any given query is actually spanned in. LINQ to SQL picks the deepest level association in the query to do the join in order to minimize the number of queries generated.  So, in this case, by adding a span between orders and order_details, you are basically “hiding” any span of orders.

DataLoadOptions can improve the performance of your application and something which you should remember when writing Linq to SQL queries.  However, if we are going more than one layer deep, you are still going to have a huge number of queries being executed.  Remember to run SQL Profiler to make sure your application isn’t going to take out your production server(s).

Technorati Tags: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *