Linq to SQL – Mapping Tables to Objects

In this post I am going to cover how Linq to Sql deals with the mappings between the objects in your assembly and the database.

There are two ways in which Linq to SQL handles the mapping, either AttributeMappingSource or XmlMappingSource.  By default, Linq uses AttributeMappingSource where all the information which links tables and columns to classes and properties are stored as attributes within the DataContext.

For example, this code links the Categories object to dbo.Categories table.

[Table(Name=”dbo.Categories”)]
public partial class Categories : INotifyPropertyChanging, INotifyPropertyChanged

While this code links the CategoryID property to the column, also called CategoryID.

[Column(Storage=”_CategoryID”, AutoSync=AutoSync.OnInsert, DbType=”Int NOT NULL IDENTITY”, IsPrimaryKey=true, IsDbGenerated=true)]
public int CategoryID

There are a few advantages to having attribute saved within the code. Reading the code is easier as everything is in a single place and the compiler can verify that everything is correct.

The other choice is to use an external file which contains all the database information as XML.  This is then combined with a DataContext (which doesn’t have attributes) in order to be able to query the database.

The external mapping file can be generated by SQLMetal as I discussed in my previous post.  The following command would generate the file.

SqlMetal /server:. /database:northwind /map:”%~dp0Northwind.map”

The contents of the map file would look something like this.


 
   
   
   
   
   
 

We can then generate the DataContext code and tell it to use the external file by having both map and code as options, like below.

SqlMetal /server:. /database:northwind /map:”%~dp0Northwind.map” /code:”%~dp0Northwind.cs”

The datacontext would then just be the code.

public partial class Categories : INotifyPropertyChanging, INotifyPropertyChanged

and

public int CategoryID

Like with attributes, there are also advantages to be had when using an external file.  For one, you can make schema changes without having to recompile the application – but you could only make minor changes without breaking the relationship.  Another advantage is that the same mapping file could be used by multiple DataContext objects.   One possible disadvantage could be the two becoming out of sync, which would result in runtime errors.

However, to use this in our application we need to specify that we want to use the external mapping source, we could just modify the DataContext constructors to do this for us.

Northwind db = new Northwind(LinqConsole.Properties.Settings.Default.NorthwindConnectionString, XmlMappingSource.FromXml(“Northwind.map”));

So which one to use?  Well it really depends on your requirements, for the most part using either one will be fine.

Finally, the instead of SQLMetal connecting to the database directly to generate the mapping and datacontext, it can gain all the information from the dbml file.  The dbml file contains all the database metadata and everything SQLMetal requires to generate the code and mappings.

To generate the dbml you use the command.

SqlMetal /server:. /Database:Northwind /dbml:Northwind.dbml

Which creates an XML file containing data like this.  It looks similar to the mapping file, however is slightly different.


 
   
   
   
   
   
 

Then to use the dbml to generate the mapping file and code instead of a database connection you would use this command.

SqlMetal /map:”%~dp0Northwind.map” /code:”%~dp0Northwind.cs” Northwind.dbml

Technorati Tags: ,

The power of SQLMetal

During my time with Linq, I have created all of my DataContext’s using the designer in Visual Studio 2008.  I’ve been aware of SQLMetal but always thought it was just a command line tool which didn’t offer much – after looking at it today, I am wrong.  SqlMetal is very powerful and cool!

To start with, SqlMetal can generate a DataContext for your entire database with a single command.  This is very useful if you have a large number of tables in your system, as dragging and dropping them onto the designer would have got boring very quickly. By entering this command we can have our Northwind datacontext created for us and saved to the file NorthwindDataContext.cs.

SqlMetal /server:. /database:Northwind /code:NorthwindDataContext.cs

We can then include the class within our project and use it as if the designer had created it.  However, it would have been nice if it accepted an array of tables to exclude, or only include, during the creation process.  Also it also doesn’t create an overload for the constructor so it uses the App.config connection string like the designer does.

We can also get SqlMetal to include all of our views, functions and sprocs

SqlMetal /server:. /database:Northwind /code:NorthwindDataContext.cs /views /functions /sprocs

If it cannot extract an item from the database into code, then it will continue with the process and report the error at the end.  In my case I had this single error:

warning SQM1014: Unable to extract stored procedure ‘dbo.sp_upgraddiagrams’ from SqlServer. Invalid object name ‘dbo.dtproperties’.

I think that’s a really powerful feature and makes life a lot simpler than manually creating everything.

In the above example, we are asking SQLMetal to generate our DataContext as code. However, SQLMetal can also output the dbml (/dbml:file.dbml) or mapping (/map:file.xml) file for the database.

By default, the Linq to SQL designer will pluralise all of the table names (Orders => Order).  SQLMetal doesn’t do this by default, however by adding /pluralize you can force the changes.  I’ve spoken about this before.

You can define the language which the code should be generated using the /language: option and you can set the namespace for which the DataContext should be part of by using the /namespace: property.

If you wanted a different name for your datacontext class then you could use the /context: option, by default it uses the same name as the database.

Onto my favourite two options. The entitybase option (/entitybase:) allows you to specify a class or interface which all the entities in the datacontext must inherit from.

SqlMetal /server:. /database:northwind /code:”%~dp0Northwind.cs” /entitybase:MyNamespace.ITable

The code generated would then look like this:

[Table(Name=”dbo.Categories”)]
public partial class Categories : MyNamespace.ITable, INotifyPropertyChanging, INotifyPropertyChanged

This is a very useful feature, as discussed in the previous post we should place all custom code in a partial class. I don’t think interfaces are best used in this situation as you would need to implement it on all the entities, however it could be useful for base classes.

Finally, /serialization:Unidirectional option adds a [DataContract()] attribute to the classes and [DataMember(Order=#)] to the properties.  These two attributes allow for objects to be serialised which means they can be used with WCF.

Ben’s top tip

Use a classic batch file (.bat) to store your SQLMetal command. Then when you need to regenerate the DataContext you will not end up using different settings by mistake.  This batch file can then be included within your source control for the rest of the team to use.  Use the “%~dp0″ variable to ensure it runs from its current location and not the default command line location, generally it should be your project folder.

Given this command in a batch file, it will convert the %~dp0 to the path where the path file is located.

SqlMetal /server:. /database:northwind /code:”%~dp0Northwind.cs”

Becomes

SqlMetal /server:. /database:northwind /code:”E:PATHTOMYPROJECTSNorthwind.cs”

Technorati Tags: , ,

Effects database changes have on Linq to SQL

In this post I’m going to discuss the effects of changing the underlying database table has on Linq to SQL and to make people aware of the potential issues as quite a few people seem to be asking about this.  Changes to the underlying database structure is a problem with any application and data access, be it ADO.net and Datasets, SubSonic, Linq or any other approach.  If the data structure is changed without your application being updated or knowing about the change then you will run into problems.

In this application, I will be working with the following entity.

Table

Writing queries against this isn’t a problem.   For example, if we wanted to write everything out to the console we could do this:

var query = from p in db.Persons
                   select p;

foreach (var person in query)
{

    Console.WriteLine(“{0} | {1} | {2} | {3}”,
                                 person.id, person.Name, person.PostCode, person.Phone);
}

This could result in:

1 | Bob | AAA | 123
2 | Jack | BBB | 0
3 | Gill | MB | 12346

Adding additional columns

If we added an additional column, say email,  onto our table then the above could would still work fine however we would never be able to access that additional column until we regenerated our DataContext (see below).

If we inserted a new person then this also wouldn’t stop our application from working, unless the new column was set to be NOT NULL, however we wouldn’t be able to fill in that information.

Person newP = new Person();
newP.Name = “Test”;
newP.PostCode = “TTTT”;
newP.Phone = new Random().Next();
db.Persons.Add(newP);
db.SubmitChanges();

Changing column data types

Changing the types in our database could cause our application to stop functioning. If we change the int to a bigint in the table design then our query would not longer run because Linq couldn’t convert a long to an int.  If we did something like change a nvarchar(50) to a nvarchar(MAX) then this wouldn’t cause a problem as Linq treats them both as a string.

Removing columns

Removing columns is when most problems will occur. If I removed the phone column from the table and execute the query then I would receive a SqlException – Invalid column name ‘Phone’.  If our query was like below, then it wouldn’t cause an exception as Phone would never be called.

var query = from p in db.Persons
                        select new { p.id, p.Name, p.PostCode };

However, when inserting data we will always get a SqlException even if the column was not populated as Linq will try and insert null into the column.

Regenerating our DataContext

So, if you are changing the underlying table, updating the DataContext in your system is also required.  In Visual Studio 2008 there are only two ways to update a DataContext.  Either, regenerate the entire datacontext using SQLMetal, or remove the table from the datacontext using the designer and then insert it again.  Shame there isn’t a button called refresh on the designer.

Thankfully, all the objects are partial, so you can store all your custom logic in a separate class which will not be affected by this.  However, if you go against this and write your logic in the actual DataContext.designer.cs class then when you recreate it, you will lose these changes.

Technorati Tags: ,

Converting IEnumerable to IEnumerable

This morning on the forum there was a question regarding nullable types (ie decimal?) and how to convert an IEnumerable collection containing nullable types to one containing the equivalent non-nullable type (decimal).  This raised an interesting question, how do you convert a collection containing one type of object, into a collection containing another type of object, be it nullable>nonnullable or otherwise.

To start with, I looked at the extension methods of IEnumerable, and in particularly Cast<>.  However, if the value is null then we are going to have to handle it differently which cast doesn’t have the ability to do.  So no luck there and there didn’t seem any other methods.

I then looked at the methods which List<> contains, this implements IEnumerable<> so its part of the same type.  List<> does have a method which allows for this requirement, called ConvertAll() which takes a Converter object, which in turn takes a method name which is called for each object in the list to handle the conversion.  This then allowed me to write this code:

List nonNull;
List nullable = new List();

nullable.Add(null);
nullable.Add(1);

nonNull = nullable.ConvertAll(new Converter(ConvertToNonNull));

foreach (var i in nonNull)
{
   Console.WriteLine(i);   
}

public static decimal ConvertToNonNull(decimal? nullValue)
{
    return nullValue.GetValueOrDefault(0);
}

ConvertToNonNull is called for every item in the list, taking advantage of generics to keep everything strongly typed.  The GetValueOrDefault() is available to use on all nullable types, you simple give it the value as a parameter you want it to return if the value is null.

The problem is that you have to be working in terms of List<> and not IEnumerable<>.  The solution, create your own extension method for IEnumerable<>.

public static class IEnumerableExtension
{
     public static IEnumerable ConvertAll(this IEnumerable collection, Converter converter)
     {
         if (converter == null)
             throw new ArgumentNullException(“converter”);

         List list = new List();

         foreach (T i in collection)
         {
             list.Add(converter(i));
         }

         return list;
     }
}

The extension method above attaches itself to IEnumerable<> and takes a Converter as a parameter – just the same as List does.  I then check to make sure its not null,  create a new internal list of the outputting type (need a way to hold the converted items internally).  Then for each item in the source collection, I call the converter method.  Finally I return the list as a return type of IEnumerable<>.  Here’s the calling code.

IEnumerable notNull;
IEnumerable INull;

INull = nullable; //Original list.

notNull = INull.ConvertAll(new Converter(ConvertToNonNull)); //My new extension method

foreach (decimal i in notNull)
{
    Console.WriteLine(i);
}

One of the nice things about extension methods is that if a method, with the same name and parameter list, is defined locally within the class then it overrides the extension implementation.  So, the List<>.ConvertAll() functionality is not affected.

Hope you find a use for this.

Technorati Tags: ,

Linq to SQL – DataLoadOptions – Does it improve performance?

Following on from my previous post on DataLoadOptions, I decided to do a quick test to see if it does actually improve performance, even with its limitations.

If we take this query, with DataLoadOptions only a single query is executed, while with it not set a query is executed per order.

DataLoadOptions options = new DataLoadOptions();
options.LoadWith(Customer => Customer.Orders);
db.LoadOptions = options;

var query = from c in db.Customers
                        select c;

            foreach (var c in query)
            {
                Console.WriteLine(“{0}, Phone: {1} with {2} orders”, c.ContactName, c.Phone, c.Orders.Count);
            }

Without using DataLoadOptions, it took 1,739,904 Ticks to execute, while with using them it took 1,487,722 Ticks. An improvement, but nothing ultra special.

If we consider this query, which if you have read the previous, causes a query to be executed for each order even with DataLoadOptions on:

DataLoadOptions options = new DataLoadOptions();
options.LoadWith(Customer => Customer.Orders); options.LoadWith(Order => Order.Order_Details);
db.LoadOptions = options;

var query = from c in db.Customers
                     select c;

         foreach (var c in query)
         {
             Console.WriteLine(“{0}, Phone: {1}”, c.ContactName, c.Phone);
             foreach (var o in c.Orders)
             {
                 Console.WriteLine(“Order Details for OrderID: {0} with {1} lines”, o.OrderID, o.Order_Details.Count);

                 foreach (var od in o.Order_Details)
                 {
                     Console.WriteLine(“Product ID: {0}”, od.ProductID);
                 }
             }
         }

Without using DataLoadOptions, it took 10,003,725 Ticks to execute.  While using DataLoadOptions, taking into account the time to setup the options, it took 6,536,320 Ticks.  Much clearer performance improvements when a larger number of queries are being executed.

There is definitely performance improvements to be had even with it not being able to load all the data in at the same time and so should be used when executing these kinds of queries.

However, what happens if we are not using the Order/Order Details information and simply just using Customer data?  Executing our first query again but without c.Orders.Count, with options set it takes 1,609,867 Ticks, without it takes 1,174,217 Ticks.  So, if your not using them, then there is additional overhead to be aware of.

Technorati Tags: , ,

Linq to SQL – DataLoadOptions (Previously DataShape)

DataLoadOptions allows you to specify which child entities you automatically want to be loaded with their parent entity. By defining how the data should be loaded, we can reduce the number of requests made to the database to retrieve information and thus improving performance of our application.
Take this code for example:

var query = from c in db.Customers
                  select c;

foreach (var c in query)
{
   Console.WriteLine(“{0}, Phone: {1} has {2} orders”, c.ContactName, c.Phone, c.Orders.Count);
}

This is a simple query which outputs the name, telephone number and the number of orders they have to the console.  There doesn’t look to be anything wrong with this, however if we run SQL Profiler that a large number of queries are being executed.

The query process is that all the customers are loaded into memory, and then for each customer a query is made to the orders table to return how many records they have which is then counted in memory.  Having queries like this will soon put pressure on your database.

This is where DataLoadOptions plays a part.  We can define that when a Customer object is loaded from the database, that all the Orders for that Customer should also be loaded with this code.

DataLoadOptions options = new DataLoadOptions();
options.LoadWith(Customer => Customer.Orders);
db.LoadOptions = options;
var query……

Now if we re-run the application, instead of having a query for each customer to find out the number of orders, we just have a single query being executed which returns the Customer and Order information.

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country],
[t0].[Phone], [t0].[Fax], [t1].[OrderID], [t1].[CustomerID] AS [CustomerID2], [t1].[EmployeeID], [t1].[OrderDate], [t1].[RequiredDate], [t1].[ShippedDate], [t1].[ShipVia],
[t1].[Freight], [t1].[ShipName], [t1].[ShipAddress], [t1].[ShipCity], [t1].[ShipRegion], [t1].[ShipPostalCode], [t1].[ShipCountry], (
    SELECT COUNT(*)
    FROM [dbo].[Orders] AS [t2]
    WHERE [t2].[CustomerID] = [t0].[CustomerID]
    ) AS [count]
FROM [dbo].[Customers] AS [t0]
LEFT OUTER JOIN [dbo].[Orders] AS [t1] ON [t1].[CustomerID] = [t0].[CustomerID]
ORDER BY [t0].[CustomerID], [t1].[OrderID]

Changing the foreach loop to output the order information will still only result in a single query being executed on the server.

foreach (var c in query)
{
    Console.WriteLine(“{0}, Phone: {1}”, c.ContactName, c.Phone);
    foreach (var o in c.Orders)
    {
        Console.WriteLine(“OrderID: {0}”, o.OrderID);
    }
}

However, it doesn’t solve all of our problems.  If we wanted a query which provided details on the Order Lines within the ProductID being outputted, we could write something like this:  

foreach (var c in query)
{
    Console.WriteLine(“{0}, Phone: {1}”, c.ContactName, c.Phone);
    foreach (var o in c.Orders)
    {
        Console.WriteLine(“Order Details for OrderID: {0} with {1} lines”, o.OrderID, o.Order_Details.Count);

        foreach (var od in o.Order_Details)
        {
            Console.WriteLine(“Product ID: {0}”, od.ProductID);
        }
    }
}

You would have thought that by adding the following code we would be able to limit the number of queries:

options.LoadWith(Order => Order.Order_Details);

Sadly, this is not the case and will still cause a query to be executed for each customer, in fact it is actually for each order. According to Michael Pizzo on the product team, this is by design with this explanation:

In LINQ to SQL, in order to reduce the complexity and amount of data returned by the join query required to do span, only one association in any given query is actually spanned in. LINQ to SQL picks the deepest level association in the query to do the join in order to minimize the number of queries generated.  So, in this case, by adding a span between orders and order_details, you are basically “hiding” any span of orders.

DataLoadOptions can improve the performance of your application and something which you should remember when writing Linq to SQL queries.  However, if we are going more than one layer deep, you are still going to have a huge number of queries being executed.  Remember to run SQL Profiler to make sure your application isn’t going to take out your production server(s).

Technorati Tags: , ,

Extending DataContext using Extension methods

Extension methods are great as they allow you to extend any class within C# 3.0 with the instance of the class being passed in as a parameter. 

In C# 2.0 we had an initial way with partial classes which allows us to extend classes with our own custom logic, however it only worked if the other class was marked as partial.

I talked before how we can extend the DataContext created by the designer (NorthwindDataContext) by using partial classes,  however if we wanted a method on all of the DataContext object in our system, we would have to create a partial class for each of them separate. This would result in a lot of duplicate code and a maintenance nightmare.

partial class DataClasses1DataContext { //… }

partial class DataClasses2DataContext { //… }

The other solution would be to add a partial class to the DataContext and have it be inherited by the other objects in the system, however this doesn’t work as the DataContext class is not partial.

The answer is, Extension methods. Extension methods allow us to extend any object in the system, even if it is marked as sealed.  They require the following:

  • Be in their own public static class
  • Method be public static
  • use the ‘this’ keyword with the object we want to extend as the first parameter.  Other parameters can be added after this.
  • Accessible from the calling object.  If they are in there own namespace, the namespace needs to be referenced via a using directive.

Using this, we can extend the DataContext class and have all of our own DataContext’s in the system implement our new functionality.

The following code could be used and access any information from the DataContext as a parameter:

namespace MyExtension
{
    public static class Extension
    {
        public static void PrintConnection(this DataContext dc)
        {
            Console.WriteLine(dc.Connection.ConnectionString);
        }
    }
}

This could then be accessed by any child object in the system like this:

DataClasses1DataContext db = new DataClasses1DataContext();
db.PrintConnection();

We are limited to the public interface of DataContext, but this is still a very useful technique. This also applies to any other object within .Net.

Technorati Tags: , ,

Partial Methods in Linq

Following on from my previous post, I noticed Linq takes advantage of the Partial Methods feature in the language so I thought I would take a closer look.

Within VS Beta 2, the DataContext and database objects contain a number of partial methods to allow you to hook into various methods which are called at certain points within the execution.  All of the definitions are in the datacontext.designer.cs file in a region called Extensibility Method Definitions.

The standard methods included within the data context are:

  • partial void OnCreated();
  • partial void InsertCustomer(Customer instance);
  • partial void UpdateCustomer(Customer instance);
  • partial void DeleteCustomer(Customer instance);

OnCreated() is called when the DataContext object is created, while the others are called on SubmitChanges().  Once SubmitChanges() are called, the DataContext will start processing any changes and calling the methods, if they have been implemented.  HOWEVER I found that if you extend one of the methods, then the action would never be passed to the database. This must be a bug, so something to be aware of.  It might be by design and you use this to override the logic – its not very clear.  If they do override the functionality and its not a bug, I don’t like the fact that there is no distinction between a method which is a hook and a method which overrides functionality. If it was something like partial override void …. then it would be a lot better that how it is at the moment.

The database entities also have extension points.

  • partial void OnLoaded();
  • partial void OnValidate();
  • partial void OnCreated();
  • partial void OnOrderIDChanging(int value);
  • partial void OnOrderIDChanged();

These partial methods are a little bit more interesting.  OnCreated is called when an object is initialised.  When loading from a database, OnCreated is called first, then OnLoaded is called.  OnXChanging() is called when the property is changed, just before the value is stored in the internal variable, not when SubmitChanges is called. OnXChanged() is called after the value is stored.

The most important method is OnValidate() which is called just before the action is passed to the database.  Here, you can add custom/business logic to verify that the object (can access everything as you are inside the object, for example this.OrderID) is correct and valid before being saved.  If there is a problem, you can throw an exception and catch it by putting a try/catch around db.SubmitChanges(). Very useful feature.

 

All of the code I wrote can be found here : C# Sample Code

Technorati Tags: , ,