Zipping a directory using C# and SharpZipLib

While I have used C# and SharpZipLib to zip a directory before, I simple used the sample provided in the framework.   However the sample doesn’t support maintaining the directory structure in the zip file. 

So, I looked around the site very quickly and noticed a FastZip class.  This has all the common scenarios like CreateZip, ExtractZip and setting a password for the zip. Very simple, but very quick to integrate and test.

There is some sample code to zip the directory ZipTest into

ICSharpCode.SharpZipLib.Zip.FastZip z = new ICSharpCode.SharpZipLib.Zip.FastZip();
z.CreateEmptyDirectories = true;
z.CreateZip(“”, “F:ZipTest”, true, “”);

if (File.Exists(“”))

Problem with Debugger.Break();

While writing my previous post on how to debug live writer plugins, I mentioned using System.Diagnostics.Debugger.Break();  However, I found a problem with this. If you have a debugger attached, for example Visual Studio, then the program will hit the line of code and the debugger will step in. Great!

However, if no debugger is attached, the following error occurs:


I instantly thought this was strange, so built it as a Release version. Same thing happened. I know that with Debug.Write() the calls are removed in Release mode to save time, space and so no strange errors like this occurs.

First thing I did was boot up Reflector.  The debug class uses the ConditionalAttribute which provides a way to include/exclude methods depending on the preprocessor symbol.  So, if the Conditional compilation symbol is set to DEBUG then the method is included in the build, otherwise all the calls are removed.


However, if we look at Debugger.Break(), it doesn’t use a ConditionalAttribute which means we cannot take advantage of this compiler feature.  The result of this is that you need to manually remove all the lines of code which calls Debugger.Break() before shipping the code to the client, or when not running it with a debugger attached (like running unit tests). Not great at all!!

Strange how this got missed, unless it was done for a reason?? 

Linq to SQL – DataLoadOptions – Does it improve performance?

Following on from my previous post on DataLoadOptions, I decided to do a quick test to see if it does actually improve performance, even with its limitations.

If we take this query, with DataLoadOptions only a single query is executed, while with it not set a query is executed per order.

DataLoadOptions options = new DataLoadOptions();
options.LoadWith(Customer => Customer.Orders);
db.LoadOptions = options;

var query = from c in db.Customers
                        select c;

            foreach (var c in query)
                Console.WriteLine(“{0}, Phone: {1} with {2} orders”, c.ContactName, c.Phone, c.Orders.Count);

Without using DataLoadOptions, it took 1,739,904 Ticks to execute, while with using them it took 1,487,722 Ticks. An improvement, but nothing ultra special.

If we consider this query, which if you have read the previous, causes a query to be executed for each order even with DataLoadOptions on:

DataLoadOptions options = new DataLoadOptions();
options.LoadWith(Customer => Customer.Orders); options.LoadWith(Order => Order.Order_Details);
db.LoadOptions = options;

var query = from c in db.Customers
                     select c;

         foreach (var c in query)
             Console.WriteLine(“{0}, Phone: {1}”, c.ContactName, c.Phone);
             foreach (var o in c.Orders)
                 Console.WriteLine(“Order Details for OrderID: {0} with {1} lines”, o.OrderID, o.Order_Details.Count);

                 foreach (var od in o.Order_Details)
                     Console.WriteLine(“Product ID: {0}”, od.ProductID);

Without using DataLoadOptions, it took 10,003,725 Ticks to execute.  While using DataLoadOptions, taking into account the time to setup the options, it took 6,536,320 Ticks.  Much clearer performance improvements when a larger number of queries are being executed.

There is definitely performance improvements to be had even with it not being able to load all the data in at the same time and so should be used when executing these kinds of queries.

However, what happens if we are not using the Order/Order Details information and simply just using Customer data?  Executing our first query again but without c.Orders.Count, with options set it takes 1,609,867 Ticks, without it takes 1,174,217 Ticks.  So, if your not using them, then there is additional overhead to be aware of.

Linq to SQL – DataLoadOptions (Previously DataShape)

DataLoadOptions allows you to specify which child entities you automatically want to be loaded with their parent entity. By defining how the data should be loaded, we can reduce the number of requests made to the database to retrieve information and thus improving performance of our application.
Take this code for example:

var query = from c in db.Customers
                  select c;

foreach (var c in query)
   Console.WriteLine(“{0}, Phone: {1} has {2} orders”, c.ContactName, c.Phone, c.Orders.Count);

This is a simple query which outputs the name, telephone number and the number of orders they have to the console.  There doesn’t look to be anything wrong with this, however if we run SQL Profiler that a large number of queries are being executed.

The query process is that all the customers are loaded into memory, and then for each customer a query is made to the orders table to return how many records they have which is then counted in memory.  Having queries like this will soon put pressure on your database.

This is where DataLoadOptions plays a part.  We can define that when a Customer object is loaded from the database, that all the Orders for that Customer should also be loaded with this code.

DataLoadOptions options = new DataLoadOptions();
options.LoadWith(Customer => Customer.Orders);
db.LoadOptions = options;
var query……

Now if we re-run the application, instead of having a query for each customer to find out the number of orders, we just have a single query being executed which returns the Customer and Order information.

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country],
[t0].[Phone], [t0].[Fax], [t1].[OrderID], [t1].[CustomerID] AS [CustomerID2], [t1].[EmployeeID], [t1].[OrderDate], [t1].[RequiredDate], [t1].[ShippedDate], [t1].[ShipVia],
[t1].[Freight], [t1].[ShipName], [t1].[ShipAddress], [t1].[ShipCity], [t1].[ShipRegion], [t1].[ShipPostalCode], [t1].[ShipCountry], (
    FROM [dbo].[Orders] AS [t2]
    WHERE [t2].[CustomerID] = [t0].[CustomerID]
    ) AS [count]
FROM [dbo].[Customers] AS [t0]
LEFT OUTER JOIN [dbo].[Orders] AS [t1] ON [t1].[CustomerID] = [t0].[CustomerID]
ORDER BY [t0].[CustomerID], [t1].[OrderID]

Changing the foreach loop to output the order information will still only result in a single query being executed on the server.

foreach (var c in query)
    Console.WriteLine(“{0}, Phone: {1}”, c.ContactName, c.Phone);
    foreach (var o in c.Orders)
        Console.WriteLine(“OrderID: {0}”, o.OrderID);

However, it doesn’t solve all of our problems.  If we wanted a query which provided details on the Order Lines within the ProductID being outputted, we could write something like this:  

foreach (var c in query)
    Console.WriteLine(“{0}, Phone: {1}”, c.ContactName, c.Phone);
    foreach (var o in c.Orders)
        Console.WriteLine(“Order Details for OrderID: {0} with {1} lines”, o.OrderID, o.Order_Details.Count);

        foreach (var od in o.Order_Details)
            Console.WriteLine(“Product ID: {0}”, od.ProductID);

You would have thought that by adding the following code we would be able to limit the number of queries:

options.LoadWith(Order => Order.Order_Details);

Sadly, this is not the case and will still cause a query to be executed for each customer, in fact it is actually for each order. According to Michael Pizzo on the product team, this is by design with this explanation:

In LINQ to SQL, in order to reduce the complexity and amount of data returned by the join query required to do span, only one association in any given query is actually spanned in. LINQ to SQL picks the deepest level association in the query to do the join in order to minimize the number of queries generated.  So, in this case, by adding a span between orders and order_details, you are basically “hiding” any span of orders.

DataLoadOptions can improve the performance of your application and something which you should remember when writing Linq to SQL queries.  However, if we are going more than one layer deep, you are still going to have a huge number of queries being executed.  Remember to run SQL Profiler to make sure your application isn’t going to take out your production server(s).

Extending DataContext using Extension methods

Extension methods are great as they allow you to extend any class within C# 3.0 with the instance of the class being passed in as a parameter. 

In C# 2.0 we had an initial way with partial classes which allows us to extend classes with our own custom logic, however it only worked if the other class was marked as partial.

I talked before how we can extend the DataContext created by the designer (NorthwindDataContext) by using partial classes,  however if we wanted a method on all of the DataContext object in our system, we would have to create a partial class for each of them separate. This would result in a lot of duplicate code and a maintenance nightmare.

partial class DataClasses1DataContext { //… }

partial class DataClasses2DataContext { //… }

The other solution would be to add a partial class to the DataContext and have it be inherited by the other objects in the system, however this doesn’t work as the DataContext class is not partial.

The answer is, Extension methods. Extension methods allow us to extend any object in the system, even if it is marked as sealed.  They require the following:

  • Be in their own public static class
  • Method be public static
  • use the ‘this’ keyword with the object we want to extend as the first parameter.  Other parameters can be added after this.
  • Accessible from the calling object.  If they are in there own namespace, the namespace needs to be referenced via a using directive.

Using this, we can extend the DataContext class and have all of our own DataContext’s in the system implement our new functionality.

The following code could be used and access any information from the DataContext as a parameter:

namespace MyExtension
    public static class Extension
        public static void PrintConnection(this DataContext dc)

This could then be accessed by any child object in the system like this:

DataClasses1DataContext db = new DataClasses1DataContext();

We are limited to the public interface of DataContext, but this is still a very useful technique. This also applies to any other object within .Net.

Partial Methods in Linq

Following on from my previous post, I noticed Linq takes advantage of the Partial Methods feature in the language so I thought I would take a closer look.

Within VS Beta 2, the DataContext and database objects contain a number of partial methods to allow you to hook into various methods which are called at certain points within the execution.  All of the definitions are in the datacontext.designer.cs file in a region called Extensibility Method Definitions.

The standard methods included within the data context are:

  • partial void OnCreated();
  • partial void InsertCustomer(Customer instance);
  • partial void UpdateCustomer(Customer instance);
  • partial void DeleteCustomer(Customer instance);

OnCreated() is called when the DataContext object is created, while the others are called on SubmitChanges().  Once SubmitChanges() are called, the DataContext will start processing any changes and calling the methods, if they have been implemented.  HOWEVER I found that if you extend one of the methods, then the action would never be passed to the database. This must be a bug, so something to be aware of.  It might be by design and you use this to override the logic – its not very clear.  If they do override the functionality and its not a bug, I don’t like the fact that there is no distinction between a method which is a hook and a method which overrides functionality. If it was something like partial override void …. then it would be a lot better that how it is at the moment.

The database entities also have extension points.

  • partial void OnLoaded();
  • partial void OnValidate();
  • partial void OnCreated();
  • partial void OnOrderIDChanging(int value);
  • partial void OnOrderIDChanged();

These partial methods are a little bit more interesting.  OnCreated is called when an object is initialised.  When loading from a database, OnCreated is called first, then OnLoaded is called.  OnXChanging() is called when the property is changed, just before the value is stored in the internal variable, not when SubmitChanges is called. OnXChanged() is called after the value is stored.

The most important method is OnValidate() which is called just before the action is passed to the database.  Here, you can add custom/business logic to verify that the object (can access everything as you are inside the object, for example this.OrderID) is correct and valid before being saved.  If there is a problem, you can throw an exception and catch it by putting a try/catch around db.SubmitChanges(). Very useful feature.


All of the code I wrote can be found here : C# Sample Code

Partial Methods in C# 3.0 and 9.0

Partial methods are a new language feature in both and C# and are designed to enable lightweight event handling and hooks into the class.  The code samples are in C#, but I have provided a link at the bottom for the version – the principals are the same.

Few important points:

  • Partial methods can only be defined or implemented within a partial class.
  • They must return void however can accept arguments (ref parameters but not out parameters).
  • Must be private

Consider the following C# code which has a simple getter and setter for the name of a person.

partial class Person
        string name;

        public string Name
            get { return name; }

                name = value;

        partial void OnNameChanging(string name);

        partial void OnNameChanged();


When a name is set, a call is made to the OnNameChanging method, passing in the value as a parameter. After the name has changed the OnNameChanged method is called.  But these methods are just placeholders/interfaces which could be implemented by a developer.

In a separate partial class we can then implement this functionality which will handle what we want to happen when the methods are called.

partial class Person
        partial void OnNameChanged()

        partial void OnNameChanging(string name)
            Console.WriteLine(“OnNameChanging(string name)”);

Now, when we compile, everything will be merged and act as a single class. If we then set the name on the object, the following is wrote out to the console

OnNameChanging(string name)
Ben Hall

But what what happens if no partial class implements the method body for a partial method? Well, the compiler removes calling reference from the code and does not get compiled. If the two methods where not implemented, the setter would look like this:

  name = value;

The result is that it is really effective to include calls to partial methods within generated code, like from a Linq DataContext, because if they are not taken advantage of then they are removed.

Looking forward to seeing how these will be used.


C# Sample | Sample

