How To Unit Test – RegEx and Technorati

Following on from my previous post, I wanted to implement another feature in the blog application. The feature is to pick out the Technorati tags from a post and pull relates posts from the Technorati API.  I’ve already blogged about how to process the Technorati API.

To implement this feature, the first thing i’m going to have implement is being able to return a list of all the tags for a post.

I found unit testing RegEx really difficult.  I’m not great at RegEx as I only rarely have a requirement to do it and this is where unit testing becomes difficult.  Its easy to unit test code you have wrote before but unit testing code you haven’t done in a while takes a little more effort.  I feel the reason is because you are thinking about the design in your head to try and figure out how it all fits together.  During this implementation, I found the best thing to do was simply just start writing simple code and spike RegEx.  To spike RegEx I wrote a simple test which I thought should pass successfully based on my previous.

[Test]
public void RegExMatchesString()
{
    Regex r = new Regex(“http://technorati.com/tags/(.+?) rel=”tag”>(.+?)“);
    MatchCollection collection =
        r.Matches(“http://technorati.com/tags/TestCode” rel=”tag”>Test Code“);

    Assert.AreEqual(1, collection.Count, “Collection”);

    foreach (Match match in collection)
    {
        Assert.IsTrue(match.Success);
        Assert.AreEqual(2, match.Captures.Count, “Captures”);
        Assert.AreEqual(“TestCode”, match.Captures[0].Value, “One”);
        Assert.AreEqual(“Test Code”, match.Captures[1].Value, “Two”);
    }
}

This fails, so I knew I must have done something wrong. I then jumped quickly into the debugger, and found instead of using the Captures collection, I wanted the Groups collection. I retested it and found Groups actually includes both the two matches plus the whole thing. The whole thing is at 0 index in the collection so I had to move everything alone one.  A couple of minutes later I had my regex code working.  There’s nothing like seeing a test go green!  The actual implementation is below.

[Test]
public void Spike_RegExMatchesString()
{
    Regex r = new Regex(“http://technorati.com/tags/(.+?) rel=”tag”>(.+?)“);
    MatchCollection collection =
        r.Matches(“http://technorati.com/tags/TestCode” rel=”tag”>Test Code“);

    Assert.AreEqual(1, collection.Count, “Collection”);

    foreach (Match match in collection)
    {
        Assert.IsTrue(match.Success);
        Assert.AreEqual(3, match.Groups.Count, “Captures”);
        Assert.AreEqual(“TestCode”, match.Groups[1].Value, “One”);
        Assert.AreEqual(“Test Code”, match.Groups[2].Value, “Two”);
    }
}

Based on the code completed during the spike, I can quickly test and implement the code by refactoring out the important parts into actual unit tested methods.  Spending that bit extra time to look into the spike made the actual implementation a lot quicker and easier.

[Test]
public void GetMatchCollectionForHtml()
{
    string post = “http://technorati.com/tags/TestCode” rel=”tag”>Test Code“;
    MatchCollection collection = Technorati.GetMatches(post);
    Assert.AreEqual(1, collection.Count, “Collection”);
}

[Test]
public void GetListFromMatches()
{
    string post = “http://technorati.com/tags/TestCode” rel=”tag”>Test Code“;
    MatchCollection collection = Technorati.GetMatches(post);

    List tags = Technorati.GetTagList(collection);
    Assert.AreEqual(1, tags.Count);
    Assert.AreEqual(“TestCode”, tags[0]);
}

Once they work, I write a test to ensure it works together with ‘real’ data.

[Test]
public void GetTagListFromPostText()
{
    RssFeed rss = TestHelper.CreateMockRSS(TestHelper.url);
    RssItemCollection items = rss.Channels[0].Items;

    string post = items[0].Description;
    List tags = Technorati.GetTagListFromPost(post);
    Assert.AreEqual(3, tags.Count);
    Assert.AreEqual(“MbUnit”, tags[0]);
    Assert.AreEqual(“TDD”, tags[1]);
    Assert.AreEqual(“Testing”, tags[2]);
}

Great, that’s our tag list completed.  We now just need to pull the posts from Technorati.

My previous post wasn’t very testable, the method was doing a lot.  It was pulling data, processing the feeds and returning a list. This version has been separated into a lot smaller methods to make unit testing easier and more effective.  The first test is to make sure we can return a list of XmlNodes from the API.

[Test]
public void GetXmlFromRss()
{
    int limit = 10;
    string technoratiTerm = “MbUnit”;

    XmlNodeList xmlNodes = Technorati.GetXmlNodeList(Technorati.GetUrl(technoratiTerm));
    Assert.AreEqual(limit, xmlNodes.Count);
}

We then want to make a Blog object from one of the XmlNodes in the list.

[Test]
public void GivenXmlNodeCreateBlog()
{
    string technoratiTerm = “MbUnit”;

    XmlNodeList xmlNodes = Technorati.GetXmlNodeList(Technorati.GetUrl(technoratiTerm));
    Blog blog = Technorati.CreateBlogPost(xmlNodes[0]);
    Assert.AreEqual(blog.Title, xmlNodes[0][“title”].InnerText);
}

After that has been implemented, we can can a list of Blog objects from a list of nodes (it simply calls the above method multiple times).

[Test]
public void GetBlogPostListTerm()
{
    string technoratiTerm = “MbUnit”;

    XmlNodeList xmlNodes = Technorati.GetXmlNodeList(Technorati.GetUrl(technoratiTerm));
    List blogs = Technorati.CreateBlogPostList(xmlNodes);
    Assert.AreEqual(10, blogs.Count);
}

Finally, we can get a list of blog posts for a list of strings (again, calling the above method multiple times).

[Test]
public void GetBlogPostListForListOfTerms()
{
    List terms = new List(3);
    terms.Add(“MbUnit”);
    terms.Add(“TDD”);
    terms.Add(“Testing”);

    List blogs = Technorati.CreateBlogPostList(terms);
    Assert.AreEqual(30, blogs.Count);
}

The only problem with these tests is that they are hitting the Technorati API for each test.  It would be much better if we could abstract away from the API and use a stub object instead. A stub object allows us to replace the implementation however are they not as cleaver as a mock object and more limited, but for this simple task it will fit the purpose as using a full mock framework isn’t really required and similar the better really.

To do this, I’m going to inject the stub object the same way I would a mock object by using a static method on the Technorati object.  Within my test code, I use TechnoratiStub (Implemented within my test assembly, no test code inside the actual assembly under test).  In the real system I use TechnoratiAccessor.

Technorati.TechnoratiAPI = new TechnoratiStub();

private static ITechnoratiAccessor dataStore = new TechnoratiAccessor();
public static ITechnoratiAccessor TechnoratiAPI
{
    get { return dataStore; }
    set { dataStore = value; }
}

I then move the implementation of hitting the API and returning the XmlNodeList to the TechnoratiAccessor.

public interface ITechnoratiAccessor
{
    XmlNodeList GetXmlNodeList(string technoratiUrl);
}

public class TechnoratiAccessor : ITechnoratiAccessor
{
    public XmlNodeList GetXmlNodeList(string technoratiUrl)
    {
        XmlDocument xmlResultsSet = new XmlDocument();
        xmlResultsSet.Load(new XmlTextReader(technoratiUrl));
        XmlNodeList xmlResponse = xmlResultsSet.SelectNodes(“/tapi/document/item”);
        return xmlResponse;
    }
}

Then, in my Technorati object, I just return whatever the TechnoratiAPI gives me (be it the real implementation or the stub).

public static XmlNodeList GetXmlNodeList(string technoratiUrl)
{
    return TechnoratiAPI.GetXmlNodeList(technoratiUrl);
}

All of our tests still pass, so we can now implement the stub.  In all the tests I setup the property to point to the Stub and then the stub just loads a valid XML response I have cached in a string to a XmlDocument and return the XmlNodeList. Very quick and easy.

public class TechnoratiStub : ITechnoratiAccessor
{
    public XmlNodeList GetXmlNodeList(string technoratiUrl)
    {
        #region Xml
        string cachedXmlResponse = @”Too long to insert here”;
        #endregion

        XmlDocument xmlDoc = new XmlDocument();
        xmlDoc.LoadXml(cachedXmlResponse);
        return xmlDoc.SelectNodes(“/tapi/document/item”);
    }
}

Now when all of our tests execute, they use this Stub object instead of the real API.  Hope you find this useful.

Downloads: TechnoratiTests.cs  ||  Technorati.cs  (These include all of the discussed in this post).

Technorati Tags: , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *