Performing Parallel Processing in Azure Web Jobs

Azure Web Jobs are a new feature available on Azure Web Sites. They allow you to perform long-running, continuous background tasks alongside your web sites. For instance, you might have a blog and would like to resize the images that the contributors of your blog upload to the site. Until now, you had to create a separate thread in your web site or even create a separate Worker Role to tackle these kind of resource intensive tasks.

Thankfully, Azure Web Jobs solve this problem quite neatly.

By the way, if you did not know about Azure Web Jobs or if you haven’t already had the chance to fiddle with this new feature, I suggest you read the excellent introductory post from Scott Hanselman on this subject. Bear in mind that Azure Web Jobs are still in preview and there is hardly any documentation available yet.

In what follows, I’m assuming that you already know about and have implemented Azure Web Jobs.

One of the neat use cases for using Web Jobs is to trigger custom processing upon receiving a message from a queue. This is easily done by creating a simple function with a parameter decorated with the QueueInput attribute. In that case, the Web Job host infrastructure will automatically invoke your method with the contents of the queued message as a parameter.

One of the benefits of Azure Web Jobs is that they scale by default with your web site. This means that there will be as many instances of your web job as there are instances of your web site, thus, allowing for some degree of parallelism in your background processing.

If, however, you do not want some of your Azure Web Jobs to scale, you can optionnaly opt for them to run as singletons. This means that whatever the number of instances you web site runs, only a single instance of the web job will run. This is great, because this provides some kind of automatic failover, should the singleton instance of your webjob crash unexpectedly.

Create Scalable Azure Web Jobs

There is one kind of scenarios, however, that is not directly supported by Azure Web Jobs, that is performing parallel processing. It is important to note that the method that performs the bulk of the processing in your Web Job cannot be reentrant. Indeed, each instance of a Web Job will only perform its processing sequentially upon receiving a message from, say, an Azure Queue.

In one of our projects, we are using Web Jobs in order to frequently process several thousand items from a queue. Each individual piece of processing is somewhat lightweight and involves calling a third-party REST API web site. But, the fact the items from the queue are processed sequentially increases considerably the total execution time.

For illustration purposes, consider the following code:

private static void Main(string[] args)
{
    new JobHost().RunAndBlock();
}

public static void Work([QueueInput("webjobq")] string item)
{
    Console.WriteLine("This is a web job invocation: Process Id: {0}, Thread Id: {1}.", System.Diagnostics.Process.GetCurrentProcess().Id, Thread.CurrentThread.ManagedThreadId);
    Console.WriteLine(">> Yawn ...");
    Thread.Sleep(2500);
    Console.WriteLine(">> ... I think I fell asleep for awhile.");
    Console.WriteLine(">>; Done.");
}

When running, the Azure Web Job will process each incoming queue item sequentially. Here is the result of the method invocations:

You can see that each item takes roughly 25 seconds to complete, essentially taken by the Thread.Sleep() instruction in the code above. Therefore the total execution time of sequentially processing input from an Azure Queue will be proportional to the number of items to process. In order to reduce the total processing time, we need to somehow perform some parallel processing.

Performing Parallel Processing in Azure Web Jobs

The obvious and simplest solution is to run multiple instances of the web job. For instance, you could deploy, say, five instances of this web job alongside your web site. This allows to have a number of Web Job instances equal to five times the number of instances of your web sites.

However this solution is not easily maintainable, unless proper automation is put in place.

Another attempt was to simply try and make the Web Job method asynchronous and returning a Task. However, this does not yield the correct result and is not supported as it seems that the Web Job infrastructure is awaiting the returned task anyway.

Another solution would be to offload the processing to a certain number of separate threads, in a single instance of a Web Job. This achieves the same result, but without having to deploy multiple identical instances of a single Web Job algonside your web site.

By using a simple Semaphore object, it is possible to limit to a fixed or configurable quantity the number of concurrent threads allowed to process an item from the queue. Here is the updated code:

public static void Work([QueueInput("webjobq")] string item)
{
    LoggerFactory.GetCurrentClassLogger().Debug("Performing work...");

    try
    {
        // wait for a slot to become available
        // then spawn a new thread

        semaphore_.Wait();
        new Thread(ThreadProc).Start();
    }
    catch (Exception e)
    {
        Console.Error.WriteLine(e);
    }
}

private const int MaxNumberOfThreads = 3;
private static readonly SemaphoreSlim semaphore_ = new SemaphoreSlim(MaxNumberOfThreads, MaxNumberOfThreads);

public static void ThreadProc()
{
    try
    {
        Work();
    }
    catch (Exception e)
    {
        Console.Error.WriteLine(">> Error: {0}", e);
    }
    finally
    {
        // release a slot for another thread
        semaphore_.Release();
    }
}

public static void Work()
{
    Console.WriteLine("This is a web job invocation: Process Id: {0}, Thread Id: {1}.", System.Diagnostics.Process.GetCurrentProcess().Id, Thread.CurrentThread.ManagedThreadId);
    Console.WriteLine(">> Yawn ...");
    Thread.Sleep(25000);
    Console.WriteLine(">> ... I think I fell asleep for awhile.");
    Console.WriteLine(">> Done.");
}

In that case, each time an item is received from an Azure Queue, a new processing thread is created if there is a slot available. Here is the result of the invocations on many queue items.

You can see that several threads are created in quick succession (several hundreds of milliseconds at most) and then, when the maximum number of concurrent threads is reached, the next one is waiting for a previously created thread to complete.

There you have it. A simple way to perform parallel processing from within a Web Job. Of course, this technique is not specific to Web Jobs, but I think it allows you to work with Azure Web Jobs in a more flexible way. In our project, we used a slightly modified version of these code, taking advantage of a CancellationToken to gracefully stop the web job from either the Azure infrastructure or from the command-line.

This entry was posted in Tips, Windows Azure. Bookmark the permalink.

5 Responses to Performing Parallel Processing in Azure Web Jobs

  1. Victor Hurdugaci (MSFT) says:

    This is an interesting approach and it can definitely work in some scenarios! However, I would like to point out two drawbacks:

    1. Since you are kicking off a new thread inside a WebJobs SDK function but you do not wait for it to complete, the dashboard will report the function as being completed even though the thread is still running.
    2. Also, because the function can complete before the thread finishes, you cannot use output parameters that are produced by the thread.

    But, as I said, there are scenarios in which the two cases above don’t matter as much as having support for concurrent execution.

    • Yes your are absolutely right, thanks for your feedback.
      In fact, I think this means we need a way for Web Jobs to natively support reentrant methods with a configurable limit on concurrent invocations🙂

      • Nick says:

        Yes I agree with Maxime that this needs to be natively supported. I have a desperate need for this in my project and will likely need to switch over to a worker role because of this limitation. Nice workaround in this blog post, but I need to have reliable/accurate status updates of invocations in the dashboard.

  2. Why threads instead of Tasks? You could await the group of tasks before returning from the WebJob.

    What would be really nice is a way for QueueTrigger to dequeue more than one message at a time for processing.

    • Good point.

      I was not as familiar with tasks as would have liked when I made this post.

      I think a lot has changed now that WebJobs are in GA. But yes, it would be nice if what you suggest were supported…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s