Scaling Long Running Web Services

Wednesday, July 30th, 2008.

While I was at TechEd USA I had an attendee, Will, come up and ask me an interesting question about how to handle web service calls that can take a long time to complete. He has a number of these kinds of requests ranging from computationally intensive tasks to those requiring sifting through large amounts of data. What Will was having problems with was preventing too many of these resource-intensive tasks from running concurrently (causing increased memory usage, paging, and eventually the server becoming unavailable).

For comparison later, here’s a diagram showing the trivial interaction:

One solution that he’d tried was to set up the web server to throttle those requests and keep a much smaller maximum thread-pool size for that application pool. The unfortunate side effect of that solution was that clients would get “turned away” by a not-so-pleasant Connection Refused exception.

Will had been to my web scalability talk and was curious about how I was using queues behind my web services. I’ve also heard this question from people just getting started with nServiceBus when looking at the Web Services Bridge sample. Here’s the code that’s in the sample and in just a second I’ll tell you why you shouldn’t do this:

[WebMethod]

public ErrorCodes Process(Command request)

    object result = ErrorCodes.None;

    IAsyncResult sync = Global.Bus.Send(request).Register(

        delegate(IAsyncResult asyncResult)

              CompletionResult completionResult = asyncResult.AsyncState as CompletionResult;

              if (completionResult != null)

                  result = (ErrorCodes) completionResult.ErrorCode;

},

          null

);

    sync.AsyncWaitHandle.WaitOne();

    return (ErrorCodes)result;

Let me repeat, this is demo-ware. Do not use this in production.

What’s happening is that in this web service call we’re putting a message in a queue for some other process/machine to process. When that processing is complete, we’ll get a message back in our local queue (which you don’t see) which is correlated to our original request, firing off the callback. We block the web method from completing (using the WaitOne call) thus keeping the HTTP connection to the client open.

The problem here is that we’re wasting resources (the HTTP connection and the thread) while waiting for a response which, as already mentioned, can take a long time. In B2B or other server to server integration environments there are all sorts of middleware solutions that help us solve these problems, however in Will’s case browsers needed to interact with this web service. All he had was HTTP.

HTTP Solutions

Another attendee who was listening in (sorry I don’t remember your name) said that he was solving similar problems using polling but that he was having scalability problems as well.

What often surprises my clients when we deal with these same issues is that I do suggest a polling based solution, but one that still uses messaging, and this is what I described to Will:

Since we can’t actually push a message to a browser over HTTP from our server when processing is complete, the browser itself will be responsible for pulling the response. We still don’t want to leave costly resources like HTTP connections open a long time, however if the browser is going to polling for a response, we’ll need some way to correlate those following requests with the original one. What we’re going to do is use the Asynchronous Completion Token pattern, and later I’ll show how to optimize it for web server technology.

Basic Polling

When the browser calls the web service, the web service will generate a Guid, put it in the message that it sends for processing, and return that guid to the browser. When the processing of the message is complete, the result will be written to some kind of database, indexed by that guid. The browser will periodically call another web method, passing in the guid it previously received as a parameter. That web method will check the database for a response using the guid, returning null if no response is there. If the browser receives a null response, it will “sleep” a bit and then retry.

One of the problems with this solution is that polling uses up server resources – both on the web server and our DB; threads, memory, DB connections. A better solution would decrease the resource cost of the polling. Let’s use the fundamental building blocks of the web to our advantage – HTTP GET and resources:

REST-full Polling

Instead of using a guid to represent the id of the response, let’s consider the REST principle of “everything’s a resource”. That would mean that the response itself would be a resource. And since every resource has a URI, we might as well use that URI in lieu of the guid. So, instead of our web service returning a guid, let’s return a URI – something like:

http://www.acme.com/responses/88ec5359-a5d8-4491-a570-3bfe469f3a64.xml

As you can see, the guid is still there. So, what’s different?

What’s different is that instead of having the processing code write the response to the database, it writes it to a resource. This can be done by writing some XML to a file on the SAN in the case of a webfarm. Also, the browser wouldn’t need to call a web service to get the response, it would just do an HTTP GET on the URI. If the it gets an HTTP 404, it would sleep and retry as before. The reason that the SAN is needed is that, as the browser polls, it may have its requests arrive at various web servers so the response needs to be accessible from any one of them.

Just as an aside, it would be better to free the processing node as quickly as possible and have something else write the response to the SAN. That would be done simply by sending a message from the processing node that would be handled by a different node that all it did was write responses to disk.

The reason that the URI makes a difference is that serving “static” resources is something that web servers do extremely efficiently without requiring any managed resources (like ASP.NET threads). That’s a big deal.

We’re still using HTTP connections for the polling but that’s something whose effect can be mitigated to a certain degree.

Timed REST-full Polling

Since various requests can take varying amounts of time to process, it’s difficult to know at what rate the browser should poll. So, why don’t we have the web service tell it. As a part of the response to the original web service call, instead of just returning a URI, we could also return the polling interval – 1 second, 5 seconds, whatever is appropriate for the type of request. This value could easily be configurable [RequestType, PollingInterval].

An even more advanced solution would allow you to change these values dynamically. The advantage that would be gained would be that your operations team could better manage the load on your servers. When a large number of users are hitting your system, you could decrease the rate at which your servers would be polled, thus leaving more HTTP connections for other users.

Scaling and Adaptive Polling

You’d probably also want to scale out the number of processing nodes behind your queue. The nice thing is that you could change the polling interval as you scale the various processing nodes per request type providing better responsiveness for the more critical requests. Once we add virtualization, things get really fun:

We had separate queues per request type, so that we could easily see the load we were under for each type of request. That way, we could scale out the processing nodes per request type as well as change the polling interval. By virtualizing our processing nodes, and writing scripts to monitor queue sizes, we had those scripts automatically provisioning (and de-provisioning) nodes as well as changing the polling interval of the browsers.

This had the enormous benefit of the system automatically shifting resources to provide the appropriate relative allocation for the current load as its macroscopic make-up changed.

Summary

Will was well-pleased with the solution which, although more complicated than what he had originally tried, was flexible enough to meet his needs. As opposed to pure server-based solutions, here we make more use of the browser (writing our own Javascript) instead of putting our faith in some Ajax-y library. That’s not to say that you couldn’t wrap this up into a library – in essence, it is a kind of messaging transport for browser to server communication allowing duplex conversations.

In fact, what could be done is to return multiple responses to the browser over a long period of time. In the response that comes back to the browser could be an additional URI where the next response will be. This can be used for reporting the status of a long running process, paging results, and in many other scenarios.

And, one parting thought, could this not be used for all browser to web service communication?

If you liked this article, you might also like articles in these categories:

If you've got a minute, you might enjoy taking a look at some of my best articles.
I've gone through the hundreds of articles I've written over the past 6 years and put together a list of the best ones as ranked by my 5000+ readers.
You won't be disappointed.

If you'd like to get new articles sent to you when they're published, it's easy and free.
Subscribe right here.

Follow me on Twitter @UdiDahan.

Something on your mind? Got a question? I'd be thrilled to hear it.
Leave a comment below or email me, whatever works for you.

23 Comments

Klaus Hebsgaard Says:
July 30th, 2008 at 7:57 am
Could this not be handled using Asynchronous Pages in ASP.NET (see http://msdn.microsoft.com/en-us/magazine/cc163725.aspx ), as I understand it this technique works with WCF services as well….

udidahan Says:
July 30th, 2008 at 8:27 am
Klaus,

I actually have another sample with nServiceBus that demonstrates that, but it only works for ASP.NET pages. If you want to call the web service directly from the browser, there is no async model set up.

Mike Says:
July 31st, 2008 at 5:57 am
What process cleans up all the static resources?

chiph Says:
July 31st, 2008 at 2:02 pm
Like Mike said — you’d have to have something clean up the static files after a while. In my experience, NTFS starts to have problems when you’ve got 10,000+ items in a directory, so depending on your transaction volume, you’d need to run the cleanup possibly several times during the day. As well as schedule your defragger to run.

The trick would be knowing when an item becomes eligible for cleanup. I’m not sure you can delete it immediately after a successful (non-404 response) status request from the user (assuming you can even detect that under IIS, I don’t know). You might want to give them a little more time in case they have system problems at their end.

Interesting Finds: 2008.07.30~2008.08.01 - gOODiDEA.NET Says:
July 31st, 2008 at 5:55 pm
[…] Scaling Long Running Web Services […]

Long running web services - Sunny Nagi Says:
July 31st, 2008 at 7:21 pm
[…] Udi Dahan has just posted an excellent blog post about long running web services. […]

udidahan Says:
August 2nd, 2008 at 3:07 am
Chiph,

You’re absolutely right – which is why the process which actually writes the response to the disk is the beginning of a saga which may have its final (delete) phase triggered either by a read, a certain number of reads, and/or time.

Dan Finucane Says:
August 6th, 2008 at 9:18 am
I love this solution especially the REST piece. I have a system where sometimes the web service operation will complete in a minute or two and other times it may run for four hours. Your solution is a perfect fit. I have one question though – since the operation results are retrieved via plain vanilla HTTP GET’s how would you document/communicate the content of an operations result. Currently I use data contracts to describe the XML schema for the data I return. If I use the REST approach you outline would you continue to use the data contract to document the schema of the result or would you leave the result structure out of the WSDL and communicate the form through supplemental documentation?

In some cases I could see leaving the schema out of the contract because it makes it easier to add elements in future releases without breaking existing code.

udidahan Says:
August 9th, 2008 at 3:30 pm
Dan,

Glad you like it. Let me know how it works out for you.

I use XSD to define the structure of the data returned. Sometimes additional documentation is needed anyway.

BTW, the X in XML stands for eXtensible (which you already knew), but I’m just reiterating that to say that it is quite easy to add elements in future releases without breaking existing code.

Colin Jack Says:
August 12th, 2008 at 2:47 pm
Great stuff

I wondered if you’d looked at the duplex WCF functionality that should help remove the need to do so much polling in Silverlight apps:

http://weblogs.asp.net/dwahlin/archive/2008/06/16/pushing-data-to-a-silverlight-client-with-wcf-duplex-service-part-i.aspx

It’s also really interesting reading this article. I worked on an XML over HTTP project many years ago that used many of these patterns for long running async jobs and it’s good to see you’ve adopted the same techniques.

udidahan Says:
August 13th, 2008 at 1:51 am
Colin,

I agree with the first commenter on that post – Rob.
IIS and ASP.NET are not designed to do comet in a scalable way.

There is the other problem of synchronous communication from server to client where server threads end up being blocked while waiting for the communication to succeed.

Hope that helps.

Colin Jack Says:
August 18th, 2008 at 9:33 am
@Udi
Good point on the threads, hadn’t thought that through.

Also I found this page quite interesting (particularly the distributed observer pattern near the end):

http://duncan-cragg.org/blog/post/distributed-observer-pattern-rest-dialogues/

udidahan Says:
August 19th, 2008 at 11:48 pm
Colin,

What he’s describing is using REST/HTTP to do messaging and pub/sub. I think that’s great. However, from a “how do I get my head thinking the right way” perspective, I find that plain messaging and pub/sub is simpler to grasp. Once you understand the applicative protocol you want to set up, mapping that to resources and GETs and POSTs isn’t very difficult.

Does that make sense?

Colin Jack Says:
September 3rd, 2008 at 3:49 pm
@Udi
Yeah it definitely makes sense but I have one more question, do you use REST in your architectures and if so how do you find it works alongside SOA and DDD?

udidahan Says:
September 3rd, 2008 at 10:00 pm
Colin,

I do use REST where it makes sense – but primarily as a kind of message serialization mechanism.

Jan Van Ryswyck Says:
September 5th, 2008 at 12:36 pm
Hi Udi,

I listened to your latest DNR episode and after reading this post I must say that this is a really awesome approach. Thx for sharing.

I have a small nitpicking question though: what about security of the response resources (supposing that they contain sensitive information, which is not unlikely). I know its not easy to determine a GUID on the right time (before deleting the resource), but those kid hackers of today can do just about anything. Any thoughts about this topic?

Again, great stuff.

udidahan Says:
September 6th, 2008 at 2:56 am
Jan,

Glad you liked it.

Security is a big topic. The question is what threat profile we’re trying to protect against.

One option is for the same saga that created the resource to protect it with an ACL.

The thing is that you need to understand that probably the only way for an external attacker to know the guid/uri of the resource is for them to go for a man-in-the-middle attack. You’d need HTTPS to protect against that. Once you have that on the request, and you don’t allow anyone to list the response resources, you’re probably secure enough not to need HTTPS on the response.

Jonathan Dickinson Says:
September 10th, 2008 at 7:24 am
How about using an async HTTP handler with your restful stuff. This way you are not wasting _bandwidth_ (far more expensive than CPU/Memory resources).

I.e.
string MyWSStart(string bla) -> Returns URL “wswait.ashx?id=100”.
Open connection to wswait.ashx and wait for response.
string MyWSEnd() -> Returns result.

I will have it up on my blog at http://www.geekswithblogs.net/jcdickinson/ in a few minutes.

Simon Segal Says:
December 19th, 2008 at 11:29 pm
Udi

Love this approach. Were all the UI requests including more passive kinds of data (enumerated lists of regions for example) also handled by this single architectural approach in this scenario?

udidahan Says:
December 20th, 2008 at 2:55 am
Simon,

“Passive kinds of data”, like countries, regions, etc were done in a similar manner.

The main difference was that they had a well known URI ahead of time as well as longer cache timeouts. Regardless, when that data was changed on the server, it would also update that resource/URI.

Long-Running Webservices without Polling « Jonathan Dickinson Says:
June 1st, 2009 at 1:31 am
[…] read about how one could do this over at Udi Dahan’s blog, but this seems like a bad practice to me, most importantly – it uses polling. Most people seem to […]

Dan Says:
July 1st, 2015 at 4:42 pm
Hi Udi,

It is now 7 years since you posted this, and still it reigns as one of the top Google search results for “long running web services”.

Given the advent of web sockets, does your recommendation for long running services with polling still stand?