| |
Archive for the ‘Messaging’ Category
Sunday, December 7th, 2008
One of the most common questions I get on the topic of pub/sub messaging is what happens if a notification is lost. Interestingly enough, there are some who almost entirely write-off this pattern because of this issue, preferring the control of request/response-exception. So, what should be done about lost messages? The short answer is durable messaging. The long answer is design.
Durable Messaging
In order to prevent a message from being lost when it is sent from a publisher to a subscriber, the message is written to disk on the publisher side, and then forwarded to the subscriber, where it is also written to disk. This store-and-forward mechanism enables our systems to gracefully recover from either side being temporarily unavailable.
In my MSDN article on this topic, I outlined some problems with this approach. These problems are exacerbated for publishers. Imagine a publisher with 40 subscribers, publishing 10 messages a second, each containing 1MB of XML. If 10 of the subscribers are unavailable, that’s 100MB of data being written to the publisher’s disk every second, 6GB every minute. That’s liable to bring down a publisher before an administrator brews a cup of coffee.
Publishers have no choice but to throw away messages after a certain period of time.
Publisher Contracts
The whole issue of contracts and schema is considered one of the better understand parts of SOA. Unfortunately, the operational aspects of service contracts is hardly ever taken into account.
On top of the schema of the messages a service publishers, additional information is needed in the contract:
- How big will this message be?
- How often will it be published?
- How long will this message be stored if a subscriber is unavailable?
This first two pieces of information are important for subscribers to do load and capacity planning. The last one is the most important as it dictates the required availability and fault-tolerance characteristic of subscribers.
For Example
In the canonical retail scenario, when our sales service accepts an order, it publishes an order accepted event. Other services subscribed to this event include shipping, billing, and business intelligence.
While shipping and billing are highly available and able to keep up with the rate at which orders are accepted, the business intelligence service is not. BI has two main parts to it - a nightly batch that does the number crunching, and a UI for reporting off of the results of that number crunching. Some even do the reporting in a semi-offline fashion, emailing reports back to the user when they’re ready.
Furthermore, nobody’s going to invest in servers for making BI highly available.
And wasn’t the whole point of this publish/subscribe messaging to keep our services autonomous? That not all services have to have the same level uptime?
Houston, do we have a problem.?
Data Freshness
There is a glimmer of light in all this doom and gloom.
Not all services have the same data freshness requirements.
The business intelligence service above doesn’t need to know about orders the second they’re accepted. A daily roll-up would be fine, and an hourly roll-up bring us that much closer to “real time business intelligence”.
So, while BI is ready to accept the sales message schema, it would like a slightly different contract around it - less messages per unit of time, more data in each message.
From the operational perspective of the sales service, it would be cost effective to have less “online” subscribers. It could even take things a few steps further. Instead of using the regular messaging backbone for transmitting these hourly messages, it could use FTP. The data could even be zipped to take up even less space. Since the total data size is less than the corresponding online stream, is stored on cheaper, large storage, and the number of subscribers for this zipped, hourly update is fairly small, these messages can be kept around far longer.
If you’ve heard about consumer-driven contracts, this is it.
Note that we’re still talking about the same logical message schema.
Summary
It’s not that lost notifications aren’t a problem.
It’s that they feed the design process in such a way that the resulting service ecosystem is set up in such a way that notifications won’t get lost. I know that that sounds kind of recursive, but that’s how it works. Either subscribers take care of their SLA allowing them to process the online stream of events, or they should subscribe to a different pipe (which will have different SLA requirements, but maybe they can deal with those).
It make sense to have multiple pipes for the same logical schema.
It’s practically a necessity to make pub/sub a feasible solution.
Related Content
MSDN article on messaging and lost messages
Durable messaging dilemmas
Additional logic required for service autonomy
More in depth example on events and pub/sub between services
Consumer-Driven Contracts
Posted in Architecture, Autonomous Services, EDA, Messaging, Pub/Sub, Reliability, SOA | 1 Comment »
Monday, August 11th, 2008
One of the common questions I receive from people starting to use nServiceBus is how one-way messaging fits with showing the user a grid (or list) of data. Thinking about publish/subscribe usually just gets them even more confused. Trying to resolve all this with Service Oriented Architecture leaves them wondering - why bother?

In regular client-server development, the server is responsible for providing the client with all CRUD (create, read, update, and delete) capabilities. However, when users look at data they do not often require it to be up to date to the second (given that they often look at the same screen for several seconds to minutes at a time). As such, retrieving data from the same table as that being used for highly consistent transaction processing creates contention resulting in poor performance for all CRUD actions under higher load.
A Scalable Solution
One of the common answers to this question is for the server/service to publish a message when data changes (say, as the result of processing a message) and for clients to subscribe to these messages. When such a notification arrives at a client, the client would cache the data it needs. Then, when the user wants to see a grid of data, that data is already on the client. Of course, this solution doesn’t work so well for older client machines (like some point of service devices) or if there are millions of rows of data.
The thing is that this solution is one implementation of a more general pattern - command query separation (CQS).
Command Query Separation
Wikipedia describes CQS as a pattern where "… every method should either be a command that performs an action, or a query that returns data to the caller, but not both. More formally, methods should return a value only if they are referentially transparent and hence possess no side effects."
Martin Fowler is less strict about the use of CQS allowing for exceptions: "Popping a stack is a good example of a modifier that modifies state. Meyer correctly says that you can avoid having this method, but it is a useful idiom. So I prefer to follow this principle when I can, but I’m prepared to break it to get my pop."
So, how does separating commands from queries and SOA help at all in getting data to and from a UI? The answer is based on Pat Helland’s thinking as described in his article Data on the Inside vs. Data on the Outside.
Services Cross Boxes
The biggest lie around SOA is that services run.
Let that sink in a second.
Sure services have runnable components, but that’s not why they’re important.
I’ll skip the books of background and cut to the chase:
Services communicate with each other using publish/subscribe and one-way messaging. Services have components inside them. Inside a service, these components can communicate with each using synchronous RPC, or any other mechanism. Also, these components can reside on different machines.
This is broader than just scaling out a service. There can be service components running on the client as well as the server.
SOA & CQS
Combining these two concepts together, here’s what comes out:
In this solution there are two services that span both client and server - one in charge of commands (create, update, delete), the other in charge of queries (read). These services communicate only via messages - one cannot access the database of the other.
The command service publishes messages about changes to data, to which the query service subscribes. When the query service receives such notifications, it saves the data in its own data store which may well have a different schema (optimized for queries like a star schema).
The client component which is in charge of showing grids of data to the user behaves the same as it would in a regular layered/tiered architecture, using synchronous blocking request/response to get its data - SOA doesn’t change that.
Composite Applications
Although the client side components of both the command and query services are hosted in the same process, they are very much independent of each other. That being said, from an interoperability perspective (the one that most people attribute to SOA), all of the client-side components will likely be developed using the same technology - although there are already ways to host Java code in .NET and vice-versa.
Of course, once we talk about web UI’s things are a bit different - but still similar. While web-server-side there may be a level of independence, for browser side inter-component communications we’re still likely to target javascript. There, I’ve managed to say something technical supporting mashups and SOA without lying through my teeth.
On the Microsoft side with the recent release of the Composite Application Guidance & Library (pronounced "prism") I hope that more of these principles will be reaching the "smart client". The command pattern is especially critical in maintaining the separation while enabling communication to still occur so I’m glad that, as one of the Prism advisors, I was able to simplify that part (Glenn still has nightmares about that rooftop conversation).
Publish / Subscribe
In the "scalable solution" section up top I mentioned how publish/subscribe to the smart client is really just one implementation of CQS and SOA. So, how different is it really?
Well, there will probably be a different technology mapping. Instead of a star-schema OLAP product, we might simply store the published data in memory on the client. That is, if you designed your components to be technology agnostic.
In terms of the use of nServiceBus, the same component is going to be subscribing to the same type of message - all that’s different is that now every client will be having data pushed to them rather than this occurring server-side only.
You could have the same code deployed differently in the same system - stronger clients subscribing themselves, weaker ones using a remote server. Web servers would probably be considered stronger clients. This kind of flexible deployment has proven to be extremely valuable for my larger clients. The added benefit of enabling users to work (view data) even while offline (somewhere there’s no WIFI) is just icing on the cake.
A Word of Warning
Once the client starts receiving notifications, and handling those on a background thread (as it should) the code becomes susceptible to deadlocks and data races. Juval does a good job of outlining some of those with respect to the use of WCF. Prism doesn’t provide any assurances in this area either.
Summary
NServiceBus is not designed to be used for any and all types of communication in a given architecture. In the examples above, nServiceBus handles the publish/subscribe but leaves the synchronous RPC to existing solutions like WCF. Not only that, but synchronous RPC does have its place in architecture, just not across service boundaries. In all cases, data is served to users from a store different from that which transaction processing logic uses.
Command Query Separation is not only a good idea at the method/class level but has advantages at the SOA/System level as well - yet another good idea from 20 years ago that services build upon. Making use of CQS requires understanding your data and its uses - SOA builds on that by looking into data volatility and the freshness business requirements around it.
Finally, designing the components of your services in such a way that their dependency on technology is limited buys a lot of flexibility in terms of deployment and, consequently, significant performance and scalability gains.
Simple, it is. Easy, it is not.
Posted in Architecture, Autonomous Services, Messaging, NServiceBus, Pub/Sub, SOA, Smart Client | 17 Comments »
Wednesday, July 30th, 2008
While I was at TechEd USA I had an attendee, Will, come up and ask me an interesting question about how to handle web service calls that can take a long time to complete. He has a number of these kinds of requests ranging from computationally intensive tasks to those requiring sifting through large amounts of data. What Will was having problems with was preventing too many of these resource-intensive tasks from running concurrently (causing increased memory usage, paging, and eventually the server becoming unavailable).
For comparison later, here’s a diagram showing the trivial interaction:
One solution that he’d tried was to set up the web server to throttle those requests and keep a much smaller maximum thread-pool size for that application pool. The unfortunate side effect of that solution was that clients would get “turned away” by a not-so-pleasant Connection Refused exception.
Will had been to my web scalability talk and was curious about how I was using queues behind my web services. I’ve also heard this question from people just getting started with nServiceBus when looking at the Web Services Bridge sample. Here’s the code that’s in the sample and in just a second I’ll tell you why you shouldn’t do this:
[WebMethod]
public ErrorCodes Process(Command request)
{
object result = ErrorCodes.None;
IAsyncResult sync = Global.Bus.Send(request).Register(
delegate(IAsyncResult asyncResult)
{
CompletionResult completionResult = asyncResult.AsyncState as CompletionResult;
if (completionResult != null)
{
result = (ErrorCodes) completionResult.ErrorCode;
}
},
null
);
sync.AsyncWaitHandle.WaitOne();
return (ErrorCodes)result;
}
Let me repeat, this is demo-ware. Do not use this in production.
What’s happening is that in this web service call we’re putting a message in a queue for some other process/machine to process. When that processing is complete, we’ll get a message back in our local queue (which you don’t see) which is correlated to our original request, firing off the callback. We block the web method from completing (using the WaitOne call) thus keeping the HTTP connection to the client open.
The problem here is that we’re wasting resources (the HTTP connection and the thread) while waiting for a response which, as already mentioned, can take a long time. In B2B or other server to server integration environments there are all sorts of middleware solutions that help us solve these problems, however in Will’s case browsers needed to interact with this web service. All he had was HTTP.
HTTP Solutions
Another attendee who was listening in (sorry I don’t remember your name) said that he was solving similar problems using polling but that he was having scalability problems as well.
What often surprises my clients when we deal with these same issues is that I do suggest a polling based solution, but one that still uses messaging, and this is what I described to Will:
Since we can’t actually push a message to a browser over HTTP from our server when processing is complete, the browser itself will be responsible for pulling the response. We still don’t want to leave costly resources like HTTP connections open a long time, however if the browser is going to polling for a response, we’ll need some way to correlate those following requests with the original one. What we’re going to do is use the Asynchronous Completion Token pattern, and later I’ll show how to optimize it for web server technology.
Basic Polling
When the browser calls the web service, the web service will generate a Guid, put it in the message that it sends for processing, and return that guid to the browser. When the processing of the message is complete, the result will be written to some kind of database, indexed by that guid. The browser will periodically call another web method, passing in the guid it previously received as a parameter. That web method will check the database for a response using the guid, returning null if no response is there. If the browser receives a null response, it will “sleep” a bit and then retry.
One of the problems with this solution is that polling uses up server resources - both on the web server and our DB; threads, memory, DB connections. A better solution would decrease the resource cost of the polling. Let’s use the fundamental building blocks of the web to our advantage - HTTP GET and resources:
REST-full Polling
Instead of using a guid to represent the id of the response, let’s consider the REST principle of “everything’s a resource”. That would mean that the response itself would be a resource. And since every resource has a URI, we might as well use that URI in lieu of the guid. So, instead of our web service returning a guid, let’s return a URI - something like:
http://www.acme.com/responses/88ec5359-a5d8-4491-a570-3bfe469f3a64.xml
As you can see, the guid is still there. So, what’s different?
What’s different is that instead of having the processing code write the response to the database, it writes it to a resource. This can be done by writing some XML to a file on the SAN in the case of a webfarm. Also, the browser wouldn’t need to call a web service to get the response, it would just do an HTTP GET on the URI. If the it gets an HTTP 404, it would sleep and retry as before. The reason that the SAN is needed is that, as the browser polls, it may have its requests arrive at various web servers so the response needs to be accessible from any one of them.
Just as an aside, it would be better to free the processing node as quickly as possible and have something else write the response to the SAN. That would be done simply by sending a message from the processing node that would be handled by a different node that all it did was write responses to disk.
The reason that the URI makes a difference is that serving “static” resources is something that web servers do extremely efficiently without requiring any managed resources (like ASP.NET threads). That’s a big deal.
We’re still using HTTP connections for the polling but that’s something whose effect can be mitigated to a certain degree.
Timed REST-full Polling
Since various requests can take varying amounts of time to process, it’s difficult to know at what rate the browser should poll. So, why don’t we have the web service tell it. As a part of the response to the original web service call, instead of just returning a URI, we could also return the polling interval - 1 second, 5 seconds, whatever is appropriate for the type of request. This value could easily be configurable [RequestType, PollingInterval].
An even more advanced solution would allow you to change these values dynamically. The advantage that would be gained would be that your operations team could better manage the load on your servers. When a large number of users are hitting your system, you could decrease the rate at which your servers would be polled, thus leaving more HTTP connections for other users.
Scaling and Adaptive Polling
You’d probably also want to scale out the number of processing nodes behind your queue. The nice thing is that you could change the polling interval as you scale the various processing nodes per request type providing better responsiveness for the more critical requests. Once we add virtualization, things get really fun:
We had separate queues per request type, so that we could easily see the load we were under for each type of request. That way, we could scale out the processing nodes per request type as well as change the polling interval. By virtualizing our processing nodes, and writing scripts to monitor queue sizes, we had those scripts automatically provisioning (and de-provisioning) nodes as well as changing the polling interval of the browsers.
This had the enormous benefit of the system automatically shifting resources to provide the appropriate relative allocation for the current load as its macroscopic make-up changed.
Summary
Will was well-pleased with the solution which, although more complicated than what he had originally tried, was flexible enough to meet his needs. As opposed to pure server-based solutions, here we make more use of the browser (writing our own Javascript) instead of putting our faith in some Ajax-y library. That’s not to say that you couldn’t wrap this up into a library - in essence, it is a kind of messaging transport for browser to server communication allowing duplex conversations.
In fact, what could be done is to return multiple responses to the browser over a long period of time. In the response that comes back to the browser could be an additional URI where the next response will be. This can be used for reporting the status of a long running process, paging results, and in many other scenarios.
And, one parting thought, could this not be used for all browser to web service communication?
Posted in Architecture, Messaging, Scalability, Web Services | 20 Comments »
Thursday, July 17th, 2008
I’ve received some great feedback on my MSDN article and some really great questions that I think more people are wondering about, so I think I’ll try to do a post per question and see how that goes.
Libor asks:
“Would you recommend using durable messaging for systems where there are similar requirements with respect to data reliability as you had – ie. not losing any messages? If so, then why didn’t the final version of your solution use it? If not, can you explain why?”
The answer is, as always, it depends, but here’s on what it depends:
When designing a system, we need to take a good, hard look at how we manage state, and what properties that state has. In a system of reasonable size we can expect various families of state with respect to their business value, data volatility, and fault-tolerance window. Each family needs to be treated differently. While durable messaging may be suitable for one, it may be overkill or underkill for another.
So, here’s what we’re going to be looking at:
- Business Value
- Data Volatility
- Fault-Tolerance Window
Business Value
When talking about business value, I want to talk about what it means “not losing any messages”. The question is under what conditions will the messages not be lost, or rather, what are the threshold conditions where messages may start getting lost. If all our datacenters are nuked, we will lose data. It’s likely the business is OK with that (as much as can be expected under those circumstances). If a single server goes down, it’s likely the business would not be OK with losing messages containing financial data. However if a message requesting the health of a server were to get lost under those same conditions, that would probably be alright. In other words, what does that message represent in business terms.
Data Volatility
Data volatility also has an impact. Let’s say that we’re building a financial trading system. The time that it takes us to respond to an event (message) that the cost of a certain financial instrument has changed, and the message that we send requesting to buy that security is critical. Let’s say that has to be done in under 10ms. Now, some failure has occurred preventing our message from reaching its destination for 20ms. What should we do with that message? Should we keep it around, making sure it doesn’t get lost? Not in this domain. On the contrary, that message should be thrown away as its “business lifetime” has been exceeded. Furthermore, even during that original period of 10ms, the use of durable messaging may make it close to impossible to maintain our response times.
Fault-Tolerance Window
These two topics feed into the third and more architectural one - fault-tolerance window: what period of time do we require fault tolerance, and with respect to how many (and what kind of) faults? This will lead us into an analysis of to how many machines do we need to copy a message before we release the calling thread. We’d also look at in which datacenters those machines reside. This will also impact (or be impacted by) the kinds of links we have to these datacenters if we want to maintain response times. These numbers will need to change when the system identifies a disaster - degrading itself to a lower level of fault-tolerance after a hurricane knocks out a datacenter, and returning to normal once it comes back up.
Re-Evaluating Durable Messaging
Durable messaging may be used at various points in each part of the solution, but we need to look at message size, the rate those messages are being written to disk, how fast the disk is, how much available disk we have (so we don’t make things worse in the case of degraded service), etc. Companies like Amazon also take into account disk failure rates, replacement rates (disks aren’t replaced immediately you know), and many other factors when making these decisions
Summary
Our job as architects when designing the system is to find that cost-benefit balance for the various parts of the system according to these very applicative parameters. No, it’s not easy. No, cloud computing will not magically solve all of this for us. But, we are getting more technical tools to work with, operations staff is getting better at working with us in the design phase, and our thought processes more rigorous in dealing with the scary conditions of the real world.
To your question, Libor, as to why we didn’t eventually use durable messaging in our solution, the answer is that we solved the overall state management problem by setting up an applicative protocol with our partners which was resilient in the face of faults by using idempotent messages that could be resent as many times as necessary. You can read more about it here. This solution isn’t viable for other kinds of interactions but was just what we needed to get the job done.
Hope that helps.
Posted in Architecture, Availability, Messaging, Performance, Reliability, Scalability | 4 Comments »
Thursday, June 19th, 2008
For those people who couldn’t come to TechEd USA and didn’t see my talks on how to build highly scalable web architectures, you’re in luck - Craig, the man behind the Polymorphic Podcast sat down with me and we chatted about what the problems, common solutions, and effective tactics there are in this space. For those of you who were at TechEd and still didn’t come to my talk - what were you thinking?!
Check it out.
Some of this stuff is a bit counter-intuitive (and not readily supported by the tools available in Visual Studio) so please, do feel free to ask questions (in the comments below).
Posted in Architecture, Caching, Messaging, Pub/Sub, Scalability, Web Services | No Comments »
Wednesday, May 21st, 2008
I’ve gotten this question several times already but now companies are beginning to look for performance comparisons in making decisions around the use of nServiceBus. It’s often compared to straight WCF, BizTalk, and now Neuron ESB. In Sam’s recent post he posts to a case study of Neuron doing 28 million messages an hour. That’s far more than I’ve ever heard quoted for BizTalk.
Disclaimer
Before giving some numbers, please keep in mind that high performance of system infrastructure does not necessarily by itself mean that the system above it is running that fast. For instance, you may have server heartbeats running really quickly but the time it takes to save a purchase order borders on a minute. So, please, take all benchmarks with a grain of salt, or two, or a whole shaker-full.
While I’m not at liberty to say on which specific domain/company these numbers were measured, I can say that we had the full gamut of “stateless services”, statefull services (sagas), number crunching, large data sets, many users, complex visualization, etc. Also, this wasn’t the largest installation of nServiceBus that I’m aware of, but its the one I have the most specific numbers for.
Setup
OK, so using the default nServiceBus distribution using MSMQ, on servers where the queue files themselves were on separate SCSI RAID disks, we were pumping around 1000 durable, transactionally processed messages per second, per server. That means that similar to the Neuron case, no messages would be lost in the case of a single fault per server per window (time to replace a failed disk set at 3 hours from failure, through detection, to replacement per site - but that’s more an operational staffing concern, not the technology itself).
So, that’s 3.6 million messages per hour per server, at full load. We had a total of 98 servers doing these kinds of processing, not including web servers, databases, etc. Keep in mind that web servers would be communicating with other servers using nServiceBus, but that would maybe be an unfair comparison to the Neuron numbers.
Server Breakdown
Anyway, the 48 number crunching servers (blade centers) we had were at full load, so we were pumping more than 170 million messages there. Keep in mind that those servers had a really fast backbone so weren’t held up by IO. Your environment may be different.
Another 30 (regular pizza boxes) were doing our sagas. Saga state was stored in a distributed in-memory “cache”, so once again IO wasn’t an issue for processing those messages. We were at about 70% utilization there, coming to just over 100 million messages an hour.
The last 20 were clustered boxes (fairly expensive) that handled the various nServiceBus distributor and timeout manager processes were at full load since they handled control messages for all the servers as well as dynamically routing the load. However, on those boxes we used much higher performance disks for the messages, since they had to feed everything else, capable of doing, on average, around 5000 messages a second. That adds up to 360 million messages an hour.
Unnecessary Durability
Later, we moved a bunch of messages that didn’t need all that durability and transactionality off the disks, pushing the total throughput over 1 billion messages an hour. That was about 100 million per hour durable, 900 million per hour non-durable. You can guess that we were left with plenty of IO to spare at that point while we weren’t yet pushing the limit of our memory.
One thing that’s important to understand is the size of the messages that didn’t require durability was less than 1MB, with most weighing in under 10KB. Also, since most of those messages were published, less state management was required around them, enabling us to further improve performance.
Summary
NServiceBus didn’t give us all that by itself. It was the result of skilled architects, developers, and operations staff working together for many iterations, deploying, monitoring, re-designing, etc. You need to understand your technology, your hardware, and your specific performance, availability, and fault-tolerance requirements if you want to get anywhere.
There’s no magic.
I didn’t see the number or kinds of servers involved in the Neuron case study so this wasn’t ever really a comparison. Nor or we talking about the same system here.
So, please, don’t base your decisions on arbitrary numbers. Spend some time setting up a scaled down version of your target architecture with all the relevant technologies and measure. Be aware that you want high performance end to end, not just of the messaging part. At times, it makes sense to actively throw away messages (of the non-durable, published kind) to help a server come online faster especially after a restart.
Thus ends the tale of another “benchmark”.
Posted in Architecture, ESB, MSMQ, Messaging, NServiceBus, Performance, Scalability | 3 Comments »
Thursday, April 10th, 2008
I’ve published a new article on performance and scalability on InfoQ:
Spectacular Scalability with Smart Service Contracts
In this article, I attempt to debunk some of the myths around stateless-ness as the key to scalability.
Here’s how it starts:
It was a sunny day in June 2005 and our spirits were high as we watched the new ordering system we’d worked on for the past 2 years go live in our production environment. Our partners began sending us orders and our monitoring system showed us that everything looked good. After an hour or so, our COO sent out an email to our strategic partners letting them know that they should send their orders to the new system. 5 minutes later, one server went down. A minute after that, 2 more went down. Partners started calling in. We knew that we wouldn’t be seeing any of that sun for a while.
The system that was supposed to increase the profitability of orders from strategic partners crumbled. The then seething COO emailed the strategic partners again, this time to ask them to return to the old system. The weird thing was that although we had servers to spare, just a few orders from a strategic customer could bring a server to its knees. The system could scale to large numbers of regular partners, but couldn’t handle even a few strategic partners.
This is the story of what we did wrong, what we did to fix it, and how it all worked out.
Continue reading…
Posted in Architecture, Articles, ESB, Messaging, Performance, Pub/Sub, Scalability | 1 Comment »
Sunday, March 30th, 2008
Ayende’s been going over nServiceBus, seeing how it’s built, and raising various questions and concerns. I’ll begin by taking them from the outside, in - that is, first API questions, and then internal structure issues.
SendLocal
First of all, the effect of calling SendLocal on IBus takes all the logical messages passed in (params IMessage[] messages), wraps them in a single TransportMessage, and puts that physical message at the end of the local queue. This call is equivalent to calling “Send(TransportMessage m, string destination);” on ITransport when passing in transport.Address as the parameter of destination.
There are numerous advantages to having this method, but one is the most important.
When client send a service a set of messages using “void Send(params IMessage[] messages);”, the client is requesting that the server treat this batch of messages as a unit of work. Under certain conditions, the service may choose to ignore the clients wishes - not least of which because the client has sent a ton of messages and the service doesn’t want ACID transactions to last a long time as they hurt throughput. In this case the server would use an intercepting message handler to go over those messages and call SendLocal for each. In other words, the server can set up units of work as it sees fit - taking into account client preference as well.
Other advantages include the ability to break apart complex or long-running logic into an “internal pipeline”. The Timeout Manager also makes use of this facility for “holding onto” messages until some condition occurs.
Return(errorCode)
The reason that integers are used as error codes is just so that you can push enums through them. This is the simplest way to get errors back to the client. More importantly, we take into account who on the client would be interested in this data.
Clients are often built using MVC with an additional Service Agent layer. Service Agents deal with translating the intent of Controllers into messages. Controllers don’t know about messaging, nor should they. However, they need to know when something fails with calls they initiated. As such, they are the final consumer of these error-code-enums, and integers are used to express them; that way Controllers don’t need to take a dependency on nServiceBus.
DoNotContinueDispatchingCurrentMessageToHandlers
This method on bus is used by intercepting message handlers in order to instruct the bus not to pass the current message on to subsequent handlers in the pipeline. This is often used by authentication and authorization handlers when those checks fail. This is what makes the message handling pipeline possible.
BuildAndDispatch
This method is defined on IBuilder and is used by the bus when dispatching messages to handlers. The reason that this exists instead of just having the bus ask the builder to create the handler and dispatch the call itself has to do with client-side threading. You can find the full explanation here - Object Builder, the place to fix system-wide threading bugs.
Summary
NServiceBus has grown over the years in environments where I’ve had the luxury of deciding most, if not all of the design of the systems involved. As such, it has taken on just the responsibilities needed from infrastructure in order to develop robust, flexible, and scalable systems. Check out the nServiceBus site.
Posted in Architecture, Messaging, NServiceBus | No Comments »
Friday, March 28th, 2008
Ted says it really well, and let me add a big +1.
Note to those who didn’t attend the session: you didn’t hear me say it, so I’ll repeat it: I hate WSDL almost as much as I hate Las Vegas. Ask me why sometime, or if I get enough of a critical mass of questions, I’ll blog it. If you’ve seen me do talks on Web Services, though, you’ve probably heard the rant: WSDL creates tightly-coupled endpoints precisely where loose coupling is necessary, WSDL encourages schema definitions that are inflexible and unevolvable, and WSDL intrinsically assumes a synchronous client-server invocation model that doesn’t really meet the scalability or feature needs of the modern enterprise. And that’s just for starters.
I hate WSDL.
I still hate Vegas more, though.
Web Services, and WSDL by connection have taken hold of the industry like cancer - inhibiting the minds of otherwise intelligent developers and architects. Whenever I get the “Web Services Question” (Does X support Web Services - where X is some design pattern, tool, and sometimes nServiceBus), I have to suppress an urge to groan - I’ve got the question that many times. The other day I was at a client and Sam, their head architect asked me that question. I gave my stock response:
“When you say ‘Web Services’, are you referring to SOAP or WSDL, and is HTTP a necessary component too?”
See how good I got at the suppressing thing?
Sam conceded that Web Services over TCP is OK too, so I pressed on with:
“What about UDP? FTP? MSMQ? Is it still ‘Web Services’ then? Is the rule then that ‘Web Services’ == SOAP?”
At that point, Sam was beginning to get a little flustered.
“And what’s so great about SOAP? Is it the interoperability? Because that’s just because it’s based on XSD.”
He didn’t know how to reply. Instead, he walked away from the whiteboard and sat down. I didn’t let up:
“And what if we want to do something other than Request/Response? How about one request with many responses? How about many requests and one response? And why does this decision need to be rigid? Shouldn’t we just be able to decide programmatically how many responses we want to return? Wouldn’t that flexibility be better than creating huge response structures for web methods to return?”
Sam made his last stand:
“Look, we can’t go and do something different from the rest of the industry. Everybody else is doing Web Services. It’s not like the technology doesn’t work.”
I gave way, a little:
“If you want, we can offer two interfaces. One, the flexible, robust, scalable XSD over messaging based solution. The second, an icky, synchronous Web Services facade which calls into our first interface.
I’m not saying that the technology doesn’t work - but both of us know that every problem has multiple solutions, some are fragile and error prone like WS, others are more elegant and have decades of knowledge behind them like messaging.
But we can do both if you like. How’s that?”
And it was agreed. The entire system would be built on one-way messaging patterns using XSD in cases where interoperability was required. And WS would be layered on, like a tiny little pig on top of a gigantic lipstick … thing - hmm, that metaphor isn’t really working - well, you get the idea.
I hate WSDL. Never been to Vegas, though.
Posted in Architecture, ESB, Messaging, NServiceBus, Pub/Sub, Web Services | 2 Comments »
Sunday, March 16th, 2008
In this podcast we revisit the topic of REST and how to make it work for process-centric enterprise systems. After describing the basic advantages and pitfalls of plain resource thinking, we’ll look at how mapping messaging concepts to resources provides solutions for transactional, multi-resource processing.
Download
Download via the Dr. Dobb’s site
Or download directly here.
Additional References
Want more?
Check out the “Ask Udi” archives.
Got a question?
Send Udi your question to answer on the show.
Posted in Ask Udi Podcast, ESB, Messaging, Pub/Sub, REST, SOA, Web Services | 9 Comments »
|
|
|
Recommendations
Sam Gentile, Independent WCF & SOA Expert
“Udi, one of the great minds in this area. A man I respect immensely.”
Ian Robinson, Principal Consultant at ThoughtWorks
"Your blog and articles have been enormously useful in shaping, testing and refining my own approach to delivering on SOA initiatives over the last few years. Over and against a certain 3-layer-application-architecture-blown-out-to- distributed-proportions school of SOA, your writing, steers a far more valuable course."
Simon Segal, Systems Integration Manager at LinFox
“Udi is one of the outstanding software development minds in the world today, his vast insights into Service Oriented Architectures and Smart Clients in particular are indeed a rare commodity. Udi is also an exceptional teacher and can help lead teams to fall into the pit of success. I would recommend Udi to anyone considering some Architecural guidance and support in their next project.”
Ohad Israeli, Chief Architect at Hewlett-Packard, Indigo Division
“When you need a man to do the job Udi is your man! No matter if you are facing near deadline deadlock or at the early stages of your development, if you have a problem Udi is the one who will probably be able to solve it, with his large experience at the industry and his widely horizons of thinking , he is always full of just in place great architectural ideas.
I am honored to have Udi as a colleague and a friend (plus having his cell phone on my speed dial).”
Eli Brin, Program Manager at RISCO Group
“We hired Udi as a SOA specialist for a large scale project. The development is outsourced to India. SOA is a buzzword used almost for anything today. We wanted to understand what SOA really is, and what is the meaning and practice to develop a SOA based system.
We identified Udi as the one that can put some sense and order in our minds. We started with a private customized SOA training for the entire team in Israel. After that I had several focused sessions regarding our architecture and design.
I will summarize it simply (as he is the software simplist): We are very happy to have Udi in our project. It has a great benefit. We feel good and assured with the knowledge and practice he brings. He doesn’t talk over our heads. We assimilated nServicebus as the ESB of the project. I highly recommend you to bring Udi into your project.”
Yoel Arnon, MSMQ Expert
“Udi has a unique, in depth understanding of service oriented architecture and how it should be used in the real world, combined with excellent presentation skills. I think Udi should be a premier choice for a consultant or architect of distributed systems.”
Vadim Mesonzhnik, Development Project Lead at Polycom
“When we were faced with a task of creating a high performance server for a video-tele conferencing domain we decided to opt for a stateless cluster with SQL server approach. In order to confirm our decision we invited Udi.
After carefully listening for 2 hours he said: "With your kind of high availability and performance requirements you don’t want to go with stateless architecture."
One simple sentence saved us from implementing a wrong product and finding that out after years of development. No matter whether our former decisions were confirmed or altered, it gave us great confidence to move forward relying on the experience, industry best-practices and time-proven techniques that Udi shared with us.
It was a distinct pleasure and a unique opportunity to learn from someone who is among the best at what he does.”
Jack Van Hoof, Enterprise Integration Architect at Dutch Railways
“Udi is a respected visionary on SOA and EDA, whose opinion I most of the time (if not always) highly agree with. The nice thing about Udi is that he is able to explain architectural concepts in terms of practical code-level examples.”
Nick Malik, Enterprise Architect at Microsoft Corporation
“You are an excellent speaker and trainer, Udi, and I've had the fortunate experience of having attended one of your presentations. I believe that you are a knowledgable and intelligent man.”
Sean Farmar, Chief Technical Architect at Candidate Manager Ltd
“Udi has provided us with guidance in system architecture and supports our implementation of NServiceBus in our core business application.
He accompanied us in all stages of our development cycle and helped us put vision into real life distributed scalable software. He brought fresh thinking, great in depth of understanding software, and ongoing support that proved as valuable and cost effective.
Udi has the unique ability to analyze the business problem and come up with a simple and elegant solution for the code and the business alike. With Udi's attention to details, and knowledge we avoided pit falls that would cost us dearly.”
Motty Cohen, SW Manager at KorenTec Technologies
“I know Udi very well from our mutual work at KorenTec. During the analysis and design of a complex, distributed C4I system - where the basic concepts of NServiceBus start to emerge - I gained a lot of "Udi's hours" so I can surely say that he is a professional, skilled architect with a fresh ideas and unique perspective for solving complex architecture challenges. His ideas, concepts and parts of the artifacts are the basis of several state-of-the-art C4I systems that I was involved in their architecture design.”
Aaron Jensen, VP of Engineering at Eleutian Technology
“Awesome. Just awesome.
We’d been meaning to delve into messaging at Eleutian after multiple discussions with and blog posts from Greg Young and Udi Dahan in the past. We weren’t entirely sure where to start, how to start, what tools to use, how to use them, etc. Being able to sit in a room with Udi for an entire week while he described exactly how, why and what he does to tackle a massive enterprise system was invaluable to say the least.
We now have a much better direction and, more importantly, have the confidence we need to start introducing these powerful concepts into production at Eleutian.”
Gad Rosenthal, Department Manager at Retalix
“A thinking person. Brought fresh and valuable ideas that helped us in architecting our product. When recommending a solution he supports it with evidence and detail so you can successfully act based on it. Udi's support "comes on all levels" - As the solution architect through to the detailed class design. Trustworthy!”
Robert Lewkovich, Product / Development Manager at Eggs Overnight
“Udi's advice and consulting were a huge time saver for the project I'm responsible for. The $ spent were well worth it and provided me with a more complete understanding of nServiceBus and most importantly in helping make the correct architectural decisions earlier thereby reducing later, and more expensive, rework.”
Ray Houston, Director of Development at TOPAZ Technologies
“Udi's SOA class made me smart - it was awesome.
The class was very well put together. The materials were clear and concise and Udi did a fantastic job presenting it. It was a good mixture of lecture, coding, and question and answer. I fully expected that I would be taking notes like crazy, but it was so well laid out that the only thing I wrote down the entire course was what I wanted for lunch. Udi provided us with all the lecture materials and everyone has access to all of the samples which are in the nServiceBus trunk.
Now I know why Udi is the "Software Simplist." I was amazed to find that all the code and solutions were indeed very simple. The patterns that Udi presented keep things simple by isolating complexity so that it doesn't creep into your day to day code. The domain code looks the same if it's running in a single process or if it's running in 100 processes.”
Liron Levy, Team Leader at Rafael
“I've met Udi when I worked as a team leader in Rafael. One of the most senior managers there knew Udi because he was doing superb architecture job in another Rafael project and he recommended bringing him on board to help the project I was leading. Udi brought with him fresh solutions and invaluable deep architecture insights. He is an authority on SOA (service oriented architecture) and this was a tremendous help in our project. On the personal level - Udi is a great communicator and can persuade even the most difficult audiences (I was part of such an audience myself..) by bringing sound explanations that draw on his extensive knowledge in the software business. Working with Udi was a great learning experience for me, and I'll be happy to work with him again in the future.”
Eytan Michaeli, CTO Korentec
“Udi was responsible for a major project in the company, and as a chief architect designed a complex multi server C4I system with many innovations and excellent performance.”
Evgeny-Hen Osipow, Head of R&D at PCLine
“Udi has helped PCLine on projects by implementing architectural blueprints demonstrating the value of simple design and code.”
Nimrod Peleg, Lab Engineer at Technion IIT
“One of the best programmers and software engineer I've ever met, creative, knows how to design and implemet, very collaborative and finally - the applications he designed implemeted work for many years without any problems!”
Consult with Udi
Guest Authored Books
|