The question of how web-based (or 3rd party) consumers can work with pub/sub based services comes up a lot.
Many developers are used to implementing web services exposing methods on them like GetAllCustomers.
When moving to pub/sub and other more loosely coupled messaging patterns, developers look to implement the same pattern, opting for something like duplex GetCustomersRequest and GetCustomersResponse. The reasoning is simple and straightforward - it is difficult to push data over the web to consumers.
However, there are still ways to disconnect the preparation of the data from its usage thus gaining many of the advantages of pub/sub.
By employing REST principles and modelling our customer list as an explicit resource, web-based consumers would simply perform regular HTTP GET operations on the URI to get the list of customers.
The resource itself could be a simple XML file - it wouldn’t need to be dynamic at all.
You can get all the scalability benefits of pub/sub for web based consumers. All you need is a bit of REST
In the architectural principle of fully self contained messages, events “can - instantly and in future - be interpreted as the respective event without the need to rely on additional data stores that would need to be in time-sync with the event during message-processing.”
Also, “passing reference data in a message makes the message-consuming systems dependent on the knowledge and availability of actual persistent data that is stored “somewhere”. This data must separately be accessed for the sake of understanding the event that is represented by the message.”
If everything in a system can access a central datastore, it is enough for one party to publish an event containing only the ID of an entity that that party previously entered/updated. Upon receiving that event, a subscriber would go to the central datastore and get the fields its interested in for that ID. The advantage of this approach is that the minimal amount of data necessary crosses the network, as subscribers only retrieve the fields that interest them. Martin Fowler describes the disadvantages as:
“An integration database needs a schema that takes all its client applications into account. The resulting schema is either more general, more complex or both. The database usually is controlled by a separate group to the applications and database changes are more complex because they have to be negotiated between the database group and the various applications.”
This is far from being aligned with the principle of autonomy so important to SOA. In that respect, the architectural principle of self-contained messages points us away from those problems and towards more autonomous services.
However, once we have these autonomous business services in place, we may find that we don’t need 100% fully self-contained messages anymore.
A Real-World Example
Let’s say we have 3 business services, Sales, Fulfillment, and Billing.
Sales publishes an OrderAccepted event when it accepts an order. That event contains all the order information.
Both Fulfillment and Billing are subscribed to this event, and thus receive it.
Fulfillment does not ship products to the customer until the customer has been billed, so it just stores the order information internally, and is done.
Billing starts the process of billing the customer for their order, possibly joining several orders into a single bill. After completing this process, it publishes a CustomerBilled event containing all billing information, as well as the IDs of the orders in that bill. It does not put all the order information in that event, as it is not the authoritative owner of that data.
When Fulfillment receives the CustomerBilled event, it uses the IDs of the orders contained in the event to find the order information it previously stored internally. It does not need to call the Sales service for this information or contact some central Master Data Management system. It uses the data it has, and goes about fulfilling the orders and shipping the products to the customer, finally publishing its own OrderShipped event.
Notice, as well, that in the original OrderAccepted event there were the IDs of products the customer ordered. These product IDs originated from another service, Merchandising, responsible for the product catalog. The same thing can be said for the customer ID originating from another service - Customer Care.
The Issue of Time
One could argue that since subscribers use previously cached data when processing new events, that data might not be up to date. Also, we may have race conditions between our services. In the above example, if Billing was extremely fast and more highly available than Fulfillment. Billing could have received the OrderAccepted event, processed it, and published the CustomerBilled event before Fulfillment had received the OrderAccepted event. In short, the CustomerBilled and OrderAccepted messages could be out of order in Fulfillment’s queue.
What would Fulfillment do when trying to process the CustomerBilled message when it doesn’t have the order information?
Well, it knows that the world is parallel and non-sequential, so it does NOT return/log an error, but rather puts that message in the back of the queue to be processed again later (or maybe in some other temporary holding area). This enables the OrderAccepted message to be processed before the CustomerBilled message is retried. When the retry occurs, well, everything’s OK – it’s worked itself out over time.
In the case where we retry again and again and things don’t work themselves out (maybe the OrderAccepted event was lost), we move that message off to a different queue for something else to resolve the conflict (maybe a person, maybe software). If/when the conflict is resolved (got the Sales system / messaging system to replay the OrderAccepted event), the conflict resolver returns the CustomerBilled message to the queue, and now everything works just fine.
As all of this is occurring, the only thing that’s visible to external parties is that it happens to be taking longer than usual for the OrderShipped event to be published. In other words, time is the only difference.
Summary
The problem of non-self-contained events is mitigated first and foremost by business services in SOA, and the apparent issue of time-synchronization by business logic inside these services.
Don’t be afraid to put IDs in your messages and events.
Do be afraid of using those IDs to access datastores shared by multiple “services”.
Using IDs to correlated current events to data from previous events is not only OK, it’s to be expected.
The architectural principle of fully self-contained messages steers us away from the problems of Integration Databases and towards Application Databases, autonomous services, and a better SOA implementation. From there, following the principle of autonomy from a business perspective, will lead us to services not publishing data in their messages that is owned by other services, taking us the next step of our journey to SOA.
One of the most common questions I get on the topic of pub/sub messaging is what happens if a notification is lost. Interestingly enough, there are some who almost entirely write-off this pattern because of this issue, preferring the control of request/response-exception. So, what should be done about lost messages? The short answer is durable messaging. The long answer is design.
Durable Messaging
In order to prevent a message from being lost when it is sent from a publisher to a subscriber, the message is written to disk on the publisher side, and then forwarded to the subscriber, where it is also written to disk. This store-and-forward mechanism enables our systems to gracefully recover from either side being temporarily unavailable.
In my MSDN article on this topic, I outlined some problems with this approach. These problems are exacerbated for publishers. Imagine a publisher with 40 subscribers, publishing 10 messages a second, each containing 1MB of XML. If 10 of the subscribers are unavailable, that’s 100MB of data being written to the publisher’s disk every second, 6GB every minute. That’s liable to bring down a publisher before an administrator brews a cup of coffee.
Publishers have no choice but to throw away messages after a certain period of time.
Publisher Contracts
The whole issue of contracts and schema is considered one of the better understand parts of SOA. Unfortunately, the operational aspects of service contracts is hardly ever taken into account.
On top of the schema of the messages a service publishers, additional information is needed in the contract:
How big will this message be?
How often will it be published?
How long will this message be stored if a subscriber is unavailable?
This first two pieces of information are important for subscribers to do load and capacity planning. The last one is the most important as it dictates the required availability and fault-tolerance characteristic of subscribers.
For Example
In the canonical retail scenario, when our sales service accepts an order, it publishes an order accepted event. Other services subscribed to this event include shipping, billing, and business intelligence.
While shipping and billing are highly available and able to keep up with the rate at which orders are accepted, the business intelligence service is not. BI has two main parts to it - a nightly batch that does the number crunching, and a UI for reporting off of the results of that number crunching. Some even do the reporting in a semi-offline fashion, emailing reports back to the user when they’re ready.
Furthermore, nobody’s going to invest in servers for making BI highly available.
And wasn’t the whole point of this publish/subscribe messaging to keep our services autonomous? That not all services have to have the same level uptime?
Houston, do we have a problem.?
Data Freshness
There is a glimmer of light in all this doom and gloom.
Not all services have the same data freshness requirements.
The business intelligence service above doesn’t need to know about orders the second they’re accepted. A daily roll-up would be fine, and an hourly roll-up bring us that much closer to “real time business intelligence”.
So, while BI is ready to accept the sales message schema, it would like a slightly different contract around it - less messages per unit of time, more data in each message.
From the operational perspective of the sales service, it would be cost effective to have less “online” subscribers. It could even take things a few steps further. Instead of using the regular messaging backbone for transmitting these hourly messages, it could use FTP. The data could even be zipped to take up even less space. Since the total data size is less than the corresponding online stream, is stored on cheaper, large storage, and the number of subscribers for this zipped, hourly update is fairly small, these messages can be kept around far longer.
Note that we’re still talking about the same logical message schema.
Summary
It’s not that lost notifications aren’t a problem.
It’s that they feed the design process in such a way that the resulting service ecosystem is set up in such a way that notifications won’t get lost. I know that that sounds kind of recursive, but that’s how it works. Either subscribers take care of their SLA allowing them to process the online stream of events, or they should subscribe to a different pipe (which will have different SLA requirements, but maybe they can deal with those).
It make sense to have multiple pipes for the same logical schema.
It’s practically a necessity to make pub/sub a feasible solution.
There’s been some discussion on the SOA yahoo group around the connection between SOA, EDA, and CEP (complex event processing) since Jack’s original post on the topic. I’ve been waiting for the right opportunity to jump in and it seems to have come.
Dennis asked this:
There are different design choices in a SOA, even when you already have identified the services. I have a simple example that I would like to share:
Imagine a order-to-cash process. One part of that process is to register an order. Suppose we have two services, Order Service and Inventory Service. The task is to register the order and make a corresponding reservation of the stock level. I would be pleased to have the groups view on the following 3 design options (A, B, C):
A. 1. The “process/application” sends a message (sync or async) to “registerOrder” on the Order Service. 2. The “process/application” sends another message (sync or async) to “reserveStock” on the the Inventory Service.
B. 1. The “process/application” sends a message (sync or async) to “registerOrder” on the Order Service. 2. The Order Service sends a message (sync or async) to “reserveStock” on the the Inventory Service.
C. 1. The “process/application” sends a message (sync or async) to “registerOrder” on the Order Service. 2. The Order Service publishes an “orderReceived” event. 3. The Inventory Service subscribes to the “orderReceived” event .
On the whole “already identified the services” thing - naming a service doesn’t mean much. It’s all about allocating responsibility, and until that’s been done, those “services” don’t give us very much information.
Business Services
If we were to view this example in light of business services, and look at the business events that make up this process, maybe we’d get a different perspective.
Three business services: Sales, Inventory, and Shipping.
In Sales, many applications and people may operate, including the person and the application he used to submit the order. When the order is submitted and goes through all the internal validation stuff, Sales raises an OrderTentativelyAccepted event.
Inventory and Orders
Inventory, which is subscribed to this event, checks if it has everything in stock for the order. For every item in the order on stock, it allocates that stock to the order and publishes the InventoryAllocatedToOrder event for it. For items/quantities not in stock, it starts a long running process which watches for inventory changes.
When an InventoryChanged event occurs, it matches that against orders requiring allocation – if it finds one that requires stock, based on some logic to choose which order gets precedence, it publishes the InventoryAllocatedToOrder event.
Sales, which is subscribed to the InventoryAllocatedToOrder event, upon receiving all events pertaining to the order tentatively accepted, will publish an OrderAccepted event.
Orders and Shipping
When Inventory receives the OrderAccepted event, it generates the pick list to bring all the stock from the warehouses to the loading docks, finally publishing the PickListGenerated event containing target docks.
When Shipping receives the PickListGenerated event, it starts the yard management necessary to bring the needed kinds of trucks to the docks.
What else is possible
I could go on, talking about things like the maximum amount of time stock of various kinds can wait to be loaded on trucks, subscribing to earlier events to employ all kinds of optimization and prediction algorithms, having a Customer Care service notifying the customer about what’s going on with their order (probably different for different kinds of customers and preferred communication definitions). Obviously, we’d need a Billing service to handle the various kinds of billing procedures, whether or not the customer has credit, pays upon delivery, etc.
It turns out that many business domains map very well to this join of SOA and EDA.
What an ESB is for
When we have these kinds of business services primarily publishing events and subscribing to those of other services, you don’t need much else from your “enterprise service bus”. All sorts of transformation, routing, and orchestration capabilities don’t come into play at all.
In all truthfullness, those bits of functionality are really just a historical artifact of their broker heritage.
Don’t get me wrong, sometimes a broker is a nice thing to have - behind a service boundary in order to perform some complex integration between existing legacy applications.
Just keep that stuff in its place - not between services.
Complex Event Processing
We can look at how Sales transitions an order from being tentatively accepted to being accepted as requiring event correlation around InventoryAllocatedToOrder events. This isn’t exactly “complex” in its own right. If there were some kind of CEP engine that did this for us out of the box, it might be a possible technology choice for implementing this logic within our service.
As we add more concerns, like time, we may find new ways to make use of this engine. For instance, if the time to provide the order to the customer is approaching, we may choose to split the order into two - accepting one for which we have all the stock allocated, and leaving the second as tentatively accepted.
Summary
While it is difficult to move forward on service responsibility without discussing the events it raises and those it subscribe to, the whole issue of CEP can be postponed for a while.
Although there aren’t many who would say that EDA is necessary for driving down coupling in SOA, or that SOA won’t likely provide much value without EDA, or that SOA is necessary for providing the right boundaries for EDA, it’s been my experience that that is exactly the case.
CEP, while being a challenging engineering field, and managing the technical risks around it necessary for a project to succeed in some circumstances, and really shines when used under the SOA/EDA umbrella, it should not be taken by itself and used at the topmost architectural levels.
And if you’re wondering about how to handle all that complexity inside services (different kinds of billing, periodic tests for electronics inventory, etc), you might like listening to this podcast about business components.
Of the tenets of Service Orientation, the tenet of Autonomy is one that many understand intuitively. Interestingly enough, many in that same intuitive category don’t see pub/sub as a necessity for that autonomy.
Watch that first step
Although sometimes described as the first step of an organization moving to SOA, web-service-izing everything results in synchronous, blocking, request/response interaction between services. The problem being that if one service were to become unavailable, all consumers of that service would not be able to perform any work. With the deep service “call stacks” this architectural style condones, the availability and performance of the entire organization will be dictated by the weakest link.
So, while I’d agree that many organizations do need to take this step, I’d caution against going into production at this step.
Pub/Sub Considered Helpful
When services interact with each other using publish/subscribe semantics we don’t have that technical problem of blocking. Subscribers cache the data published to them (either in memory or durably depending on their fault-tolerance requirements) thus enabling them to function and process requests even if the publisher is unavailable.
Consider the following scenario:
Let’s say we have an e-commerce site, a part of our Sales service responsible for selling products. Another service, let’s call it merchandising, is responsible for the catalog of products, and how much each product costs. Sales is subscribed to price update events published by Merchandising and saves (caches) those prices in its own database. When a customer orders some products on the site, Sales does not need to call Merchandising to get the price of the product and just uses the previously saved (cached) price. Thus, even if Merchandising is unavailable, Sales is able to accept orders. This is a big win as our merchandising application is not nearly as robust as our sales systems.
Yet, there are scenarios where data freshness requirements prevent this.
Too Much of a Good Thing?
Technically, the above story is accurate. There is nothing technically preventing Sales from accepting orders. Yet consider a scenario where Merchandising is down or unavailable for an extended period of time. While this may not be entirely likely for two servers in the same data center, consider physical kiosks which customers can use to buy products. Those kiosks may not receive updates for days. Should they accept orders?
That’s really a question to the business. If pricing data is stale for a time period greater than X, do not sell that item. The value of X may even be different for different kinds of products. Keep in mind that this issue only arose since we architected our services to be fully autonomous. In a synchronous systems architecture, this issue would not come up. As such, it is our responsibility as architects to go digging for these requirements as well as explaining to the business what the tradeoffs are.
In order to have more up to date data, we need to invest in more available hardware, networks, and infrastructure. This needs to be balanced against the predicted increase in revenue that more up to date (read higher) prices would give us.
You Can Get What You Pay For
Beyond the additional cost of writing that additional logic, and the perceived increased complexity, another difference to note between this architectural style and the synchronous/traditional one is that it puts control of spending back in the hands of business.
In a synchronous architecture, in order to achieve required performance and availability, all systems need to be performant requiring across the board investments in servers, networks, and storage. Without investing everywhere, the weakest link is liable to undo all other investments. In other words, your developers have made your investment choices for you. Scary, isn’t it.
A more prudent investment strategy would prefer spending on services that give the biggest bang for the buck, better known as return on investment. A pub/sub based architecture allows investing in data-freshness where it makes the most sense. For example, in sales of high profit products to strategic customers rather than inventory management of raw materials for products slated to be decommissioned.
That sounds a lot like IT-Business Alignment.
Maybe there’s something to this SOA thing after all…
One of the common questions I receive from people starting to use nServiceBus is how one-way messaging fits with showing the user a grid (or list) of data. Thinking about publish/subscribe usually just gets them even more confused. Trying to resolve all this with Service Oriented Architecture leaves them wondering - why bother?
In regular client-server development, the server is responsible for providing the client with all CRUD (create, read, update, and delete) capabilities. However, when users look at data they do not often require it to be up to date to the second (given that they often look at the same screen for several seconds to minutes at a time). As such, retrieving data from the same table as that being used for highly consistent transaction processing creates contention resulting in poor performance for all CRUD actions under higher load.
A Scalable Solution
One of the common answers to this question is for the server/service to publish a message when data changes (say, as the result of processing a message) and for clients to subscribe to these messages. When such a notification arrives at a client, the client would cache the data it needs. Then, when the user wants to see a grid of data, that data is already on the client. Of course, this solution doesn’t work so well for older client machines (like some point of service devices) or if there are millions of rows of data.
The thing is that this solution is one implementation of a more general pattern - command query separation (CQS).
Command Query Separation
Wikipedia describes CQS as a pattern where "… every method should either be a command that performs an action, or a query that returns data to the caller, but not both. More formally, methods should return a value only if they are referentially transparent and hence possess no side effects."
Martin Fowler is less strict about the use of CQS allowing for exceptions: "Popping a stack is a good example of a modifier that modifies state. Meyer correctly says that you can avoid having this method, but it is a useful idiom. So I prefer to follow this principle when I can, but I’m prepared to break it to get my pop."
So, how does separating commands from queries and SOA help at all in getting data to and from a UI? The answer is based on Pat Helland’s thinking as described in his article Data on the Inside vs. Data on the Outside.
Services Cross Boxes
The biggest lie around SOA is that services run.
Let that sink in a second.
Sure services have runnable components, but that’s not why they’re important.
Services communicate with each other using publish/subscribe and one-way messaging. Services have components inside them. Inside a service, these components can communicate with each using synchronous RPC, or any other mechanism. Also, these components can reside on different machines.
This is broader than just scaling out a service. There can be service components running on the client as well as the server.
SOA & CQS
Combining these two concepts together, here’s what comes out:
In this solution there are two services that span both client and server - one in charge of commands (create, update, delete), the other in charge of queries (read). These services communicate only via messages - one cannot access the database of the other.
The command service publishes messages about changes to data, to which the query service subscribes. When the query service receives such notifications, it saves the data in its own data store which may well have a different schema (optimized for queries like a star schema).
The client component which is in charge of showing grids of data to the user behaves the same as it would in a regular layered/tiered architecture, using synchronous blocking request/response to get its data - SOA doesn’t change that.
Composite Applications
Although the client side components of both the command and query services are hosted in the same process, they are very much independent of each other. That being said, from an interoperability perspective (the one that most people attribute to SOA), all of the client-side components will likely be developed using the same technology - although there are already ways to host Java code in .NET and vice-versa.
Of course, once we talk about web UI’s things are a bit different - but still similar. While web-server-side there may be a level of independence, for browser side inter-component communications we’re still likely to target javascript. There, I’ve managed to say something technical supporting mashups and SOA without lying through my teeth.
On the Microsoft side with the recent release of the Composite Application Guidance & Library (pronounced "prism") I hope that more of these principles will be reaching the "smart client". The command pattern is especially critical in maintaining the separation while enabling communication to still occur so I’m glad that, as one of the Prism advisors, I was able to simplify that part (Glenn still has nightmares about that rooftop conversation).
Publish / Subscribe
In the "scalable solution" section up top I mentioned how publish/subscribe to the smart client is really just one implementation of CQS and SOA. So, how different is it really?
Well, there will probably be a different technology mapping. Instead of a star-schema OLAP product, we might simply store the published data in memory on the client. That is, if you designed your components to be technology agnostic.
In terms of the use of nServiceBus, the same component is going to be subscribing to the same type of message - all that’s different is that now every client will be having data pushed to them rather than this occurring server-side only.
You could have the same code deployed differently in the same system - stronger clients subscribing themselves, weaker ones using a remote server. Web servers would probably be considered stronger clients. This kind of flexible deployment has proven to be extremely valuable for my larger clients. The added benefit of enabling users to work (view data) even while offline (somewhere there’s no WIFI) is just icing on the cake.
A Word of Warning
Once the client starts receiving notifications, and handling those on a background thread (as it should) the code becomes susceptible to deadlocks and data races. Juval does a good job of outlining some of those with respect to the use of WCF. Prism doesn’t provide any assurances in this area either.
Summary
NServiceBus is not designed to be used for any and all types of communication in a given architecture. In the examples above, nServiceBus handles the publish/subscribe but leaves the synchronous RPC to existing solutions like WCF. Not only that, but synchronous RPC does have its place in architecture, just not across service boundaries. In all cases, data is served to users from a store different from that which transaction processing logic uses.
Command Query Separation is not only a good idea at the method/class level but has advantages at the SOA/System level as well - yet another good idea from 20 years ago that services build upon. Making use of CQS requires understanding your data and its uses - SOA builds on that by looking into data volatility and the freshness business requirements around it.
Finally, designing the components of your services in such a way that their dependency on technology is limited buys a lot of flexibility in terms of deployment and, consequently, significant performance and scalability gains.
For those people who couldn’t come to TechEd USA and didn’t see my talks on how to build highly scalable web architectures, you’re in luck - Craig, the man behind the Polymorphic Podcast sat down with me and we chatted about what the problems, common solutions, and effective tactics there are in this space. For those of you who were at TechEd and still didn’t come to my talk - what were you thinking?!
Some of this stuff is a bit counter-intuitive (and not readily supported by the tools available in Visual Studio) so please, do feel free to ask questions (in the comments below).
This innocuous question comes up a lot. Usually I get this question after a short problem domain description. One of these came up on the nServiceBus discussion groups. Ayende took it and ran with it turning it into a nice blog post, An exercise in designing SOA systems. I’ve been meaning to write something myself. Bill put up a response already in his Service Granularity Example. So, I’m late to the party, again, but here we go.
It’s almost impossible to know, right away, which services are appropriate.
So, I’m going to focus more on the process of getting there, rather than describing the solution itself.
The domain deals with a placement agency placing physicians in positions at hospitals.
1. So, what does it actually do?
In Ayende’s post, he describes several services, but I’d rather look at them as use cases: registering an open position, registering a candidate, verifying their credentials, etc. It’s worth going through this requirements process. It doesn’t necessarily translate immediately to services, but there’s value in it.
2. What does it do it to?
We should also be looking at the data model, an entity relationship diagram (ERD) , where we see that we may have placed a certain physician at a number of positions. It’s also important for us to know about under which circumstances a physician finished their employment at a previous position before, say, trying to place them at a position in the same hospital or chain of hospitals. Don’t go thinking that this what the database schema will look like, it’s all about understanding connections between various bits of data.
3. When does that happen?
The next step is to map the uses cases above to the entities in the ERD, which entity is used in which use case. It’s also important to differentiate between entities (or even more importantly, specific fields of entities) that are used in a read-only fashion within a given use case. For instance, when registering a new position, we’ll want to check that against other open positions in the same hospital so we don’t end up registering the same position twice. Also, we might want to suggest verified physicians whose credentials match the position’s requirements. Data we wouldn’t be interested in might be which other physicians we placed at that hospital.
4. What just happened?
Another valuable perspective on the problem domain is the business process view - what are the interesting business events in the system and how they unfold over time. For instance, physician registered, position opened, physician’s credentials verified, and physician placed in position (or position filled by physician) are events that describe a different business perspective than use cases.
5. How do I decide?
Once we know what events there are, we can start looking at what kind of decisions we might want to make when those events occur and what data we’d need to make those decisions. These decisions may be as simple as updating a database or sending an email to a user. They also may include more advanced logic like when the profitability of an agreement with a specific hospital chain changes, prefer placing physicians in positions in that chain over others.
6. How do I deal with all this information?
After we have all of this information, we can start looking for cohesive bunching across all of these axes using these rules:
Data that is modified by a use case gets published as an event.
Data that is required by a use case for read-only purposes, arrives as the result of subscribing to some event.
Look for rules that differentiate behaviour based on the properties of data. Look for a correlation to some business concept. For instance, physicians probably won’t be changing their specialization, and open positions often deal with a certain specialization. Therefore, specific data instances tied to two different specializations can be said to be loosely coupled.
7. Which property slices across the domain?
Even though the ERD may not have made it clear, and the use cases didn’t show any particular break-down, nor did the events call out this point, the key to finding the way a business domain decomposes into services lies in decoupling specific data instances.
Actually, at this point we can clump autonomous components (mere technical bits) that handle a single message, into more granular business components.
If you think about it, it makes a lot of sense. The kind of credential checking you’d do for physicians specializing in brain surgery would likely be different than for general practitioners. The kind of information you’d store would, therefore, also be different.
But, which services do I need?
Quite frankly, I don’t have enough information to know.
But if we had continued this conversation, going through issues like transactional consistency, availability requirements, and other non-functional issues we could have gotten there.
If there’s one thing that I hope you got out of this, it’s that the questions are what’s important. The iterative process of looking at the problem domain from various perspectives, incorporating the new-found knowledge, and asking more questions is what leads us to a solution. But we don’t stop there. We keep looking for characteristics which split services apart into business components, and for consistency requirements that brings autonomous components together into services.
It’s not easy, but by focusing on these simple questions, you can get to a coherent service oriented architecture.
In this video, Greg Young, Martin Fowler, Evan Hoff, Dru Sellers, myself and some others discussed various aspects of event-based systems, how Domain-Driven Design works with them, what role messaging has, and how all these connect to architectural properties like scalability and fault tolerance.
One of the questions that Martin started answering was how teams can start getting into the messaging state-of-mind. Unfortunately, the conversation veered off into what kind of messaging interactions are appropriate leaving the original question unanswered.
I’m hoping to address this topic with some of the information I’m putting up on the nServiceBus site. There’s always Gregor and Bobby’s excellent EIP book that I think is a must for anybody writing distributed systems.
In this article, I attempt to debunk some of the myths around stateless-ness as the key to scalability.
Here’s how it starts:
It was a sunny day in June 2005 and our spirits were high as we watched the new ordering system we’d worked on for the past 2 years go live in our production environment. Our partners began sending us orders and our monitoring system showed us that everything looked good. After an hour or so, our COO sent out an email to our strategic partners letting them know that they should send their orders to the new system. 5 minutes later, one server went down. A minute after that, 2 more went down. Partners started calling in. We knew that we wouldn’t be seeing any of that sun for a while.
The system that was supposed to increase the profitability of orders from strategic partners crumbled. The then seething COO emailed the strategic partners again, this time to ask them to return to the old system. The weird thing was that although we had servers to spare, just a few orders from a strategic customer could bring a server to its knees. The system could scale to large numbers of regular partners, but couldn’t handle even a few strategic partners.
This is the story of what we did wrong, what we did to fix it, and how it all worked out.
Ian Robinson, Principal Consultant at ThoughtWorks
"Your blog and articles have been enormously useful in shaping, testing and refining my own approach to delivering on SOA initiatives over the last few years. Over and against a certain 3-layer-application-architecture-blown-out-to- distributed-proportions school of SOA, your writing, steers a far more valuable course."
Simon Segal, Systems Integration Manager at LinFox
“Udi is one of the outstanding software development minds in the world today, his vast insights into Service Oriented Architectures and Smart Clients in particular are indeed a rare commodity. Udi is also an exceptional teacher and can help lead teams to fall into the pit of success. I would recommend Udi to anyone considering some Architecural guidance and support in their next project.”
Ohad Israeli, Chief Architect at Hewlett-Packard, Indigo Division
“When you need a man to do the job Udi is your man! No matter if you are facing near deadline deadlock or at the early stages of your development, if you have a problem Udi is the one who will probably be able to solve it, with his large experience at the industry and his widely horizons of thinking , he is always full of just in place great architectural ideas.
I am honored to have Udi as a colleague and a friend (plus having his cell phone on my speed dial).”
Eli Brin, Program Manager at RISCO Group
“We hired Udi as a SOA specialist for a large scale project. The development is outsourced to India. SOA is a buzzword used almost for anything today. We wanted to understand what SOA really is, and what is the meaning and practice to develop a SOA based system.
We identified Udi as the one that can put some sense and order in our minds. We started with a private customized SOA training for the entire team in Israel. After that I had several focused sessions regarding our architecture and design.
I will summarize it simply (as he is the software simplist): We are very happy to have Udi in our project. It has a great benefit. We feel good and assured with the knowledge and practice he brings. He doesn’t talk over our heads. We assimilated nServicebus as the ESB of the project. I highly recommend you to bring Udi into your project.”
Yoel Arnon, MSMQ Expert
“Udi has a unique, in depth understanding of service oriented architecture and how it should be used in the real world, combined with excellent presentation skills. I think Udi should be a premier choice for a consultant or architect of distributed systems.”
Vadim Mesonzhnik, Development Project Lead at Polycom
“When we were faced with a task of creating a high performance server for a video-tele conferencing domain we decided to opt for a stateless cluster with SQL server approach. In order to confirm our decision we invited Udi.
After carefully listening for 2 hours he said: "With your kind of high availability and performance requirements you don’t want to go with stateless architecture."
One simple sentence saved us from implementing a wrong product and finding that out after years of development. No matter whether our former decisions were confirmed or altered, it gave us great confidence to move forward relying on the experience, industry best-practices and time-proven techniques that Udi shared with us.
It was a distinct pleasure and a unique opportunity to learn from someone who is among the best at what he does.”
Jack Van Hoof, Enterprise Integration Architect at Dutch Railways
“Udi is a respected visionary on SOA and EDA, whose opinion I most of the time (if not always) highly agree with. The nice thing about Udi is that he is able to explain architectural concepts in terms of practical code-level examples.”
Nick Malik, Enterprise Architect at Microsoft Corporation
“You are an excellent speaker and trainer, Udi, and I've had the fortunate experience of having attended one of your presentations. I believe that you are a knowledgable and intelligent man.”
He accompanied us in all stages of our development cycle and helped us put vision into real life distributed scalable software. He brought fresh thinking, great in depth of understanding software, and ongoing support that proved as valuable and cost effective.
Udi has the unique ability to analyze the business problem and come up with a simple and elegant solution for the code and the business alike. With Udi's attention to details, and knowledge we avoided pit falls that would cost us dearly.”
Motty Cohen, SW Manager at KorenTec Technologies
“I know Udi very well from our mutual work at KorenTec. During the analysis and design of a complex, distributed C4I system - where the basic concepts of NServiceBus start to emerge - I gained a lot of "Udi's hours" so I can surely say that he is a professional, skilled architect with a fresh ideas and unique perspective for solving complex architecture challenges. His ideas, concepts and parts of the artifacts are the basis of several state-of-the-art C4I systems that I was involved in their architecture design.”
We’d been meaning to delve into messaging at Eleutian after multiple discussions with and blog posts from Greg Young and Udi Dahan in the past. We weren’t entirely sure where to start, how to start, what tools to use, how to use them, etc. Being able to sit in a room with Udi for an entire week while he described exactly how, why and what he does to tackle a massive enterprise system was invaluable to say the least.
We now have a much better direction and, more importantly, have the confidence we need to start introducing these powerful concepts into production at Eleutian.”
Gad Rosenthal, Department Manager at Retalix
“A thinking person. Brought fresh and valuable ideas that helped us in architecting our product. When recommending a solution he supports it with evidence and detail so you can successfully act based on it. Udi's support "comes on all levels" - As the solution architect through to the detailed class design. Trustworthy!”
Robert Lewkovich, Product / Development Manager at Eggs Overnight
“Udi's advice and consulting were a huge time saver for the project I'm responsible for. The $ spent were well worth it and provided me with a more complete understanding of nServiceBus and most importantly in helping make the correct architectural decisions earlier thereby reducing later, and more expensive, rework.”
The class was very well put together. The materials were clear and concise and Udi did a fantastic job presenting it. It was a good mixture of lecture, coding, and question and answer. I fully expected that I would be taking notes like crazy, but it was so well laid out that the only thing I wrote down the entire course was what I wanted for lunch. Udi provided us with all the lecture materials and everyone has access to all of the samples which are in the nServiceBus trunk.
Now I know why Udi is the "Software Simplist." I was amazed to find that all the code and solutions were indeed very simple. The patterns that Udi presented keep things simple by isolating complexity so that it doesn't creep into your day to day code. The domain code looks the same if it's running in a single process or if it's running in 100 processes.”
Liron Levy, Team Leader at Rafael
“I've met Udi when I worked as a team leader in Rafael. One of the most senior managers there knew Udi because he was doing superb architecture job in another Rafael project and he recommended bringing him on board to help the project I was leading. Udi brought with him fresh solutions and invaluable deep architecture insights. He is an authority on SOA (service oriented architecture) and this was a tremendous help in our project. On the personal level - Udi is a great communicator and can persuade even the most difficult audiences (I was part of such an audience myself..) by bringing sound explanations that draw on his extensive knowledge in the software business. Working with Udi was a great learning experience for me, and I'll be happy to work with him again in the future.”
Eytan Michaeli, CTO Korentec
“Udi was responsible for a major project in the company, and as a chief architect designed a complex multi server C4I system with many innovations and excellent performance.”
Evgeny-Hen Osipow, Head of R&D at PCLine
“Udi has helped PCLine on projects by implementing architectural blueprints demonstrating the value of simple design and code.”
Nimrod Peleg, Lab Engineer at Technion IIT
“One of the best programmers and software engineer I've ever met, creative, knows how to design and implemet, very collaborative and finally - the applications he designed implemeted work for many years without any problems!”