Autonomous Services – Udi Dahan – The Software Simplist

[Ask Udi] Two services operating on the same entity

udidahan — Fri, 19 Feb 2016 08:10:41 +0000

Most of my regular readers know that I recommend against having more than one service operating on the same entity, but with the groundswell of interest around microservices, there appear to be more and more people who are falling into this trap, so I thought it worthwhile to do a short refresher on the topic.

I got this question submitted the other day on an old post from 2007 on autonomous services and enterprise entity aggregation:

What happens when two services must be able to create/update the same entities?

For example, take this scenario, lets suppose we have two services: Marketing Service and Product Inventory Service. Marketing Service must be able to create new instances of the business entity Products, as the marketing team needs to make publicity about new products before the products is available for sale. On the other hand, the marketing department will not make publicity for the all products, so the Product Inventory Service needs to be able to create new instances of the Product too.

How to implement autonomous services when there is more than one service which is authoritative services over the same business entity?

Where confusion starts

When people look to allocate responsibility to their services, there are a number of implicit assumptions that are often made that make it difficult to follow the rules of service design.

Problematic Assumption #1

Services should be aligned with organizational boundaries.

For the full story as to why this isn’t a good idea, read my post on people, politics, and the single responsibility principle.

In the question above, assuming that the Marketing Service does all the things that the Marketing Department does, and inversely doesn’t do the things the department doesn’t do gets us into trouble. If a service is responsible for the creation of an entity, there shouldn’t be any other service that can create that entity. So, in this case, the “Marketing Service” (assuming that that would still be the best name for it) would always create the “Product” entity – assuming that that is the right name and set of attributes to put together.

Problematic Assumption #2

Entities should represent “real world” things.

For the full story as to why this isn’t a good idea, read my post don’t try to model the real world, it doesn’t exist.

In the question above, assuming that there is an entity called Product and that entity owns all data about a product like its’ name, price, and amount in inventory gets us into trouble. You see, each of those attributes relates to very different business processes when we look at their use in transactional logic. I mean, sure, you may need to show all the data on one screen in the UI, but that can be done fairly simply using UI composition techniques. For some more advanced UI composition ideas, see the video in this post on service-oriented composition.

Anyway, instead of having a single Product entity, we could have a “CatalogItem” entity that would encompass information about the “product” like its name, description, image, and category information. That entity would share a common ID with entities owned by other services – like an “InventoryItem” that would own information like the quantity of the product.

Depending on other more detailed information about the domain, we may decide on whether data like the dimensions and weight of the product would go together with the quantity or on a separate “ShippableItem” entity owned by yet a different service. One justification for the separation is that the quantity of inventory is quite a bit more volatile than the dimensions and weight. Still, we could have separate entities for those things in the same service.

And that’s why it’s so difficult

When you can no longer rely on the crutches of organizational structure and the nouns of the domain, it becomes much harder to model your domain – dividing up responsibility into services and setting up the right entity model for each of them. At that point, you really need to dive deep into the domain and analyze the way each attribute of data is used and what kind of transactional integrity constraints are real and which are imagined.

For those of you who haven’t gone back and read all of my older posts (and let’s be honest, nobody does that when they subscribe to a blog), I hope the links above give you some next steps to take in learning about service design.

If you want even more

I’ve been teaching about this stuff for a while now and most of the people who’ve attended the training think it’s quite helpful. To get a sense of what this course is like, check out this short video

The thing is that these courses fill up pretty quick – the upcoming ones in Dallas Texas in March and London UK in April have already sold out, so I’m going to try to do more this year than just the four I did last year.

The next one that is now open for registration will be in Sydney Australia in May.

I’m going to try and see if I can get back to the US in August and London in December, but for now the next other course that is already available will be in Denver CO in November.

For those of you who would have a hard time traveling or taking time off work, you can get access to the recording of the first two days of the course – totally free.

Microservices presentation [London 2014]

udidahan — Tue, 21 Jul 2015 11:23:45 +0000

… in which I realize I shouldn’t put off blogging about the presentations I’ve given.

This one is from µCon 2014: The Microservices Conference at Skills Matter in London.

The title of this talk was: An Integrated Services Approach

and the description:

After many years of the largely enterprise-scale SOA philosophy being applied across multiple systems, we’re now seeing some of that philosophy being applied to the design of the systems themselves with Microservices. Unfortunately, unless we integrate these enterprise and system level philosophies appropriately, we’ll end up with a mess of data duplication and coupling that may even result in businesses running on inconsistent data. Join Udi for a discussion of a unified approach that leverages the best of both worlds.

Hope you find it interesting.

Finding Service Boundaries – illustrated in healthcare

udidahan — Mon, 02 Feb 2015 10:23:38 +0000

A couple of months back I gave a presentation at NDC London about how to find service boundaries, giving examples from the field of healthcare. The recording is now online here.

If you want to learn more about these topics, check out more of my posts on SOA here.

If you want the full, in-depth, zero-to-sixty experience – you should really attend my Advanced Distributed Systems Design course. The next one is in March in San Francisco but there will be others around the world through the rest of this year.

For the full list of events, click here.

Service-Oriented Composition (with video)

udidahan — Wed, 30 Jul 2014 12:44:48 +0000

When telling people about my approach to SOA, in which a given service would have client/browser-side components running side-by-side in the same process and even in the same page as components from other services, I often get asked this question:

“Doesn’t all of this loosely-coupled composition come with a high cost, in terms of client to server chit-chat?”

So, I’ve finally buckled down and put together a slide to illustrate how the technocratic IT/Ops service I’ve talked about in the past can provide components to resolve these sorts of problems.

After putting the slide together, and realizing some animation would do it good, I went and made a short (5 min) video including some verbal explanation as to how it all works – just for clarity. Check it out or watch it here:

And here’s the image showing everything in one picture:

People, Politics, and the Single Responsibility Principle

udidahan — Mon, 26 May 2014 06:24:59 +0000

In one of Uncle Bob’s recent blog posts on the Single Responsibility Principle he uses the example of using people and organization boundaries as an indication of possible good software boundaries:

When you write a software module, you want to make sure that when changes are requested, those changes can only originate from a single person, or rather, a single tightly coupled group of people representing a single narrowly defined business function. You want to isolate your modules from the complexities of the organization as a whole, and design your systems such that each module is responsible (responds to) the needs of just that one business function.

This is something that often comes up when I teach people about service boundaries when it comes to SOA – organization boundaries are the most intuitive choice.

And, once up on a time, that intuition might have indeed held up.

Stepping back in time

In the age before computers, organizations had a very specific way of structuring themselves.

People who had to work closely together sat in close physical proximity to each other. Data that was required on an ongoing basis would be in file cabinets also physically co-located with the people using that data, and it would be structured in a way that was optimal for their specific purposes. All of this was due to the high cost of communicating with people farther away.

If you needed data from a different department, you had requisition it by filling out a special form, put it in your outbox, and then some guy from the mail room would pick it up, and physically schlep it to the right department, putting it in their inbox, and then someone there would get your data for you – putting it together with your original request, and then the mail guy would schlep it back. This inbox/outbox style of communication should ring a bell from the messaging patterns I talk about with NServiceBus.

As a result, different departments had to have very clearly delineated responsibilities with minimal overlap with each other. The organization just couldn’t function any other way.

And then a bunch of us geeks came along.

Enter the age of computers and networks

By introducing this technology, the cost of communication across large distances started falling – slowly at first, and then quite dramatically.

When anyone in an organization was able access data from anywhere in the blink of an eye, an interesting dynamic started to unfold. All of a sudden, the division of responsibility between departments wasn’t as critical as it was before. When an employee needed to do something, there wasn’t this “that isn’t our job, you need to go to so-and-so” reaction. Because things could be done instantly, that’s exactly what happened.

And then came the politics

By removing the cost of communication, it became possible for more power-hungry people in the organization to start making (or trying to make) decisions that they couldn’t have made before. The introduction of computers into an organization was heralded as a new way of doing business – that the old organizational boundaries were a relic that we should leave behind us.

And thus can the re-org (the first of many).

Responsibilities and people were shuffled around, managers vied for more power, and politics took its’ place as one of the driving forces in the company structure.

Nowadays, if you want a decision made in a company, there isn’t just one person who has the authority to sign off on it anymore. No, you need to have meetings – and more meetings, with people you never knew existed in the company, or why on earth they should have a say on how something is supposed to get done. But that is now our reality: endlessly partially overlapping responsibilities across the organization.

So, what of the Single Responsibility Principle

This just makes it that much harder to decide how to structure our software – there is no map with nice clean borders. We need to be able to see past the organizational dysfunction around us, possibly looking for how the company might have worked 100 years ago if everything was done by paper. While this might be possible in domains that have been around that long (like banking, shipping, etc) but even there, given the networked world we now live in, things that used to be done entirely within a single company are now spread across many different entities taking part in transnational value networks.

In short – it’s freakin’ hard.

But it’s still important.

Just don’t buy too deeply into the idea that by getting the responsibilities of your software right, that you will somehow reduce the impact that all of that business dysfunction has on you as a software developer. Part of the maturation process for a company is cleaning up its’ business processes in parallel to cleaning up its’ software processes.

The good news is that you’ll always have a job

On that Microservices thing

udidahan — Mon, 31 Mar 2014 16:40:36 +0000

Seems that I’m a bit late to the Microservices party – original article here.

But since I’ve been getting repeated requests to weigh in on the topic, I guess I’ll have to risk fanning the flames up again.

Also, since quite a few reactions have already been written on the topic (and I don’t want to repeat them here), I’ll just point to this post by Arnon which sums them all up pretty well.

Now, I don’t entirely agree with all the commentary Arnon pointed to, or all of his thoughts on the topic, but I’ll try to take those up some other time.

And before jumping into it, let me say that there is a lot of good stuff in the article and that, regardless of naming, spreading the word more broadly on these approaches has value.

So, where do I stand on the topic

First of all, for those of you who have been following my blog for a while I’d say this:

Microservices almost equals Autonomous Components.

Why “almost”?

Because an Autonomous Component (AC) isn’t necessarily a physical unit of deployment – very often we’ll see multiple ACs deployed in the same physical process. One of the most common occurrences is in a web front end built as a composite UI. In the same web server process we’ll see components from multiple Services.

This is something that was hardly mentioned in the original article.

On Services and Systems

In my world, Services are a larger organizing principle that are meant to align solution domain boundaries with problem domain boundaries.

Now, that might sound very similar to this passage from the original article:

“The microservice approach to division is different, splitting up into services organized around business capability. Such services take a broad-stack implementation of software for that business area, including user-interface, persistant storage, and any external collaborations. Consequently the teams are cross-functional, including the full range of skills required for the development: user-experience, database, and project management.”

Now, this isn’t entirely surprising because I did have several conversations with both James and Martin on the topic over the past couple of years.

Still, there is something important missing here that I believe is very important to achieve loose-coupling, and that is that Services necessarily have to span system boundaries.

Let me repeat that: a Service will need to have components that are deployed to more than one system.

Here’s why:

Let’s say you have a piece of data like the price of a product. Not only will that data be visible in one system, often it will need to be shown (as well as updated) in other systems too. In order to have appropriate encapsulation of that concept, the owning service will need to be the one that owns the components that operate on that concept in the other systems.

This means that if we need to show the product price on an invoice in a back-end system, then that invoice would have to be a composite UI as well, and the service which owns the price will have a component deployed there which would be responsible for showing the price on the invoice.

In this manner, no code outside the service boundary would know about the concept of the product price and thus could not end up coupled to it.

Although the original article does get into this to some degree (when talking about Decentralized Data Management), I don’t really see how a microservice/AC could end up having this level of ownership of data.

Still, the point made about different persistence technologies is valid at the level of services (though not ACs).

How big is a service

Now, if the price is not shared outside the boundary of the service, then how would order totals be calculated?

The answer is that the totals must (MUST) be calculated in the same service.

This shouldn’t be surprising as it’s just good old OO – encapsulating data with the logic that operates on it. Or, if you’d like, call it the Single Responsibility Principle: there should only be a single service impacted by a change to the definition of this data.

As a result, you’ll tend to see services that aren’t all that small, and probably not so many of them. In my experience, I’ve seen between 7 and 15 services the majority of the time.

Cross-service collaboration

Although I am glad to see the recommendation for event-based interactions between microservices, the focus on cross-process communication ignores some extremely important collaboration scenarios – the most important of which is in the client tier.

In a web application, it is quite common to have components written in javascript from multiple services interacting among themselves in the browser – one publishing events, others subscribing to those JS events. It is also quite common to see those JS components request some data from back-end components in the same service in response to those JS events.

This type of synchronous RPC communication within a service boundary is perfectly acceptable, although it stands in contrast to the recommendations of the microservices approach.

Caveat on sharing data

I’ve been going on and on about the importance of not sharing data, but there is one exception to that rule.

There is a special service that I call IT/Ops which (among other things) is responsible for integration with 3rd party systems. As a part of this integration, it encapsulates data transformation logic and, as a result, needs to be able to receive data from the other more business-centric services.

As you can imagine, this puts IT/Ops in the risky position of coupling itself to a lot of things and thus needs to be done carefully. As a result, I recommend that many of the most skilled technical people work within the IT/Ops team, also serving in a consultative capacity to the other service teams.

In closing

I am extremely thankful to Martin and James for writing the Microservices article.

I think that the conversations it has sparked are timely, and hopefully more people will ponder these questions of how to structure their code-bases in order to avoid them becoming monolithic.

And while I think that it’s great to consider aligning team boundaries with service boundaries, people need to understand that it will need to be an evolutionary process – it will take time to transition an existing code base and an existing team structure to this new model, especially since these teams will have to continue to deliver features and bug fixes through the transition period. Jumping to the new model directly may cause more harm than good.

This is actually one of the most salient topics of my course (next one in NYC in May) – how do you get there from here. In my experience there are 4 phases that companies go through, often taking at least a couple of years, with larger environments taking potentially longer.

In any case, let’s keep the conversation going.

What are your thoughts? Have you been applying the Microservices approach, or possibly the one I talk about (I really should give it a name). What’s been working well for you? What hasn’t?

Leave me a comment or write your own blog post.

Data Duplication and Replication

udidahan — Tue, 28 Aug 2012 08:10:59 +0000

Occasionally I’ll get questions from people who have been going down the CQRS path about why I’m so against data duplication. Aren’t the performance benefits of a denormalized view model justified, they ask. This is even more pronounced in geographically distributed systems where the “round-trip” may involve going outside your datacenter over a relatively slow link to another site.

CQRS

As his been said several times before by many others, it’s not the denormalized view model that defines CQRS.

One of the things that sometimes surprising people after going through my course is that in most cases you don’t need a denormalized view model, or at least, not the kind you think. Yes, that’s right: MOST cases.

But I don’t want to get too deep into the CQRS thing in this post – that can wait.

SOA

The big thing I’m against is raw business data being duplicated between services.

Data that can be expected to be accessible in multiple services includes things like identifiers, status information, and date-times. These date-times are used to anchor the status changes in time so that our system will behave correctly even if data/messages are processed out of order. Not all status information necessarily needs to be anchored in time explicitly – sometimes this can be implicit to the context of a given flow through the system.

For example, the Amazon.com checkout workflow.

In that flow, if you provide a shipping address that is in the US, you are presented with one set of options for shipping speed, whereas an international address will lead you to a different set of options.

Assuming that the address information of the customer and the shipping speed options are in different services, we need to propagate the status InternationalAddress(true/false) between these services in that same flow. In this case, there isn’t a need to explicitly anchor that status in time.

But what’s so bad about duplication of data between services?

The danger is that functionality ultimately follows raw business data.

You start with something small like having product prices in the catalog service, the order service, and the invoice service. Then, when you get requirements around supporting multiple currencies, you now need to implement that logic in multiple places, or create a shared library that all the services depend on.

These dependencies creep up on you slowly, tying your shoelaces together, gradually slowing down the pace of development, undermining the stability of your codebase where changes to one part of the system break other parts. It’s a slow death by a thousand cuts, and as a result nobody is exactly sure what big decision we made that caused everything to go so bad.

That’s the thing, it wasn’t viewed as a “big decision” but rather as just one “pragmatic choice” for that specific case. The first one excuses the second, which paves the way for third, and from that point on, it’s a “pattern” – how we do things around here; the proverbial slippery slope.

So what’s with the word “Replication” in the title of this post?

While data duplication between services is very dangerous, replication of business data WITHIN a service is perfectly alright.

Let’s get back into multi-site scenarios, like a retail chain that has a headquarters (HQ) and many stores. Prices are pushed out from the HQ and orders are pushed back from the stores according to some schedule.

We know that we can’t guarantee a perfect connection between all stores and the HQ at all times, therefore we copy the prices published from the HQ and store them locally in the store. Also, since we want to perform top-level analytics on the orders made at the various stores, that would be best done by having all of those orders copied locally at the HQ as well.

We should not view this movement of data from one physical location to another as duplication, but rather as replication done for performance reasons. If there were some magical always-on zero-latency network that existed, we wouldn’t need to do any of this replication.

And that’s just the thing – logical boundaries should not be impacted by these types of physical infrastructure choices (generally speaking). Since services are aligned with logical boundaries, we should expect to see them cross physical boundaries – this includes SYSTEM boundaries (since a system is really nothing more than a unit of deployment).

I know that you might be reading that and thinking “What!?” but there isn’t enough time to get into this in any more depth here. You can read some of my previous posts on the topic of SOA for more info here.

Cross-site integration without replication

There are some domains where sensitive data cannot be allowed to “rest” just anywhere. Let’s look at a healthcare environment where we’re integrating data from multiple hospitals and care providers. While all of these partners are interested in working together to make sure that patients get the best care, which means that they need to share their data with each other, they don’t want any of THEIR data to remain at any partner sites afterwards (and are quite adamant about this).

In these cases, the decision was made that performance is less important than data ownership. Personally, I don’t agree with this mindset. The fact that data is “at rest” in a location as opposed to “in flight” does not change ownership. It could be stored in an encrypted manner so that only a certain application could use it, resulting in the same overall effect, but this is an argument that I’ve never won.

People (as physical beings) put a great deal of emphasis on the physical locations of things. It’s understandable but quite counterproductive when dealing with the more abstract domain of software.

In closing

By virtue of the fact that we don’t duplicate raw business data between services, that means that the regular data structures inside a service already look very different from what they would have looked like in a traditional layered architecture with an ORM-persisted entity model.

In fact, you probably wouldn’t see very many relationships between entities at all.

Going beyond that, you probably wouldn’t see the same entities you had before. An Order wouldn’t exist the way you expect; addresses (billing and shipping) would be stored (indexed by OrderID) in one service whereas the shipping speed (also indexed by OrderId) would be in another, and the prices may well be in yet another.

It is in this manner that data does not end up being duplicated between services, but rather is composed by many services whether that is in the UI of one system, the print-outs down by a second system, or in the integration with 3rd parties done by a third system.

If performance needs to be improved, look at having these services replicate their data from one physical system to another – in-memory caching is one way of doing this, denormalized view models might be though of as another (until you realize there isn’t very much normalization within a service to begin with).

And a word from our sponsor

For those of you on “rewrite that big-ball-of-mud” projects looking to use these principles, I strongly suggest coming on one of my courses. The next one is in San Francisco and I’ve just opened up the registration for Miami.

For those of you on the other side of the Atlantic, the next courses will be in Stockholm in October and in London this December.

The schedule for next year is also coming together and it will include South Africa and Australia too.

Anyway, here’s what one attendee had to say after taking the course earlier this month:

I wanted to thank you for the excellent workshop in Toronto last week. I spent the better part of the weekend reflecting over what was presented, the insights we learned through the group exercises, and how my preconceptions of SOA have changed. By the end of the course, all the tidbits of (usually) rather ambiguous information that I’ve collected from various blogs, books, and other sources, finally coalesced into something more intelligible – one big A-HA moment if you will. Overall, I found the content of the workshop to be incredibly enlightening and it left me feeling invigorated and excited to learn more.
– Joel from Canada

Hope you’ll be able to make it.

If travel is out of the question for you, you can also look at get a recording of the course here.

One final thing

If your employer won’t foot the bill for these, please get in touch with me.
I wouldn’t want you not to be able to come just because you’re paying out of pocket.

There are very substantial discounts available.

Contact me.

UI Composition vs. Server-side Orchestration

udidahan — Mon, 09 Jul 2012 06:49:26 +0000

Following on my last post called UI composition techniques for correct service boundaries, one commentor didn’t seem to like the approach I described saying:

“I’m sorry, but with all due respect I must strongly disagree. You haven’t avoided any orchestration work at all, you’ve just moved it in to client side script!

How are you going to deal with the scenario that one of the service calls fails? Say a failed credit card payment, or no more rooms left? In more javascript??

I would much rather take the less brittle approach of introducing an orchestration service. Like it or not, however trivial it may be, there is a relationship between these services, if one call fails, they both fail. This should be reflected in the architecture, not hidden in javascript. With an orchestration service you also either get transactions for free provided by infrastructure, or alternatively if the underlying service doesnt support this, explicit and unit testable control over recovery.”

Since this is a common point of view, I thought I’d take the time to explain a bit more.

Let’s start at a fairly high level.

On failures

I’ve talked many times in the past about how to handle technical causes for failure like server crashes, database deadlocks, and even deserialization exceptions. Messaging and queuing solutions like NServiceBus can help overcome these issues such that things don’t actually fail – they just take a little longer to succeed.

On the logical side of things, the CQRS patterns I talk about describe an approach where aggressive client-side validation is done to prevent almost all logical causes for failure. The only thing that can’t be mitigated client-side are race conditions resulting in actions taken by other users at the same time.

In short, it really is uncommon for things to fail when being processed server-side.

Back to the specific example

The concerns raised in the comment specifically talked about a failed credit card payment or no rooms left in the hotel, so let’s start with the credit card thing:

In my last post I talked about collecting guest and credit card information from the user as a part of the “checkout” process when making a reservation for a hotel room. Just to be clear – there is a final “confirm your reservation” step that happens after all information has been collected.

What this means is that we aren’t actually charging the customer’s card when we collect that data, therefore there is no real issue with a failed credit card payment that needs to be handled by the client-side javascript. When the customer confirms their reservation, yes, there might be a failure when charging the card though there are only some specific types of rates for which the hotel charges your card when you make a reservation.

In general, failed credit card payments are handled pretty much the same way for all ecommerce – an email is sent to the customer asking for an alternative form of payment, also saying that their purchase won’t be processed until payment is made.

In any case, it is only after the reservation is placed that the responsible service would publish an event about that. The service which collected the credit card information would be subscribed to that event and initiate the charge of the card when that event arrives (or not, depending on the rate rules mentioned).

With regards to there not being any rooms left, well, first of all, there’s overbooking – hotels accept more reservations than rooms available because they know that customers sometimes need to cancel, and some just don’t show up. Secondly, there is a manual compensation process if more people show up than there are actual rooms to put them in. In some cases, a hotel will bump you up to a higher class of room (assuming there aren’t too many reservations for those), and in others they will call a “partner” hotel nearby and put you up there instead.

In summary

While arguments can be made that yes, these issues have been addressed in this specific example, there may be other domains where it is not possible to do these kinds of “tricks”. Although I do agree with that in theory, I’ve spent the better part of 5 years travelling around the world talking to hundreds of people in quite a few business domains, and every single time I’ve found it possible to apply these principles.

In short, the use of UI composition allows services to collect their own data, making it so anything outside that service doesn’t depend on those data structures which makes both development and versioning much easier. Technical failure conditions can be mitigated at infrastructure levels in most cases and other business logic concerns can be addressed asynchronously with respect to the data collection.

Give it a try.

UI Composition Techniques for Correct Service Boundires

udidahan — Sat, 23 Jun 2012 13:04:00 +0000

One of the things which often throws people off when looking to identify their service boundaries is the UI design. Even those who know that the screen a user is looking at is the result of multiple services working together sometimes stumble when dealing with forms that users enter data into.

Let’s take for example a screen from the Marriott.com online reservation system (below). This screen collects information about the guest staying at the hotel (name, phone number, address, etc) and credit card information.

While we might have wanted to keep guest information in a separate service from the credit card information (which may very well be the corporate card of someone responsible for travel), the above screen would seem to indicate that the data would be collected together, validated together, and would also have to be processed together.

The traditional way

In standard layered architectures you would have all the data submitted by the user passed in a single call from a controller to some “service layer” (possibly running on a different machine), which would then persist that data in one transaction.

Even if some attempt was made to separate things out, there likely would be some “orchestration service” that received the full set of data and it would make calls to the other “services”, passing in the specific data that each “service” is responsible for.

I am putting quotes around the word “service” to indicate that I don’t consider these proper services in the SOA sense (as they lack the necessary autonomy) – they are more like functions or procedures, whether or not they’re invoked XML over HTTP is besides the point.

What to do?

Like so many other things, the solution is simple but a bit counter-intuitive as it doesn’t follow the way most web development is done, i.e. one submit button => one call to the server.

Let’s say the “Red” service is responsible for guest information and the “Blue” service is responsible for credit card data. In this case, each service would have its own javascript come down with the page and that script would register itself for a callback on the click of the submit button. Each service would take the data the user entered into its part of the page and independently make a call to “the” server (could be to 2 separate servers) where the data is persisted (potentially to 2 different databases).

This raises other questions, of course.

Now that the data submitted is being processed in 2 transactions rather than just one, we may need to figure out how to correlate the data. In this specific case, it’s not such a big deal as there is no direct relationship between the guest and the credit card – both need to be independently correlated to some reservation ID.

That reservation ID would likely have been “created” on a button click on a previous screen by some other service. The reason why I put the word “created” in quotes is that this could be as simple as having the client generate a new GUID and put that in a cookie (which would cause the reservation ID to end up being submitted along with subsequent requests). Another alternative would be to put the reservation ID in the session.

It’s quite possible that the reservation ID would only be persisted much later in the service that owns it when the user actually confirms the reservation on the website.

In any case, what we can see is that each of the commands of our respective services can now be processed independently of the others in an entirely asynchronous fashion thus vastly improving the autonomy of our services.

Some words on CQRS

This style of UI composition where services leverage javascript code running in the browser isn’t technically difficult in the slightest. The rest of the implementation of each service – having a controller that takes that data and passes it on for persistence can be quite simple.

I’d say even more strongly, most of the time you shouldn’t need to use any fancy-dancy messaging to get that data persisted – that is, unless you’re still stuck with the big relational database behind 23 firewalls type data tier. Embrace NoSQL databases for the simplicity and scalability they provide – don’t try to re-invent that using messaging, CQRS, persistent view models, event-sourcing, and other crap.

There are other very valid business reasons to embrace CQRS, but they have nothing to do with persistence.

Also notice, this is all happening within a service boundary / bounded context.

In closing

If you aren’t leveraging these types of composite UI techniques, it’s quite likely that your service boundaries aren’t quite right. Do be aware of the UI design and use it to inform your choices around boundaries, but be aware of certain programming “best practices” that may lead you astray with your architecture.

Also, if you’re planning on coming to my course in Toronto to learn more about these topics, just wanted to let you know that there’s one week left for the early-bird discount.

Finally, it’s good I have a birthday that comes around once a year to remind me that my time here isn’t unlimited and that I had better get off my rear and do something meaningful with the time I do have. If you get value from these posts, leave a comment or send me a tweet to let me know – it does wonders for my motivation.

Thanks a bunch.

Why you should be using CQRS almost everywhere…

udidahan — Sun, 02 Oct 2011 20:45:44 +0000

… but differently than the way most people have been using it.

I think I’ve just about drove everybody crazy now with my apparent zigzagging on CQRS.

Some people heard about CQRS first from one of my presentations and got all excited about it. Then I did some blogging which further drove people to CQRS (as did Greg Young and some others). As CQRS was just about to hit its stride with the Early Adopters, I started pushing a more balanced view – CQRS not as an answer, but as one of many questions. More recently I’ve pushed more strongly back against CQRS saying that it should be used rarely.

So what’s the missing piece?

If you’re in the Domain-Driven Design camp (as many doing CQRS are), then it’s Bounded Contexts.

If you’re in the Event-Driven SOA camp (a much smaller camp to be sure), then it’s Services.

The problem is the naming, because the DDD guys have their kinds of services which do not fit the definition for Service of the Event-Driven SOA approach.

Let me propose the term Autonomous Business Component for the purposes of this blog post to describe that thing which is both a DDD Bounded Context (have the shared BC part of the acronym) and an SOA Autonomous Services. Resulting in the nice short form: ABC (and everyone knows you need to have a good acronym if you want something to catch on).

What does this have to do with CQRS?

Nothing just yet. Well, at least, nothing directly to do with CQRS.

Although some proponents of CQRS have stated that it can and should be used as the top-most architectural pattern, both myself and Greg Young (arguably the first two to talk about it and the two who ultimately collaborated on naming it – and now Google knows we didn’t means “cars”) always recommended it as a pattern to be used one level down.

Although Greg and I have had many long discussions on the topic and do agree very much about what the overall structure should look like, I’ll try to avoid putting words in his mouth from this point on.

Before talking more about ABCs, let’s discuss the principle upon which they rest: The Single Responsibility Principle (SRP).

What does SRP have to with CQRS?

Many developers are familiar with SRP and have seen good results from using it. What we’re going to do is take this principle to the next level.

In Object Orientation (OO), data is encapsulated in an object. A good object does not expose its data to other objects to do with as they wish. Rather, it exposes methods that other objects can invoke, and those methods operate on the internal data.

SRP would guide us to not have the same data exist in two objects. For example, if we saw the customer’s first name as an internal data member of two objects, we’d be right to question that kind of duplication and move to refactor it away. However, when we see two systems doing the exact same thing – somehow that gets excused.

“Of course we need to be able to see the customer’s first name in the front-end website as well as in the back-end fulfillment system. How could we NOT have the customer’s first name in both those code-bases?”

And there’s the catch.

Who said that a system should be a single code-base?

But what about integration?

Although many times we do need to integrate existing systems together, sometimes we have the ability to change those systems. More importantly, when going to create a new solution, we can avoid getting ourselves into the problems that integration tries to solve.

Integrating with a system that cannot be changed can be done also by composing multiple ABCs, but that’s a topic for another post.

It is better to think of integration as a necessary evil – kind of like regular expressions and multi-threading; things to be avoided unless absolutely necessary.

“If you have a problem that you decide to use a regular expression to solve, you now have 2 problems.” Or so the saying goes. With multi-threading, you have a non-deterministic number of problems to solve.

If you thought you had duplicate responsibilities with 2 systems operating on the same data, how will introducing a 3rd code base (also known as “integration”) help? Remember that Single Responsibility Principle – our goal is to get it down to one.

OK, so how do ABCs do that?

In order for us to get back into alignment with SRP, that would require us to have responsibility for a single piece of data exist in one code base. Note that SRP makes no statements about how many physical places a given code base can be deployed to. Nor does it state that only a single technology can be in play – code that emits HTML can be packaged at design time together with rich-client code in the same solution.

If an ABC is responsible for a piece of data, it is responsible for it everywhere, and forever. No other ABC should see that data. That data should not travel between ABCs via remote procedure call (RPC) or via publish/subscribe. It is the ultimate level of encapsulation – SRP applied at the highest level of granularity.

This results in systems which are the result of deploying the components of multiple ABCs to the same physical place. The ABC which owns the customer name would have the necessary web code to render it in the e-commerce front-end and in the shipping back-end for printing on labels. This would mean that practically every screen in any UI is a composite of widgets owned by their respective ABCs.

This is ultimately what keeps the complexity of each ABC’s code base to a minimum.

But why not just use CQRS as the top-level pattern? ABCs are weird.

Imagine trying to create a single denormalized view model for the entire Amazon.com product page – product name, price, inventory, editorial review, customer comments, other products that customers viewed, other products that customers bought, etc.

Pretty complex, right?

How much duplication would you have for the page shown after you add an item to a cart? Once again, you need to show other products that customers bought, their names, images, prices, and inventory.

And then on the home page – items you might be interested in, names, images, prices.

And that’s only in the front-end system.

It’s not just the duplication, but how complex the code is for each one.

Instead of the duplication that top-level CQRS would bring you, consider an ABC responsible for products names and images that has just about the same view model composed on each of the above screens. The same with another ABC responsible for price.

You may be thinking that this would result in more queries to get the data to show on a page, and you’d be right. But it isn’t necessarily a classical N+1 Select problem, as the queries are bounded to the number of ABCs. Secondly, consider the ability to have well-tuned caching at the granularity of an ABC – something that would be much more difficult when dealing with everything as a single monolithic view model. In short, not only will it not be a performance problem, often it will actually improve performance.

OK – that explains “everywhere”, what about “forever”?

Forever is where things get interesting – or more accurately, when they get interesting.

Let’s talk about things like invoices.

One of the requirements in this area is that immutability. If the customer’s name was Jane Smith when they made their purchase, it doesn’t matter that they’ve since changed their name to Jane Jones, the invoice should still show Jane Smith.

Often developers push these types of requirements on the data warehouse guys – that’s where history gets handled. The only thing is that if your ABC owns the customer’s name, then no other code base can deal with it. If it’s your data, you have to handle all historical representations of it.

On the one hand, this would seem to kill the data warehouse. On the other hand, it means that the principles of data warehouses are now core to every code-base.

This means you don’t ever delete data (see my previous blog post on the subject), and you definitely don’t overwrite it with an update – even if you think you’re in a simple CRUD domain. The only case where you can get away with traditional CRUD is if we’re talking about private data – data that is only ever acted on by a single actor.

This sounds like the collaboration you talk about with CQRS

It’s similar in principle but different in practice.

In a collaborative domain, an inherent property of the domain is that multiple actors operate in parallel on the same set of data. A reservation system for concerts would be a good example of a collaborative domain – everyone wants the “good seats” (although it might be better call that competitive rather than collaborative, it is effectively the same principle).

A customer’s name would not fall under that category. It isn’t an inherent property of the domain for multiple actors to operate on that data. While there can be multiple readers, one can easily enforce a single writer without any adverse effects. Doing that with a reservation system would cause the online system to behave as if users were lining up in front of a box office – not a desirable outcome.

Private data would be something like a user’s shopping cart. Until they make a purchase, that data doesn’t need to be visible anywhere. Here you could theoretically do simple CRUD – that is, until the business realizes that there’s extremely valuable information to be extracted from the historical record of things people do with their carts.

I think you’re ready to make your point, so just make it already

OK – so we now realize that Update and Delete don’t exist in their traditional form. Delete is really just a kind of update, and update is effectively an “upsert” – a combination of update and insert to retain history. This can be done by having ValidFrom and ValidTo columns for our data.

In which case, Create is really just a special case of Upsert, which looks like this:

UPDATE Something SET ValidTo = NOW() WHERE Id=@Id AND ValidTo = NULL; INSERT INTO Something SET { regular values }, Id=@Id, ValidTo = NULL;

And then we’d have 2 forms of Read – reading the current state (ValidTo = NULL), and reading history (ValidFrom <= Instant AND (ValidTo >= Instant OR ValidTo = NULL))

Here we don’t need fancy N-Tier architectures, data transfer objects, service layers, or domain models. A simple 2-Tier approach could probably suffice. We don’t need a task-based UI, events, denormalized view models, or any of that CQRS stuff. This was at the crux of my previous anti-CQRS post.

The only thing is that this is exactly CQRS.

Say what?

Have we not effectively separated the responsibility of commands/upserts and queries/reads?

As Greg Young has said before, “the creation of 2 objects where there previously was one”.

Effectively 2 paths through our ABC.

CQRS.

Let me give you a second to gather your thoughts.

You see, CQRS is an approach, a mind-set – not a cookie cutter solution. Frameworks that guide you to applying CQRS exactly the same way everywhere are taking you in the wrong direction. The fact is that you couldn’t possibly know what your Aggregate Roots were before you figured out how to break your system down into ABCs. Attempting to create commands and events for everything will make you overcomplicate your solution.

So the built-in history of this model is event-sourcing?

Well, it’s not event-sourcing in the sense that we don’t necessarily have events. It achieves many of the benefits of event-sourcing by giving us the full history of what happened.

On the whole issue of replaying events to fix bugs – that’s a bit problematic, logically, unless we have a closed system. A closed system is one that doesn’t interact with anything else – no other systems, no users, nothing. As such, closed systems aren’t that common.

In an open system, one with users, let’s say there was a bug. This bug could have caused the wrong data to be written and/or shown to users. As such, users could have submitted subsequent commands based on that erroneous data that they would not have submitted otherwise. There’s no way for us to know.

The problem with replaying events when we fix the bug is that we’re in essence rewriting history – making it as if the user didn’t see the wrong data. The only problem is that we can’t know which events not to replay – we can’t automatically come up with the right events that should have come afterwards. We could try to sit together with our users and have them try to revise history manually, but our organization often isn’t in a bubble. Our users interacted with customers and suppliers. It isn’t feasible to try to undo the real-world impacts of this situation.

Why didn’t you just tell us this from the very beginning?

I did, you just weren’t listening.

You wanted a cookie cutter, and until you tried CQRS out as cookie cutter (and saw it create a bunch of complexity) you wouldn’t listen to anything else.

As developers, we’re trained to solve problems – the faster the better. Unfortunately, this causes us to be blind to things that don’t immediately present themselves as solutions.

When applying CQRS with ABCs, the solutions you end up with are very simple, but the process of getting there is quite hard and takes practice. Finding the boundaries of ABCs such that data isn’t duplicated between them and that data doesn’t travel between them either via RPC or publish/subscribe – it may feel impossible the first several times you try. Keep at it – it is almost always possible.

We haven’t touched on the whole saga/aggregate-root thing yet, but that isn’t as important until you can successfully apply the principles described here.

Also, this post has already gotten long enough, so it looks like now would be a good time to stop.

Until next time…

Inconsistent data, poor performance, or SOA – pick one

udidahan — Sun, 18 Sep 2011 16:52:17 +0000

One of the things that surprises some developers that I talk to is that you don’t always get consistency even with end-to-end synchronous communication and a single database. This goes beyond things like isolation levels that some developers are aware of and is particularly significant in multi-user collaborative domains.

The problem

Let’s start with an image to describe the scenario:

Image 1. 3 transactions working in parallel on 3 entities

The main issue we have here is that the values transaction 2 gets for A and B are those from T0 – before either transaction 1 or 3 completed. The reason this is an issue is that these old values (usually together with some message data) are used to calculate what the new state of C should be.

Traditional optimistic concurrency techniques won’t detect any problem if we don’t touch A or B in transaction 2.

In short, systems today are causing inconsistency.

Some solutions

1. Don’t have transactions which operate on multiple entities (which probably isn’t possible for some of your most important business logic).

2. Turn on multi-version concurrency control – this is called snapshot isolation in MS Sql Server.

Yes, you need to turn it on. It’s off by default.

The good news is that this will stop the writing of inconsistent data to your database.
The bad news is that it will probably cause your system many more exceptions when going to persist.

For those of you who are using transaction messaging with automatic retrying, this will end up as “just” a performance problem (unless you follow the recommendations below). For those of you who are using regular web/wcf services (over tcp/http), you’re “cross cutting” exception management will likely end up discarding all the data submitted in those requests (but since that’s what you’re doing when you run into deadlocks this shouldn’t be news to you).

The solution to the performance issues

Eventual consistency.

Funny isn’t it – all those people who were afraid of eventual consistency got inconsistency instead.

Also, it’s not enough to just have eventual consistency (like between the command and query sides of CQRS). You need to drastically decrease the size of your entities. And the best way of doing that is to partition those entities across multiple business services (also known in DDD lingo as Bounded Contexts) each with its own database.

This is yet another reason why I say that CQRS shouldn’t be the top level architectural breakdown. Very useful within a given business service, yes – though sometimes as small as just some sagas.

Next steps

It may seem unusual that the title of this post implies that SOA is the solution, yet the content clearly states that traditional HTTP-based web services are a problem. Even REST wouldn’t change matters as it doesn’t influence how transactions are managed against a database.

The SOA solution I’m talking about here is the one I’ve spent the last several years blogging about. It’s a different style of SOA which has services stretch up to contain parts of the UI as well as down to contain parts of the database, resulting in a composite UI and multiple databases. This is a drastically different approach than much of the literature on the topic – especially Thomas Erl’s books.

Unfortunately there isn’t a book out there with all of this in it (that I’ve found), and I’m afraid that with my schedule (and family) writing a book is pretty much out of the question. Let’s face it – I’m barely finding time to blog.

The one thing I’m trying to do more of is provide training on these topics. I’ve just finished a course in London, doing another this week in Aarhus Denmark, and another next month in San Francisco (which is now sold out). The next openings this year will be in Stockholm, London; Sydney Australia and Austin Texas will be coming in January of next year. I’ll be coming over to the US more next year so if you missed San Francisco, keep an eye out.

I wish there was more I could do, but I’m only one guy.

Hmm, maybe it’s time to change that.

The Danger of Centralized Workflows

udidahan — Wed, 13 Jul 2011 08:05:19 +0000

It isn’t uncommon for me to have a client or student at one of my courses ask me about some kind of workflow tool. This could be Microsoft Workflow Foundation, BizTalk, K2, or some kind of BPEL/orchestration engine. The question usually revolves around using this tool for all workflows in the system as opposed to the SOA-EDA-style publish/subscribe approach I espouse.

The question

The main touted benefit of these workflow-centric architectures is that we don’t have to change the code of the system in order to change its behavior resulting in ultimate flexibility!

Some of you may have already gone down this path and are shaking your heads remembering how your particular road to hell was paved with the exact same good intentions.

Let me explain why these things tend to go horribly wrong.

What’s behind the curtain

It starts with the very nature of workflow – a flow chart, is procedural in nature. First do this, then that, if this, then that, etc. As we’ve experienced first hand in our industry, procedural programming is fine for smaller problems but isn’t powerful enough to handle larger problems. That’s why we’ve come up with object-oriented programming.

I have yet to see an object-oriented workflow drag-and-drop engine. Yes, it works great for simple demo-ware apps. But if you try to through your most complex and volatile business logic at it, it will become a big tangled ball of spaghetti – just like if you were using text rather than pictures to code it.

And that’s one of the fundamental fallacies about these tools – you are still writing code. The fact that it doesn’t look like the rest of your code doesn’t change that fact. Changing the definition of your workflow in the tool IS changing your code.

On productivity

Sometimes people mention how much more productive it would be to use these tools than to write the code “by hand”. Occasionally I hear about an attempt to have “the business” use these tools to change the workflows themselves – without the involvement of developers (“imagine how much faster we could go without those pesky developers!”).

For those of us who have experienced this first-hand, we know that’s all wrong.

If “the business” is changing the workflows without developer involvement, invariably something breaks, and then they don’t know what to do. They haven’t been trained to think the way that developers have – they don’t really know how to debug. So the developers are brought back in anyway and from that point on, the business is once again giving requirements and the devs are the one implementing it.

Now when it comes to developer productivity, I can tell you that the keyboard is at least 10x more productive than the mouse. I can bang out an if statement in code much faster than draggy-dropping a diamond on the canvas, and two other activities for each side of the clause.

On maintainability

Sometimes the visualization of the workflow is presented as being much more maintainable than “regular code”.

When these workflows get to be to big/nested/reused, it ends up looking like the wiring diagram of an Intel chip (or worse). Check out the following diagram taken from the DailyWTF on a customer friendly system:

The bigger these get, the less maintainable they are.

Now, some would push back on this saying that a method with 10,000 lines of code in it may be just as bad, if not worse. The thing is that these workflow tools guide developers down a path where it is very likely to end up with big, monolithic, procedural, nested code. When working in real code, we know we need to take responsibility for the cleanliness of our code using object-orientation, patterns, etc and refactoring things when they get too messy.

Here is where I’d bring up the SOA/pub-sub approach as an alternative – there is no longer this idea of a centralized anything. You have small pieces of code, each encapsulating a single business responsibility, working in concert with each other – reacting to each others events.

Productivity take 2: testing and version control

If you’re going to take your most complex and volatile business logic and put it into these workflow tools, have you thought about how your going to test it? How do you know that it works correctly? It tends to be VERY difficult to unit-test these kinds of workflows.

When a developer is implementing a change request, how do they know what other workflows might have been broken? Do they have to manually go through each and every scenario in the system to find out? How’s that for productivity?

Assuming something did break and the developer wants to see a diff – what’s different in the new workflow from the old one, what would that look like? When working with a team, the ability to diff and merge code is at the base of the overall team productivity.

What would happen to your team if you couldn’t diff or merge code anymore?
In this day and age, it should be considered irresponsible to develop without these version control basics.

In closing

There are some cases where these tools might make sense, but those tend to be much more rare than you’d expect (and there are usually better alternatives anyway). Regardless, the architectural analysis should start without the assumption of centralized workflow, database, or centralized anything for that matter.

If someone tries to push one of these tools/architectures on you, don’t walk away – run!

Service Boundaries Aren’t Process Boundaries

udidahan — Sun, 03 Jul 2011 12:23:37 +0000

Richard Veryard blogged about the topic of service boundaries in SOA, specifically asking why aren’t more people talking about service boundaries – especially if they’re such a core principle in SOA.

I can only speak for myself on this one, but I guess it’s that there’s just so many times you can repeat yourself.

So, why this post?

Well, Richard was able to dig up an old (2004) presentation I gave about SOA in which I said:

“Services run in a separate process from their clients
A boundary must be crossed to get from the client to the service – network, security, …”

And 7 years later I can say, hand on heart, I was wrong.

Luckily, I’ve spent much of those past 7 years trying to correct that recommendation. One blog post in which I tried to do that (in mid-2007) was On Intermediation and SOA in which I described the relationship between systems (i.e process boundaries) and services:

“all of these “systems” might just end up within the same service, or having parts of them being used by multiple services

There can also be multiple services (or, more accurately, parts of multiple services) deployed together in the same system/process.

And this is nothing new – in the 4+1 Architectural View Model by Philippe Kruchten (1995) we can see very clearly the differentiation between the Logical View (our services) and the Physical View (a.k.a the Deployment View).

These views are orthogonal to each other – multiple elements from one view can map to a single element in another view (and vice versa).

This, if anything, makes it that much harder to identify service boundaries – if they have nothing to do with the existing applications and systems, then what are they? In my blog post on The Known Unknowns of SOA I point to the fact that Business Capabilities are much more appropriate constructs than, say, web services which (as it says in the referenced post) “[are] merely a standardized approach to accessing functionality on remote systems”.

As I bring this post to a close, I’m feeling more comfortable rehashing material I’ve published before:

Logical and Physical Architecture

and the rest of the SOA category on my blog here.

Happy boundary hunting.

The Known Unknowns of SOA

udidahan — Mon, 15 Nov 2010 13:44:40 +0000

One of the better known analysts in the enterprise software area, JP Morgenthal, wrote this post about the relationship between SOA, BPM, and EA. In it he defines SOA as follows:

“SOA is a practice that focuses on modeling the entities, and relationships between entities, that comprise the business as a set of services. This can be done on a small or large scale. Typically, the relationships in this model represent consumer/provider relationships.”

I have some serious concerns about the ramifications of this definition/description.

First of all, when reading “entities”, many people will interpret that to mean the entities found in Entity Relationship Diagrams [ERD] or in Object Oriented Analysis & Design [OOAD]. In both, these entities are identified as the “nouns” of the domain. Examples of these ERD/OOAD-type entities include things like Customer, Order, and Product.

These are almost always the wrong place to start for identifying services in SOA.

Second, on the consumer/provider relationship: on the one had, this fits very well with how web services can consume (or call) other web services. However, the downsides of using web services as services in SOA is becoming well enough known that even in the same post we see this warning:

“Web Services is not SOA, it is merely a standardized approach to accessing functionality on remote systems.”

But the question remains, if a producer/consumer relationship is OK for SOA-type services, why doesn’t that hold for web services? And the answer is… it depends on the type of producer/consumer relationship. The typical relationship is one of synchronous calls from consumer to producer, this is not OK for SOA-type services either.

You see, this synchronous producer/consumer implies a model where services are not able to fulfill their objectives without calling other services. In order for us to achieve the IT/Business alignment promised by SOA, we need services which are autonomous, ie. able to fulfill their objectives without that kind of external help.

Instead, we need to look for a more loosely coupled producer/consumer relationship – like publish/subscribe, where the producer emits events, and the consumer subscribes and handles those events. The reason that this kind of relationship doesn’t hurt autonomy is that it disconnects services on the dimension of time. In order for a service to be able to make a decision autonomously without synchronously calling any other service, using only information provided by events it received in the past, it must be strongly aligned with the business.

Most projects which bandy about the SOA acronym aren’t actually made up of services – they’re made up of XML over HTTP functions calling other XML over HTTP functions, eventually calling XML over HTTP databases. You can layer as much XML and HTTP as you want on top of it, but at the end of the day, most projects are just functions calling functions calling databases – in other words, procedural programming in the large, and no amount of SOAP will wash away the stink.

Here’s a different definition of services for SOA that may communicate a bit better what it’s all about:

A service is the technical authority for a specific business capability.
Any piece of data or rule must be owned by only one service.

What this means is that even when services are publishing and subscribing to each other’s events, we always know what the authoritative source of truth is for every piece of data and rule.

Also, when looking at services from the lense of business capabilities, what we see is that many user interfaces present information belonging to different capabilities – a product’s price alongside whether or not it’s in stock. In order for us to comply with the above definition of services, this leads us to an understanding that such user interfaces are actually a mashup – with each service having the fragment of the UI dealing with its particular data.

Ultimately, process boundaries like web apps, back-end, batch-processing are very poor indicators of service boundaries. We’d expect to see multiple business capabilities manifested in each of those processes.

I know that this may be more confusing than the traditional web services approach but, to paraphrase Donald Rumsfeld, it is better to know that you don’t know, than to not know that you don’t know

Logical and Physical Architecture

udidahan — Mon, 08 Nov 2010 19:14:04 +0000

One architectural misunderstanding I see repeatedly in my work with clients is in the relationship between logical and physical architecture. The most common building-block of these misunderstandings is the web service (or it’s “upgraded” .net counterpart – the WCF service).

Don’t get me wrong, sometimes there is a place for a web service, just not everywhere.

So, what’s the problem?

Well, when developers and architects use web services as the building blocks of their designs, they are creating the same architecture for both the logical and physical elements of their system. Back in 1995, Philippe Kruchten documented his 4 + 1 Architectural View Model in which he outlined 4 + 1 different views that should be used to describe an architecture.

Even though since 1995 the number and types of recommended views of software architecture has evolved (with things like the Zachman Framework for enterprise architecture numbering some 30 views), there is broad agreement that (at the very least) the logical and physical artifacts should likely be designed differently.

Just because two distinct logical components have been identified in the architecture, that doesn’t necessarily mean they should be hosted separately (for example by making each one a web/wcf service). In fact, there are significant disadvantages to doing so (as described in the Fallacies of Distributed Computing).

In some cases, this mistake is exacerbated by a mistaking these components with SOA-type services, resulting in an attempt by developers to have each component have its own contract, which can then be independently versioned. This often results in the need for transformation between the structure of these so-called contracts, but not within the components themselves (oh-no, they’re “autonomous”), but rather in between them using some kind of “ESB” technology.

This architectural style is known as the Broker, Hub and Spoke, Mediator, and most importantly – not SOA. If you find a technology that fits this style perfectly (like BizTalk), that technology is not a Bus, not a Service Bus, and definitely not an Enterprise Service Bus.

One of the problems of this approach is that when any “service” contract changes, you have to change all the transformations in your broker that involve it. Unfortunately, most brokers have no unit-testing facility so it’s very much trial and error, and error, and error. The matter is even more serious since most brokers don’t enable you to have your transformations or orchestrations in source control, so you can’t diff to see what changed from the previous version.

It’s really amazing how much pain can be traced back to that one original misunderstanding.

Clarified CQRS

udidahan — Wed, 09 Dec 2009 14:57:19 +0000

After listening how the community has interpreted Command-Query Responsibility Segregation I think that the time has come for some clarification. Some have been tying it together to Event Sourcing. Most have been overlaying their previous layered architecture assumptions on it. Here I hope to identify CQRS itself, and describe in which places it can connect to other patterns.

Download as PDF – this is quite a long post.

Why CQRS

Before describing the details of CQRS we need to understand the two main driving forces behind it: collaboration and staleness.

Collaboration refers to circumstances under which multiple actors will be using/modifying the same set of data – whether or not the intention of the actors is actually to collaborate with each other. There are often rules which indicate which user can perform which kind of modification and modifications that may have been acceptable in one case may not be acceptable in others. We’ll give some examples shortly. Actors can be human like normal users, or automated like software.

Staleness refers to the fact that in a collaborative environment, once data has been shown to a user, that same data may have been changed by another actor – it is stale. Almost any system which makes use of a cache is serving stale data – often for performance reasons. What this means is that we cannot entirely trust our users decisions, as they could have been made based on out-of-date information.

Standard layered architectures don’t explicitly deal with either of these issues. While putting everything in the same database may be one step in the direction of handling collaboration, staleness is usually exacerbated in those architectures by the use of caches as a performance-improving afterthought.

A picture for reference

I’ve given some talks about CQRS using this diagram to explain it:

The boxes named AC are Autonomous Components. We’ll describe what makes them autonomous when discussing commands. But before we go into the complicated parts, let’s start with queries:

Queries

If the data we’re going to be showing users is stale anyway, is it really necessary to go to the master database and get it from there? Why transform those 3rd normal form structures to domain objects if we just want data – not any rule-preserving behaviors? Why transform those domain objects to DTOs to transfer them across a wire, and who said that wire has to be exactly there? Why transform those DTOs to view model objects?

In short, it looks like we’re doing a heck of a lot of unnecessary work based on the assumption that reusing code that has already been written will be easier than just solving the problem at hand. Let’s try a different approach:

How about we create an additional data store whose data can be a bit out of sync with the master database – I mean, the data we’re showing the user is stale anyway, so why not reflect in the data store itself. We’ll come up with an approach later to keep this data store more or less in sync.

Now, what would be the correct structure for this data store? How about just like the view model? One table for each view. Then our client could simply SELECT * FROM MyViewTable (or possibly pass in an ID in a where clause), and bind the result to the screen. That would be just as simple as can be. You could wrap that up with a thin facade if you feel the need, or with stored procedures, or using AutoMapper which can simply map from a data reader to your view model class. The thing is that the view model structures are already wire-friendly, so you don’t need to transform them to anything else.

You could even consider taking that data store and putting it in your web tier. It’s just as secure as an in-memory cache in your web tier. Give your web servers SELECT only permissions on those tables and you should be fine.

Query Data Storage

While you can use a regular database as your query data store it isn’t the only option. Consider that the query schema is in essence identical to your view model. You don’t have any relationships between your various view model classes, so you shouldn’t need any relationships between the tables in the query data store.

So do you actually need a relational database?

The answer is no, but for all practical purposes and due to organizational inertia, it is probably your best choice (for now).

Scaling Queries

Since your queries are now being performed off of a separate data store than your master database, and there is no assumption that the data that’s being served is 100% up to date, you can easily add more instances of these stores without worrying that they don’t contain the exact same data. The same mechanism that updates one instance can be used for many instances, as we’ll see later.

This gives you cheap horizontal scaling for your queries. Also, since your not doing nearly as much transformation, the latency per query goes down as well. Simple code is fast code.

Data modifications

Since our users are making decisions based on stale data, we need to be more discerning about which things we let through. Here’s a scenario explaining why:

Let’s say we have a customer service representative who is one the phone with a customer. This user is looking at the customer’s details on the screen and wants to make them a ‘preferred’ customer, as well as modifying their address, changing their title from Ms to Mrs, changing their last name, and indicating that they’re now married. What the user doesn’t know is that after opening the screen, an event arrived from the billing department indicating that this same customer doesn’t pay their bills – they’re delinquent. At this point, our user submits their changes.

Should we accept their changes?

Well, we should accept some of them, but not the change to ‘preferred’, since the customer is delinquent. But writing those kinds of checks is a pain – we need to do a diff on the data, infer what the changes mean, which ones are related to each other (name change, title change) and which are separate, identify which data to check against – not just compared to the data the user retrieved, but compared to the current state in the database, and then reject or accept.

Unfortunately for our users, we tend to reject the whole thing if any part of it is off. At that point, our users have to refresh their screen to get the up-to-date data, and retype in all the previous changes, hoping that this time we won’t yell at them because of an optimistic concurrency conflict.

As we get larger entities with more fields on them, we also get more actors working with those same entities, and the higher the likelihood that something will touch some attribute of them at any given time, increasing the number of concurrency conflicts.

If only there was some way for our users to provide us with the right level of granularity and intent when modifying data. That’s what commands are all about.

Commands

A core element of CQRS is rethinking the design of the user interface to enable us to capture our users’ intent such that making a customer preferred is a different unit of work for the user than indicating that the customer has moved or that they’ve gotten married. Using an Excel-like UI for data changes doesn’t capture intent, as we saw above.

We could even consider allowing our users to submit a new command even before they’ve received confirmation on the previous one. We could have a little widget on the side showing the user their pending commands, checking them off asynchronously as we receive confirmation from the server, or marking them with an X if they fail. The user could then double-click that failed task to find information about what happened.

Note that the client sends commands to the server – it doesn’t publish them. Publishing is reserved for events which state a fact – that something has happened, and that the publisher has no concern about what receivers of that event do with it.

Commands and Validation

In thinking through what could make a command fail, one topic that comes up is validation. Validation is different from business rules in that it states a context-independent fact about a command. Either a command is valid, or it isn’t. Business rules on the other hand are context dependent.

In the example we saw before, the data our customer service rep submitted was valid, it was only due to the billing event arriving earlier which required the command to be rejected. Had that billing event not arrived, the data would have been accepted.

Even though a command may be valid, there still may be reasons to reject it.

As such, validation can be performed on the client, checking that all fields required for that command are there, number and date ranges are OK, that kind of thing. The server would still validate all commands that arrive, not trusting clients to do the validation.

Rethinking UIs and commands in light of validation

The client can make of the query data store when validating commands. For example, before submitting a command that the customer has moved, we can check that the street name exists in the query data store.

At that point, we may rethink the UI and have an auto-completing text box for the street name, thus ensuring that the street name we’ll pass in the command will be valid. But why not take things a step further? Why not pass in the street ID instead of its name? Have the command represent the street not as a string, but as an ID (int, guid, whatever).

On the server side, the only reason that such a command would fail would be due to concurrency – that someone had deleted that street and that that hadn’t been reflected in the query store yet; a fairly exceptional set of circumstances.

Reasons valid commands fail and what to do about it

So we’ve got a well-behaved client that is sending valid commands, yet the server still decides to reject them. Often the circumstances for the rejection are related to other actors changing state relevant to the processing of that command.

In the CRM example above, it is only because the billing event arrived first. But “first” could be a millisecond before our command. What if our user pressed the button a millisecond earlier? Should that actually change the business outcome? Shouldn’t we expect our system to behave the same when observed from the outside?

So, if the billing event arrived second, shouldn’t that revert preferred customers to regular ones? Not only that, but shouldn’t the customer be notified of this, like by sending them an email? In which case, why not have this be the behavior for the case where the billing event arrives first? And if we’ve already got a notification model set up, do we really need to return an error to the customer service rep? I mean, it’s not like they can do anything about it other than notifying the customer.

So, if we’re not returning errors to the client (who is already sending us valid commands), maybe all we need to do on the client when sending a command is to tell the user “thank you, you will receive confirmation via email shortly”. We don’t even need the UI widget showing pending commands.

Commands and Autonomy

What we see is that in this model, commands don’t need to be processed immediately – they can be queued. How fast they get processed is a question of Service-Level Agreement (SLA) and not architecturally significant. This is one of the things that makes that node that processes commands autonomous from a runtime perspective – we don’t require an always-on connection to the client.

Also, we shouldn’t need to access the query store to process commands – any state that is needed should be managed by the autonomous component – that’s part of the meaning of autonomy.

Another part is the issue of failed message processing due to the database being down or hitting a deadlock. There is no reason that such errors should be returned to the client – we can just rollback and try again. When an administrator brings the database back up, all the message waiting in the queue will then be processed successfully and our users receive confirmation.

The system as a whole is quite a bit more robust to any error conditions.

Also, since we don’t have queries going through this database any more, the database itself is able to keep more rows/pages in memory which serve commands, improving performance. When both commands and queries were being served off of the same tables, the database server was always juggling rows between the two.

Autonomous Components

While in the picture above we see all commands going to the same AC, we could logically have each command processed by a different AC, each with it’s own queue. That would give us visibility into which queue was the longest, letting us see very easily which part of the system was the bottleneck. While this is interesting for developers, it is critical for system administrators.

Since commands wait in queues, we can now add more processing nodes behind those queues (using the distributor with NServiceBus) so that we’re only scaling the part of the system that’s slow. No need to waste servers on any other requests.

Service Layers

Our command processing objects in the various autonomous components actually make up our service layer. The reason you don’t see this layer explicitly represented in CQRS is that it isn’t really there, at least not as an identifiable logical collection of related objects – here’s why:

In the layered architecture (AKA 3-Tier) approach, there is no statement about dependencies between objects within a layer, or rather it is implied to be allowed. However, when taking a command-oriented view on the service layer, what we see are objects handling different types of commands. Each command is independent of the other, so why should we allow the objects which handle them to depend on each other?

Dependencies are things which should be avoided, unless there is good reason for them.

Keeping the command handling objects independent of each other will allow us to more easily version our system, one command at a time, not needing even to bring down the entire system, given that the new version is backwards compatible with the previous one.

Therefore, keep each command handler in its own VS project, or possibly even in its own solution, thus guiding developers away from introducing dependencies in the name of reuse (it’s a fallacy). If you do decide as a deployment concern, that you want to put them all in the same process feeding off of the same queue, you can ILMerge those assemblies and host them together, but understand that you will be undoing much of the benefits of your autonomous components.

Whither the domain model?

Although in the diagram above you can see the domain model beside the command-processing autonomous components, it’s actually an implementation detail. There is nothing that states that all commands must be processed by the same domain model. Arguably, you could have some commands be processed by transaction script, others using table module (AKA active record), as well as those using the domain model. Event-sourcing is another possible implementation.

Another thing to understand about the domain model is that it now isn’t used to serve queries. So the question is, why do you need to have so many relationships between entities in your domain model?

(You may want to take a second to let that sink in.)

Do we really need a collection of orders on the customer entity? In what command would we need to navigate that collection? In fact, what kind of command would need any one-to-many relationship? And if that’s the case for one-to-many, many-to-many would definitely be out as well. I mean, most commands only contain one or two IDs in them anyway.

Any aggregate operations that may have been calculated by looping over child entities could be pre-calculated and stored as properties on the parent entity. Following this process across all the entities in our domain would result in isolated entities needing nothing more than a couple of properties for the IDs of their related entities – “children” holding the parent ID, like in databases.

In this form, commands could be entirely processed by a single entity – viola, an aggregate root that is a consistency boundary.

Persistence for command processing

Given that the database used for command processing is not used for querying, and that most (if not all) commands contain the IDs of the rows they’re going to affect, do we really need to have a column for every single domain object property? What if we just serialized the domain entity and put it into a single column, and had another column containing the ID? This sounds quite similar to key-value storage that is available in the various cloud providers. In which case, would you really need an object-relational mapper to persist to this kind of storage?

You could also pull out an additional property per piece of data where you’d want the “database” to enforce uniqueness.

I’m not suggesting that you do this in all cases – rather just trying to get you to rethink some basic assumptions.

Let me reiterate

How you process the commands is an implementation detail of CQRS.

Keeping the query store in sync

After the command-processing autonomous component has decided to accept a command, modifying its persistent store as needed, it publishes an event notifying the world about it. This event often is the “past tense” of the command submitted:

MakeCustomerPerferredCommand -> CustomerHasBeenMadePerferredEvent

The publishing of the event is done transactionally together with the processing of the command and the changes to its database. That way, any kind of failure on commit will result in the event not being sent. This is something that should be handled by default by your message bus, and if you’re using MSMQ as your underlying transport, requires the use of transactional queues.

The autonomous component which processes those events and updates the query data store is fairly simple, translating from the event structure to the persistent view model structure. I suggest having an event handler per view model class (AKA per table).

Here’s the picture of all the pieces again:

Bounded Contexts

While CQRS touches on many pieces of software architecture, it is still not at the top of the food chain. CQRS if used is employed within a bounded context (DDD) or a business component (SOA) – a cohesive piece of the problem domain. The events published by one BC are subscribed to by other BCs, each updating their query and command data stores as needed.

UI’s from the CQRS found in each BC can be “mashed up” in a single application, providing users a single composite view on all parts of the problem domain. Composite UI frameworks are very useful for these cases.

Summary

CQRS is about coming up with an appropriate architecture for multi-user collaborative applications. It explicitly takes into account factors like data staleness and volatility and exploits those characteristics for creating simpler and more scalable constructs.

One cannot truly enjoy the benefits of CQRS without considering the user-interface, making it capture user intent explicitly. When taking into account client-side validation, command structures may be somewhat adjusted. Thinking through the order in which commands and events are processed can lead to notification patterns which make returning errors unnecessary.

While the result of applying CQRS to a given project is a more maintainable and performant code base, this simplicity and scalability require understanding the detailed business requirements and are not the result of any technical “best practice”. If anything, we can see a plethora of approaches to apparently similar problems being used together – data readers and domain models, one-way messaging and synchronous calls.

Although this blog post is over 3000 words (a record for this blog), I know that it doesn’t go into enough depth on the topic (it takes about 3 days out of the 5 of my Advanced Distributed Systems Design course to cover everything in enough depth). Still, I hope it has given you the understanding of why CQRS is the way it is and possibly opened your eyes to other ways of looking at the design of distributed systems.

Questions and comments are most welcome.

The Fallacy Of ReUse

udidahan — Sun, 07 Jun 2009 08:40:16 +0000

This industry is pre-occupied with reuse.

There’s this belief that if we just reused more code, everything would be better.

Some even go so far as saying that the whole point of object-orientation was reuse – it wasn’t, encapsulation was the big thing. After that component-orientation was the thing that was supposed to make reuse happen. Apparently that didn’t pan out so well either because here we are now pinning our reuseful hopes on service-orientation.

Entire books of patterns have been written on how to achieve reuse with the orientation of the day.
Services have been classified every which way in trying to achieve this, from entity services and activity services, through process services and orchestration services. Composing services has been touted as the key to reusing, and creating reusable services.

I might as well let you in on the dirty-little secret:

Reuse is a fallacy

Before running too far ahead, let’s go back to what the actual goal of reuse was: getting done faster.

That’s it.

It’s a fine goal to have.

And here’s how reuse fits in to the picture:

If we were to write all the code of a system, we’d write a certain amount of code.
If we could reuse some code from somewhere else that was written before, we could write less code.
The more code we can reuse, the less code we write.
The less code we write, the sooner we’ll be done!

However, the above logical progression is based on another couple of fallacies:

Fallacy: All code takes the same amount of time to write

Fallacy: Writing code is the primary activity in getting a system done

Anyone who’s actually written some code that’s gone into production knows this.

There’s the time it takes us to understand what the system should do.
Multiply that by the time it takes the users to understand what the system should do
Then there’s the integrating that code with all the other code, databases, configuration, web services, etc.
Debugging. Deploying. Debugging. Rebugging. Meetings. Etc.

Writing code is actually the least of our worries.
We actually spend less time writing code than…

Rebugging code

Also known as bug regressions.

This is where we fix one piece of code, and in the process break another piece of code.
It’s not like we do it on purpose. It’s all those dependencies between the various bits of code.
The more dependencies there are, the more likely something’s gonna break.
Especially when we have all sorts of hidden dependencies,
like when other code uses stuff we put in the database without asking us what it means,
or, heaven forbid, changing it without telling us.

These debugging/rebugging cycles can make stabilizing a system take a long time.

So, how does reuse help/hinder with that?

Here’s how:

Dependencies multiply by reuse

It’s to be expected. If you wrote the code all in one place, there are no dependencies. By reusing code, you’ve created a dependency. The more you reuse, the more dependencies you have. The more dependencies, the more rebugging.

Of course, we need to keep in mind the difference between…

Reuse & Use

Your code uses the runtime API (JDK, .NET BCL, etc).
Likewise other frameworks like (N)Hibernate, Spring, WCF, etc.

Reuse happens when you extend and override existing behaviors within other code.
This is most often done by inheritance in OO languages.

Interestingly enough, by the above generally accepted definition, most web services “reuse” is actually really use.

Let’s take a look at the characteristics of the code we’re using and reusing to see where we get the greatest value:

The value of (re)use

If we were to (re)use a piece of code in only one part of our system, it would be safe to say that we would get less value than if we could (re)use it in more places. For example, we could say that for many web applications, the web framework we use provides more value than a given encryption algorithm that we may use in only a few places.

So, what characterizes the code we use in many places?

Well, it’s very generic.

Actually, the more generic a piece of code, the less likely it is that we’ll be changing something in it when fixing a bug in the system.

That’s important.

However, when looking at the kind of code we reuse, and the reasons around it, we tend to see very non-generic code – something that deals with the domain-specific behaviors of the system. Thus, the likelihood of a bug fix needing to touch that code is higher than in the generic/use-not-reuse case, often much higher.

How it all fits together

Goal: Getting done faster
Via: Spending less time debugging/rebugging/stabilizing
Via: Having less dependencies reasonably requiring a bug fix to touch the dependent side
Via: Not reusing non-generic code

This doesn’t mean you shouldn’t use generic code / frameworks where applicable – absolutely, you should.
Just watch the number of kind of dependencies you introduce.

Back to services

So, if we follow the above advice with services, we wouldn’t want domain specific services reusing each other.
If we could get away with it, we probably wouldn’t even want them using each other either.

As use and reuse go down, we can see that service autonomy goes up. And vice-versa.
Luckily, we have service interaction mechanisms from Event-Driven Architecture that enable use without breaking autonomy.
Autonomy is actually very similar to the principle of encapsulation that drove object-orientation in the first place.
Interesting, isn’t it?

In summary

We all want to get done faster.

Way back when, someone told us reuse was the way to do that.

They were wrong.

Reuse may make sense in the most tightly coupled pieces of code you have, but not very much anywhere else.

When designing services in your SOA, stay away from reuse, and minimize use (with EDA patterns).

The next time someone pulls the “reuse excuse”, you’ll be ready.

Saga Persistence and Event-Driven Architectures

udidahan — Mon, 20 Apr 2009 11:50:44 +0000

When working with clients, I run into more than a couple of people that have difficulty with event-driven architecture (EDA). Even more people have difficulty understanding what sagas really are, let alone why they need to use them. I’d go so far to say that many people don’t realize the importance of how sagas are persisted in making it all work (including the Workflow Foundation team).

The common e-commerce example

We accept orders, bill the customer, and then ship them the product.

Fairly straight-forward.

Since each part of that process can be quite complex, let’s have each step be handled by a service:

Sales, Billing, and Shipping. Each of these services will publish an event when it’s done its part. Sales will publish OrderAccepted containing all the order information – order Id, customer Id, products, quantities, etc. Billing will publish CustomerBilledForOrder containing the customer Id, order Id, etc. And Shipping will publish OrderShippedToCustomer with its data.

So far, so good. EDA and SOA seem to be providing us some value.

Where’s the saga?

Well, let’s consider the behavior of the Shipping service. It shouldn’t ship the order to the customer until it has received the CustomerBilledForOrder event as well as the OrderAccepted event. In other words, Shipping needs to hold on to the state that came in the first event until the second event comes in. And this is exactly what sagas are for.

Let’s take a look at the saga code that implements this. In order to simplify the sample a bit, I’ll be omitting the product quantities.

   1:      public class ShippingSaga : Saga,

   2:          ISagaStartedBy,

   3:          ISagaStartedBy

   4:      {

   5:          public void Handle(OrderAccepted message)

   6:          {

   7:              this.Data.ProductIdsInOrder = message.ProductIdsInOrder;

   8:          }

9:

  10:          public void Handle(CustomerBilledForOrder message)

  11:          {

  12:               this.Bus.Send(

  13:                  (m =>

  14:                  {

  15:                      m.CustomerId = message.CustomerId;

  16:                      m.OrderId = message.OrderId;

  17:                      m.ProductIdsInOrder = this.Data.ProductIdsInOrder;

  18:                  }

  19:                  ));

20:

  21:              this.MarkAsComplete();

  22:          }

23:

  24:          public override void Timeout(object state)

  25:          {

26:

  27:          }

  28:      }

First of all, this looks fairly simple and straightforward, which is good.
It’s also wrong, which is not so good.

One problem we have here is that events may arrive out of order – first CustomerBilledForOrder, and only then OrderAccepted. What would happen in the above saga in that case? Well, we wouldn’t end up shipping the products to the customer, and customers tend not to like that (for some reason).

There’s also another problem here. See if you can spot it as I go through the explanation of ISagaStartedBy.

Saga start up and correlation

The “ISagaStartedBy” that is implemented for both messages indicates to the infrastructure (NServiceBus) that when a message of that type arrives, if an existing saga instance cannot be found, that a new instance should be started up. Makes sense, doesn’t it? For a given order, when the OrderAccepted event arrives first, Shipping doesn’t currently have any sagas handling it, so it starts up a new one. After that, when the CustomerBilledForOrder event arrives for that same order, the event should be handled by the saga instance that handled the first event – not by a new one.

I’ll repeat the important part: “the event should be handled by the saga instance that handled the first event”.

Since the only information we stored in the saga was the list of products, how would we be able to look up that saga instance when the next event came in containing an order Id, but no saga Id?

OK, so we need to store the order Id from the first event so that when the second event comes along we’ll be able to find the saga based on that order Id. Not too complicated, but something to keep in mind.

Let’s look at the updated code:

   1:      public class ShippingSaga : Saga,

   2:          ISagaStartedBy,

   3:          ISagaStartedBy

   4:      {

   5:          public void Handle(CustomerBilledForOrder message)

   6:          {

   7:              this.Data.CustomerHasBeenBilled = true;

8:

   9:              this.Data.CustomerId = message.CustomerId;

  10:              this.Data.OrderId = message.OrderId;

11:

  12:              this.CompleteIfPossible();

  13:          }

14:

  15:          public void Handle(OrderAccepted message)

  16:          {

  17:              this.Data.ProductIdsInOrder = message.ProductIdsInOrder;

18:

  19:              this.Data.CustomerId = message.CustomerId;

  20:              this.Data.OrderId = message.OrderId;

21:

  22:              this.CompleteIfPossible();

  23:          }

24:

  25:          private void CompleteIfPossible()

  26:          {

  27:              if (this.Data.ProductIdsInOrder != null && this.Data.CustomerHasBeenBilled)

  28:              {

  29:                  this.Bus.Send(

  30:                     (m =>

  31:                     {

  32:                         m.CustomerId = this.Data.CustomerId;

  33:                         m.OrderId = this.Data.OrderId;

  34:                         m.ProductIdsInOrder = this.Data.ProductIdsInOrder;

  35:                     }

  36:                     ));

  37:                  this.MarkAsComplete();

  38:              }

  39:          }

  40:      }

And that brings us to…

Saga persistence

We already saw why Shipping needs to be able to look up its internal sagas using data from the events, but what that means is that simple blob-type persistence of those sagas is out. NServiceBus comes with an NHibernate-based saga persister for exactly this reason, though any persistence mechanism which allows you to query on something other than saga Id would work just as well.

Let’s take a quick look at the saga data that we’ll be storing and see how simple it is:

   1:      public class ShippingSagaData : ISagaEntity

   2:      {

   3:          public virtual Guid Id { get; set; }

   4:          public virtual string Originator { get; set; }

   5:          public virtual Guid OrderId { get; set; }

   6:          public virtual Guid CustomerId { get; set; }

   7:          public virtual List ProductIdsInOrder { get; set; }

   8:          public virtual bool CustomerHasBeenBilled { get; set; }

   9:      }

You might have noticed the “Originator” property in there and wondered what it is for. First of all, the ISagaEntity interface requires the two properties Id and Originator. Originator is used to store the return address of the message that started the saga. Id is for what you think it’s for. In this saga, we don’t need to send any messages back to whoever started the saga, but in many others we do. In those cases, we’ll often be handling a message from some other endpoint when we want to possibly report some status back to the client that started the process. By storing that client’s address the first time, we can then “ReplyToOriginator” at any point in the process.

The manufacturing sample that comes with NServiceBus shows how this works.

Saga Lookup

Earlier, we saw the need to search for sagas based on order Id. The way to hook into the infrastructure and perform these lookups is by implementing “IFindSagas.Using” where T is the type of the saga data and M is the type of message. In our example, doing this using NHibernate would look like this:

   1:      public class ShippingSagaFinder :

   2:          IFindSagas.Using,

   3:          IFindSagas.Using

   4:      {

   5:          public ShippingSagaData FindBy(CustomerBilledForOrder message)

   6:          {

   7:              return FindBy(message.OrderId)

   8:          }

9:

  10:          public ShippingSagaData FindBy(OrderAccepted message)

  11:          {

  12:              return FindBy(message.OrderId)

  13:          }

14:

  15:          private ShippingSagaData FindBy(Guid orderId)

  16:          {

  17:              return sessionFactory.GetCurrentSession().CreateCriteria(typeof(ShippingSagaData))

  18:                  .Add(Expression.Eq("OrderId", orderId))

  19:                  .UniqueResult();

  20:          }

21:

  22:          private ISessionFactory sessionFactory;

23:

  24:          public virtual ISessionFactory SessionFactory

  25:          {

  26:              get { return sessionFactory; }

  27:              set { sessionFactory = value; }

  28:          }

  29:      }

For a performance boost, we’d probably index our saga data by order Id.

On concurrency

Another important note is that for this saga, if both messages were handled in parallel on different machines, the saga could get stuck. The persistence mechanism here needs to prevent this. When using NHibernate over a database with the appropriate isolation level (Repeatable Read – the default in NServiceBus), this “just works”. If/When implementing your own saga persistence mechanism, it is important to understand the kind of concurrency your business logic can live with.

Take a look at Ayende’s example for mobile phone billing to get a feeling for what that’s like.

Summary

In almost any event-driven architecture, you’ll have services correlating multiple events in order to make decisions. The saga pattern is a great fit there, and not at all difficult to implement. You do need to take into account that events may arrive out of order and implement the saga logic accordingly, but it’s really not that big a deal. Do take the time to think through what data will need to be stored in order for the saga to be fault-tolerant, as well as a persistence mechanism that will allow you to look up that data based on event data.

If you feel like giving this approach a try, but don’t have an environment handy for this, download NServiceBus and take a look at the samples. It’s really quick and easy to get set up.

Backwards-Compatibility: Why Most Versioning Problems Aren’t

udidahan — Fri, 10 Apr 2009 13:17:17 +0000

I’ve been to too many clients where I’ve been brought in to help them with their problems around service versioning when the solution I propose is simply to have version N+1 of the system be backwards-compatible with version N. If two adjacent versions of a given system aren’t compatible with each other, it is practically impossible to solve versioning issues.

Here’s what happens when versions aren’t compatible:

Admins stop the system from accepting any new requests, and wait until all current requests are done processing. They take a backup/snapshot of all relevant parts of the system (like data in the DB). Then, bring down the system – all of it. Install the new version on all machines. Bring everything back up. Let the users back in.

If, heaven-forbid, problems were uncovered with the new version (since some problems only appear in production), the admins have to roll back to the previous version – once again bringing everything down.

This scenario is fairly catastrophic for any company that requires not-even high availability, but pretty continuous availability – like public facing web apps.

If adjacent versions were compatible with each other, we could upgrade the system piece-meal – machine by machine, where both the old and new versions will be running side by side, communicating with each other. While the system’s performance may be sub-optimal, it will continue to be available throughout upgrades as well as downgrades.

This isn’t trivial to do.

It impacts how you decide what is (and more importantly, what isn’t) nullable.

It may force you to spread certain changes to features across more versions (aka releases).

As such, you can expect this to affect how you do release and feature planning.

However, if you do not take these factors into account, it’s almost a certainty that your versioning problems will persist and no technology (new or old) will be able to solve them.

Coming next… Units of versioning – inside and outside a service.

Self-Contained Events and SOA

udidahan — Sat, 13 Dec 2008 23:35:08 +0000

In the architectural principle of fully self contained messages, events “can – instantly and in future – be interpreted as the respective event without the need to rely on additional data stores that would need to be in time-sync with the event during message-processing.”

Also, “passing reference data in a message makes the message-consuming systems dependent on the knowledge and availability of actual persistent data that is stored “somewhere”. This data must separately be accessed for the sake of understanding the event that is represented by the message.”

The discussion of self-contained events can be compared to integration databases vs application databases.

Centralized Integration – Pros & Cons

If everything in a system can access a central datastore, it is enough for one party to publish an event containing only the ID of an entity that that party previously entered/updated. Upon receiving that event, a subscriber would go to the central datastore and get the fields its interested in for that ID. The advantage of this approach is that the minimal amount of data necessary crosses the network, as subscribers only retrieve the fields that interest them. Martin Fowler describes the disadvantages as:

“An integration database needs a schema that takes all its client applications into account. The resulting schema is either more general, more complex or both. The database usually is controlled by a separate group to the applications and database changes are more complex because they have to be negotiated between the database group and the various applications.”

This is far from being aligned with the principle of autonomy so important to SOA. In that respect, the architectural principle of self-contained messages points us away from those problems and towards more autonomous services.

However, once we have these autonomous business services in place, we may find that we don’t need 100% fully self-contained messages anymore.

A Real-World Example

Let’s say we have 3 business services, Sales, Fulfillment, and Billing.

Sales publishes an OrderAccepted event when it accepts an order. That event contains all the order information.

Both Fulfillment and Billing are subscribed to this event, and thus receive it.

Fulfillment does not ship products to the customer until the customer has been billed, so it just stores the order information internally, and is done.

Billing starts the process of billing the customer for their order, possibly joining several orders into a single bill. After completing this process, it publishes a CustomerBilled event containing all billing information, as well as the IDs of the orders in that bill. It does not put all the order information in that event, as it is not the authoritative owner of that data.

When Fulfillment receives the CustomerBilled event, it uses the IDs of the orders contained in the event to find the order information it previously stored internally. It does not need to call the Sales service for this information or contact some central Master Data Management system. It uses the data it has, and goes about fulfilling the orders and shipping the products to the customer, finally publishing its own OrderShipped event.

Notice, as well, that in the original OrderAccepted event there were the IDs of products the customer ordered. These product IDs originated from another service, Merchandising, responsible for the product catalog. The same thing can be said for the customer ID originating from another service – Customer Care.

The Issue of Time

One could argue that since subscribers use previously cached data when processing new events, that data might not be up to date. Also, we may have race conditions between our services. In the above example, if Billing was extremely fast and more highly available than Fulfillment. Billing could have received the OrderAccepted event, processed it, and published the CustomerBilled event before Fulfillment had received the OrderAccepted event. In short, the CustomerBilled and OrderAccepted messages could be out of order in Fulfillment’s queue.

What would Fulfillment do when trying to process the CustomerBilled message when it doesn’t have the order information?

Well, it knows that the world is parallel and non-sequential, so it does NOT return/log an error, but rather puts that message in the back of the queue to be processed again later (or maybe in some other temporary holding area). This enables the OrderAccepted message to be processed before the CustomerBilled message is retried. When the retry occurs, well, everything’s OK – it’s worked itself out over time.

In the case where we retry again and again and things don’t work themselves out (maybe the OrderAccepted event was lost), we move that message off to a different queue for something else to resolve the conflict (maybe a person, maybe software). If/when the conflict is resolved (got the Sales system / messaging system to replay the OrderAccepted event), the conflict resolver returns the CustomerBilled message to the queue, and now everything works just fine.

As all of this is occurring, the only thing that’s visible to external parties is that it happens to be taking longer than usual for the OrderShipped event to be published. In other words, time is the only difference.

Summary

The problem of non-self-contained events is mitigated first and foremost by business services in SOA, and the apparent issue of time-synchronization by business logic inside these services.

Don’t be afraid to put IDs in your messages and events.

Do be afraid of using those IDs to access datastores shared by multiple “services”.

Using IDs to correlated current events to data from previous events is not only OK, it’s to be expected.

The architectural principle of fully self-contained messages steers us away from the problems of Integration Databases and towards Application Databases, autonomous services, and a better SOA implementation. From there, following the principle of autonomy from a business perspective, will lead us to services not publishing data in their messages that is owned by other services, taking us the next step of our journey to SOA.

Lost Notifications? No Problem.

udidahan — Sun, 07 Dec 2008 09:46:05 +0000

One of the most common questions I get on the topic of pub/sub messaging is what happens if a notification is lost. Interestingly enough, there are some who almost entirely write-off this pattern because of this issue, preferring the control of request/response-exception. So, what should be done about lost messages? The short answer is durable messaging. The long answer is design.

Durable Messaging

In order to prevent a message from being lost when it is sent from a publisher to a subscriber, the message is written to disk on the publisher side, and then forwarded to the subscriber, where it is also written to disk. This store-and-forward mechanism enables our systems to gracefully recover from either side being temporarily unavailable.

In my MSDN article on this topic, I outlined some problems with this approach. These problems are exacerbated for publishers. Imagine a publisher with 40 subscribers, publishing 10 messages a second, each containing 1MB of XML. If 10 of the subscribers are unavailable, that’s 100MB of data being written to the publisher’s disk every second, 6GB every minute. That’s liable to bring down a publisher before an administrator brews a cup of coffee.

Publishers have no choice but to throw away messages after a certain period of time.

Publisher Contracts

The whole issue of contracts and schema is considered one of the better understand parts of SOA. Unfortunately, the operational aspects of service contracts is hardly ever taken into account.

On top of the schema of the messages a service publishers, additional information is needed in the contract:

How big will this message be?
How often will it be published?
How long will this message be stored if a subscriber is unavailable?

This first two pieces of information are important for subscribers to do load and capacity planning. The last one is the most important as it dictates the required availability and fault-tolerance characteristic of subscribers.

For Example

In the canonical retail scenario, when our sales service accepts an order, it publishes an order accepted event. Other services subscribed to this event include shipping, billing, and business intelligence.

While shipping and billing are highly available and able to keep up with the rate at which orders are accepted, the business intelligence service is not. BI has two main parts to it – a nightly batch that does the number crunching, and a UI for reporting off of the results of that number crunching. Some even do the reporting in a semi-offline fashion, emailing reports back to the user when they’re ready.

Furthermore, nobody’s going to invest in servers for making BI highly available.

And wasn’t the whole point of this publish/subscribe messaging to keep our services autonomous? That not all services have to have the same level uptime?

Houston, do we have a problem.?

Data Freshness

There is a glimmer of light in all this doom and gloom.

Not all services have the same data freshness requirements.

The business intelligence service above doesn’t need to know about orders the second they’re accepted. A daily roll-up would be fine, and an hourly roll-up bring us that much closer to “real time business intelligence”.

So, while BI is ready to accept the sales message schema, it would like a slightly different contract around it – less messages per unit of time, more data in each message.

From the operational perspective of the sales service, it would be cost effective to have less “online” subscribers. It could even take things a few steps further. Instead of using the regular messaging backbone for transmitting these hourly messages, it could use FTP. The data could even be zipped to take up even less space. Since the total data size is less than the corresponding online stream, is stored on cheaper, large storage, and the number of subscribers for this zipped, hourly update is fairly small, these messages can be kept around far longer.

If you’ve heard about consumer-driven contracts, this is it.

Note that we’re still talking about the same logical message schema.

Summary

It’s not that lost notifications aren’t a problem.

It’s that they feed the design process in such a way that the resulting service ecosystem is set up in such a way that notifications won’t get lost. I know that that sounds kind of recursive, but that’s how it works. Either subscribers take care of their SLA allowing them to process the online stream of events, or they should subscribe to a different pipe (which will have different SLA requirements, but maybe they can deal with those).

It make sense to have multiple pipes for the same logical schema.

It’s practically a necessity to make pub/sub a feasible solution.

SOA, EDA, and CEP a winning combo

udidahan — Sat, 01 Nov 2008 22:57:14 +0000

There’s been some discussion on the SOA yahoo group around the connection between SOA, EDA, and CEP (complex event processing) since Jack’s original post on the topic. I’ve been waiting for the right opportunity to jump in and it seems to have come.

Dennis asked this:

There are different design choices in a SOA, even when you already have identified the services. I have a simple example that I would like to share:

Imagine a order-to-cash process. One part of that process is to register an order. Suppose we have two services, Order Service and Inventory Service. The task is to register the order and make a corresponding reservation of the stock level. I would be pleased to have the groups view on the following 3 design options (A, B, C):

A.
1. The “process/application” sends a message (sync or async) to “registerOrder” on the Order Service.
2. The “process/application” sends another message (sync or async) to “reserveStock” on the the Inventory Service.

B.
1. The “process/application” sends a message (sync or async) to “registerOrder” on the Order Service.
2. The Order Service sends a message (sync or async) to “reserveStock” on the the Inventory Service.

C.
1. The “process/application” sends a message (sync or async) to “registerOrder” on the Order Service.
2. The Order Service publishes an “orderReceived” event.
3. The Inventory Service subscribes to the “orderReceived” event .

On the whole “already identified the services” thing – naming a service doesn’t mean much. It’s all about allocating responsibility, and until that’s been done, those “services” don’t give us very much information.

Business Services

If we were to view this example in light of business services, and look at the business events that make up this process, maybe we’d get a different perspective.

Three business services: Sales, Inventory, and Shipping.

In Sales, many applications and people may operate, including the person and the application he used to submit the order. When the order is submitted and goes through all the internal validation stuff, Sales raises an OrderTentativelyAccepted event.

Inventory and Orders

Inventory, which is subscribed to this event, checks if it has everything in stock for the order. For every item in the order on stock, it allocates that stock to the order and publishes the InventoryAllocatedToOrder event for it. For items/quantities not in stock, it starts a long running process which watches for inventory changes.

When an InventoryChanged event occurs, it matches that against orders requiring allocation – if it finds one that requires stock, based on some logic to choose which order gets precedence, it publishes the InventoryAllocatedToOrder event.

Sales, which is subscribed to the InventoryAllocatedToOrder event, upon receiving all events pertaining to the order tentatively accepted, will publish an OrderAccepted event.

Orders and Shipping

When Inventory receives the OrderAccepted event, it generates the pick list to bring all the stock from the warehouses to the loading docks, finally publishing the PickListGenerated event containing target docks.

When Shipping receives the PickListGenerated event, it starts the yard management necessary to bring the needed kinds of trucks to the docks.

What else is possible

I could go on, talking about things like the maximum amount of time stock of various kinds can wait to be loaded on trucks, subscribing to earlier events to employ all kinds of optimization and prediction algorithms, having a Customer Care service notifying the customer about what’s going on with their order (probably different for different kinds of customers and preferred communication definitions). Obviously, we’d need a Billing service to handle the various kinds of billing procedures, whether or not the customer has credit, pays upon delivery, etc.

It turns out that many business domains map very well to this join of SOA and EDA.

What an ESB is for

When we have these kinds of business services primarily publishing events and subscribing to those of other services, you don’t need much else from your “enterprise service bus”. All sorts of transformation, routing, and orchestration capabilities don’t come into play at all.

In all truthfullness, those bits of functionality are really just a historical artifact of their broker heritage.

Don’t get me wrong, sometimes a broker is a nice thing to have – behind a service boundary in order to perform some complex integration between existing legacy applications.

Just keep that stuff in its place – not between services.

Complex Event Processing

We can look at how Sales transitions an order from being tentatively accepted to being accepted as requiring event correlation around InventoryAllocatedToOrder events. This isn’t exactly “complex” in its own right. If there were some kind of CEP engine that did this for us out of the box, it might be a possible technology choice for implementing this logic within our service.

As we add more concerns, like time, we may find new ways to make use of this engine. For instance, if the time to provide the order to the customer is approaching, we may choose to split the order into two – accepting one for which we have all the stock allocated, and leaving the second as tentatively accepted.

Summary

While it is difficult to move forward on service responsibility without discussing the events it raises and those it subscribe to, the whole issue of CEP can be postponed for a while.

Although there aren’t many who would say that EDA is necessary for driving down coupling in SOA, or that SOA won’t likely provide much value without EDA, or that SOA is necessary for providing the right boundaries for EDA, it’s been my experience that that is exactly the case.

CEP, while being a challenging engineering field, and managing the technical risks around it necessary for a project to succeed in some circumstances, and really shines when used under the SOA/EDA umbrella, it should not be taken by itself and used at the topmost architectural levels.

Additional Logic Required For Service Autonomy

udidahan — Wed, 22 Oct 2008 22:12:06 +0000

Of the tenets of Service Orientation, the tenet of Autonomy is one that many understand intuitively. Interestingly enough, many in that same intuitive category don’t see pub/sub as a necessity for that autonomy.

Watch that first step

Although sometimes described as the first step of an organization moving to SOA, web-service-izing everything results in synchronous, blocking, request/response interaction between services. The problem being that if one service were to become unavailable, all consumers of that service would not be able to perform any work. With the deep service “call stacks” this architectural style condones, the availability and performance of the entire organization will be dictated by the weakest link.

So, while I’d agree that many organizations do need to take this step, I’d caution against going into production at this step.

Pub/Sub Considered Helpful

When services interact with each other using publish/subscribe semantics we don’t have that technical problem of blocking. Subscribers cache the data published to them (either in memory or durably depending on their fault-tolerance requirements) thus enabling them to function and process requests even if the publisher is unavailable.

Consider the following scenario:

Let’s say we have an e-commerce site, a part of our Sales service responsible for selling products. Another service, let’s call it merchandising, is responsible for the catalog of products, and how much each product costs. Sales is subscribed to price update events published by Merchandising and saves (caches) those prices in its own database. When a customer orders some products on the site, Sales does not need to call Merchandising to get the price of the product and just uses the previously saved (cached) price. Thus, even if Merchandising is unavailable, Sales is able to accept orders. This is a big win as our merchandising application is not nearly as robust as our sales systems.

Yet, there are scenarios where data freshness requirements prevent this.

Too Much of a Good Thing?

Technically, the above story is accurate. There is nothing technically preventing Sales from accepting orders. Yet consider a scenario where Merchandising is down or unavailable for an extended period of time. While this may not be entirely likely for two servers in the same data center, consider physical kiosks which customers can use to buy products. Those kiosks may not receive updates for days. Should they accept orders?

That’s really a question to the business. If pricing data is stale for a time period greater than X, do not sell that item. The value of X may even be different for different kinds of products. Keep in mind that this issue only arose since we architected our services to be fully autonomous. In a synchronous systems architecture, this issue would not come up. As such, it is our responsibility as architects to go digging for these requirements as well as explaining to the business what the tradeoffs are.

In order to have more up to date data, we need to invest in more available hardware, networks, and infrastructure. This needs to be balanced against the predicted increase in revenue that more up to date (read higher) prices would give us.

You Can Get What You Pay For

Beyond the additional cost of writing that additional logic, and the perceived increased complexity, another difference to note between this architectural style and the synchronous/traditional one is that it puts control of spending back in the hands of business.

In a synchronous architecture, in order to achieve required performance and availability, all systems need to be performant requiring across the board investments in servers, networks, and storage. Without investing everywhere, the weakest link is liable to undo all other investments. In other words, your developers have made your investment choices for you. Scary, isn’t it.

A more prudent investment strategy would prefer spending on services that give the biggest bang for the buck, better known as return on investment. A pub/sub based architecture allows investing in data-freshness where it makes the most sense. For example, in sales of high profit products to strategic customers rather than inventory management of raw materials for products slated to be decommissioned.

That sounds a lot like IT-Business Alignment.

Maybe there’s something to this SOA thing after all…

Services Don’t Serve

udidahan — Sat, 23 Aug 2008 14:42:36 +0000

Another prominent SOA practitioner and blogger, Steve Jones, shows that, when you’re identifying your top level business services you shouldn’t be thinking about who’s going to consume them.

“We have three high level business services: Engagement, Management, [and] Production. […] they represent different operational ambitions. Engagement is all about quantity and then filtering. Management is about the quality and Production is about realising the benefits.”

Services are not about “are you being served?”

They’re not about re-use, and barely about use. Events are what it’s all about.

Each service has its own responsibility and does what it needs to do, business-wise, to achieve its goals. Whether it’s about increasing the number of leads, ensuring high-profile clients get good service, or maximizing equipment utilization, services take responsibility.

I know I harp on this a lot.

It’s because it’s that important.

Command Query Separation and SOA

udidahan — Mon, 11 Aug 2008 13:18:40 +0000

One of the common questions I receive from people starting to use nServiceBus is how one-way messaging fits with showing the user a grid (or list) of data. Thinking about publish/subscribe usually just gets them even more confused. Trying to resolve all this with Service Oriented Architecture leaves them wondering – why bother?

In regular client-server development, the server is responsible for providing the client with all CRUD (create, read, update, and delete) capabilities. However, when users look at data they do not often require it to be up to date to the second (given that they often look at the same screen for several seconds to minutes at a time). As such, retrieving data from the same table as that being used for highly consistent transaction processing creates contention resulting in poor performance for all CRUD actions under higher load.

A Scalable Solution

One of the common answers to this question is for the server/service to publish a message when data changes (say, as the result of processing a message) and for clients to subscribe to these messages. When such a notification arrives at a client, the client would cache the data it needs. Then, when the user wants to see a grid of data, that data is already on the client. Of course, this solution doesn’t work so well for older client machines (like some point of service devices) or if there are millions of rows of data.

The thing is that this solution is one implementation of a more general pattern – command query separation (CQS).

Command Query Separation

Wikipedia describes CQS as a pattern where "… every method should either be a command that performs an action, or a query that returns data to the caller, but not both. More formally, methods should return a value only if they are referentially transparent and hence possess no side effects."

Martin Fowler is less strict about the use of CQS allowing for exceptions: "Popping a stack is a good example of a modifier that modifies state. Meyer correctly says that you can avoid having this method, but it is a useful idiom. So I prefer to follow this principle when I can, but I’m prepared to break it to get my pop."

So, how does separating commands from queries and SOA help at all in getting data to and from a UI? The answer is based on Pat Helland’s thinking as described in his article Data on the Inside vs. Data on the Outside.

Services Cross Boxes

The biggest lie around SOA is that services run.

Let that sink in a second.

Sure services have runnable components, but that’s not why they’re important.

I’ll skip the books of background and cut to the chase:

Services communicate with each other using publish/subscribe and one-way messaging. Services have components inside them. Inside a service, these components can communicate with each using synchronous RPC, or any other mechanism. Also, these components can reside on different machines.

This is broader than just scaling out a service. There can be service components running on the client as well as the server.

SOA & CQS

Combining these two concepts together, here’s what comes out:

In this solution there are two services that span both client and server – one in charge of commands (create, update, delete), the other in charge of queries (read). These services communicate only via messages – one cannot access the database of the other.

The command service publishes messages about changes to data, to which the query service subscribes. When the query service receives such notifications, it saves the data in its own data store which may well have a different schema (optimized for queries like a star schema).

The client component which is in charge of showing grids of data to the user behaves the same as it would in a regular layered/tiered architecture, using synchronous blocking request/response to get its data – SOA doesn’t change that.

Composite Applications

Although the client side components of both the command and query services are hosted in the same process, they are very much independent of each other. That being said, from an interoperability perspective (the one that most people attribute to SOA), all of the client-side components will likely be developed using the same technology – although there are already ways to host Java code in .NET and vice-versa.

Of course, once we talk about web UI’s things are a bit different – but still similar. While web-server-side there may be a level of independence, for browser side inter-component communications we’re still likely to target javascript. There, I’ve managed to say something technical supporting mashups and SOA without lying through my teeth.

On the Microsoft side with the recent release of the Composite Application Guidance & Library (pronounced "prism") I hope that more of these principles will be reaching the "smart client". The command pattern is especially critical in maintaining the separation while enabling communication to still occur so I’m glad that, as one of the Prism advisors, I was able to simplify that part (Glenn still has nightmares about that rooftop conversation).

Publish / Subscribe

In the "scalable solution" section up top I mentioned how publish/subscribe to the smart client is really just one implementation of CQS and SOA. So, how different is it really?

Well, there will probably be a different technology mapping. Instead of a star-schema OLAP product, we might simply store the published data in memory on the client. That is, if you designed your components to be technology agnostic.

In terms of the use of nServiceBus, the same component is going to be subscribing to the same type of message – all that’s different is that now every client will be having data pushed to them rather than this occurring server-side only.

You could have the same code deployed differently in the same system – stronger clients subscribing themselves, weaker ones using a remote server. Web servers would probably be considered stronger clients. This kind of flexible deployment has proven to be extremely valuable for my larger clients. The added benefit of enabling users to work (view data) even while offline (somewhere there’s no WIFI) is just icing on the cake.

A Word of Warning

Once the client starts receiving notifications, and handling those on a background thread (as it should) the code becomes susceptible to deadlocks and data races. Juval does a good job of outlining some of those with respect to the use of WCF. Prism doesn’t provide any assurances in this area either.

Summary

NServiceBus is not designed to be used for any and all types of communication in a given architecture. In the examples above, nServiceBus handles the publish/subscribe but leaves the synchronous RPC to existing solutions like WCF. Not only that, but synchronous RPC does have its place in architecture, just not across service boundaries. In all cases, data is served to users from a store different from that which transaction processing logic uses.

Command Query Separation is not only a good idea at the method/class level but has advantages at the SOA/System level as well – yet another good idea from 20 years ago that services build upon. Making use of CQS requires understanding your data and its uses – SOA builds on that by looking into data volatility and the freshness business requirements around it.

Finally, designing the components of your services in such a way that their dependency on technology is limited buys a lot of flexibility in terms of deployment and, consequently, significant performance and scalability gains.

Simple, it is. Easy, it is not.

Autonomous Services – Udi Dahan – The Software Simplist

[Ask Udi] Two services operating on the same entity

Where confusion starts

Problematic Assumption #1

Problematic Assumption #2

And that’s why it’s so difficult

If you want even more

More questions?

Microservices presentation [London 2014]

Finding Service Boundaries – illustrated in healthcare

Service-Oriented Composition (with video)

People, Politics, and the Single Responsibility Principle

Stepping back in time

Enter the age of computers and networks

And then came the politics

So, what of the Single Responsibility Principle

On that Microservices thing

So, where do I stand on the topic

On Services and Systems

How big is a service

Cross-service collaboration

Caveat on sharing data

In closing

Data Duplication and Replication

CQRS

SOA

But what’s so bad about duplication of data between services?

So what’s with the word “Replication” in the title of this post?

Cross-site integration without replication

In closing

And a word from our sponsor

One final thing

UI Composition vs. Server-side Orchestration

On failures

Back to the specific example

In summary

UI Composition Techniques for Correct Service Boundires

The traditional way

What to do?

Some words on CQRS

In closing

Why you should be using CQRS almost everywhere…

So what’s the missing piece?

What does this have to do with CQRS?

What does SRP have to with CQRS?

But what about integration?

OK, so how do ABCs do that?

But why not just use CQRS as the top-level pattern? ABCs are weird.

OK – that explains “everywhere”, what about “forever”?

This sounds like the collaboration you talk about with CQRS

I think you’re ready to make your point, so just make it already

Say what?

So the built-in history of this model is event-sourcing?

Why didn’t you just tell us this from the very beginning?

Inconsistent data, poor performance, or SOA – pick one

The problem

Some solutions

The solution to the performance issues

Next steps

The Danger of Centralized Workflows

The question

What’s behind the curtain

On productivity

On maintainability

Productivity take 2: testing and version control

In closing

Service Boundaries Aren’t Process Boundaries

The Known Unknowns of SOA

Logical and Physical Architecture

Clarified CQRS

Why CQRS

A picture for reference

Queries

Query Data Storage

Scaling Queries

Data modifications

Commands

Commands and Validation

Rethinking UIs and commands in light of validation

Reasons valid commands fail and what to do about it