Command Query Separation and SOA

Monday, August 11th, 2008.

One of the common questions I receive from people starting to use nServiceBus is how one-way messaging fits with showing the user a grid (or list) of data. Thinking about publish/subscribe usually just gets them even more confused. Trying to resolve all this with Service Oriented Architecture leaves them wondering – why bother?

client server

In regular client-server development, the server is responsible for providing the client with all CRUD (create, read, update, and delete) capabilities. However, when users look at data they do not often require it to be up to date to the second (given that they often look at the same screen for several seconds to minutes at a time). As such, retrieving data from the same table as that being used for highly consistent transaction processing creates contention resulting in poor performance for all CRUD actions under higher load.

A Scalable Solution

One of the common answers to this question is for the server/service to publish a message when data changes (say, as the result of processing a message) and for clients to subscribe to these messages. When such a notification arrives at a client, the client would cache the data it needs. Then, when the user wants to see a grid of data, that data is already on the client. Of course, this solution doesn’t work so well for older client machines (like some point of service devices) or if there are millions of rows of data.

The thing is that this solution is one implementation of a more general pattern – command query separation (CQS).

Command Query Separation

Wikipedia describes CQS as a pattern where "… every method should either be a command that performs an action, or a query that returns data to the caller, but not both. More formally, methods should return a value only if they are referentially transparent and hence possess no side effects."

Martin Fowler is less strict about the use of CQS allowing for exceptions: "Popping a stack is a good example of a modifier that modifies state. Meyer correctly says that you can avoid having this method, but it is a useful idiom. So I prefer to follow this principle when I can, but I’m prepared to break it to get my pop."

So, how does separating commands from queries and SOA help at all in getting data to and from a UI? The answer is based on Pat Helland’s thinking as described in his article Data on the Inside vs. Data on the Outside.

Services Cross Boxes

The biggest lie around SOA is that services run.

Let that sink in a second.

Sure services have runnable components, but that’s not why they’re important.

I’ll skip the books of background and cut to the chase:

Services communicate with each other using publish/subscribe and one-way messaging. Services have components inside them. Inside a service, these components can communicate with each using synchronous RPC, or any other mechanism. Also, these components can reside on different machines.

This is broader than just scaling out a service. There can be service components running on the client as well as the server.

SOA & CQS

Combining these two concepts together, here’s what comes out:

In this solution there are two services that span both client and server – one in charge of commands (create, update, delete), the other in charge of queries (read). These services communicate only via messages – one cannot access the database of the other.

The command service publishes messages about changes to data, to which the query service subscribes. When the query service receives such notifications, it saves the data in its own data store which may well have a different schema (optimized for queries like a star schema).

The client component which is in charge of showing grids of data to the user behaves the same as it would in a regular layered/tiered architecture, using synchronous blocking request/response to get its data – SOA doesn’t change that.

Composite Applications

Although the client side components of both the command and query services are hosted in the same process, they are very much independent of each other. That being said, from an interoperability perspective (the one that most people attribute to SOA), all of the client-side components will likely be developed using the same technology – although there are already ways to host Java code in .NET and vice-versa.

Of course, once we talk about web UI’s things are a bit different – but still similar. While web-server-side there may be a level of independence, for browser side inter-component communications we’re still likely to target javascript. There, I’ve managed to say something technical supporting mashups and SOA without lying through my teeth.

On the Microsoft side with the recent release of the Composite Application Guidance & Library (pronounced "prism") I hope that more of these principles will be reaching the "smart client". The command pattern is especially critical in maintaining the separation while enabling communication to still occur so I’m glad that, as one of the Prism advisors, I was able to simplify that part (Glenn still has nightmares about that rooftop conversation).

Publish / Subscribe

In the "scalable solution" section up top I mentioned how publish/subscribe to the smart client is really just one implementation of CQS and SOA. So, how different is it really?

smart client pub/sub

Well, there will probably be a different technology mapping. Instead of a star-schema OLAP product, we might simply store the published data in memory on the client. That is, if you designed your components to be technology agnostic.

In terms of the use of nServiceBus, the same component is going to be subscribing to the same type of message – all that’s different is that now every client will be having data pushed to them rather than this occurring server-side only.

You could have the same code deployed differently in the same system – stronger clients subscribing themselves, weaker ones using a remote server. Web servers would probably be considered stronger clients. This kind of flexible deployment has proven to be extremely valuable for my larger clients. The added benefit of enabling users to work (view data) even while offline (somewhere there’s no WIFI) is just icing on the cake.

A Word of Warning

Once the client starts receiving notifications, and handling those on a background thread (as it should) the code becomes susceptible to deadlocks and data races. Juval does a good job of outlining some of those with respect to the use of WCF. Prism doesn’t provide any assurances in this area either.

Summary

NServiceBus is not designed to be used for any and all types of communication in a given architecture. In the examples above, nServiceBus handles the publish/subscribe but leaves the synchronous RPC to existing solutions like WCF. Not only that, but synchronous RPC does have its place in architecture, just not across service boundaries. In all cases, data is served to users from a store different from that which transaction processing logic uses.

Command Query Separation is not only a good idea at the method/class level but has advantages at the SOA/System level as well – yet another good idea from 20 years ago that services build upon. Making use of CQS requires understanding your data and its uses – SOA builds on that by looking into data volatility and the freshness business requirements around it.

Finally, designing the components of your services in such a way that their dependency on technology is limited buys a lot of flexibility in terms of deployment and, consequently, significant performance and scalability gains.

Simple, it is. Easy, it is not.

If you liked this article, you might also like articles in these categories:

If you've got a minute, you might enjoy taking a look at some of my best articles.
I've gone through the hundreds of articles I've written over the past 6 years and put together a list of the best ones as ranked by my 5000+ readers.
You won't be disappointed.

If you'd like to get new articles sent to you when they're published, it's easy and free.
Subscribe right here.

Follow me on Twitter @UdiDahan.

Something on your mind? Got a question? I'd be thrilled to hear it.
Leave a comment below or email me, whatever works for you.

24 Comments

Alex Simkin Says:
August 11th, 2008 at 8:34 am
Short question on CQS.

“Usual” return value of the CREATE command is a surrogate key of the inserted (or sometimes upserted) entity. Even if natural key exists, it may be to clumsy to use. How this (getting identity of the created entity) should be handled in CQS.

I have ment upsert to underline that it’s not always possible to generate identity on the client side.

commenter Says:
August 11th, 2008 at 10:54 am
Its kind of funny that the wikipedia article illustrates CQS with a function that returns a value and changes the state. I think they’re saying that returning a value via an out parameter isn’t ‘returning a value’.

So, to my understanding, CQS is really ‘only change state in methods with returntype == void’.

udidahan Says:
August 11th, 2008 at 1:33 pm
Alex,

Actually, the SQL insert command doesn’t return the value of an identity column – you specifically have to query for it. That being said, all SQL statement do return the number of rows affected, an apparent violation of CQS.

Personally, I liked GUIDs for IDs.

udidahan Says:
August 11th, 2008 at 1:35 pm
Commenter,

Wikipedia is the outer join of the wisdom of crowds and the ignorance of herds, or so someone once told me 😉

Alex Simkin Says:
August 11th, 2008 at 3:06 pm
Even though you didn’t answer my question your responce gave me an idea to use GUID as a correlation value for subsequent query. As I explained, I cannot generate ID on the client, the record being “created” may already exist (service automatically removes duplicate facts).

However, in this case, a new question arises: how long should service keep correlation value?

Sergey Shishkin Says:
August 11th, 2008 at 3:22 pm
Why don’t you use async messages for any potentially remote communication? Why at all do you encourage people to use RPC? (let’s left streaming aside) Is your position more like “RPC is OK for in-service remote communication” or more like “Avoid using Messages for in-service communication”? Could you elaborate more on this?

Sergey Shishkin Says:
August 11th, 2008 at 3:28 pm
Off topic: the comments feed seems to be broken, I can’t subscribe. See http://feedvalidator.org/check.cgi?url=http%3a%2f%2fwww.udidahan.com%2fcomments%2ffeed%2f

Tyler Burd Says:
August 11th, 2008 at 3:50 pm
Thanks for the great post, Udi. This applies to my current situation, and it has helped clarify things greatly.

Alex, we’re toying with the idea of using an ID broker service for things that we can’t have GUIDs for. This way a client can send a RequestNextId message, the server generates the next id (and reserves it so no future entities can use it), and then sends back a message to the original sender. The correlation is done automatically for you in nServiceBus (via the IdForCorrelation property of the TransportMessage class if you’re using the MSMQTransport). The client receives the message and knows which RequestNextId message it correlates to.

udidahan Says:
August 11th, 2008 at 4:19 pm
Alex,

I’m not quite sure I understand your question, but does the REST style solution I described in previous post correlate to the way your thinking about using GUIDs?

If so, then I’d use a long-running process like nServiceBus’ sagas to manage the allocation and cleanup of those GUIDs. You’d probably want to experiment with various cleanup times until you settle on a cleanup policy that works for you.

Hope that helps.

Alex Simkin Says:
August 11th, 2008 at 5:26 pm
Oh, now I see from where this idea has come to me 🙂

Thank you very much, indeed.

The straw that healed the camels back | The Freak Parade Says:
August 11th, 2008 at 10:11 pm
[…] the system within that paradigm, even when the fit seemed just terrible. Udi Dahan was kind enough to post about that exact topic in his blog last night, setting a critical thought-balloon free and allowing a system design […]

udidahan Says:
August 13th, 2008 at 2:20 am
Sergey,

> Why don’t you use async messages for any potentially remote communication?

When the architecture itself is synchronous, introducing one-way messaging brings quite a lot of complexity. Also, the robustness and scalability benefits are thwarted by the synchronous architecture.

You can’t be going both left and right at the same time.

> Why at all do you encourage people to use RPC?

I’m trying to encourage people to understand the tradeoffs between the various choices – one way/pub sub/rpc. What makes sense where, and why.

> Is your position more like “RPC is OK for in-service remote communication” or more like “Avoid using Messages for in-service communication”?

My position is that, if there’s any place where RPC will cause the minimal amount of damage, it’s within service bounaries.

I would definitely NOT suggest the latter. Messaging is such a strong paradigm that it requires the overall architecture to be aligned with it. When that occurs, you get tremendous benefits from using messaging within service boundaries as well.

Finally, the main point is to understand the different contexts for making decisions. Between services, under no circumstances should RPC be used – unless its an insignificant implementation detail in supporting a higher level messaging infrastructure (MSMQ makes use of RPC under the covers).

Hope that clears it up.

I’ll check on the comment feed, thanks.

Colin Jack Says:
October 21st, 2008 at 2:53 pm
@udidahan
Would it be fair to say that the upper layers mainly communicate with the Query services synchronously? I’m thinking here of getting all Customers, getting a Customer to edit and so on.

udidahan Says:
October 22nd, 2008 at 2:30 am
Colin,

That’s correct.

SOA, EDA, and CEP a winning combo Says:
November 1st, 2008 at 4:57 pm
[…] How client interaction fits with SOA […]

Steve Evers Says:
December 16th, 2008 at 5:21 pm
In the article you showed that the query component could be moved to the client [‘Instead of a star-schema OLAP product, we might simply store the published data in memory on the client.’] However wouldn’t this mean that the client would initially have to request the whole dataset as the data would be lost every time the client or app was restarted (or when a new client comes online)? How scalable would this be if large numbers of clients were requesting data in this way especially over slow connnections?

The client could save the information to disk, however they may also be disconnected or off for long periods of time. You mentioned in your recent article ‘Lost Notifications? No Problem.’ that [‘Publishers have no choice but to throw away messages after a certain period of time.’] What would you suggest in this situation (a differential query, P2P) or force the server to maintain the messages indefinitely (eg Outlook and Exchange)?

udidahan Says:
December 19th, 2008 at 2:58 pm
Steve,

Re: How scalable would this be if large numbers of clients were requesting data in this way especially over slow connnections?

This is the way clients that don’t use pub/sub do things anyway 🙂

The answer to the rest of your question deserves a full blog post of its own, if not more than one 🙂

Jump into >> Command Query Separation as an Architectural Concept « Jump into >> Says:
June 27th, 2009 at 12:07 pm
[…] Command Query Separation and SOA […]

The catalogue metaphor and command-query seperation architectures - Ian Cooper - CodeBetter.Com - Stuff you need to Code Better! Says:
October 8th, 2009 at 5:06 am
[…] separation is being popularized today with architects like Greg Young and Udi Dahan as a way of architecting systems. It can be hard to grasp as a concept, even if we understand […]

Elegant Code » DTO’s, DDD & The Anemic Domain Model Says:
November 13th, 2009 at 5:07 pm
[…] all somewhat related: http://jonathan-oliver.blogspot.com/2009/03/dddd-and-cqs-getting-started.html http://www.udidahan.com/2008/08/11/command-query-separation-and-soa/ http://codebetter.com/blogs/gregyoung/archive/2009/08/13/command-query-separation.aspx […]

Living in the Tech Avalanche Generation » Entity Framework – Don’t take aggregates for granted Says:
December 6th, 2009 at 6:44 am
[…] other thing to consider is that aggregations are usually a function of reporting and this is where architecturally speaking CQRS can help us solve the problem and build a more highly concurrent system by publishing […]

Roy Says:
January 13th, 2010 at 1:06 pm
I’m just getting up to speed with this CQS technique. I’ve noticed that the event store is usually serialized events. I’m definitely not use to serialzing objects in a database, but my concern is what if an Event object needs to be changed? What happens when I need to add a property? How will that affect the other objects of the same type already stored? You can’t have two different verions of ClientMovedEvents in the same database. Wouldn’t that break the code if you need to reply them?

udidahan Says:
January 13th, 2010 at 1:55 pm
Roy,

There is a difference between CQS (recently renamed to CQRS – see here for more info: http://www.udidahan.com/2009/12/09/clarified-cqrs/ ) and what’s known as Event Sourcing. CQRS doesn’t make any statement about how data in the storage serving commands is structured – it could be relational, hierarchical, object-oriented, or anything else.

I think the best place to ask these questions on the comment thread going on here:

http://blog.fohjin.com/blog/2009/11/12/CQRS_a_la_Greg_Young

Jiho Han Says:
July 20th, 2010 at 11:26 am
I like the CQS(or CQRS) approach in how it separates the domain concerns from the reporting concerns.

My question is regarding the message/event published from the domain to the reporting side. I understand that then the reporting side uses the message to populate its data store however it wants to. What happens when there is a failure during this transition? Does the message get re-applied until it succeeds?

Is the reporting store considered a “cache” of the system state? Are we expecting to be able to recreate the reporting store from scratch if needed using the domain store? Or does that capability only come with using event sourcing?

In general what happens when the domain and reporting are out of sync? Would you ever manually “fix” the reporting store?