CQRS isn’t the answer – it’s just one of the questions

Friday, May 7th, 2010.

dont panic With the growing interest in Command/Query Responsibility Segregation (CQRS), more people are starting to ask questions about how to apply it to their applications. CQRS is actually in danger of reaching “best practice” status at which point in time people will apply it indiscriminately with truly terrible results.

One of the things that I’ve been trying to do with my presentations around the world on CQRS was to explain the why behind it, just as much as the what. The problem with the format of these presentations is that they’re designed to communicate a fairly closed message: here’s the problem, here’s how that problem manifests itself, here’s a solution.

In this post, I’m going to try to go deeper.

The hitchhiker’s guide to the galaxy

In this most excellent book, one of the things that struck me was the theme that made it’s way through the whole book – starting with the answer to life, the universe, and everything: 42. By the time you get to the end of the book, you find out that the real question to life, the universe, and everything is “what do you get when you multiply 6 by 9”. And that’s how the book leaves it.

To us engineers, we can’t just accept the fact that the book would say that 6*9 = 42 when we know it’s 54. After bashing our heads on the rigid rules of math, we realize that not all math problems are necessarily in base 10, and that if we switch to base 13, the number 42 is 4*13 + 2 = 54. So, the book was right – but that’s not the point.

What’s the point?

The hitchhiker’s guide is an example of a teaching technique which presents an apparent paradox, leaving the student to dig up unspoken and unthought assumptions in order to resolve it. Key to this technique are rigid rules which do not allow any compromise or shortcuts on the student’s part.

The purpose of this technique is not for the student to learn the answer, but to gain deeper understanding, which in turn changes the way they go about thinking about problems in the future.

So, when given the problem 4*5, we do not just immediately answer 20, instead we clarify in which numeric base the question is being phrased, and only then go to solve the problem. In base 13, the answer would be 17. In hex, the answer would be 14.

The externally visible change is that we know which questions to ask in order to arrive at the right answer – not that we know the answer ahead of time.

Making an “ass” out of “u” and “me”

Let’s start at the end – one of the unspoken assumptions that has been causing problems:

All businesses can be treated the same from the perspective of software.

In our previous example, we assumed that all math problems use base 10. It turns out that different bases are useful for different domains (like base 2 for computers). We can say similar things about degrees and radians in geometry. The more we look at the real world, the more we see this repeating itself. There’s no reason that software should be any different.

Base 10 is not a ubiquitous best practice. We shouldn’t be surprised that there really aren’t best practices for software either.

Here’s another problematic assumption:

“The business” can (and do) tell us what they need in a way we can understand.

So many software fads have been built on the quicksand of this assumption. OOAD – on verbs and nouns. 4GL and other visual tools that “the business” will use directly. SOA – on IT business alignment. I expect we haven’t seen the end of this.

Some of you may be wondering why this is false, others are sagely nodding their heads in agreement.

The myth of “the business”

Unless you have a single user, who is also the CEO paying for the development, there is no “the”. It’s an amalgam of people with different backgrounds, skills, and goals – there is no homogeneity. Even if no software was involved, many business organizations are dysfunctional with conflicting goals, policies, and politics.

To some extent, we technical people have hidden ourselves away in IT to avoid the scary world of business whose rules we don’t understand. With the rise in importance of information to the world, we’ve been pulled back – being forced to talk to people, and not just computers. Luckily, we’ve been able to create a buffer to insulate ourselves – we’ve taken the less successful technical people from our heard and nominated them “business analysts”. No, not all companies do it this way, but we do need to take a minute to reflect on how information flows between the business Mars into and out of the IT Venus.

On human communication

Even if we made this insulation layer more permeable, allowing and encouraging more technical people and business people to cross its boundary, we still need to deal with the problem of two humans communicating with each other. There are enough books that have been written on this topic, so I won’t go into that beyond recommending (strongly) to technical people to read (some of) them.

Rather, I’d like to focus on the environment in which these discussions take place. IT has been around long enough, and users have used computers long enough, that a certain amount of tainting has taken place. If the world was a trial, the evidence would have been thrown out as untrustworthy.

When users tell you what they want, they’re usually framing that with respect to the current system that they’re using. “Like the old system – but faster, and with better search, and more information on that screen, and…”

At this point, business analysts write down and formalize these “requirements” into some IT-sanctioned structure (use cases, user stories, whatever), at which point developers are told to build it. Users only know what they didn’t want when developers deliver exactly what was asked.

How can that be?

These are not the “requirements” you are looking for

Users ultimately dictate solutions to us, as a delta from the previous set of solutions we’ve delivered them. That’s just human psychology – writer’s block when looking at a blank page, as compared to the ease with which we provide “constructive criticism” on somebody else’s work.

We need to get the real requirements. We need to probe beyond the veneer:

Why do you need this additional screen?
What real-world trigger will cause you to open it?
Is there more than one trigger?
How are they different?
etc, etc, etc…

This is real work – different work than programming. It requires different skills. And that’s not even getting into the political navigation between competing organizational forces.

But let’s say that you don’t have (enough) people with these skills in your organization. What then?

Enter CQRS

CQRS gives us a set of questions to ask, and some rigid rules that our answers must conform to. If our answers don’t fit, we need to go back to the drawing board and move things around and/or go back to “the business” and seek deeper understanding there.

For each screen/task/piece of data:

Will multiple users be collaborating on data related to this task?
Look at every shred of raw data, not just at the entity level.
Are there business consistency requirements around groups of raw data?

If “the business” answers no – ask them if they see that answer changing, and if so, in what time frame, and why. What changing conditions in the business environment would cause that to change – what other parts of the system would need to be re-examined under those conditions.

After understanding all that and you find a true single-user-only-thing, then you can use standard “CRUD” techniques and technologies. There are no inherent time-propagation problems in a single-user environment – so eventual consistency is beyond pointless, it actually makes matters worse.

On the other hand, if the business-data-space is collaborative, the inherent time-propagation of information between actors means they will be making decisions on data that isn’t up-to-the-millisecond-accurate anyway. This is physics, gravity – you can’t fight it (and win).

The rule for collaboration

Actors must be able submit one-way commands that will fail only under exceptional business circumstances.

The challenge we have is how to achieve the real business objectives uncovered in our previous “requirements excavation” activities and follow this rule at the same time. This will likely involve a different user-system interaction than those implemented in the past. UI design is part of the solution domain – it shouldn’t be dictated by the business (otherwise it’s like someone asking you to run a marathon, but also dictating how you do so, like by tying your shoelaces together).

Many of the technical patterns I described in my previous blog post describe the tools involved. BTW, hackers can be considered “exceptional actors” – the business actually wants their commands to fail.

In Summary

The hard and fast rule of CQRS about one-way commands is relevant for collaborative domains only. This domain has inherent eventual consistency – in the real world. Taking that and baking it into our solution domain is how we align with the business.

The process we go through, until ultimately arriving at one-way-almost-always-successful-commands is business analysis. Rejecting pre-formulated solutions, truly understanding the business drivers, and then representing those as directly as possible in our solution domain – that’s our job.

After doing this enough times and/or in more than one business domain, we may gain the insight that there is no cookie-cutter, one-size-fits-all, best-practice solution architecture for everything. Each problem domain is distinct and different – and we need to understand the details, because they should shape the resulting software structure.

The next time the business tell us to implement 42, we’ll use CQRS along with other questioning techniques until we can get “6 x 9” out of them, learning from the exercise what are the significant and stable parts of the business – ultimately helping us to “build the right system, and to build the system right”.

Don’t Panic 🙂

If you liked this article, you might also like articles in these categories:

If you've got a minute, you might enjoy taking a look at some of my best articles.
I've gone through the hundreds of articles I've written over the past 6 years and put together a list of the best ones as ranked by my 5000+ readers.
You won't be disappointed.

If you'd like to get new articles sent to you when they're published, it's easy and free.
Subscribe right here.

Follow me on Twitter @UdiDahan.

Something on your mind? Got a question? I'd be thrilled to hear it.
Leave a comment below or email me, whatever works for you.

24 Comments

Rinat Abdullin Says:
May 7th, 2010 at 5:27 am
Nice article, I really liked the concise rule of collaboration in CQRS – straight to the point.

Potential typo: “After understanding all that and you find a true single-user-only-thing,”

BTW, Occam’s razor applies to the base of 13 theory – it was just a joke, as admitted by Adams))

Remco Ros Says:
May 7th, 2010 at 5:47 am
Really nice post Udi.

Funny side note: No matter what numeric base you take, 1 + 1 will always be 2 😉

junior programmer Says:
May 7th, 2010 at 6:00 am
please don’t talk vague! we need concrete examples.

Nitpicker Says:
May 7th, 2010 at 6:45 am
@Remco Ros:
1 + 1 = 10 in base 2!

PMBauer Says:
May 7th, 2010 at 7:35 am
@Remco Ross
Nope, in base 2, 1+1 = 10.
😀

Josh Schwartzberg Says:
May 7th, 2010 at 9:00 am
Great post. So basically 2 or more users working on the same data warrants CQRS :)? Before jumping right into CQRS and eventual consitency, Pessimistic Locking with CRUD can work well in small (5-10 user) collaborative data systems; preventing the user from ever attempting to alter stale data at the business cost of “waiting in line” to do so. Obviously this business cost grows exponentially the larger the system gets, but if you are fairly sure it’s not going to happen it should be evaluated as a solution.

Diogo Mafra Says:
May 7th, 2010 at 4:24 pm
Great post!! I would like to see more people asking the real question, the WHY behind the problem. And also realizing that, there is not only one solution for everything.

By asking the why, we gain a better view of the problem and, sometimes, it helps the user to realize what he really need.

James Pelletier Says:
May 9th, 2010 at 1:08 am
So often when I start asking why I get ushered out the door… I’m having trouble making people see these are important questions.

Tarek Says:
May 9th, 2010 at 3:37 am
Josh,

“Pessimistic Locking” is interpreted in different ways. Merely locking the record while you’re updating it does not really solve the problem if the other concurrent request will be served right after you release the lock. The other user has already made a decision based on data he had on the screen which is now stale.

A common solution to this problem is to read the version field of the record, store it on the client side, and then submit it along with the changes. So, if you read a record, spend 10 minutes updating it and during those 10 minutes someone else changes it then you’ll get an error. This happened to me the other day and I ended up losing 10 minutes of work and cursing the application 🙁

Of course, there is the solution of merging the changes but again that can only be done based on deep understanding of the business requirements.

I’m not saying CQRS is necessarily the solution but some solution is definitely required for this problem.

Chris Nicola Says:
May 9th, 2010 at 11:30 am
Udi, this is all very interesting and as a fairly pragmatic software developer I’d never assume CQRS is a silver bullet solution for anything. However while in this post you have given yet another reason for why and when we should use CQRS I’ve felt you have not really supported your assertion that there are situations when not to use it.

Could you give at least one concrete example of a situation where there would be notable drawbacks to using CQRS as a solution and perhaps illustrate what those drawbacks are?

Josh Schwartzberg Says:
May 9th, 2010 at 9:48 pm
Tarek,

More explicitly, if you do not allow the second user to click the “begin editing” button for an entity, they would never accidently do some work and find out that all is lost when they go to hit save because someone did it first. This is much like the exclusive check-out mode that many source control systems have. Again, it’s not ideal behavior but it could be a valid solution for smaller systems.

Alex Simkin Says:
May 10th, 2010 at 7:25 am
Base 10 is not a ubiquitous best practice.

In fact it is. 10 (one followed by zero) is ALWAYS base in any positional numbering system.

So you should have been using word “ten” not symbol 10.

Frank Quednau Says:
May 10th, 2010 at 3:28 pm
Reminds me of the fact that there is a formula to calculate an arbitrary digit being at the nth position of the number Pi without calculating the preceding digits – however, it only works with the binary number system. (http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula)

Tarek Nabil Says:
May 11th, 2010 at 2:18 am
Josh,

What you’re describing is some times referred to as a “Pessimistic Offline Lock”. Although it is suitable for certain situations, it has its limitations, especially in web applications, where users can abandon their sessions and the offline lock would only be released on timeout (whether of the session or the lock).

If you try to fix that by reducing the timeout period of the lock then you risk having the locks timeout while the users are still editing the data.

ste Says:
May 13th, 2010 at 12:17 pm
How is “true single-user-only-thing” defined?
Does it mean, that only a single user views or edits certain data at a time but there could be some kind of process which passes the data from one user to another. Or does it mean that certain data is owned by a single user and only used by this user (like a private document)?

Charlie Barker Says:
May 13th, 2010 at 6:22 pm
I like this rule of thumb it is a good sanity check. I find that once I have a sufficient understanding of the business problem building a solution becomes straightforward. The tricky part is building the understanding. In any given scenario there are many factorss that will affect your ability to build your understanding. I find the larger and more complex problems take many sessions starting at the high level and working down to the detail.
The toughest tasks are where a business is innovating so there are no experts to ask. In this scenario I accept early on that there may be significant direction changes along the way and thus im prepared to scrap or redesign as learning dictates.

Adam Says:
May 14th, 2010 at 8:58 am
Great article. Although you are explicitly talking about CQRS I feel like this applies on a much broader scope of “right tool for the job.” Too many times people find a hammer that they like and decide to use that hammer for everything, even if they really need a screwdriver, or jigsaw.

Kevin Jordan Says:
May 17th, 2010 at 9:28 am
“CQRS is actually in danger of reaching “best practice” status at which point in time people will apply it indiscriminately with truly terrible results.”

One does get this impression by the amount of material out there on the subject. It’s easy to see why this is the case. Here are the driving forces from your previous blog entry:

“Collaboration refers to circumstances under which multiple actors will be using/modifying the same set of data”

“Staleness refers to the fact that in a collaborative environment, once data has been shown to a user, that same data may have been changed by another actor – it is stale.”

The problem is an argument could be made that both of these driving forces were true for all of the applications (client side and web) I have built over the last 10 years.

I haven’t implemented CQRS or looked at a sample app as of yet, just doing alot of reading and trying to wrap my head around when to apply it. Great articles….

Cheers

Eben Says:
May 24th, 2010 at 12:44 am
Nice post!

I do think, however, that as time goes by certain techniques *can* be used in *all* situations but it may be that in some applications it is overkill 🙂 — as would be the case in a ‘small application’

Anyway, any application that is going to be going anywhere (in terms of longevity and complexity) should be looking at CQRS as best practise.

Eventual consistency, OTOH, is another matter. I would not *not* consider it as best practice.

Eben

Matt S. Says:
June 29th, 2010 at 12:28 pm
I’m still having a hard time wrapping my brain around this technique when it comes to my users’ (primarily website users) need for instant gratification.

They expect to see their recent edits applied. I previously commented about a system that allows its users to update their account information. If they make such an edit, realize they forgot something, then go back to the edit view/page to change what they forgot, they may still see the old information (prior to their edit moments ago).

I would really rather not have to code for an ambitious user that constantly checks to make sure the change they just made is taking, and reissuing it if not (i.e. view edit page, submit change, view edit page and still see old data, submit change again, view edit page, submit change again, …).

Am I missing something or are we only talking about the customer service representative example I keep seeing replayed?

udidahan Says:
June 30th, 2010 at 11:29 pm
Matt,

Part of CQRS is explicitly telling users how stale the data they’re looking at is: “Account information as of today at 12:00pm”.

Keep in mind that if you have a user who is aggressively modifying some data, it’s likely that that data is private at that point in time – possibly requiring the user to perform a kind of “publish” operation to make it visible to other users.

When the user is working on private data, there is no collaboration going on, therefore no need for CQRS.

Hope that helps.

Matt S. Says:
July 6th, 2010 at 7:10 am
Udi,

That last bit REALLY helped! Making special cases for private data to directly query the live data makes complete sense. So does letting users know the time the current data was retrieved in a collaborative environment.

Thanks again for clearing it up.

Maninder Batth Says:
September 21st, 2010 at 9:16 am
Udi,
To what kind of data is CQRS applicable? Surely, with organization having petabytes of information, it is not feasible that every function subscribes to “interesting” events and store a local copy.
For example take “reference data”. typically there is a logical function called reference data services, which provide access to clean and latest reference data. Should reference data systems start publishing their changes and every system interested in reference data maintain a copy of events?
Secondly, CQRS also seems to imply additional CRUD capabilities for every entity that a function is interested in. If function A is interested in events X, Y, Z, but the responsibility of function A is to analyze and present some information, one way to accomplish is that function A pulls the information about X Y and Z from their respective sources, but in CQRS model, function A will subscribe to these events and now needs some functionality to “do something” once it gets X Y and Z. Perhaps update logic, perhaps insert.. And additional it has logic to retrieve X Y and Z from its local db when it wants to analyze them.
It doesnt seem bad with few entities, but as subscription interest grows, so will this extra logic.