Validation – Udi Dahan – The Software Simplist

CQRS Video Online

udidahan — Fri, 26 Feb 2010 09:42:45 +0000

A couple of weeks ago I gave a talk on Command/Query Responsibility Segregation in London.

The recording of the talk is online here.

There is one important thing that I didn’t have enough time to cover, but I want you to keep in mind as you’re watching this. It is that CQRS is applicable only *within* the context of a single service/BC – NOT across or between them.

Let me know what you think.

Clarified CQRS

udidahan — Wed, 09 Dec 2009 14:57:19 +0000

After listening how the community has interpreted Command-Query Responsibility Segregation I think that the time has come for some clarification. Some have been tying it together to Event Sourcing. Most have been overlaying their previous layered architecture assumptions on it. Here I hope to identify CQRS itself, and describe in which places it can connect to other patterns.

Download as PDF – this is quite a long post.

Why CQRS

Before describing the details of CQRS we need to understand the two main driving forces behind it: collaboration and staleness.

Collaboration refers to circumstances under which multiple actors will be using/modifying the same set of data – whether or not the intention of the actors is actually to collaborate with each other. There are often rules which indicate which user can perform which kind of modification and modifications that may have been acceptable in one case may not be acceptable in others. We’ll give some examples shortly. Actors can be human like normal users, or automated like software.

Staleness refers to the fact that in a collaborative environment, once data has been shown to a user, that same data may have been changed by another actor – it is stale. Almost any system which makes use of a cache is serving stale data – often for performance reasons. What this means is that we cannot entirely trust our users decisions, as they could have been made based on out-of-date information.

Standard layered architectures don’t explicitly deal with either of these issues. While putting everything in the same database may be one step in the direction of handling collaboration, staleness is usually exacerbated in those architectures by the use of caches as a performance-improving afterthought.

A picture for reference

I’ve given some talks about CQRS using this diagram to explain it:

The boxes named AC are Autonomous Components. We’ll describe what makes them autonomous when discussing commands. But before we go into the complicated parts, let’s start with queries:

Queries

If the data we’re going to be showing users is stale anyway, is it really necessary to go to the master database and get it from there? Why transform those 3rd normal form structures to domain objects if we just want data – not any rule-preserving behaviors? Why transform those domain objects to DTOs to transfer them across a wire, and who said that wire has to be exactly there? Why transform those DTOs to view model objects?

In short, it looks like we’re doing a heck of a lot of unnecessary work based on the assumption that reusing code that has already been written will be easier than just solving the problem at hand. Let’s try a different approach:

How about we create an additional data store whose data can be a bit out of sync with the master database – I mean, the data we’re showing the user is stale anyway, so why not reflect in the data store itself. We’ll come up with an approach later to keep this data store more or less in sync.

Now, what would be the correct structure for this data store? How about just like the view model? One table for each view. Then our client could simply SELECT * FROM MyViewTable (or possibly pass in an ID in a where clause), and bind the result to the screen. That would be just as simple as can be. You could wrap that up with a thin facade if you feel the need, or with stored procedures, or using AutoMapper which can simply map from a data reader to your view model class. The thing is that the view model structures are already wire-friendly, so you don’t need to transform them to anything else.

You could even consider taking that data store and putting it in your web tier. It’s just as secure as an in-memory cache in your web tier. Give your web servers SELECT only permissions on those tables and you should be fine.

Query Data Storage

While you can use a regular database as your query data store it isn’t the only option. Consider that the query schema is in essence identical to your view model. You don’t have any relationships between your various view model classes, so you shouldn’t need any relationships between the tables in the query data store.

So do you actually need a relational database?

The answer is no, but for all practical purposes and due to organizational inertia, it is probably your best choice (for now).

Scaling Queries

Since your queries are now being performed off of a separate data store than your master database, and there is no assumption that the data that’s being served is 100% up to date, you can easily add more instances of these stores without worrying that they don’t contain the exact same data. The same mechanism that updates one instance can be used for many instances, as we’ll see later.

This gives you cheap horizontal scaling for your queries. Also, since your not doing nearly as much transformation, the latency per query goes down as well. Simple code is fast code.

Data modifications

Since our users are making decisions based on stale data, we need to be more discerning about which things we let through. Here’s a scenario explaining why:

Let’s say we have a customer service representative who is one the phone with a customer. This user is looking at the customer’s details on the screen and wants to make them a ‘preferred’ customer, as well as modifying their address, changing their title from Ms to Mrs, changing their last name, and indicating that they’re now married. What the user doesn’t know is that after opening the screen, an event arrived from the billing department indicating that this same customer doesn’t pay their bills – they’re delinquent. At this point, our user submits their changes.

Should we accept their changes?

Well, we should accept some of them, but not the change to ‘preferred’, since the customer is delinquent. But writing those kinds of checks is a pain – we need to do a diff on the data, infer what the changes mean, which ones are related to each other (name change, title change) and which are separate, identify which data to check against – not just compared to the data the user retrieved, but compared to the current state in the database, and then reject or accept.

Unfortunately for our users, we tend to reject the whole thing if any part of it is off. At that point, our users have to refresh their screen to get the up-to-date data, and retype in all the previous changes, hoping that this time we won’t yell at them because of an optimistic concurrency conflict.

As we get larger entities with more fields on them, we also get more actors working with those same entities, and the higher the likelihood that something will touch some attribute of them at any given time, increasing the number of concurrency conflicts.

If only there was some way for our users to provide us with the right level of granularity and intent when modifying data. That’s what commands are all about.

Commands

A core element of CQRS is rethinking the design of the user interface to enable us to capture our users’ intent such that making a customer preferred is a different unit of work for the user than indicating that the customer has moved or that they’ve gotten married. Using an Excel-like UI for data changes doesn’t capture intent, as we saw above.

We could even consider allowing our users to submit a new command even before they’ve received confirmation on the previous one. We could have a little widget on the side showing the user their pending commands, checking them off asynchronously as we receive confirmation from the server, or marking them with an X if they fail. The user could then double-click that failed task to find information about what happened.

Note that the client sends commands to the server – it doesn’t publish them. Publishing is reserved for events which state a fact – that something has happened, and that the publisher has no concern about what receivers of that event do with it.

Commands and Validation

In thinking through what could make a command fail, one topic that comes up is validation. Validation is different from business rules in that it states a context-independent fact about a command. Either a command is valid, or it isn’t. Business rules on the other hand are context dependent.

In the example we saw before, the data our customer service rep submitted was valid, it was only due to the billing event arriving earlier which required the command to be rejected. Had that billing event not arrived, the data would have been accepted.

Even though a command may be valid, there still may be reasons to reject it.

As such, validation can be performed on the client, checking that all fields required for that command are there, number and date ranges are OK, that kind of thing. The server would still validate all commands that arrive, not trusting clients to do the validation.

Rethinking UIs and commands in light of validation

The client can make of the query data store when validating commands. For example, before submitting a command that the customer has moved, we can check that the street name exists in the query data store.

At that point, we may rethink the UI and have an auto-completing text box for the street name, thus ensuring that the street name we’ll pass in the command will be valid. But why not take things a step further? Why not pass in the street ID instead of its name? Have the command represent the street not as a string, but as an ID (int, guid, whatever).

On the server side, the only reason that such a command would fail would be due to concurrency – that someone had deleted that street and that that hadn’t been reflected in the query store yet; a fairly exceptional set of circumstances.

Reasons valid commands fail and what to do about it

So we’ve got a well-behaved client that is sending valid commands, yet the server still decides to reject them. Often the circumstances for the rejection are related to other actors changing state relevant to the processing of that command.

In the CRM example above, it is only because the billing event arrived first. But “first” could be a millisecond before our command. What if our user pressed the button a millisecond earlier? Should that actually change the business outcome? Shouldn’t we expect our system to behave the same when observed from the outside?

So, if the billing event arrived second, shouldn’t that revert preferred customers to regular ones? Not only that, but shouldn’t the customer be notified of this, like by sending them an email? In which case, why not have this be the behavior for the case where the billing event arrives first? And if we’ve already got a notification model set up, do we really need to return an error to the customer service rep? I mean, it’s not like they can do anything about it other than notifying the customer.

So, if we’re not returning errors to the client (who is already sending us valid commands), maybe all we need to do on the client when sending a command is to tell the user “thank you, you will receive confirmation via email shortly”. We don’t even need the UI widget showing pending commands.

Commands and Autonomy

What we see is that in this model, commands don’t need to be processed immediately – they can be queued. How fast they get processed is a question of Service-Level Agreement (SLA) and not architecturally significant. This is one of the things that makes that node that processes commands autonomous from a runtime perspective – we don’t require an always-on connection to the client.

Also, we shouldn’t need to access the query store to process commands – any state that is needed should be managed by the autonomous component – that’s part of the meaning of autonomy.

Another part is the issue of failed message processing due to the database being down or hitting a deadlock. There is no reason that such errors should be returned to the client – we can just rollback and try again. When an administrator brings the database back up, all the message waiting in the queue will then be processed successfully and our users receive confirmation.

The system as a whole is quite a bit more robust to any error conditions.

Also, since we don’t have queries going through this database any more, the database itself is able to keep more rows/pages in memory which serve commands, improving performance. When both commands and queries were being served off of the same tables, the database server was always juggling rows between the two.

Autonomous Components

While in the picture above we see all commands going to the same AC, we could logically have each command processed by a different AC, each with it’s own queue. That would give us visibility into which queue was the longest, letting us see very easily which part of the system was the bottleneck. While this is interesting for developers, it is critical for system administrators.

Since commands wait in queues, we can now add more processing nodes behind those queues (using the distributor with NServiceBus) so that we’re only scaling the part of the system that’s slow. No need to waste servers on any other requests.

Service Layers

Our command processing objects in the various autonomous components actually make up our service layer. The reason you don’t see this layer explicitly represented in CQRS is that it isn’t really there, at least not as an identifiable logical collection of related objects – here’s why:

In the layered architecture (AKA 3-Tier) approach, there is no statement about dependencies between objects within a layer, or rather it is implied to be allowed. However, when taking a command-oriented view on the service layer, what we see are objects handling different types of commands. Each command is independent of the other, so why should we allow the objects which handle them to depend on each other?

Dependencies are things which should be avoided, unless there is good reason for them.

Keeping the command handling objects independent of each other will allow us to more easily version our system, one command at a time, not needing even to bring down the entire system, given that the new version is backwards compatible with the previous one.

Therefore, keep each command handler in its own VS project, or possibly even in its own solution, thus guiding developers away from introducing dependencies in the name of reuse (it’s a fallacy). If you do decide as a deployment concern, that you want to put them all in the same process feeding off of the same queue, you can ILMerge those assemblies and host them together, but understand that you will be undoing much of the benefits of your autonomous components.

Whither the domain model?

Although in the diagram above you can see the domain model beside the command-processing autonomous components, it’s actually an implementation detail. There is nothing that states that all commands must be processed by the same domain model. Arguably, you could have some commands be processed by transaction script, others using table module (AKA active record), as well as those using the domain model. Event-sourcing is another possible implementation.

Another thing to understand about the domain model is that it now isn’t used to serve queries. So the question is, why do you need to have so many relationships between entities in your domain model?

(You may want to take a second to let that sink in.)

Do we really need a collection of orders on the customer entity? In what command would we need to navigate that collection? In fact, what kind of command would need any one-to-many relationship? And if that’s the case for one-to-many, many-to-many would definitely be out as well. I mean, most commands only contain one or two IDs in them anyway.

Any aggregate operations that may have been calculated by looping over child entities could be pre-calculated and stored as properties on the parent entity. Following this process across all the entities in our domain would result in isolated entities needing nothing more than a couple of properties for the IDs of their related entities – “children” holding the parent ID, like in databases.

In this form, commands could be entirely processed by a single entity – viola, an aggregate root that is a consistency boundary.

Persistence for command processing

Given that the database used for command processing is not used for querying, and that most (if not all) commands contain the IDs of the rows they’re going to affect, do we really need to have a column for every single domain object property? What if we just serialized the domain entity and put it into a single column, and had another column containing the ID? This sounds quite similar to key-value storage that is available in the various cloud providers. In which case, would you really need an object-relational mapper to persist to this kind of storage?

You could also pull out an additional property per piece of data where you’d want the “database” to enforce uniqueness.

I’m not suggesting that you do this in all cases – rather just trying to get you to rethink some basic assumptions.

Let me reiterate

How you process the commands is an implementation detail of CQRS.

Keeping the query store in sync

After the command-processing autonomous component has decided to accept a command, modifying its persistent store as needed, it publishes an event notifying the world about it. This event often is the “past tense” of the command submitted:

MakeCustomerPerferredCommand -> CustomerHasBeenMadePerferredEvent

The publishing of the event is done transactionally together with the processing of the command and the changes to its database. That way, any kind of failure on commit will result in the event not being sent. This is something that should be handled by default by your message bus, and if you’re using MSMQ as your underlying transport, requires the use of transactional queues.

The autonomous component which processes those events and updates the query data store is fairly simple, translating from the event structure to the persistent view model structure. I suggest having an event handler per view model class (AKA per table).

Here’s the picture of all the pieces again:

Bounded Contexts

While CQRS touches on many pieces of software architecture, it is still not at the top of the food chain. CQRS if used is employed within a bounded context (DDD) or a business component (SOA) – a cohesive piece of the problem domain. The events published by one BC are subscribed to by other BCs, each updating their query and command data stores as needed.

UI’s from the CQRS found in each BC can be “mashed up” in a single application, providing users a single composite view on all parts of the problem domain. Composite UI frameworks are very useful for these cases.

Summary

CQRS is about coming up with an appropriate architecture for multi-user collaborative applications. It explicitly takes into account factors like data staleness and volatility and exploits those characteristics for creating simpler and more scalable constructs.

One cannot truly enjoy the benefits of CQRS without considering the user-interface, making it capture user intent explicitly. When taking into account client-side validation, command structures may be somewhat adjusted. Thinking through the order in which commands and events are processed can lead to notification patterns which make returning errors unnecessary.

While the result of applying CQRS to a given project is a more maintainable and performant code base, this simplicity and scalability require understanding the detailed business requirements and are not the result of any technical “best practice”. If anything, we can see a plethora of approaches to apparently similar problems being used together – data readers and domain models, one-way messaging and synchronous calls.

Although this blog post is over 3000 words (a record for this blog), I know that it doesn’t go into enough depth on the topic (it takes about 3 days out of the 5 of my Advanced Distributed Systems Design course to cover everything in enough depth). Still, I hope it has given you the understanding of why CQRS is the way it is and possibly opened your eyes to other ways of looking at the design of distributed systems.

Questions and comments are most welcome.

Don’t Delete – Just Don’t

udidahan — Tue, 01 Sep 2009 12:04:48 +0000

After reading Ayende’s post advocating against “soft deletes” I felt that I should add a bit more to the topic as there were some important business semantics missing. As developers discuss the pertinence of using an IsDeleted column in the database to mark deletion, and the way this relates to reporting and auditing concerns is weighed, the core domain concepts rarely get a mention. Let’s first understand the business scenarios we’re modeling, the why behind them, before delving into the how of implementation.

The real world doesn’t cascade

Let’s say our marketing department decides to delete an item from the catalog. Should all previous orders containing that item just disappear? And cascading farther, should all invoices for those orders be deleted as well? Going on, would we have to redo the company’s profit and loss statements?

Heaven forbid.

So, is Ayende wrong? Do we really need soft deletes after all?

On the one hand, we don’t want to leave our database in an inconsistent state with invoices pointing to non-existent orders, but on the other hand, our users did ask us to delete an entity.

Or did they?

When all you have is a hammer…

We’ve been exposing users to entity-based interfaces with “create, read, update, delete” semantics in them for so long that they have started presenting us requirements using that same language, even though it’s an extremely poor fit.

Instead of accepting “delete” as a normal user action, let’s go into why users “delete” stuff, and what they actually intend to do.

The guys in marketing can’t actually make all physical instances of a product disappear – nor would they want to. In talking with these users, we might discover that their intent is quite different:

“What I mean by ‘delete’ is that the product should be discontinued. We don’t want to sell this line of product anymore. We want to get rid of the inventory we have, but not order any more from our supplier. The product shouldn’t appear any more when customers do a product search or category listing, but the guys in the warehouse will still need to manage these items in the interim. It’s much shorter to just say ‘delete’ though.”

There seem to be quite a few interesting business rules and processes there, but nothing that looks like it could be solved by a single database column.

Model the task, not the data

Looking back at the story our friend from marketing told us, his intent is to discontinue the product – not to delete it in any technical sense of the word. As such, we probably should provide a more explicit representation of this task in the user interface than just selecting a row in some grid and clicking the ‘delete’ button (and “Are you sure?” isn’t it).

As we broaden our perspective to more parts of the system, we see this same pattern repeating:

Orders aren’t deleted – they’re cancelled. There may also be fees incurred if the order is canceled too late.

Employees aren’t deleted – they’re fired (or possibly retired). A compensation package often needs to be handled.

Jobs aren’t deleted – they’re filled (or their requisition is revoked).

In all cases, the thing we should focus on is the task the user wishes to perform, rather than on the technical action to be performed on one entity or another. In almost all cases, more than one entity needs to be considered.

Statuses

In all the examples above, what we see is a replacement of the technical action ‘delete’ with a relevant business action. At the entity level, instead of having a (hidden) technical WasDeleted status, we see an explicit business status that users need to be aware of.

The manager of the warehouse needs to know that a product is discontinued so that they don’t order any more stock from the supplier. In today’s world of retail with Vendor Managed Inventory, this often happens together with a modification to an agreement with the vendor, or possibly a cancellation of that agreement.

This isn’t just a case of transactional or reporting boundaries – users in different contexts need to see different things at different times as the status changes to reflect the entity’s place in the business lifecycle. Customers shouldn’t see discontinued products at all. Warehouse workers should, that is, until the corresponding Stock Keeping Unit (SKU) has been revoked (another status) after we’ve sold all the inventory we wanted (and maybe returned the rest back to the supplier).

Rules and Validation

When looking at the world through over-simplified-delete-glasses, we may consider the logic dictating when we can delete to be quite simple: do some role-based-security checks, check that the entity exists, delete. Piece of cake.

The real world is a bigger, more complicated cake.

Let’s consider deleting an order, or rather, canceling it. On top of the regular security checks, we’ve got some rules to consider:

If the order has already been delivered, check if the customer isn’t happy with what they got, and go about returning the order.

If the order contained products “made to order”, charge the customer for a portion (or all) of the order (based on other rules).

And more…

Deciding what the next status should be may very well depend on the current business status of the entity. Deciding if that change of state is allowed is context and time specific – at one point in time the task may have been allowed, but later not. The logic here is not necessarily entirely related to the entity being “deleted” – there may be other entities which need to be checked, and whose status may also need to be changed as well.

Summary

I know that some of you are thinking, “my system isn’t that complex – we can just delete and be done with it”.

My question to you would be, have you asked your users why they’re deleting things? Have you asked them about additional statuses and rules dictating how entities move as groups between them? You don’t want the success of your project to be undermined by that kind of unfounded assumption, do you?

The reason we’re given budgets to build business applications is because of the richness in business rules and statuses that ultimately provide value to users and a competitive advantage to the business. If that value wasn’t there, wouldn’t we be serving our users better by just giving them Microsoft Access?

In closing, given that you’re not giving your users MS Access, don’t think about deleting entities. Look for the reason why. Understand the different statuses that entities move between. Ask which users need to care about which status. I know it doesn’t show up as nicely on your resume as “3 years WXF”, but “saved the company $4 million in wasted inventory” does speak volumes.

One last sentence: Don’t delete. Just don’t.

Don’t Create Aggregate Roots

udidahan — Mon, 29 Jun 2009 11:52:37 +0000

My previous post on Domain Events left some questions about how aggregate roots should be created unanswered. It would actually be more accurate to say how aggregate roots should *not* be created. It turns out that this is one of the less intuitive parts of domain-driven design and has been the source of many arguments on the matter. Let’s start with the wrong way:

   1:  using (ISession s = sf.OpenSession())

   2:  using (ITransaction tx = s.BeginTransaction())

   3:  {

   4:      Customer c = new Customer();

   5:      c.Name = "udi dahan";

6:

   7:      s.Save(c);

   8:      tx.Commit();

   9:  }

I understand that the code above is representative of how much code is written when using an object-relational mapper. Many would consider this code to follow DDD principles – that Customer is an aggregate root. Unfortunately – that is not the case. The code above is missing the real aggregate root.

There’s also the inevitable question of validation – if the customer object isn’t willing to accept a name with a space in it, should we throw an exception? That would prevent an invalid entity from being saved, which is good. On the other hand, exceptions should be reserved for truly exceptional occurrences. But if we don’t use exceptions, using Domain Events instead, how do we prevent the invalid entity from being saved?

All of these issues are handled auto-magically once we have a true aggregate root.

Always Get An Entity

Let’s start with the technical guidance – always get an entity. At least one. Also, don’t add any objects to the session or unit of work explicitly – rather, have some other already persistent domain entity create the new entity and add it to a collection property.

Looking at the code above, we see that we’re not following the technical guidance.

But the question is, which entity could we possibly get from the database in this case? All we’re doing is adding a customer.

And that’s exactly where the technical guidance leads us to the business analysis that was missing in this scenario…

Business Analysis

Customers don’t just appear out of thin air.

Blindingly obvious – isn’t it.

So why would we technically model our system as if they did? My guess is that we never really thought about it – it wasn’t our job. So here’s the breaking news – if we want to successfully apply DDD we do need to think about it, it is our job.

Going back to the critical business question:

Where do customers come from?

In the real world, they stroll into the store. In our overused e-commerce example, they navigate to our website. New customers that haven’t used our site before don’t have any cookies or anything we can identify them with. They navigate around, browsing, maybe buying something in the end, maybe not.

Yet, the browsing process is interesting in its own right:

Which products did they look at?
Did they use the search feature?
How long did they spend on each page?
Did they scroll down to see the reviews?

If and when they do finally buy something, all that history is important and we’d like to maintain a connection to it.

Actually, even before they buy something, what they put in their cart is the interesting piece. The transition from cart to checkout is another interesting piece. Do they actually complete the checkout process, or do they abandon it midway through?

Add to that when we ask/force them to create a user/login in our system.

Are they actually a customer if they haven’t bought anything?

We’re beginning to get an inkling that almost every activity that results in the creation of an entity or storing of additional information can be traced to a transition from a previous business state.

In any transition, the previous state is the aggregate root.

In the beginning…

Let’s start at the very beginning then – someone came to our site. Either they navigated here from some other web page, they clicked on an email link someone sent them, or they typed in our URL. This can be designed as follows:

   1:  using (ISession s = sf.OpenSession())

   2:  using (ITransaction tx = s.BeginTransaction())

   3:  {

   4:     var referrer = s.Get(msg.URL);

   5:     referrer.BroughtVisitorWithIp(msg.IpAddress);

6:

   7:     tx.Commit();

   8:  }

9:

And our referrer code could look something like this:

   1:  public void BroughtVisitorWithIp(string ipAddress)

   2:  {

   3:     var visitor = new Visitor(ipAddress);

   4:     this.NewVisitors.Add(visitor);

   5:  }

6:

This follows the technical guidance we saw at the beginning.

It also allows us to track which referrer is bringing us which visitors, through tracking those visitors as they become shoppers (by putting stuff in their cart), finally seeing which become customers.

We can solve the situation of not having a referrer by implementing the null object pattern which is well supported by all the standard object-relational mappers these days.

How it works internally

When we call a method on a persistent entity retrieved by the object-relational mapper, and the entity modifies its state like when it adds a new entity to one of its collection properties, when the transaction commits, here’s what happens:

The mapper sees that the persistent entity is dirty, specifically, that its collection property was modified, and notices that there is an object in there that isn’t persistent. At that point, the mapper knows to persist the new entity without us ever having to explicitly tell it to do so. This is sometimes known as “persistence by reachability”.

Where validation happens

Let’s consider the relatively trivial rule that says that a user name can’t contain a space.

Also, keep in mind that a registered user is the result of a transition from a visitor.

Here’s *one* way of doing that:

   1:  public class Visitor

   2:  {

   3:     public void Register(string username, string password)

   4:     {

   5:        if (username.Contains(" "))

   6:        {

   7:           DomainEvents.Raise();

   8:           return;

   9:        }

10:

  11:        var user = new User(username, password);

  12:        this.RegisteredUser = u;

  13:     }

  14:  }

15:

This actually isn’t representative of most of the rules that will be found in the domain model, but it illustrates a way of preventing an entity from being created without our service layer needing to know anything. All the service layer does is get the visitor object and call the Register method.

Validation of string lengths, data ranges, etc is not domain logic and is best handled elsewhere (and a topic for a different post). The same goes for uniqueness.

Summary

The most important thing to keep in mind is that if your service layer is newing up some entity and saving it – that entity isn’t an aggregate root *in that use case*. As we saw above, in the original creation of the Visitor entity by the Referrer, the visitor class wasn’t the aggregate root. Yet, in the user registration use case, the Visitor entity was the aggregate root.

Aggregate roots aren’t a structural property of the domain model.

And in any case, don’t go saving entities in your service layer – let the domain model manage its own state. The domain model doesn’t need any references to repositories, services, units of work, or anything else to manage its state.

If you do all this, you’ll also be able to harness the technique of fetching strategies to get the best performance out of your domain model by representing your use cases as interfaces on the domain model like IRegisterUsers (implemented by Visitor) and IBringVisitors (implemented by Referrer).

And spending some time on business analysis doesn’t hurt either – unless customers really do fall out of the sky in your world

Generic Validation

thesoftwaresimplist — Mon, 30 Apr 2007 21:54:26 +0000

Ayende brought up the topic of Input & Business Rule Validation and I wanted to post how I solve this issue.

On kind of input validation is something you do as close to the user as possible for performance reasons. This includes all sorts of smart stuff you can do with JavaScript in Web scenarios. When in a Smart Client environment, you usually have greater capabilities.

When I look at the issue of validation, I see that it centers around the entity. Sometimes, it is also affected by other things, like what process are we in (as described in the comments on Ayende’s post).

So, we can model the thing that validates an entity with an interface, say, IValidator where T : IEntity. This interface will have one main method: bool IsValid(T entity); and one main property: string ErrorDescription { get; }

What this allows us to do is to separate out different validation concerns into different classes, yet have all of them implement the same interface.

The next thing we’ll need is to be able to get an instance for each of the classes that is a validator for a specific kind of entity. For instance, when the NewCustomerView raises an event proclaiming that it has a Customer object ready to be saved, the Controller will want to find all classes that implement IValidator so that it can run all the validation rules.

Luckily, the generics patch I put out for the Spring.Net Framework allows us to do this in one simple line of code:

IList> validators = spring.GetObjectsOfType(typeof(IValidator));

and quite simply perform the validation as follows:

foreach(IValidator v in validators)
  if (!v.IsValid(myCustomer))
    // notify user with v.ErrorDescription, write to log, whatever

Now, when using this in Smart Client scenarios, you will often have views that allow the user to enter in a single entity which you will then want to validate. If you have those views implement a generic interface like: IEntityView where T : IEntity, then you could have a single base class implement a “Validate” method, which would perform the work above generically like so:

foreach(IValidator v in spring.GetObjectsOfType(typeof(IValidator)))
  if (!v.IsValid(this.Entity))
    // notify user with v.ErrorDescription

and just have your specific view call that method on the button click.

This enables all entity views to activate all the relevant custom validation logic without being tied to it. It also enables you to extend your system by adding new classes implementing the IValidator interface, and have them automatically run without even changing a config file. How’s that for loose coupling?

Finally, on the issue of tying validation rules to specific processes, this can be done by extending the interface to: IValidator where T : IEntity, P : IProcess; You then model each process with a marker interface. You then have your specific validation classes implement the above generic interface for each specific process interface. For instance, say we have a validation rule that needs to run for processes P1 (marked by IP1), and P2 (marked by IP2), but not P3 (marked by IP3), which validates entities of type E. This would be done by defining the class like so:

public class MyValidator : IValidator, IValidator {}

and the Controller class that would request validators for process 2 would just call:

spring.GetObjectsOfType(typeof(IValidator));

That’s it. The basic principles are simple, but, as you can see, can create very powerful structures. I’ve got to say that this exemplifies one of the reasons why I love generics so much. When used with Dependency Injection, and/or Delegates, and/or Anonymous Methods, you get such a power of expression just by defining an interface. This is one of things that make coding fun for me. Or maybe I’m just wierd that way