Udi Dahan   Udi Dahan  –  The  Software  Simplist
 
Enterprise  Development  Expert  &  SOA  Specialist
 
 
Home Blog Consulting Training Articles Speaking About Contact
  

Archive for the ‘Architecture’ Category



Building Super-Scalable Web Systems with REST

Monday, December 29th, 2008

I’ve been consulting with a client who has a wildly successful web-based system, with well over 10 million users and looking at a tenfold growth in the near future. One of the recent features in their system was to show users their local weather and it almost maxed out their capacity. That raised certain warning flags as to the ability of their current architecture to scale to the levels that the business was taking them.

danger

On Web 2.0 Mashups

One would think that sites like Weather.com and friends would be the first choice for implementing such a feature. Only thing is that they were strongly against being mashed-up Web 2.0 style on the client - they had enough scalability problems of their own. Interestingly enough (or not), these partners were quite happy to publish their weather data to us and let us handle the whole scalability issue.

Implementation 1.0

The current implementation was fairly straightforward - client issues a regular web service request to the GetWeather webmethod, the server uses the user’s IP address to find out their location, then use that location to find the weather for that location in the database, and return that to the user. Standard fare for most dynamic data and the way most everybody would tell you to do it.

Only thing is that it scales like a dog.

Add Some Caching

The first thing you do when you have scalability problems and the database is the bottleneck is to cache, well, that’s what everybody says (same everybody as above).

The thing is that holding all the weather of the entire globe in memory, well, takes a lot of memory. More than is reasonable. In which case, there’s a fairly decent chance that a given request can’t be served from the cache, resulting in a query to the database, an update to the cache, which bumps out something else, in short, not a very good hit rate.

Not much bang for the buck.

If you have a single datacenter, having a caching tier that stores this data is possible, but costly. If you want a highly available, business continuity supportable, multi-datacenter infrastructure, the costs add up quite a bit quicker - to the point of not being cost effective (”You need HOW much money for weather?! We’ve got dozens more features like that in the pipe!”)

What we can do is to tell the client we’re responding to that they can cache the result, but that isn’t close to being enough for us to scale.

Look at the Data, Leverage the Internet

When you find yourself in this sort of situation, there’s really only one thing to do:

In order to save on bandwidth, the most precious commodity of the internet, the various ISPs and backbone providers cache aggressively. In fact, HTTP is designed exactly for that.

If user A asks for some html page, the various intermediaries between his browser and the server hosting that page will cache that page (based on HTTP headers). When user B asks for that same page, and their request goes through one of the intermediaries that user A’s request went through, that intermediary will serve back its cached copy of the page rather than calling the hosting server.

Also, users located in the same geographic region by and large go through the same intermediaries when calling a remote site.

Leverage the Internet

The internet is the biggest, most scalable data serving infrastructure that mankind was lucky enough to have happen to it. However, in order to leverage it - you need to understand your data and how your users use it, and finally align yourself with the way the internet works.

Let’s say we have 1,000 users in London. All of them are going to have the same weather. If all these users come to our site in the period of a few hours and ask for the weather, they all are going to get the exact same data. The thing is that the response semantics of the GetWeather webmethod must prevent intermediaries from caching so that users in Dublin and Glasgow don’t get London weather (although at times I bet they’d like to).

REST Helps You Leverage the Internet

Rather than thinking of getting the weather as an operation/webmethod, we can represent the various locations weather data as explicit web resources, each with its own URI. Thus, the weather in London would be http://weather.myclient.com/UK/London.

If we were able to make our clients in London perform an HTTP GET on http://weather.myclient.com/UK/London then we could return headers in the HTTP response telling the intermediaries that they can cache the response for an hour, or however long we want.

That way, after the first user in London gets the weather from our servers, all the other 999 users will be getting the same data served to them from one of the intermediaries. Instead of getting hammered by millions of requests a day, the internet would shoulder easily 90% of that load making it much easier to scale. Thanks Al.

This isn’t a “cheap trick”. While being straight forward for something like weather, understanding the nature of your data and intelligently mapping that to a URI space is critical to building a scalable system, and reaping the benefits of REST.

What’s left?

The only thing that’s left is to get the client to know which URI to call. A simple matter, really.

When the user logs in, we perform the IP to location lookup and then write a cookie to the client with their location (UK/London). That cookie then stays with the user saving us from having to perform that IP to location lookup all the time. On subsequent logins, if the cookie is already there, we don’t do the lookup.

BTW, we also show the user “you’re in London, aren’t you?” with the link allowing the user to change their location, which we then update the cookie with and change the URI we get the weather from.

In Closing

While web services are great for getting a system up and running quickly and interoperably, scalability often suffers. Not so much as to be in your face, but after you’ve gone quite a ways and invested a fair amount of development in it, you find it standing between you and the scalability you seek.

Moving to REST is not about turning on the “make it restful” switch in your technology stack (ASP.NET MVC and WCF, I’m talking to you). Just like with databases there is no “make it go fast” switch - you really do need to understand your data, the various users access patterns, and the volatility of the data so that you can map it to the “right” resources and URIs.

If you do walk the RESTful path, you’ll find that the scalability that was once so distant is now within your grasp.



SOA, REST, and Pub/Sub

Monday, December 15th, 2008

From Integrated Simplicity:

SOA & Web

The question of how web-based (or 3rd party) consumers can work with pub/sub based services comes up a lot.

Many developers are used to implementing web services exposing methods on them like GetAllCustomers.

When moving to pub/sub and other more loosely coupled messaging patterns, developers look to implement the same pattern, opting for something like duplex GetCustomersRequest and GetCustomersResponse. The reasoning is simple and straightforward - it is difficult to push data over the web to consumers.

However, there are still ways to disconnect the preparation of the data from its usage thus gaining many of the advantages of pub/sub.

By employing REST principles and modelling our customer list as an explicit resource, web-based consumers would simply perform regular HTTP GET operations on the URI to get the list of customers.

The resource itself could be a simple XML file - it wouldn’t need to be dynamic at all.

You can get all the scalability benefits of pub/sub for web based consumers. All you need is a bit of REST :)



Self-Contained Events and SOA

Saturday, December 13th, 2008

diamondIn the architectural principle of fully self contained messages, events “can - instantly and in future - be interpreted as the respective event without the need to rely on additional data stores that would need to be in time-sync with the event during message-processing.”

Also, “passing reference data in a message makes the message-consuming systems dependent on the knowledge and availability of actual persistent data that is stored “somewhere”. This data must separately be accessed for the sake of understanding the event that is represented by the message.”

The discussion of self-contained events can be compared to integration databases vs application databases.

Centralized Integration - Pros & Cons

If everything in a system can access a central datastore, it is enough for one party to publish an event containing only the ID of an entity that that party previously entered/updated. Upon receiving that event, a subscriber would go to the central datastore and get the fields its interested in for that ID. The advantage of this approach is that the minimal amount of data necessary crosses the network, as subscribers only retrieve the fields that interest them. Martin Fowler describes the disadvantages as:

“An integration database needs a schema that takes all its client applications into account. The resulting schema is either more general, more complex or both. The database usually is controlled by a separate group to the applications and database changes are more complex because they have to be negotiated between the database group and the various applications.”

This is far from being aligned with the principle of autonomy so important to SOA. In that respect, the architectural principle of self-contained messages points us away from those problems and towards more autonomous services.

However, once we have these autonomous business services in place, we may find that we don’t need 100% fully self-contained messages anymore.

A Real-World Example

Let’s say we have 3 business services, Sales, Fulfillment, and Billing.

Sales publishes an OrderAccepted event when it accepts an order. That event contains all the order information.

Both Fulfillment and Billing are subscribed to this event, and thus receive it.

Fulfillment does not ship products to the customer until the customer has been billed, so it just stores the order information internally, and is done.

Billing starts the process of billing the customer for their order, possibly joining several orders into a single bill. After completing this process, it publishes a CustomerBilled event containing all billing information, as well as the IDs of the orders in that bill. It does not put all the order information in that event, as it is not the authoritative owner of that data.

When Fulfillment receives the CustomerBilled event, it uses the IDs of the orders contained in the event to find the order information it previously stored internally. It does not need to call the Sales service for this information or contact some central Master Data Management system. It uses the data it has, and goes about fulfilling the orders and shipping the products to the customer, finally publishing its own OrderShipped event.

Notice, as well, that in the original OrderAccepted event there were the IDs of products the customer ordered. These product IDs originated from another service, Merchandising, responsible for the product catalog. The same thing can be said for the customer ID originating from another service - Customer Care.

The Issue of Time

One could argue that since subscribers use previously cached data when processing new events, that data might not be up to date. Also, we may have race conditions between our services. In the above example, if Billing was extremely fast and more highly available than Fulfillment. Billing could have received the OrderAccepted event, processed it, and published the CustomerBilled event before Fulfillment had received the OrderAccepted event. In short, the CustomerBilled and OrderAccepted messages could be out of order in Fulfillment’s queue.

What would Fulfillment do when trying to process the CustomerBilled message when it doesn’t have the order information?

Well, it knows that the world is parallel and non-sequential, so it does NOT return/log an error, but rather puts that message in the back of the queue to be processed again later (or maybe in some other temporary holding area). This enables the OrderAccepted message to be processed before the CustomerBilled message is retried. When the retry occurs, well, everything’s OK – it’s worked itself out over time.

In the case where we retry again and again and things don’t work themselves out (maybe the OrderAccepted event was lost), we move that message off to a different queue for something else to resolve the conflict (maybe a person, maybe software). If/when the conflict is resolved (got the Sales system / messaging system to replay the OrderAccepted event), the conflict resolver returns the CustomerBilled message to the queue, and now everything works just fine.

As all of this is occurring, the only thing that’s visible to external parties is that it happens to be taking longer than usual for the OrderShipped event to be published. In other words, time is the only difference.

 

Summary

The problem of non-self-contained events is mitigated first and foremost by business services in SOA, and the apparent issue of time-synchronization by business logic inside these services.

Don’t be afraid to put IDs in your messages and events.

Do be afraid of using those IDs to access datastores shared by multiple “services”.

Using IDs to correlated current events to data from previous events is not only OK, it’s to be expected.

The architectural principle of fully self-contained messages steers us away from the problems of Integration Databases and towards Application Databases, autonomous services, and a better SOA implementation. From there, following the principle of autonomy from a business perspective, will lead us to services not publishing data in their messages that is owned by other services, taking us the next step of our journey to SOA.


Related Content

[Podcast] Message Ordering - Is it cost effective?

Don’t EDA between existing systems

[Podcast] Handling dependencies between subscribers in SOA



Lost Notifications? No Problem.

Sunday, December 7th, 2008

One of the most common questions I get on the topic of pub/sub messaging is what happens if a notification is lost. Interestingly enough, there are some who almost entirely write-off this pattern because of this issue, preferring the control of request/response-exception. So, what should be done about lost messages? The short answer is durable messaging. The long answer is design.

Durable Messaging

In order to prevent a message from being lost when it is sent from a publisher to a subscriber, the message is written to disk on the publisher side, and then forwarded to the subscriber, where it is also written to disk. This store-and-forward mechanism enables our systems to gracefully recover from either side being temporarily unavailable.

In my MSDN article on this topic, I outlined some problems with this approach. These problems are exacerbated for publishers. Imagine a publisher with 40 subscribers, publishing 10 messages a second, each containing 1MB of XML. If 10 of the subscribers are unavailable, that’s 100MB of data being written to the publisher’s disk every second, 6GB every minute. That’s liable to bring down a publisher before an administrator brews a cup of coffee.

Publishers have no choice but to throw away messages after a certain period of time.

Publisher Contracts

The whole issue of contracts and schema is considered one of the better understand parts of SOA. Unfortunately, the operational aspects of service contracts is hardly ever taken into account.

On top of the schema of the messages a service publishers, additional information is needed in the contract:

  1. How big will this message be?
  2. How often will it be published?
  3. How long will this message be stored if a subscriber is unavailable?

This first two pieces of information are important for subscribers to do load and capacity planning. The last one is the most important as it dictates the required availability and fault-tolerance characteristic of subscribers.

For Example

In the canonical retail scenario, when our sales service accepts an order, it publishes an order accepted event. Other services subscribed to this event include shipping, billing, and business intelligence.

While shipping and billing are highly available and able to keep up with the rate at which orders are accepted, the business intelligence service is not. BI has two main parts to it - a nightly batch that does the number crunching, and a UI for reporting off of the results of that number crunching. Some even do the reporting in a semi-offline fashion, emailing reports back to the user when they’re ready.

Furthermore, nobody’s going to invest in servers for making BI highly available.

And wasn’t the whole point of this publish/subscribe messaging to keep our services autonomous? That not all services have to have the same level uptime?

Houston, do we have a problem.?

Data Freshness

There is a glimmer of light in all this doom and gloom.

Not all services have the same data freshness requirements.

The business intelligence service above doesn’t need to know about orders the second they’re accepted. A daily roll-up would be fine, and an hourly roll-up bring us that much closer to “real time business intelligence”.

So, while BI is ready to accept the sales message schema, it would like a slightly different contract around it - less messages per unit of time, more data in each message.

From the operational perspective of the sales service, it would be cost effective to have less “online” subscribers. It could even take things a few steps further. Instead of using the regular messaging backbone for transmitting these hourly messages, it could use FTP. The data could even be zipped to take up even less space. Since the total data size is less than the corresponding online stream, is stored on cheaper, large storage, and the number of subscribers for this zipped, hourly update is fairly small, these messages can be kept around far longer.

If you’ve heard about consumer-driven contracts, this is it.

Note that we’re still talking about the same logical message schema.

Summary

It’s not that lost notifications aren’t a problem.

It’s that they feed the design process in such a way that the resulting service ecosystem is set up in such a way that notifications won’t get lost. I know that that sounds kind of recursive, but that’s how it works. Either subscribers take care of their SLA allowing them to process the online stream of events, or they should subscribe to a different pipe (which will have different SLA requirements, but maybe they can deal with those).

It make sense to have multiple pipes for the same logical schema.

It’s practically a necessity to make pub/sub a feasible solution.

 


Related Content

MSDN article on messaging and lost messages

Durable messaging dilemmas

Additional logic required for service autonomy

More in depth example on events and pub/sub between services

Consumer-Driven Contracts



Reliability, Availability, and Scalability

Saturday, November 15th, 2008

The great people at IASA have made the recording for my webcast available online.

You can find it here.
The slides can be found here.

I also gave this talk at TechEd Barcelona and wanted to thank the attendee who posted this comment:

“You’ve done it again. Everytime I attend a session of yours I leave the room with new insights and inspiration on how to improve my software…”

You made my day.



Domain Events - Take 2

Monday, August 25th, 2008

My previous post on how to create fully encapsulated domain models introduced the concept of events as a core pattern of communication from the domain back to the service layer. In that post, I put up enough code to get the idea across but didn’t address issues like memory leaks and multi-threading. This post will show the solution to those two critical points.

I’ve snipped out one of the events in the previous example for brevity.

Previous API

The previous API looked like this:

   1:  public static class DomainEvents
   2:  {
   3:       public static event EventHandler GameReportedLost;
   4:       public static void RaiseGameReportedLostEvent()
   5:       {
   6:             if (GameReportedLost != null)
   7:                 GameReportedLost(null, null);
   8:       }
   9:   
  10:       public static event EventHandler CartIsFull;
  11:       public static void RaiseCartIsFull()
  12:       {
  13:             if (CartIsFull != null)
  14:                 CartIsFull(null, null);
  15:       }
  16:  }

One thing that we want to keep in the solution is that all the code to define events, their names, and the parameters they bring will be in one place - in this case, the DomainEvents class. One thing that we’d like to fix is the amount of code needed to define an event.

Previous Service Layer

Here’s what our previous service layer code looked like:

   1:  public class AddGameToCartMessageHandler :
   2:      BaseMessageHandler<AddGameToCartMessage>
   3:  {
   4:      public override void Handle(AddGameToCartMessage m)
   5:      {
   6:          using (ISession session = SessionFactory.OpenSession())
   7:          using (ITransaction tx = session.BeginTransaction())
   8:          {
   9:              ICart cart = session.Get<ICart>(m.CartId);
  10:              IGame g = session.Get<IGame>(m.GameId);
  11:   
  12:              Domain.DomainEvents.GameReportedLost +=
  13:                gameReportedLost;
  14:              Domain.DomainEvents.CartIsFull +=
  15:                cartIsFull;
  16:   
  17:              cart.Add(g);
  18:   
  19:              Domain.DomainEvents.GameReportedLost -=
  20:                gameReportedLost;
  21:              Domain.DomainEvents.CartIsFull -=
  22:                cartIsFull;
  23:   
  24:              tx.Commit();
  25:          }
  26:      }
  27:   
  28:      private EventHandler gameReportedLost = delegate { 
  29:            Bus.Return((int)ErrorCodes.GameReportedLost);
  30:          };
  31:   
  32:      private EventHandler cartIsFull = delegate { 
  33:            Bus.Return((int)ErrorCodes.CartIsFull);
  34:          };
  35:      }
  36:  }

Another thing that should be improved is the amount of code needed in the service layer.

Raising an event, though, should still be fairly simple - one line of code similar to DomainEvents.RaiseGameReportedLost().

New API

Here’s what the new API looks like:

   1:  public static class DomainEvents
   2:  {
   3:       public static readonly DomainEvent<IGame> GameReportedLost = 
   4:                                            new DomainEvent<IGame>;
   5:   
   6:       public static readonly DomainEvent<ICart> CartIsFull=
   7:                                            new DomainEvent<ICart>;
   8:  }

It looks like we’ve managed to bring down the complexity of defining an event.

Raising an event is slightly different, but still only one line of code (”this” refers to the Cart class that is calling this API): DomainEvents.CartIsFull.Raise(this);

New Service Layer

The advantage of having a disposable domain event allows us to use the “using” construct for cleanup.

   1:  public class AddGameToCartMessageHandler :
   2:      BaseMessageHandler<AddGameToCartMessage>
   3:  {
   4:      public override void Handle(AddGameToCartMessage m)
   5:      {
   6:          using (ISession session = SessionFactory.OpenSession())
   7:          using (ITransaction tx = session.BeginTransaction())
   8:          using (DomainEvents.GameReportedLost.Register(gameReportedLost))
   9:          using (DomainEvents.CartIsFull.Register(cartIsFull))
  10:          {
  11:              ICart cart = session.Get<ICart>(m.CartId);
  12:              IGame g = session.Get<IGame>(m.GameId);
  13:   
  14:              cart.Add(g);
  15:   
  16:              tx.Commit();
  17:          }
  18:      }
  19:   
  20:      private Action<IGame> gameReportedLost = delegate { 
  21:            Bus.Return((int)ErrorCodes.GameReportedLost);
  22:          };
  23:   
  24:      private Action<ICart> cartIsFull = delegate { 
  25:            Bus.Return((int)ErrorCodes.CartIsFull);
  26:          };
  27:      }
  28:  }

I also want to mention that you don’t necessarily have to have the same service layer object handle these events as that which calls the domain objects. In other words, we can have singleton objects handling these events for things like sending emails, notifying external systems, and auditing.

The Infrastructure

The infrastructure that makes all this possible (in a thread-safe way) is quite simple and made up of two parts, the DomainEvent that we saw being used above, and the DomainEventRegistrationRemover which handles the disposing:

   1:  using System;
   2:  using System.Collections.Generic;
   3:   
   4:  namespace DomainEventInfrastructure
   5:  {
   6:      public class DomainEvent<E> 
   7:      {
   8:          [ThreadStatic] 
   9:          private static List<Action<E>> _actions; 
  10:   
  11:          protected List<Action<E>> actions 
  12:          {
  13:              get { 
  14:                  if (_actions == null) 
  15:                      _actions = new List<Action<E>>(); 
  16:   
  17:                  return _actions; 
  18:              }
  19:          }
  20:   
  21:          public IDisposable Register(Action<E> callback) 
  22:          {
  23:              actions.Add(callback);
  24:              return new DomainEventRegistrationRemover(delegate
  25:                  {
  26:                      actions.Remove(callback);
  27:                  }
  28:              ); 
  29:          }
  30:   
  31:          public void Raise(E args) 
  32:          {
  33:              foreach (Action<E> action in actions) 
  34:                  action.Invoke(args);
  35:          }
  36:      }
  37:  }
  38:   

Note that the invocation list of the domain event is thread static, meaning that each thread gets its own copy - even though they’re all working with the same instance of the domain event.

Here’s the DomainEventRegistrationRemover - even simpler:

   1:  using System;
   2:   
   3:  namespace DomainEventInfrastructure
   4:  {
   5:      public class DomainEventRegistrationRemover : IDisposable 
   6:      {
   7:          private readonly Action CallOnDispose;
   8:   
   9:          public DomainEventRegistrationRemover(Action ToCall) 
  10:          {
  11:              this.CallOnDispose = ToCall; 
  12:          }
  13:   
  14:          public void Dispose() 
  15:          {
  16:              this.CallOnDispose.DynamicInvoke();
  17:          }
  18:      }
  19:  }

For your convenience, I’ve made these available for download here.

I also want to add that if you haven’t looked at the comments on the original post - there’s some really good stuff there (36 comments so far). Take a look.



An Answer of Scale

Wednesday, August 13th, 2008

To the question of scale Ayende brings up, I thought I’d tap my concept map.

First of all, I wanted to address the relationship between various topics related to scalability:

performance topics

And on the connection between scalability and throughput:

 scalability topics

The important message here is that the scalability of a system is a cost function that gives throughput as a function of recurring costs and one time costs - servers and other hardware, and the join of buy & build:

Did you write your own locking/transaction mechanism on top of an open source distributed cache or did you buy a license for a space-based technology?

Also, don’t forget that people need to administer all the servers that you have. Those people cost money (easily100K per year). Maybe, because you haven’t invested in management or monitoring tools you need one person for every two servers. This will influence the breakdown of up front costs and recurring costs. Also, the level of availability you require will impact this as well.

In my experience, architects don’t consider often enough the operations environment in their "scalability calculations".

What this means is that there’s no such thing as technically "not being able to scale".

Rather, that the cost (up front + recurring) of supporting higher throughput grows faster than the function of revenue per user/request/whatever.

Sometimes, the solution is just to find ways to make more money per customer.

For more technical solutions, take a look at the difference between capacity and scalability and how the competing consumer pattern helps scale out.

Scalability, it’s all about the money.

Oh, I almost forgot, I also had a great conversation with Carl and Richard about scaling web sites that’s now up on the .NET Rocks site. Enjoy.



Command Query Separation and SOA

Monday, August 11th, 2008

One of the common questions I receive from people starting to use nServiceBus is how one-way messaging fits with showing the user a grid (or list) of data. Thinking about publish/subscribe usually just gets them even more confused. Trying to resolve all this with Service Oriented Architecture leaves them wondering - why bother?

client server

In regular client-server development, the server is responsible for providing the client with all CRUD (create, read, update, and delete) capabilities. However, when users look at data they do not often require it to be up to date to the second (given that they often look at the same screen for several seconds to minutes at a time). As such, retrieving data from the same table as that being used for highly consistent transaction processing creates contention resulting in poor performance for all CRUD actions under higher load.

A Scalable Solution

One of the common answers to this question is for the server/service to publish a message when data changes (say, as the result of processing a message) and for clients to subscribe to these messages. When such a notification arrives at a client, the client would cache the data it needs. Then, when the user wants to see a grid of data, that data is already on the client. Of course, this solution doesn’t work so well for older client machines (like some point of service devices) or if there are millions of rows of data.

The thing is that this solution is one implementation of a more general pattern - command query separation (CQS).

Command Query Separation

Wikipedia describes CQS as a pattern where "… every method should either be a command that performs an action, or a query that returns data to the caller, but not both. More formally, methods should return a value only if they are referentially transparent and hence possess no side effects."

Martin Fowler is less strict about the use of CQS allowing for exceptions: "Popping a stack is a good example of a modifier that modifies state. Meyer correctly says that you can avoid having this method, but it is a useful idiom. So I prefer to follow this principle when I can, but I’m prepared to break it to get my pop."

So, how does separating commands from queries and SOA help at all in getting data to and from a UI? The answer is based on Pat Helland’s thinking as described in his article Data on the Inside vs. Data on the Outside.

Services Cross Boxes

The biggest lie around SOA is that services run.

Let that sink in a second.

Sure services have runnable components, but that’s not why they’re important.

I’ll skip the books of background and cut to the chase:

Services communicate with each other using publish/subscribe and one-way messaging. Services have components inside them. Inside a service, these components can communicate with each using synchronous RPC, or any other mechanism. Also, these components can reside on different machines.

This is broader than just scaling out a service. There can be service components running on the client as well as the server.

SOA & CQS

Combining these two concepts together, here’s what comes out:

In this solution there are two services that span both client and server - one in charge of commands (create, update, delete), the other in charge of queries (read). These services communicate only via messages - one cannot access the database of the other.

The command service publishes messages about changes to data, to which the query service subscribes. When the query service receives such notifications, it saves the data in its own data store which may well have a different schema (optimized for queries like a star schema).

The client component which is in charge of showing grids of data to the user behaves the same as it would in a regular layered/tiered architecture, using synchronous blocking request/response to get its data - SOA doesn’t change that.

Composite Applications

Although the client side components of both the command and query services are hosted in the same process, they are very much independent of each other. That being said, from an interoperability perspective (the one that most people attribute to SOA), all of the client-side components will likely be developed using the same technology - although there are already ways to host Java code in .NET and vice-versa.

Of course, once we talk about web UI’s things are a bit different - but still similar. While web-server-side there may be a level of independence, for browser side inter-component communications we’re still likely to target javascript. There, I’ve managed to say something technical supporting mashups and SOA without lying through my teeth.

On the Microsoft side with the recent release of the Composite Application Guidance & Library (pronounced "prism") I hope that more of these principles will be reaching the "smart client". The command pattern is especially critical in maintaining the separation while enabling communication to still occur so I’m glad that, as one of the Prism advisors, I was able to simplify that part (Glenn still has nightmares about that rooftop conversation).

Publish / Subscribe

In the "scalable solution" section up top I mentioned how publish/subscribe to the smart client is really just one implementation of CQS and SOA. So, how different is it really?

smart client pub/sub

Well, there will probably be a different technology mapping. Instead of a star-schema OLAP product, we might simply store the published data in memory on the client. That is, if you designed your components to be technology agnostic.

In terms of the use of nServiceBus, the same component is going to be subscribing to the same type of message - all that’s different is that now every client will be having data pushed to them rather than this occurring server-side only.

You could have the same code deployed differently in the same system - stronger clients subscribing themselves, weaker ones using a remote server. Web servers would probably be considered stronger clients. This kind of flexible deployment has proven to be extremely valuable for my larger clients. The added benefit of enabling users to work (view data) even while offline (somewhere there’s no WIFI) is just icing on the cake.

A Word of Warning

Once the client starts receiving notifications, and handling those on a background thread (as it should) the code becomes susceptible to deadlocks and data races. Juval does a good job of outlining some of those with respect to the use of WCF. Prism doesn’t provide any assurances in this area either.

Summary

NServiceBus is not designed to be used for any and all types of communication in a given architecture. In the examples above, nServiceBus handles the publish/subscribe but leaves the synchronous RPC to existing solutions like WCF. Not only that, but synchronous RPC does have its place in architecture, just not across service boundaries. In all cases, data is served to users from a store different from that which transaction processing logic uses.

Command Query Separation is not only a good idea at the method/class level but has advantages at the SOA/System level as well - yet another good idea from 20 years ago that services build upon. Making use of CQS requires understanding your data and its uses - SOA builds on that by looking into data volatility and the freshness business requirements around it.

Finally, designing the components of your services in such a way that their dependency on technology is limited buys a lot of flexibility in terms of deployment and, consequently, significant performance and scalability gains.

Simple, it is. Easy, it is not.



Distributed Systems Concept Map

Monday, August 4th, 2008

The other day I had this idea, what if I were to take all the concepts I write, speak, and consult about and turn them into a concept map. That might help me explain how things like messaging, unit of work, and exception management work together and why. It also shouldn’t be too much work. Or so I thought.

I started out with a blank piece of paper, and this is what happened:

concept_map

I got into some threat modeling, which connected to authentication, authorization, and integrity, connecting to consistency, transactions, and fault tolerance, and, on the flip side, eventual consistency, messaging, and REST. Yes, SOA is in a tiny circle at the bottom left :-)

I tried to keep the file big enough to be quite readable but small enough to be sent directly via email (less than 600KB).

The coming release is going to be a more interactive environment where you’ll be able to click on each concept for more information, see what else its linked to, and get links to online resources. This is quite an undertaking so if this is something that you want to see move forward, please leave a comment or maybe link to it from your blog. It’s hard for me to know what really connects with you and what you just delete from your reader so any feedback is really appreciated.



Logging - The Smart Way

Friday, August 1st, 2008

Don’t.

Not in applicative code anyway.

This follows up on Ayende’s post about the AOP way.

Now, I have nothing against AOP but some developers are leery of it.

In broader terms, all logging goes in framework-level code. For smart clients, one really good place to put logging is in your Command infrastructure - every time a command is invoked, log it and the args. For data access, well, any decent O/R Mapper has a lot of logging already, just use it. For communication, ditto. Funny that just last week this was one of the major bits of feedback I gave in a code review.

The Important Part image

Logging is useful for developers to find out why a system isn’t working correctly.

It is terrible for knowing that a system isn’t working correctly.

If you’re entire exception management strategy is “write it to the log”, how will an admin know that something’s wrong? Did you remember to configure your logging library that errors (and maybe warnings too) should be pushed out to a monitoring system? Do you have a monitoring system?

And if the admins don’t know anything’s wrong, they won’t know they need to increase the fidelity of the logs, will they? Are you planning on providing training for your admins telling them this (and all the other things they need to know)? Or maybe this will all be set up as an automatic script?

An Agile Digression

I hope all that’s on your agile (”we can ship at the end of every 2 week iteration”) product backlog (pardon my cynicism). I hope it’s at least something that you’re looking at per release and feeding the relevant features into your iterations. Yes, there’s project work to do (writing training manuals) that isn’t “development” that needs to be handled; if you don’t timebox it into the same iterations, it won’t get done.

Now, back to you’re regularly schedule logging…

Things Logging Doesn’t Addressimage

Logging is a mildly useless tool for pinpointing where in the system the source of a problem is.

“I know the entity isn’t in the database. I can see that. I want to know why it isn’t there.”

Sure, if you had every SQL statement logged you could figure these sorts of things out out. Of course, performance-wise, you wouldn’t put the system into production like that. In which case, the delete statement wouldn’t have been logged leaving you with precious little information to solve the root cause.

Also consider that the more logging you do, the more crap you’ll have to sift through to find the proverbial needle. Developers often don’t think twice about increasing the amount of crap logs they generate…

The Real Problem

The real problem is that developers think too much about logging and not at all nearly enough about designing the system in ways that it’ll be easy possible to answer questions like those above without having to know exactly how the system is built. One of the reasons that developers should care about this is that it’ll decrease the number of times they need to get up at 3:00 am to answer those questions.

A Path to the Solutionimage

Now, if you had some kind of business activity monitoring (BAM) capability in your system, an admin could do a simple search/query [WHEN entity DELETED] and find out answers to the questions above, find out the time that the relevant activities occurred, figure out what the problem is on their own, and maybe even fix it - especially if it has to do with some esoteric configuration variable.Regardless of whether you buy a BAM tool or roll what you need yourself, you need to understand what about the system needs to be monitored. That’s a very different thought-process to go through than “should we log this? Yeah, sure, why not.”

It’s called “Design for Operations”.

Take a holistic perspective on exception management, logging, monitoring, etc. Think about questions like those above and then analyse your use of the relevant tools in that context. Think about all the different kinds of users of the information that’s going to be generated and how quickly their going to need to act on that information. Admins in the data-center in the middle of a crisis are going to have different needs than developers analysing logs on their machine. Think about:

  • How will the administrator know that a server has been configured properly?
  • If the system is feeling slow, how can the administrator know which server/process is to blame?
    • So that maybe they can scale out that part of the system.

In Closingimage

It’s a mindset.

It takes time to make the shift.

It takes more time to bring the development process to this kind of maturity (god, I hate that word).

Writing exceptions to the log is not a strategy.

At the very best, its a tactic.

What’s your strategy?



   


Don't miss my best content
 
Locations of visitors to this page

Recommendations

Sam Gentile Sam Gentile, Independent WCF & SOA Expert
“Udi, one of the great minds in this area.
A man I respect immensely.”





Ian Robinson Ian Robinson, Principal Consultant at ThoughtWorks
"Your blog and articles have been enormously useful in shaping, testing and refining my own approach to delivering on SOA initiatives over the last few years. Over and against a certain 3-layer-application-architecture-blown-out-to- distributed-proportions school of SOA, your writing, steers a far more valuable course."

Simon Segal Simon Segal, Systems Integration Manager at LinFox
“Udi is one of the outstanding software development minds in the world today, his vast insights into Service Oriented Architectures and Smart Clients in particular are indeed a rare commodity. Udi is also an exceptional teacher and can help lead teams to fall into the pit of success. I would recommend Udi to anyone considering some Architecural guidance and support in their next project.”

Ohad Israeli Ohad Israeli, Chief Architect at Hewlett-Packard, Indigo Division
“When you need a man to do the job Udi is your man! No matter if you are facing near deadline deadlock or at the early stages of your development, if you have a problem Udi is the one who will probably be able to solve it, with his large experience at the industry and his widely horizons of thinking , he is always full of just in place great architectural ideas.
I am honored to have Udi as a colleague and a friend (plus having his cell phone on my speed dial).”

Eli Brin, Program Manager at RISCO Group
“We hired Udi as a SOA specialist for a large scale project. The development is outsourced to India. SOA is a buzzword used almost for anything today. We wanted to understand what SOA really is, and what is the meaning and practice to develop a SOA based system.
We identified Udi as the one that can put some sense and order in our minds. We started with a private customized SOA training for the entire team in Israel. After that I had several focused sessions regarding our architecture and design.
I will summarize it simply (as he is the software simplist): We are very happy to have Udi in our project. It has a great benefit. We feel good and assured with the knowledge and practice he brings. He doesn’t talk over our heads. We assimilated nServicebus as the ESB of the project. I highly recommend you to bring Udi into your project.”

Yoel Arnon Yoel Arnon, MSMQ Expert
“Udi has a unique, in depth understanding of service oriented architecture and how it should be used in the real world, combined with excellent presentation skills. I think Udi should be a premier choice for a consultant or architect of distributed systems.”

Vadim Mesonzhnik, Development Project Lead at Polycom
“When we were faced with a task of creating a high performance server for a video-tele conferencing domain we decided to opt for a stateless cluster with SQL server approach. In order to confirm our decision we invited Udi.

After carefully listening for 2 hours he said: "With your kind of high availability and performance requirements you don’t want to go with stateless architecture."

One simple sentence saved us from implementing a wrong product and finding that out after years of development. No matter whether our former decisions were confirmed or altered, it gave us great confidence to move forward relying on the experience, industry best-practices and time-proven techniques that Udi shared with us.
It was a distinct pleasure and a unique opportunity to learn from someone who is among the best at what he does.”

Jack Van Hoof Jack Van Hoof, Enterprise Integration Architect at Dutch Railways
“Udi is a respected visionary on SOA and EDA, whose opinion I most of the time (if not always) highly agree with. The nice thing about Udi is that he is able to explain architectural concepts in terms of practical code-level examples.”

Nick Malik Nick Malik, Enterprise Architect at Microsoft Corporation
“You are an excellent speaker and trainer, Udi, and I've had the fortunate experience of having attended one of your presentations. I believe that you are a knowledgable and intelligent man.”

Sean Farmar Sean Farmar, Chief Technical Architect at Candidate Manager Ltd
“Udi has provided us with guidance in system architecture and supports our implementation of NServiceBus in our core business application.

He accompanied us in all stages of our development cycle and helped us put vision into real life distributed scalable software. He brought fresh thinking, great in depth of understanding software, and ongoing support that proved as valuable and cost effective.

Udi has the unique ability to analyze the business problem and come up with a simple and elegant solution for the code and the business alike.
With Udi's attention to details, and knowledge we avoided pit falls that would cost us dearly.”

Motty Cohen, SW Manager at KorenTec Technologies
“I know Udi very well from our mutual work at KorenTec. During the analysis and design of a complex, distributed C4I system - where the basic concepts of NServiceBus start to emerge - I gained a lot of "Udi's hours" so I can surely say that he is a professional, skilled architect with a fresh ideas and unique perspective for solving complex architecture challenges. His ideas, concepts and parts of the artifacts are the basis of several state-of-the-art C4I systems that I was involved in their architecture design.”

Aaron Jensen Aaron Jensen, VP of Engineering at Eleutian Technology
“Awesome. Just awesome.

We’d been meaning to delve into messaging at Eleutian after multiple discussions with and blog posts from Greg Young and Udi Dahan in the past. We weren’t entirely sure where to start, how to start, what tools to use, how to use them, etc. Being able to sit in a room with Udi for an entire week while he described exactly how, why and what he does to tackle a massive enterprise system was invaluable to say the least.

We now have a much better direction and, more importantly, have the confidence we need to start introducing these powerful concepts into production at Eleutian.”

Gad Rosenthal Gad Rosenthal, Department Manager at Retalix
“A thinking person. Brought fresh and valuable ideas that helped us in architecting our product. When recommending a solution he supports it with evidence and detail so you can successfully act based on it. Udi's support "comes on all levels" - As the solution architect through to the detailed class design. Trustworthy!”

Robert Lewkovich, Product / Development Manager at Eggs Overnight
“Udi's advice and consulting were a huge time saver for the project I'm responsible for. The $ spent were well worth it and provided me with a more complete understanding of nServiceBus and most importantly in helping make the correct architectural decisions earlier thereby reducing later, and more expensive, rework.”

Ray Houston Ray Houston, Director of Development at TOPAZ Technologies
“Udi's SOA class made me smart - it was awesome.

The class was very well put together. The materials were clear and concise and Udi did a fantastic job presenting it. It was a good mixture of lecture, coding, and question and answer. I fully expected that I would be taking notes like crazy, but it was so well laid out that the only thing I wrote down the entire course was what I wanted for lunch. Udi provided us with all the lecture materials and everyone has access to all of the samples which are in the nServiceBus trunk.

Now I know why Udi is the "Software Simplist." I was amazed to find that all the code and solutions were indeed very simple. The patterns that Udi presented keep things simple by isolating complexity so that it doesn't creep into your day to day code. The domain code looks the same if it's running in a single process or if it's running in 100 processes.”

Liron Levy, Team Leader at Rafael
“I've met Udi when I worked as a team leader in Rafael. One of the most senior managers there knew Udi because he was doing superb architecture job in another Rafael project and he recommended bringing him on board to help the project I was leading.
Udi brought with him fresh solutions and invaluable deep architecture insights. He is an authority on SOA (service oriented architecture) and this was a tremendous help in our project.
On the personal level - Udi is a great communicator and can persuade even the most difficult audiences (I was part of such an audience myself..) by bringing sound explanations that draw on his extensive knowledge in the software business. Working with Udi was a great learning experience for me, and I'll be happy to work with him again in the future.”

Eytan Michaeli Eytan Michaeli, CTO Korentec
“Udi was responsible for a major project in the company, and as a chief architect designed a complex multi server C4I system with many innovations and excellent performance.”

Evgeny-Hen Osipow, Head of R&D at PCLine
“Udi has helped PCLine on projects by implementing architectural blueprints demonstrating the value of simple design and code.”

Nimrod Peleg Nimrod Peleg, Lab Engineer at Technion IIT
“One of the best programmers and software engineer I've ever met, creative, knows how to design and implemet, very collaborative and finally - the applications he designed implemeted work for many years without any problems!”

Consult with Udi

Guest Authored Books
Chapter: Introduction to SOA    Article: The Enterprise Service Bus and Your SOA



Creative Commons License  © Copyright 2008, Udi Dahan. email@UdiDahan.com