Ask Joel IT: High Cohesion and Low Coupling

We don't often talk about cohesion and coupling at the same time but maybe we should. High cohesion is not a principle, it's a state that we try to achieve. Cohesion is a metric: a measurement of how closely related a group of things are.

We have been tasked with designing software for a bank. Here is our first shot at what an account should look like in the Unified Modeling Language. We are going to take this back to our customer and make sure that we are all on the same page about how an account works:

You may not like this example. Maybe you see some problems, or maybe it's just a feeling that something is wrong. So, let's pull it apart and find out exactly why!

The first question I'd like to ask is: what process can you think of uses the balance and interest rate together? How about one using the balance and address together? Ah ha! That one raises a flag. Sure, they'll get printed together on a report, but where is the data used together?

Robert Martin (Uncle Bob) defines the Single Responsibility principle as a class or object having "a single reason to change" (Martin, Hansel). Remember, there are no classes in a prototypal language like JavaScript, only objects, but all of this still applies. A good way to look at the principle is if the only reason I can find to change the account class is to do something with the balance, then the class has a single responsibility: the balance! Single responsibility is not cohesion but they are related: high-cohesion is an indication that single responsibility may be met. So are the balance and customer information cohesive?

We took this diagram back to our customer and they scratched their heads a bit. Then they asked: does this support a customer who has multiple accounts? Or customers who share accounts? So we'll have to duplicate the customer information to every account that they own. Of course if they change their address will have to fix it in all of those places. What if we miss one? Then which account has the right information?

Of course having customers share an account would mean that we have to have room for multiple names and addresses in each account. There is a law of nature involved here: no matter how many customers an account can have, you're going to find a case where you need just one more!

So what we are bumping up against here is the principle of Avoid Duplication, also known as DRY: Don't Repeat Yourself (Hunt). The principle addresses two concerns in programming: don't repeat code, and don't repeat information.

Well, now two red flags are up on our class, so maybe we had better do something about it. We'll do what you always do in OOP: decouple the information into related classes!

Ah, decoupling! So coupling is a metric too, and it's almost the same as cohesion. We talk about cohesion inside a class or object, and we talk about coupling between objects. But if you talk about the coupling between a group of objects in a module, isn't that the same as cohesion in the module? It's really just about your point of view, and we use different words to emphasize that point of view!

While we strive for high cohesion in a class, object, or module, on the flip-side we strive for low coupling between them. The reason is: if classes or objects are tightly coupled then they are very dependent upon each other, and a change to one will force a ripple effect through the others. So low-coupling means that objects are not very dependent on each other. How do we achieve this?

The primary means in OOP to achieve low coupling is to design to an abstraction. You may be more familiar with the concept through a closely related term: polymorphism. Why would we make the accounts in a bank polymorphic? Well, because a checking account, savings account, and investment account all have similar behaviors. So we create a superclass (or prototype object) that encapsulates the shared behavior, and then we expand on that behavior in subclasses or extended objects:

One of the benefits of doing this in all OOP languages is that client-code using an account can be designed to the abstraction of the superclass (or prototype object). My bank can keep a collection of account objects and periodically send messages to them to calculate interest on the balance. The bank doesn't care what kind of objects they are, and in fact the code doesn't need to be updated if new types of accounts are added because we know that all new accounts will still have the method to calculate interest!

class Bank {

@tab;List<Account> accounts;
@tab;...

@tab;public void applyInterest() {

@tab;@tab;for (Account acct : accounts) {

@tab;@tab;@tab;acct.calculateInterest();
@tab;@tab;}
@tab;}
}

The underlying principle behind polymorphism is substitutability: when a particular data type is needed any data of that type or derived from that type may be used. More specifically, most modern OOP languages implement the Liskov Substitution Principle. This principle extends substitutability with more constraints, among them that substitutability also applies to data returned from a method (Martin, Hansel).

So what we have here is the use of an abstraction. Uncle Bob Martin defined this as another principle: open for extension, closed for modification, or open/closed (Martin). New types of accounts can be added (Account is open) without changing the bank that manages them (Bank is closed).

Hopefully all of this helped demystify what we mean when we are talking about high cohesion and low coupling. More importantly along the way we looked at four fundamental principles that underly these two measurements: single responsibility, open/closed, Liskov substitution, and DRY. A fundamental goal behind all software development is to achieve the characteristics of RAM: reliable, adaptable, and maintainable software. Everything about successful programming and meeting that goal depends on the application of the fundamental principles of design.

References

See the references page.

Friday, February 14, 2014

High Cohesion and Low Coupling

No comments:

Post a Comment