Ask Joel IT: There Can Be Only One!

Well, I want to talk about singletons but that title is a bit misleading because you can have more than one instance of a singleton. Say what? My definition of a singleton is "restrict a class or object to a single instance in a shared context." OK, I know my definition isn't exactly the Gang of Four's: "Ensure a class only has one instance, and provide a global point of access to it" (Gamma). But honestly, some folks take that book way to literally.

Those guys used "global" in the context of C++ applications mostly written for the desktop, and their point gets twisted by the choice of that word. When you move the management of the object into the class, as the GoF teaches us, then that could mean one object in a module. Or what about web-server applications: how about one per connection, or one per session? Such software wasn't really around in 1994, that is why almost two decades later the the word context is really a better choice for the description.

Why is "provide a global point of access" a constraint? The goal is to prevent more than one copy in a given context. There are many ways I can get the reference to it: a class method, from a factory class, as a configuration bean, passing it as a parameter, etc.

And I'd really need to step back from the word class too. It's appropriate for C++, Java, and C#, but JavaScript is a prototypal language without classes. It only has objects, and it isn't the only language like that. Of course I'm just being picky about terminology here: since JavaScript doesn't have classes in one sense everything is a singleton (see my post about JavaScript Objects).

So the whole point of a singleton is to support DRY, or don't repeat yourself (Hunt). First of all, if I have multiple copies of the object and data it's hard to manage. There's a chance one or more of the copies will not be correct. And even worse, making new copies of the object is an expensive proposition: it requires allocating memory and initializing it, cleaning up when the object is no longer needed, and it eats into memory which is a finite resource.

So why not create an object an reference it through a global variable? That's not the object-oriented way and it's not even possible in Java or C#. So let's just create static methods in a class and then there is only one copy! Well, what if you need to have one copy per context? A static class doesn't allow for that. What if your language doesn't even have classes? And again, it's just not the object-oriented way. We'll still have to use the static keyword, but leave classes with static methods like that for grouping loosely related functions together; that's what the Math class does!

So Code it!

In C++, Java, and C# I'm tempted to simply code this as a private static member of the class and initialize it at the same time, and then add a static method to return the value. Then I always know that it's been created before it's ever going to be used. Here's what that would look like in Java:

public class MySingleton {

@tab;private MySingleton instance = new MySingleton();
@tab;private MySingleton() { }
@tab;public static MySingleton getInstance() { return instance; }
}

This doesn't follow the GoF pattern. But, I don't have a problem using it as long as you remember two things:

The instance is created eagerly, that means it will be created when the program starts even if you never use it. That can potentially slow down the initial load of the software.
This won't work if you need to manage different instances in multiple contexts. And because we'd have to change the whole structure in the future to adapt to that requirement, maybe we should just build it a different way to begin with.

Before we leave the example notice that the constructor on line four is private. That stops any client code from accidentally creating another instance of the class.

So the other option is to create the instance in the method on demand (lazy instantiation). That bring us a whole lot closer to being able to create and manage multiple instances in different contexts:

public class MySingleton {
@tab;private MySingleton instance = null;
@tab;private MySingleton() { }
@tab;public static MySingleton getInstance() {

@tab;if (instance == null) { instance = new MySingleton(); }

@tab;@tab;return instance;
@tab;}
}

This would work fine in a single-threaded environment, which is where my mind is most of the time. But, how do you know that is going to happen? So, assuming that multiple threads could attempt to enter the getInstance method simultaneously we need to use a lock to make sure only one can enter at a time. Here's what locking the whole method looks like in Java:

public class MySingleton {

@tab;private MySingleton instance = null;

@tab;private MySingleton() { }

@tab;public static synchronized MySingleton getInstance() {

@tab;@tab;if (instance == null) { instance = new MySingleton(); }

@tab;@tab;return instance;
@tab;}
}

On line 26 the keyword synchronized has been added, which blocks multiple threads from entering. With a really simple method like this that doesn't look to bad, but... our method may have a whole lot more work to do to initialize the object. Even if the work is in the constructor it's called from inside the method and it's going to block for a while until it everything is done. And getting a lock is expensive (in computer processing terms) and doing it every time the method is called is inefficient.

OK, so lets move the lock after the check for null on line 41. This is what that looks like:

public class MySingleton {

@tab;private MySingleton instance = null;

@tab;private MySingleton() { }

@tab;public static MySingleton getInstance() {

@tab;@tab;if (instance == null) {

@tab;@tab;@tab;lock (this) { instance = new MySingleton(); }
@tab;@tab;}

@tab;@tab;return instance;
@tab;}
}

Well, that's a little better but what if two threads arrive at the check exactly at the same time and the instance is null? Both will move to the lock and the first one will enter that block and create an instance while the second one waits. But when the first thread leaves, the second thread will then enter and create yet another instance. Uh oh, that means one client has the first instance and the second client has the second instance. That is not a singleton!

The solution is a double-check on the instance. Once we enter the locked block we'll check again to make sure the instance is still null. If it isn't, that means we were the second thread hitting the lock simultaneously and we will proceed with the instance the first thread created:

public class MySingleton {

@tab;private MySingleton instance = null;

@tab;private MySingleton() { }

@tab;public static MySingleton getInstance() {

@tab;@tab;if (instance == null) {

@tab;@tab;@tab;lock (this) { if (instance == null) { instance = new MySingleton(); }}
@tab;@tab;}

@tab;@tab;return instance;
@tab;}
}

Just one more thing: it's really bad form to lock on the object this. When you lock on an object, that blocks any thread from entering any other block that is also locked on the same object. You shouldn't have multiple blocks lock the same object when they do unrelated work, that can create a deadlock: thread A locks a block using an object and needs something from thread B, but thread B is waiting to enter another block locked on the same object that thread A used. So both threads will wait forever.

As a matter of form you should always create separate objects to lock the individual tasks. Even though we have only one locked block in the example, what if in a more complex situation we could have more? So in every case I create the lock objects that I need:

public class MySingleton {

@tab;private MySingleton instance = null;
@tab;private Object instanceLock = new Object();

@tab;private MySingleton() { }

@tab;public static MySingleton getInstance() {

@tab;@tab;if (instance == null) {

@tab;@tab;@tab;lock (instanceLock) { if (instance == null) { instance = new MySingleton(); }}
@tab;@tab;}

@tab;@tab;return instance;
@tab;}
}

So there is my definition of the singleton pattern and how to apply it in Java; C++ and C# are almost identical, just the keywords are different. The only thing I didn't demonstrate was managing multiple instances in different contexts, but that is not really part of the singleton problem. I always encourage the use of a singletons instead of static objects, it's just more object-oriented!

References

See the references page.

Thursday, February 20, 2014

There Can Be Only One!

So Code it!

No comments:

Post a Comment