B669: Personalized Data Mining and Mapping; Make Contracts

B669: Personalized Data Mining and Mapping
Make Contracts

Encapsulating your code is far more than just making all variables private and adding public getters and setters to control their state; it's about taking control of all manipulations of your own state. It's about freedom.

Suppose, for example, that we want to give a number of objects of different classes a unique identifier. Say we have to put identifiers on a number of car parts of different types in a car factory. Each object (car part) must have its own identifier and there may be many classes (kinds) of objects.

Solution I

One obvious solution is to give each class its own range of int identifiers:


class SomeClass
	{
	//assign an initial identifier value
	//for the range of identifers
	//to be used by instances of this class
	private static int identifierNumber = xxxx;

	//declare my identifier
	private int identifier;

	public SomeClass()
		{
		/*
		Set my identifier and do other constructor stuff.
		*/

		identifier = identifierNumber++;

		//other constructor code...
		}

	public int getIdentifier()
		{
		/*
		Tell the world what my identifier is.
		*/

		return identifier;
		}

	public void setIdentifier(int identifier)
		{
		/*
		Let the world reset my identifier.
		*/

		this.identifier = identifier;
		}

	//other code to implement SomeClass's behavior...
	}

This solution might look quite reasonable to a C programmer, but it is in fact very bad code. (Which might explain why C programs are so notoriously buggy...)

First, SomeClass has no real control over its identifiers, even though identifer is private, since its setIdentifier() method is public---which means that any other object could alter identifier at any time. Simply making all variables private and adding getters and setters does not encapsulate a class.

Second, SomeClass has no real control over the uniqueness of its identifiers, since it might be asked to create more objects than it has numbers in its (implicit) range. You might object that surely this problem isn't serious if we make the ranges number in millions---but of course one day that decision will be seen as shortsighted, as the Year 2000 problem illustrates today. More than that, though, it's a sign of a deeper problem with the code: identifiers have nothing in particular to do with integers.

Third, because each class has its own range of identifiers hardwired in, at compile time we must have predecided exactly how many identifiers there will be for each type. Consequently, there is no flexibility at all in this code.

In sum, this code is bad because to work properly it relies completely on the programmer to always do the right thing. Whoever has to write or modify all the client classes using identifiers have to know everything about all identifiers---and remember to make all the right changes in all the right places. Which means of course that someone will forget something one day and catch a nasty bug. Never treat programmers like the machines they use.

This is extremely bad style. I'm ashamed to admit that it's my own braindamaged code! There might be a general rule here: whatever first pops up is probably a bad idea; think again.

Solution II

A slightly less nasty version creates a separate Identifier class to hold identifiers and gets rid of the public setIdentifier() method entirely:


class SomeClass
	{
	//declare my identifer
	private int identifier;

	public SomeClass()
		{
		/*
		Set my identifier.
		*/

		this.identifier = Identifier.currentIdentifier++;
		}

	public int getIdentifier()
		{
		/*
		Tell the world what my identifier is.
		*/

		return identifier;
		}
	}

class Identifier
	{
	/*
	Hold the current identifier value in a class variable.
	*/

	public static int currentIdentifier = 0;
	}

This is a bit better, but it's still bad code.

First, the programmer working on SomeClass has to remember to increment the currentIdentifier variable. Multiply that effort by the number of classes that have to have identifiers and you have a problem. Worse, if it isn't done, or is done improperly, it could lead to subtle bugs without causing any fatal errors. This is a variant of the previous issue: don't treat programmers like machines; they'll disappoint you.

Second, the code is not thread-safe. If multiple objects are each running in their own thread and each want to get an identifier, they will each try to increment the same class variable (in class Identifier) and might step on each other's toes. To solve that problem we could synchronize on class Identifier, but that's not a good solution; it's a band-aid because it doesn't address the deeper problem, which is that currentIdentifier is being treated as a good old C-style global variable!

Third, we're in trouble if we ever change our assumptions about identifiers to reserve ranges of identifiers, reuse identifiers of destroyed objects, keep track of how many identifiers are presently in use, ensure concurrent access to identifiers, add check digits to identifiers to ensure validity, or, indeed, make any change whatsoever to identifiers. To make any such change may mean changing every single occurrence of the variables identifier and currentIdentifier everywhere in the code.

A telltale sign that this is a bad solution is that the new Identifer class is nothing more than a data storage device---it has no behavior. A good way to tell whether something is a true object is to see whether it has state (variables) and behavior (methods to operate on those variables). If it's missing one or the other, it isn't an object and should be folded into whichever class that uses it. If there are many such classes then the design is broken and needs to be rethought so that the bad class is given state and behavior.

Solution III

A more sophisticated solution encapsulates identifiers still further by removing all knowledge of how identifiers are generated from the client classes:


class SomeClass
	{
	//declare my identifier
	private int identifier;

	public SomeClass()
		{
		/*
		Set my identifer.
		*/

		identifier = Identifier.getNewIdentifier();
		}

	public int getIdentifier()
		{
		/*
		Tell the world what my identifier is.
		*/

		return identifier;
		}
	}

class Identifier
	{
	/*
	Hold the current identifier value in a class variable.
	*/

	private static int currentIdentifier = 0;

	public static int getNewIdentifier()
		{
		/*
		Generate and return a new identifier.
		*/

		return currentIdentifier++;
		}
	}

This is the kind of code commonly displayed in programming languages books as being "well-encapsulated". It's of course much better than the previous stabs but it still leaves unnecessary coupling between class Identifier and its client classes. For example, all the client classes still have to know that identifiers are integers---which means that if we ever decide to change that type, then all the clients will have to change their declarations of identifiers. The Identifier encapsulation is still incomplete.

Solution IV

An even better solution completely encapsulates all identifier assumptions into class Identifier:


class SomeClass
	{
	//declare my identifier
	private Identifier identifier;

	public SomeClass()
		{
		/*
		Set my identifier.
		*/

		identifier = Identifier.getNewIdentifier();
		}

	public Identifier getIdentifier()
		{
		/*
		Tell the world what my identifier is.
		*/

		return Identifier;
		}
	}

class Identifier
	{
	/*
	Encapsulate the idea of an identifier
	by creating, returning, managing, and testing identifiers
	for arbitrary client classes.
	*/

	private static int currentIdentifier = 0;
	private int identifier;

	private Identifier()
		{
		/*
		Set the secret identifier
		for this particular Identifier object.

		Disallow outside instantiation
		to keep full control of Identifier creation.
		*/

		identifier = currentIdentifier;
		}

	public static synchronized Identifier getNewIdentifier()
		{
		/*
		Thread-safely generate and return a new Identifier.
		*/

		currentIdentifier++;
		return new Identifier();
		}

	public String toString()
		{
		/*
		Return a String representation of an Identifier.
		*/

		return "" + identifier;
		}

	public static boolean equals(Identifier identifier1,
		Identifier identifier2)
		{
		/*
		Test whether two Identifiers are equal.
		*/

		return (identifier1 == identifier2);
		}

	//other identifier code...
	}

Each client class now gets an Identifier object and that's all it knows, or has to know. Hidden inside that object is the actual identifier, but that's not for the client classes to know about, or to care about. Class Identifier takes care of all the identifier management. It is now fully encapsulated.

The client classes' complete ignorance of what's inside an Identifier object is, paradoxically, good for everyone. It's easy to see, for example, how to change class Identifier to use Strings as identifiers, say, rather than ints, and not have the change affect any of the client classes at all:


class Identifier
	{
	/*
	Encapsulate the idea of an identifier
	by creating, returning, managing, and testing identifiers
	for arbitrary client classes.
	*/

	private static String currentIdentifier = "";
	private String identifier;

	private Identifier()
		{
		/*
		Set the secret identifier
		for this particular Identifier object.

		Disallow outside instantiation
		to keep full control of Identifier creation.
		*/

		identifier = currentIdentifier;
		}

	public static synchronized Identifier getNewIdentifier()
		{
		/*
		Thread-safely generate and return a new Identifier.
		*/

		currentIdentifier += "1";
		return new Identifier();
		}

	public String toString()
		{
		/*
		Return a String representation of an Identifier.
		*/

		return identifier;
		}

	public static boolean equals(Identifier identifier1,
		Identifier identifier2)
		{
		/*
		Test whether two Identifiers are equal.
		*/

		return (identifier1.equals(identifier2));
		}

	//other identifier code...
	}

Conclusion

Now we can see what's at the heart of good object-oriented style: class Identifier establishes a contract with the classes that use it (its clients). Class Identifier promises to do certain things (its public methods collectively form its API) and that's all its client classes have to know. In fact, the less they know about the actual implementation of class Identifier the less they change if class Identifier changes.

To establish contracts between all the involved classes we must face the issue of exactly what an identifier is when we're creating class Identifier in the first place. That constraint helps us produce the most insight and so the best code for class Identifier and its clients.

Because class Identifier has been encapsulated, then once its contract has been decided, a programmer on the other side of the world can implement it in complete isolation from whoever is implementing its client classes.

The core identifier properties we've discovered are:

an object's identifier is unique,
an object's identifier is fixed,
two object identifiers can be tested for equality,
there are an unlimited number of identifiers, and
an object can publically display its identifier.

Of course, a good programmer's job is never done. Suppose we later decide that the contract for class Identifier should stipulate that an object can address another object given only its Identifier. To make that change could require us to change all the calls to getNewIdentifier() to add a reference to the calling object, which might then be saved in a private table internal to class Identifier.

Because it breaks the contract, that new condition requires us to change potentially a lot of calls. Even then, though, all that would really have to change is the client's calls to getNewIdentifier (or we might use reflection). Of course, the best solution is to think of this condition when we're designing class Identifier's contract in the first place!

Another problem that might arise in actual implementations is that Identifier creation might be a bottleneck when we have to create thousands of objects since, at bottom, it presently depends on exactly one class variable. It might be better to distribute the work by creating multiple Identifier creator objects, each with their own variables and some way to probabilistically guarantee identifier uniqueness (generating large random numbers should do the trick).

One final picky point is that there is nothing in the current contract that prevents a client object from requesting lots of identifiers and producing different ones when asked for its identifier. (Yes, I know that would probably take actual malice on the part of the client class programmer, but it might happen accidentally somehow.) Ideally, each client object should only ever get one Identifier and that protocol should be enforced exclusively by the Identifier class. But this is stepping into pretty rarefied areas that most programmers don't consider at all, so it's probably time to stop and sum up.

Object-oriented programming protects programmers by letting them establish firm contracts between different objects. Establishing firm contracts is what leads to clean, robust, reusable, and repurposable code. Learning the mere statements of Java is useless if this lesson is lost.