Make Contracts
Encapsulating your code is far more than just making all variables private and adding public getters and setters to control their state; it's about taking control of all manipulations of your own state. It's about freedom.
Suppose, for example, that we want to give a number of objects of different
classes a unique identifier. Say we have to put identifiers on a number of
car parts of different types in a car factory. Each object (car part) must
have its own identifier and there may be many classes (kinds) of objects.
Solution I
One obvious solution is to give each class its own range of int identifiers:
class SomeClass
{
//assign an initial identifier value
//for the range of identifers
//to be used by instances of this class
private static int identifierNumber = xxxx;
//declare my identifier
private int identifier;
public SomeClass()
{
/*
Set my identifier and do other constructor stuff.
*/
identifier = identifierNumber++;
//other constructor code...
}
public int getIdentifier()
{
/*
Tell the world what my identifier is.
*/
return identifier;
}
public void setIdentifier(int identifier)
{
/*
Let the world reset my identifier.
*/
this.identifier = identifier;
}
//other code to implement SomeClass's behavior...
}
This solution might look quite reasonable to a C programmer, but it is in fact very bad code. (Which might explain why C programs are so notoriously buggy...)
First, SomeClass
has no real control over its identifiers,
even though identifer
is private,
since its setIdentifier()
method is public---which
means that any other object could alter identifier
at any time.
Simply making all variables private and adding getters and setters
does not encapsulate a class.
Second, SomeClass
has no real control
over the uniqueness of its identifiers,
since it might be asked to create more objects than it has numbers in its
(implicit) range.
You might object that surely this problem isn't serious if we make the ranges
number in millions---but of course one day that decision will be seen as
shortsighted, as the Year 2000 problem illustrates today.
More than that, though, it's a sign of a deeper problem with the code:
identifiers have nothing in particular to do with integers.
Third, because each class has its own range of identifiers hardwired in, at compile time we must have predecided exactly how many identifiers there will be for each type. Consequently, there is no flexibility at all in this code.
In sum, this code is bad because to work properly it relies completely on the programmer to always do the right thing. Whoever has to write or modify all the client classes using identifiers have to know everything about all identifiers---and remember to make all the right changes in all the right places. Which means of course that someone will forget something one day and catch a nasty bug. Never treat programmers like the machines they use.
This is extremely bad style.
I'm ashamed to admit that it's my own braindamaged code!
There might be a general rule here:
whatever first pops up is probably a bad idea; think again.
Solution II
A slightly less nasty version creates
a separate Identifier
class to hold identifiers
and gets rid of the public setIdentifier()
method entirely:
class SomeClass
{
//declare my identifer
private int identifier;
public SomeClass()
{
/*
Set my identifier.
*/
this.identifier = Identifier.currentIdentifier++;
}
public int getIdentifier()
{
/*
Tell the world what my identifier is.
*/
return identifier;
}
}
class Identifier
{
/*
Hold the current identifier value in a class variable.
*/
public static int currentIdentifier = 0;
}
This is a bit better, but it's still bad code.
First, the programmer working on SomeClass
has to remember to
increment the currentIdentifier
variable. Multiply that effort by
the number of classes that have to have identifiers and you have a problem.
Worse, if it isn't done, or is done improperly,
it could lead to subtle bugs without causing any fatal errors.
This is a variant of the previous issue:
don't treat programmers like machines; they'll disappoint you.
Second, the code is not thread-safe. If multiple objects are each running
in their own thread and each want to get an identifier,
they will each try to increment the same class variable
(in class Identifier
)
and might step on each other's toes.
To solve that problem we could synchronize on class Identifier
,
but that's not a good solution;
it's a band-aid because it doesn't address the deeper problem,
which is that currentIdentifier
is being treated as a good old C-style global variable!
Third, we're in trouble if we ever change our assumptions about identifiers
to reserve ranges of identifiers,
reuse identifiers of destroyed objects,
keep track of how many identifiers are presently in use,
ensure concurrent access to identifiers,
add check digits to identifiers to ensure validity,
or, indeed, make any change whatsoever to identifiers.
To make any such change may mean changing every single occurrence of
the variables identifier
and currentIdentifier
everywhere in the code.
A telltale sign that this is a bad solution is that the new
Identifer
class is nothing more than a data storage device---it
has no behavior. A good way to tell whether something is a true object
is to see whether it has state (variables)
and behavior (methods to operate on those variables).
If it's missing one or the other, it isn't an object
and should be folded into whichever class that uses it.
If there are many such classes then the design is broken
and needs to be rethought so that the bad class is given state and behavior.
Solution III
A more sophisticated solution encapsulates identifiers still further by removing all knowledge of how identifiers are generated from the client classes:
class SomeClass
{
//declare my identifier
private int identifier;
public SomeClass()
{
/*
Set my identifer.
*/
identifier = Identifier.getNewIdentifier();
}
public int getIdentifier()
{
/*
Tell the world what my identifier is.
*/
return identifier;
}
}
class Identifier
{
/*
Hold the current identifier value in a class variable.
*/
private static int currentIdentifier = 0;
public static int getNewIdentifier()
{
/*
Generate and return a new identifier.
*/
return currentIdentifier++;
}
}
This is the kind of code commonly displayed in programming languages books
as being "well-encapsulated".
It's of course much better than the previous stabs
but it still leaves unnecessary coupling
between class Identifier
and its client classes.
For example, all the client classes still have to know that identifiers
are integers---which means that if we ever decide to change that type,
then all the clients will have to change their declarations of identifiers.
The Identifier
encapsulation is still incomplete.
Solution IV
An even better solution completely encapsulates all
identifier assumptions into class Identifier
:
class SomeClass
{
//declare my identifier
private Identifier identifier;
public SomeClass()
{
/*
Set my identifier.
*/
identifier = Identifier.getNewIdentifier();
}
public Identifier getIdentifier()
{
/*
Tell the world what my identifier is.
*/
return Identifier;
}
}
class Identifier
{
/*
Encapsulate the idea of an identifier
by creating, returning, managing, and testing identifiers
for arbitrary client classes.
*/
private static int currentIdentifier = 0;
private int identifier;
private Identifier()
{
/*
Set the secret identifier
for this particular Identifier object.
Disallow outside instantiation
to keep full control of Identifier creation.
*/
identifier = currentIdentifier;
}
public static synchronized Identifier getNewIdentifier()
{
/*
Thread-safely generate and return a new Identifier.
*/
currentIdentifier++;
return new Identifier();
}
public String toString()
{
/*
Return a String representation of an Identifier.
*/
return "" + identifier;
}
public static boolean equals(Identifier identifier1,
Identifier identifier2)
{
/*
Test whether two Identifiers are equal.
*/
return (identifier1 == identifier2);
}
//other identifier code...
}
Each client class now gets an Identifier
object
and that's all it knows, or has to know.
Hidden inside that object is the actual identifier,
but that's not for the client classes to know about, or to care about.
Class Identifier
takes care of all the identifier management.
It is now fully encapsulated.
The client classes' complete ignorance of what's inside
an Identifier
object is, paradoxically, good for everyone.
It's easy to see, for example, how to change class Identifier
to use Strings as identifiers, say, rather than ints,
and not have the change affect any of the client classes at all:
class Identifier
{
/*
Encapsulate the idea of an identifier
by creating, returning, managing, and testing identifiers
for arbitrary client classes.
*/
private static String currentIdentifier = "";
private String identifier;
private Identifier()
{
/*
Set the secret identifier
for this particular Identifier object.
Disallow outside instantiation
to keep full control of Identifier creation.
*/
identifier = currentIdentifier;
}
public static synchronized Identifier getNewIdentifier()
{
/*
Thread-safely generate and return a new Identifier.
*/
currentIdentifier += "1";
return new Identifier();
}
public String toString()
{
/*
Return a String representation of an Identifier.
*/
return identifier;
}
public static boolean equals(Identifier identifier1,
Identifier identifier2)
{
/*
Test whether two Identifiers are equal.
*/
return (identifier1.equals(identifier2));
}
//other identifier code...
}
Conclusion
Now we can see what's at the heart of good object-oriented style:
class Identifier
establishes a
contract with the classes that use it (its clients).
Class Identifier
promises to do certain things
(its public methods collectively form its API)
and that's all its client classes have to know.
In fact, the less they know about the actual implementation
of class Identifier
the less they change if class Identifier
changes.
To establish contracts between all the involved classes
we must face the issue of exactly what an identifier is
when we're creating class Identifier
in the first place.
That constraint helps us produce the most insight
and so the best code for class Identifier
and its clients.
Because class Identifier
has been encapsulated,
then once its contract has been decided,
a programmer on the other side of the world can implement it
in complete isolation from whoever is implementing its client classes.
The core identifier properties we've discovered are:
an object's identifier is unique,
an object's identifier is fixed,
two object identifiers can be tested for equality,
there are an unlimited number of identifiers, and
an object can publically display its identifier.
Of course, a good programmer's job is never done.
Suppose we later decide that the contract for class
Identifier
should stipulate that
an object can address another object given only its Identifier
.
To make that change could require us to change all
the calls to getNewIdentifier()
to add a reference to the calling object,
which might then be saved in a private table internal to
class Identifier
.
Because it breaks the contract,
that new condition requires us to change potentially a lot of calls.
Even then, though, all that would really have to change
is the client's calls to getNewIdentifier
(or we might use reflection).
Of course, the best solution is to think of this condition
when we're designing class Identifier
's contract in the first
place!
Another problem that might arise in actual implementations
is that Identifier
creation might be a bottleneck
when we have to create thousands of objects
since, at bottom, it presently depends on exactly one class variable.
It might be better to distribute the work by creating multiple
Identifier
creator objects,
each with their own variables and some way to
probabilistically guarantee identifier uniqueness
(generating large random numbers should do the trick).
One final picky point is that there is nothing in the current contract
that prevents a client object from requesting lots of identifiers
and producing different ones when asked for its identifier.
(Yes, I know that would probably take actual malice
on the part of the client class programmer, but it might happen accidentally
somehow.)
Ideally, each client object should only ever get one
Identifier
and that protocol should be enforced exclusively
by the Identifier
class.
But this is stepping into pretty rarefied areas that most programmers
don't consider at all, so it's probably time to stop and sum up.
Object-oriented programming protects programmers by letting them establish firm contracts between different objects. Establishing firm contracts is what leads to clean, robust, reusable, and repurposable code. Learning the mere statements of Java is useless if this lesson is lost.