|
|||||||||||||||||||||||||||
HOME | COURSES | TALKS | ARTICLES | GENERICS | LAMBDAS | IOSTREAMS | ABOUT | CONTACT | | | | |||||||||||||||||||||||||||
|
Secrets of equals()
|
||||||||||||||||||||||||||
Secrets of equals() - Part 1 Not all implementations of equals() are equal
JavaSolutions, April 2002
Object Comparison - Implementing equals() correctly is harder than it looks at first sight Java magazines are full of articles describing the hottest and latest Java novelty to the interested reader. There is, however, little discussion of seemingly trivial core features of the Java programming language itself. Take for instance the equals() method. It is a piece of basic infrastructure that is common to all classes. It is defined in class Object and as a consequence all objects are equality-comparable. Object.equals() provides a default implementation of equality-comparison and at times the default implementation is just fine and semantically correct for a newly defined class. In other situations the new class must override equals() to provide meaningful and semantically correct equality-comparison. One would expect that overriding equals() , since it is a fairly common task, should be a piece of cake. The reality is far from that. There is an amazing amount of disagreement in the Java community regarding correct implementation of equals() . Look into the next best Java source code or open the an arbitrary Java textbook and take a look at what you find. Chances are good that you will find several different approaches and a variety of recommendations.
In this article we want to take a closer look at the different techniques
that are around for implementing
equals()
.
The goal is to evaluate the approaches in order to gain a deeper
understanding of the related up- and downsides. For purpose of the discussion
we arbitrarily picked a number of sample implementations that are accessible
to our readers, because they have been published. They are taken from "Program
Development in Java" by Barbara Liskov with John Guttag (see /
LIS
/), "Effective Java" by Joshua Bloch (see /
BLO
/), "Practical Java" by Peter Haggar (see /
HAG
/),
and from the JDK 1.3 source code (see /
JDK
/; authors:
James Gosling, Arthur van Hoff, Alan Liu). Listing 1 thru 4 show the examples.
A first glance at the source code reveals that the respective authors use quite a variety of techniques.
Entity Types vs. Value Types When I define a new class, do I have to override Object.equals() ? To answer this question we need to understand what the default implementation provided by Object.equals() does. It exhibit the same behavior as the == operator, namely a check for identity of two objects. Two objects are identical if they occupy the same memory location and are effectively the "same" object. Identity is different from equality . Two objects are equal when they behave in the same way, which often means that they have the same state or content. Equality is what the equals() method is supposed to checks for; identity is what operator == (and Object.equals() ) check for. Whether or not a class must override the default behavior of equals() (namely the check for identity provided by Object.equals() ) depends on the semantics of the class. There is a fundamental distinction between value types on the one hand and non-value types, sometimes called entity types, on the other hand.
The equals() Contract Classes that override Object.equals() must provide an implementation that compares for equality of objects. That's the intended and expected semantic meaning of method equals() . In addition to that, there are a couple of other requirements that an implementation of equals() must meet. These requirements stem from the fact that the Java platform libraries use equals() in various places, most notably in the hash-based collections such as HashSet and HashMap . These collections have certain expectations regarding the behavior of equals() ; otherwise they do not work properly. This behavior expected of equals() is defined in the so-called equals() contract. It is described in the documentation that comes with the Java platform libraries. Here is the contract as stated in the Java 2 Platform, Standard Edition, v 1.3.1 API Specification (see /JDOC/), to be looked up under Object.equals() :
Let us now take a look at the sample implementations in Listing 1 -
4. It will turn out that not all of them are correct.
Listing 1 : An Incorrect Implementation of equals()
Closer examination discloses that the implementation in
Listing
1
is not transitive and for that reason incorrect. The example in Listing
1 shows a subclass
Point3
derived of a superclass
Point2
that provides three versions of
equals()
with three different
signatures. Let us see what happens when we use these classes. Consider
the following situation:
System.out.println(origin.equals(p2)); // calls Point2.equals(Point2) System.out.println(p1.equals(p2)); // calls Point3.equals(Point3) true true true false Let's start with the first comparison p1.equals(origin) . Method Point3.equals(Point2) is called. Here is it implementation: public boolean equals(Point2 p) { // overriding definition if (p instanceof Point3) return equals((Point3)p); return super.equals(); } ... Both comparisons are mixed-type comparisons, involving a superclass and a subclass object. The functionality of mixed-type comparison in this sample implementaion is comparison of the part that both objects have in common, namely the superclass part. We will refer to this kind of comparison as slice comparison in the remainder of the article. Eventually method Point3.equals(Point3) is called to compare p1 and p2 . Following the rule of transitivity their comparison by means of equals() must yield true. Quite obviously p1 and p2 are not equal and indeed p1.equals(p2) yields false. So, what's going on here? The transitivity is violated because Point3.equals(Point3) performs a semantically different comparison. It does not compare superclass slices, in which case p1 and p2 would compare equal, because they both contain (0,0) . Instead it compares the entire subclass objects including their subclass-specific parts, and p1 and p2 happen to have a different z coordinate. For this reason the result is false.
The point is that
equals()
methods that perform semantically
different comparisons in a class hierarchy and allow mixed-type comparison
can never by transitive. The problem is not an implementation problem;
it is a conceptual problem. Method
equals()
is required to be transitive,
because of the
equals()
contract. At the same time it is required
to perform a comparison for equality of objects; that's the natural and
intended semantics of
equals()
. Objects of different types in
a class hierarchy of value types can have a different structure: typically
subclass objects have more fields than superclass objects. In this case
the subclass must provide an overriding version of the
equals()
method that takes the subclass-specific fields into account. Inevitably
the subclass version of
equals()
performs a comparison that is
semantically different from its superclass version. If
equals()
permits mixed-type comparison in such a situation, then it can never be
transitive. Either the
equals()
method is non-transitive (and
incorrect) (like in the example in
Listing 1
),
or it does not allow slice comparison of superclass and subclass objects
(as will be demonstrated in
Listing 4
), or the
equals()
method is final in the superclass and not overridden in any of the subclasses
(similar to
Listing 3
).
Listing 2 : A Common, Yet Debatable Implementation The next example in Listing 2 is taken from the Java platform libraries. It is the example of a non-final class that represents a value type, namely class Date . There is no problem with class Date itself or its implementation of the equals() method. Note, however, that neither class Date not its equals() method are final. Someone can derive a subclass of Date and override equals() . This subclass in conjunction with superclass Date will ask for trouble. Here is a conceivable implementation of a subclass and its equals() method: public boolean equals(Object other) { if (other instanceof NamedDate && !name.equals(((NamedDate)other).name)) return false; return super.equals(other)); } public boolean equals(Object obj) { return obj instanceof Date && getTime() == ((Date) obj).getTime(); } NamedDate TheEnd = new NamedDate(99,11,31,"end of the world"); Date NewYearsEve = new Date(99,11,31);
EndOfMillenium.equals(NewYearsEve)
// slice comparison: true
Listing 3 : A Correct Implementation
Listing 3 shows the example of a final class
PhoneNumber
. Its
implementation of
equals()
is similar to the solution in
Listing
2
in that is uses the
instanceof
test. Since the class is
final no subclasses can ever exists and the transitivity problem discussed
above can never come up. This solution is flawless, but restricted to final
classes.
Listing 4 : Another Correct Implementation Listing 4 shows yet another approach. Here we have the example of a superclass Golfball and its subclass MyGolfball , which overrides equals() because it has a subclass-specific field. The superclass performs the check for type match not using the instanceof operator as in the previous examples, but using the getClass() method instead. Here are the implementations of the two equals() methods: public boolean equals(object obj) { if (this == obj) return true; if (obj!=null && getClass() == obj.getClass()) { Golfball gb = (Golfball)obj; // Classes are equal, downcast. if (brand.equals(gb.brand()) && // Compare atrributes. make.equals(gb.make()) && compression == gb.compression()) return true; } return false; } class MyGolfball extends Golfball { public boolean equals(Object obj) { if (super.equals(obj)) { MyGolfball bg = (MyGolfball)obj; // Classes equal, downcast. if (ballConstruction == gb.construction()) return true; } return false; } MyGolfball gb1 = new MyGolfball("xyz", "abc", 100, MyGolfball.TwoPiece); MyGolfball gb2 = new MyGolfball("xyz", "abc", 100, MyGolfball.ThreePiece);
gb1.equals(original) // mixed-type
comparison: yields false
Revisiting Listing 2 Can we use this approach (using getClass() instead of instanceof ) to solve the problem with the subclass of class Date ? Unfortunately not. The superclass Date permits mixed-type comparison. For sake of symmetry any subclass must also allow mixed-type comparison. Here is what an incorrect, asymmetric implementation of NamedDate.equals() using getClass() could look like: public boolean equals(Object obj) { return obj instanceof Date && getTime() == ((Date) obj).getTime(); } public class NamedDate extends Date { public boolean equals(Object other) { if (other!=null && getClass() == other.getClass()) { // getClass() instead of instanceof if (!super.equals(other)) return false; return name.equals(((NamedDate)other).name)); } } Date NewYearsEve = new Date(99,11,31); EndOfMillenium.equals(NewYearsEve) // slice comparison: false NewYearsEve.equals(EndOfMillenium) // slice comparison: true Conclusions Having dissected the four arbitrarily chosen examples of implementations of equals() , what do we conclude? First of all: there are two substantially different ways of performing the check for type match in an implementation of equals() . A class can allow mixed-type comparison between super- and subclass objects by means of the instanceof operator, or a class can treat objects of different type as non-equal by means of the getClass() test. The examples above illustrated nicely that implementations of equals() using getClass() are generally more robust than those implementations using instanceof . The instanceof test is correct only for final classes or if at least method equals() is final in a superclass. The latter essentially implies that no subclass must extend the superclass's state, but can only add functionality or fields that are irrelevant for the object's state and behavior, such as transient or static fields.
Implementations using the
getClass()
test on the other hand
always comply to the
equals()
contract; they are correct and robust.
They are, however, semantically very different from implementations that
use the
instanceof
test. Implementations using
getClass()
do not allow comparison of sub- with superclass objects, not even when
the subclass does not add any fields and would not even want to override
equals()
. Such a "trivial" class extension would for instance be the addition of
a debug-print method in a subclass defined for exactly this "trivial" purpose.
If the superclass prohibits mixed-type comparison via the
getClass()
check, then the trivial extension would not be comparable to its superclass.
Whether or not this is a problem fully depends on the semantics of the
class and the purpose of the extension.
Guidelines For Practitioners Ultimately, what do we do when we design a class and/or use an existing class as a superclass? Considering the different approaches and their respective up- and downsides it should by now be clear that none of the solutions is the silver bullet. There is not easy recipe for doing this. A designer of a new class must carefully evaluate the intended and required semantics of the class in order to come to a decision regarding the right implementation of equals() (and other pieces of basic infrastructure not discussed in this article). Here are some guidelines for class designers:
If the designer of such a non-final class decides in favor of implementing equals() using instanceof , then no subclass can ever add fields and override equals() without violating the transitivity requirement of the equals() contract. If the designer decides in favor of implementing equals() using getClass() , then no subclass object will ever be comparable to a superclass object and trivial extensions may not make a lot of sense.
The superclass designer should actually make sure that you can't step into this pitfall by qualifying the equals() method as a final method in the superclass.
Instead of worrying about how to correctly design a superclass or how to implement a subclass of a superclass that does this or that - why don't we simply avoid inheritance in the first place? One way of making life easy is to habitually declare every new class a final class, unless you are sure that it is supposed to be a superclass. In case of defining a superclass the effort of providing a robust and correct implementation can be substantially higher. More often than not you'll find that you do not really want to or need to go all the trouble of defining a superclass. Keep it simple; try out a final class before you consider implementing a non-final class. Another way of making life easy is to avoid subclassing. In many situations inheritance can be replaced by delegation. Instead of implementing a new class as a subclass of an existing class, the new class can be implemented as an unrelated class that holds a reference to an object of the existing class type and implements the same methods as the existing class by delegation to the existing class's method. As an example let us re-implement class NamedDate : private Date date; // Date as a field instead of a superclass ... public boolean equals(Object other) { if (other!=null && getClass()==other.getClass()) { if(date.equals(((NamedDate)other).date) && name.equals(((NamedDate)other).name)) return true; } return false; } ... public boolean before(Date when) { return date.before(when); } public boolean after(Date when) { return date.after(when); } ... Note, how similar it is to the getClass() approach to equals() : in this case NamedDate would not be considered as compatible with Date and nobody would expect that objects of type Date and type NamedDate would be comparable for equality. If such a comparison were needed it would be provided in terms of methods such as NamedDate.isSameDateAs(Date) or Date.isSameDateAs(NamedDate) , both of which would be final methods that perform a slice comparison. But they would have nothing to do with equals() .
Useful as delegation is as an alternative to subclassing, there are
situations in which delegation cannot replace inheritance. An example are
cases in which protected parts of the superclass must be accessed by the
subclass. An example are implementations of the Template Method pattern
as described in the GOF book (see /
GOF
/ for reference).
In these cases inheritance cannot be avoided and the user must live with
the resulting restrictions. Another example are frameworks in which subclassing
cannot be avoided because the superclass serves as a marker interface at
the same time. An example is the class hierarchy of standard exceptions,
most notably class
Throwable
, class
Error
, class
Exception
,
and class
RuntimeException
. If you need to define a user-defined
domain-specific exception type, then you must derive of one of these classes
or any subclass thereof. Delegation is not an option here; a class that
contained an
IndexOutOfBoundsException
, instead of being derived
of it, would not be an exception type. The morale of the story is: avoid
subclassing if you can and try to live with it if you can't.
Summary In this article we poked around in the implementations of equals() , studying various published implementations for illustration of the non-trivial nature of a method that compares objects for equality of content. One case turned out to be problematic; it is the case of hierarchies of value types in which equals() is overridden on various levels of the hierarchy with naturally different semantics. In this case any mixed-type comparison between sub- and superclass objects fails to be transitive. But transitivity is one of the fundamental pieces of the equals() contract. The implementation detail of relevance in this article is the check for type match that is performed in an implementation of equals() . There are two techniques: using the instanceof operator, which allows mixed-type comparison of subclass and superclass objects. The other choice is using getClass() to make sure that mixed-type comparisons always fail. Implementations of equals() using getClass() are more robust, because they are always correct. Implementations of equals() using instanceof should be final methods; in practice they rarely are declared final, which opens up a notorious pitfall. This was a snippet of what can and must be said about equals() and other basic infrastructure of types in Java. We haven't even touched on related issues such as:
We haven't discussed the signature of equals() in subclasses. Should there be just one overriding version with the signature of Object.equals() ? Or should there be overloaded versions, as suggested in Listing1? What are the tradeoffs? The ideas in this article were in part inspired by a private email discussion led with Joshua Bloch in summer 2001; we thank Josh for his patience and thoughtful response. Another source of invaluable insight was Mark Weiss, a fellow instructor at DevelopMentor; he sat through our Java course in February 2001 and spotted the non-transitivity of our first naive attempts to get equals() right. Follow-UP Article In a second article (published later in the same magazine, see / KRE /) we explain an algorithm for implementation of equals() that allows mixed-type comparison and is correct. Readers' Comments
A reader sent the following alternative implementation of
equals()
:
References
|
|||||||||||||||||||||||||||
© Copyright 1995-2007 by Angelika Langer. All Rights Reserved. URL: < http://www.AngelikaLanger.com/Articles/JavaSolutions/SecretsOfEquals/Equals.html> last update: 10 Aug 2007 |