A set is a collection of objects in which no object occurs more than once. Sets implement all the methods in the interface *Collection<T>*, but do so in a way that ensures that no element occurs twice in the set. For example, if *set* is an object of type *Set<T>*, then *set.add(obj) *will have no effect on the set if *obj* is already an element of the set. Java has two classes that implement the interface *Set<T>: java.util.TreeSet *and* java.util.HashSet*.

In addition to being a Set, a *TreeSet* has the property that the elements of the set are arranged into ascending sorted order. An Iterator (or a for-each loop) for a *TreeSet* will always visit the elements of the set in ascending order.

A *TreeSet* cannot hold arbitrary objects, since there must be a way to determine the sorted order of the objects it contains. Ordinarily, this means that the objects in a set of type *TreeSet<T>* should implement the interface *Comparable<T> *and that *obj1.compareTo(obj2)* should be defined in a reasonable way for any two objects *obj1* and *obj2* in the set. Alternatively, an object of type *Comparator<T>* can be provided as a parameter to the constructor when the *TreeSet* is created. In that case, the *compareTo() *method of the *Comparator* will be used to compare objects that are added to the set.

A *TreeSet* does not use the *equals()* method to test whether two objects are the same. Instead, it uses the *compareTo()* method. This can be a problem as *compareTo()* can consider two objects to be the same for the purpose of the comparison even though the objects are not equal. For a *TreeSet,* this means that only **one** of those objects can be in the set. For example, if the *TreeSet* contains mailing addresses and if the *compareTo()* method for addresses just compares their post codes, then the set can contain only one address in each post code. You have to be aware of the semantics of *TreeSets*, and you need to make sure that *compareTo()* is defined in a reasonable way for objects that you put into a *TreeSet*. This will also be true for *Strings, Integers* and many other built-in types, since the *compareTo()* method for these types considers two objects to be the same only if they are actually equal.

In the implementation of a *TreeSet*, the elements are stored in something similar to a **binary sort tree**. However, the data structure that is used is **balanced** in the sense that all the leaves of the tree are at about the same distance from the root of the tree. This ensures that all the basic operations — inserting, deleting, and searching – – are efficient.

The fact that a *TreeSet* sorts its elements and removes duplicates makes it very useful in some applications, eg: if you wanted to write a program that would read a file and output an alphabetical list of all the words that occurred in the file, with duplicates removed. The words could be stored in an ArrayList, so it would be up to you to make sure that the list was sorted and contained no duplicates. This can be can be programmed much more easily using a *TreeSet* instead of a list.

A *TreeSet *automatically eliminates duplicates, and an iterator for the set will automatically visit the items in the set in sorted order. An algorithm for the program, using a *TreeSet*, would be:

TreeSet words = new TreeSet(); while there is more data in the input file: Let word = the next word from the file Convert word to lower case words.add(word) // Adds the word only if not already present. for ( String w : words ) // for each String w in words Output w

As another example, suppose that *coll* is any Collection of Strings. (This would also work for any other type for which *compareTo()* is properly defined.) We can use a *TreeSet* to sort the items of *coll *and remove the duplicates simply by saying:

TreeSet set = new TreeSet(); set.addAll(coll);

The second statement adds all the elements of the collection to the set. Since it’s a *Set*, duplicates are ignored. Since it’s a *TreeSet*, the elements of the set are sorted. If you would like to have the data in some other type of data structure, it’s easy to copy the data from the set. For example, to place the answer in an *ArrayList*, you could say:

TreeSet set = new TreeSet(); set.addAll(coll); ArrayList list = new ArrayList(); list.addAll(set);

Now, in fact, every one of Java’s collection classes has a constructor that takes a Collection as an argument. All the items in that Collection are added to the new collection when it is created. So, if *coll* is of type Collection, then *new TreeSet(coll)* creates a *TreeSet* that contains the same elements as *coll*, but with duplicates removed and in sorted order. This means that we can abbreviate the four lines in the above example to the single command:

ArrayList list = new ArrayList( new TreeSet(coll) );

This makes a sorted list of the elements of coll with no duplicates.

A *HashSet* stores its elements in a **hash table**, a type of data structure that we will discuss shortly. The operations of finding, adding, and removing elements are implemented very efficiently in hash tables, even more so than for *TreeSets*. The elements of a *HashSet* are not stored in any particular order, and so do not need to implement the *Comparable* interface. (They do, however, need to define a proper “hash code,” as we’ll see shortly.)

The *equals()* method is used to determine whether two objects in a *HashSet* are to be considered the same. An *Iterator* for a *HashSet* will visit its elements in what seems to be a completely arbitrary order, and it’s possible for the order to change completely when a new element is added. Use a *HashSet* instead of a *TreeSet *when the elements it contains are not comparable, or when the order is not important, or when the small advantage in efficiency is important.

In mathematical set theory, the items in a set are called **members **or **elements** of that set. Important operations include adding an element to a set, removing an element from a set, and testing whether a given entity is an element of a set. Operations that can be performed on two sets include union, intersection, and set difference. All these operations are defined in Java for objects of type *Set*, but with different names. Suppose that A and B are *Sets*. Then:

*A.add(x)***adds**the element x to the set A.*A.remove(x)***removes**the element x from the set A.*A.contains(x)***tests**whether x is an element of the set A.*A.addAll(B)*computes the**union**of A and B.*A.retainAll(B)*computes the**intersection**of A and B.*A.removeAll(B)*computes the**set difference**, A – B.

There are of course, differences between mathematical sets and sets in Java. Most important, perhaps, sets in Java must be **finite**, while in mathematics, most of the fun in set theory comes from working with infinity. In mathematics, a set can contain arbitrary elements, while in Java, a set of type *Set<T>* can only contain elements of type *T*. The operation *A.addAll(B)* acts by modifying the value of A, while in mathematics the operation A union B computes a new set, without changing the value of A or B.

**Next: EnumSet**