Lists and Sets

There are two types of collections: lists and sets:

  • A list is a collection in which the objects are arranged in a linear sequence, ie: there is a first item, a second item, and so on. Each item in the list, other than the last item, has an item that directly follows it.
  • The defining characteristic of a set is that no object can occur in it more than once. The elements of a set are not necessarily in any particular order.

Lists and sets can be represented as parameterized interfaces List<T> and Set<T>. These are sub-interfaces of Collection<T>, so any object that implements List<T> or Set<T> automatically implements Collection<T> as well. Collection<T> specifies general operations that can be applied to any collection while List<T> and Set<T> specify the additional operations that are applicable to lists and sets respectively.

Any actual object that is a collection, list or set must belong to a concrete class that implements the corresponding interface, eg: the class ArrayList<T> implements the interface List<T> and therefore also implements Collection<T>. This means that all the methods that are defined in the list and collection interfaces can be used with, for example, an ArrayList<String> object. We will look at various classes that implement the list and set interfaces shortly, but before we do that, we’ll look at some of the general operations that are available for all collections.

The interface Collection<T> specifies methods for performing some basic operations on any collection of objects. Operations that can be applied to all collections are very general. They are generic operations, meaning that they can be applied to various types of collections containing various types of objects. If coll is an object that implements the interface Collection<T> then the following operations are defined for coll:

coll.size() – returns an int giving the number of objects in the collection.

coll.isEmpty() – returns a boolean value which is true if the size of the collection is 0.

coll.clear() – removes all objects from the collection.

coll.add(tobject) – adds tobject to the collection. The parameter must be of type T, otherwise a syntax error occurs at compile time. This method returns a boolean value which indicates whether the operation actually modified the collection, eg: adding an object to a Set has no effect if that object was already in the set.

coll.contains(object) – returns a boolean value that is true if object is in the collection. Note that object does not need to be of type T, since it can be useful to check whether object is in the collection, irrespective of the type of object. For testing equality, null is considered to be equal to itself. The criteria for testing non-null objects for equality can differ between collections.

coll.remove(object) – removes object from the collection (if present) and returns a boolean value indicating whether or not the object was found. Object does not need to be of type T.

coll.containsAll(coll2) – returns a boolean value that is true if every object in coll2 is also in coll. The parameter can be any collection.

coll.addAll(coll2) – adds all the objects in coll2 to coll. coll2 can be any collection of type Collection<T>, but it can also be more general, eg: if T is a class and S is a sub-class of T, then coll2 can be of type Collection<S>. Any object of type S is automatically of type T and so can legally be added to coll.

coll.removeAll(coll2) – removes every object from coll that also occurs in the collection coll2.

coll.retainAll(coll2) – removes every object from coll that does not occur in coll2.

coll.toArray() – returns an array of type Object[] that contains all the items in the collection. Note that the return type is Object[], rather than  T[]. There is another version of this method that takes an array of type T[] as a parameter: the method coll.toArray(tarray) returns an array of type T[] containing all the items in the collection, eg: coll.toArray(new String[0]) can be used if coll is a collection of Strings and will return a new array of type String[].

These methods are part of the Collection<T> interface, so they must be defined for every object that implements that interface. This can potentially cause problems. For example, the size of some collections cannot be changed after they are created. Methods that add or remove objects cannot be used with these collections. While it is still legal to call the methods, an exception will be thrown when the call is evaluated at run time. The type of the exception is UnsupportedOperationException.

Since Collection<T> is an interface, rather than a concrete class, the actual implementation of the method is left to the classes that implement the interface. This means that the semantics of the methods are not guaranteed to be valid for all collection objects. However, they are valid for classes in the Java Collection Framework.

Note that even when an operation is defined for several types of collections, it might not be equally efficient in all cases. A method as simple as size() can vary greatly in efficiency. For some collections, computing the size() might involve counting the items in the collection. The number of steps in this process is equal to the number of items.

Other collections might use instance variables to keep track of the size, so evaluating size() only involves returning the value of a variable, so the calculation only takes one step, irrespective of the number of items. When working with collections it is useful to have an idea of how efficient operations are and to choose a collection where the operations that you need can be implemented most efficiently.

Next: ArrayList and Linked List