set and multiset Operations

Apache C++ Standard Library User's Guide

8.2 set and multiset Operations

The member functions provided by the set and multiset datatypes are described in the following sections. Note that while member functions provide basic operations, the utility of these data structures is greatly extended through the use of the generic algorithms described in Part IV.

8.2.1 Declaration and Initialization of set

A set class template is specialized on the type of the elements it contains and the operator used to compare keys. The latter argument is optional and, if not provided, the less-than operator for the key type is assumed. The element type can be a primitive language type, such as int or double; a pointer type; or a user-defined type. The element type must be comparable with both the equality testing operator==() and the less-than comparison operator<().

A set can be declared with no initial elements, or initialized from another container by providing a pair of iterators.

Whether a set is declared with no initial elements or initialized from another container, an optional argument is an alternative comparison function; this value overrides the value provided by the template parameter. The comparison object must be of the type given in the template parameter. If the comparison object needs non-default initialization, the object must be initialized and provided to the constructor for the container.

Note that a specialization is created for each combination of template type parameters. More specializations require more space. If saving space is important and two closely-related comparisons are needed for the same type of container, design comparison objects which change behavior when they are initialized (for example, an object which can be initialized to sort in either ascending or descending order). One template specialization can avoided for each additional behavior of the comparison object. However, performance will likely suffer due to increased overhead for the comparison.

The copy constructor can be used to form a new set that is a clone, or copy, of an existing set.

std::set<int> set_one;

std::set<int, std::greater<int> > set_two;
std::set<int, std::greater<int> > set_three(std::greater<int>());

std::set<gadget, std::less<gadget> > gset;
std::set<gadget> gset(std::less<gadget>());

std::set<int> set_four (aList.begin(), aList.end());
std::set<int, std::greater<int> > set_five 
   (aList.begin(), aList.end(), std::greater<int>());

std::set<int> set_six (set_four);              // copy constructor

A set can be assigned to another set, and two sets can exchange their values using the swap() operation in a manner analogous to other C++ Standard Library containers.

set_one = set_five;
set_six.swap(set_two);

8.2.2 Type Definitions

The classes set and multiset include a number of type definitions, most commonly used in declaration statements. For example, an iterator for a set specialized on int can be declared in the following fashion:

std::set<int>::iterator location;

In addition to iterator, set and multiset define the following types:

Table 13: Type definitions for the set and multiset class templates

Type	Definition
`value_type`	The type of the elements maintained by the set. Must satisfy the Assignable requirement.
`key_type`	The type of the elements maintained by the set. Must satisfy the Assignable requirement.
`key_compare`	A binary predicate (see Section 3.3) used to compare two values of `value_type`.
`const_iterator`	An iterator that does not allow modification of the underlying sequence.
`reverse_iterator`	An iterator that moves in a backward direction.
`const_reverse_iterator`	A reverse iterator that does not allow modification of the underlying sequence.
`reference`	A reference to an underlying element.
`const_reference`	A reference to an underlying element that will not permit modification.
`pointer`	A pointer to an underlying element
`const_pointer`	A constant pointer to an underlying element.
`size_type`	An unsigned integer type, used to refer to the size of containers.
`key_compare`	A binary predicate (see Section 3.3) used to compare two values of `key_type`.
`value_compare`	A binary predicate (see Section 3.3) used to compare two values of `value_type`.
`difference_type`	A signed integer type, used to describe the distance between iterators.
`allocator_type`	An allocator used by the container or all storage management.

8.2.3 Insertion

Unlike a list or vector, there is only one way to add a new element to a set. A value must be inserted into a set or a multiset using the insert() member function. With a multiset, the function returns an iterator that denotes the value just inserted. Insert operations into a set return a pair of values, in which the first field contains an iterator, and the second field contains a boolean value that is true if the element was inserted, and false otherwise.

The pair data structure is a tuple of values. The first value is accessed through the field name first, while the second is, naturally, named second. A function named std::make_pair() simplifies the task of producing an instance of class pair.

namespace std {

  template <class T1, class T2>
  struct pair {
      T1 first;
      T2 second;
      pair(const T1& x, const T2& y) : first(x), second(y) { }
  };


  template <class T1, class T2>
  inline pair<T1, T2> make_pair(const T1& x, const T2& y)
     { return pair<T1, T2>(x, y); }

}

NOTE -- If you want to use the pair datatype without using sets, you should include the <utility> header file.

Recall that in a set, an element will not be inserted if it matches an element already contained in the collection.

set_one.insert(18);

if (set_one.insert(18).second)
   std::cout << "element was inserted" << std::endl;
else
   std::cout << "element was not inserted" << std::endl;

To determine whether two keys are equivalent, the comparison function for the set is used. Remember that the comparison function does not test equality, but instead returns a Boolean value indicating whether or not the keys are in order. The strategy used to determine equivalence is simple. Two keys are equivalent if the comparison function for the keys is false in both directions. For example, if (key1 < key2) is false, and (key2 < key1) is also false, then key1 and key2 are equivalent.

Insertions of several elements from another container can also be performed using two iterators which indicate a range:

set_one.insert(set_three.begin(), set_three.end());

No value is returned. If there are duplicate values in the range, the duplicates are ignored.

8.2.4 Removal of Elements from a set

Values are removed from a set using the member function erase(). The argument can be either a specific value, an iterator that points to a single value, or a pair of iterators that indicate a range of values. When the first form is used on a multiset, all arguments matching the argument value are removed, and the return value indicates the number of elements that have been erased:

// erase element equal to 4
set_three.erase(4);

// find and then erase element equal to 5
std::set<int>::iterator five = set_three.find(5);
set_three.erase(five);

// erase all values between seven and eleven
std::set<int>::iterator seven  = set_three.find(7);
std::set<int>::iterator eleven = set_three.find(11);
set_three.erase (seven, eleven);

If the underlying element type provides a destructor, then the destructor is invoked prior to removing the element from the collection.

8.2.5 Searching and Counting

The member function size() yields the number of elements held by a container. The member function empty() returns a Boolean true value if the container is empty, and is generally faster than testing the size against zero.

The member function find() takes an element value, and returns an iterator denoting the location of the value in the set if it is present, or a value matching the end-of-set value yielded by the function end() if it is not. If a multiset contains more than one matching element, the value returned can be any one of the possible matching elements.

std::set<int>::iterator five = set_three.find(5);
if (five != set_three.end())
    std::cout << "set contains a five" << std::endl;

The member functions lower_bound() and upper_bound() are most useful with multisets, since with sets they simply mimic the function find(). The member function lower_bound() yields the first entry that matches the argument key, while the member function upper_bound() returns the first value past the last entry matching the argument. Finally, the member function equal_range() returns a pair of iterators holding the lower and upper bounds.

The member function count() returns the number of elements that match the argument. For a set, this value is either zero or one, whereas for a multiset it can be any nonnegative value. Since a non-zero integer value is treated as true, the count() function can be used to test for inclusion of an element, if all that is desired is to determine whether or not the element is present in the set. The alternative, using find(), requires testing the result returned by find() against the end-of-collection iterator.

if (set_three.count(5))
  std::cout << "set contains a five" << std::endl;

8.2.6 Iterators

The member functions begin() and end() produce iterators for both sets and multisets. The iterators produced by these functions are constant to ensure that the ordering relation for the set is not inadvertently or intentionally destroyed by assigning a new value to a set element. The iterators generate elements in sequence, ordered by the comparison operator provided when the set was declared. The member functions rbegin() and rend() produce iterators that yield the elements in reverse order.

NOTE -- Unlike a vector or deque, the insertion or removal of values from a set does not invalidate iterators or references to other elements in the collection.

8.2.7 set Operations

The traditional set operations for determining if one set is a subset of another, creating a union of two sets, finding the intersection between two sets, and finding the difference between two sets are not provided as member functions. Instead, these operations are implemented as generic algorithms that work with any ordered structure. These functions are described in more detail in Section 14.6. The following sections describe how these functions can be used with the set and multiset container classes.

8.2.7.1 Subset test

The function std::includes() can be used to determine if one set is a subset of another. The function returns true if and only if each distinct element in the second sorted range has a corresponding distinct element in the first sorted range to which it compares equal. The four arguments are a pair of iterators representing the (presumably) larger set, and a pair of iterators representing the (potentially) smaller set:

if (std::includes(set_one.begin(), set_one.end(),
                  set_two.begin(), set_two.end()))
  std::cout << "set_two is a subset of set_one" << std::endl;

The less-than operator, operator<(), is used for the comparison of elements, regardless of the comparison function used in the declaration of the container. Where this is inappropriate, an alternative version of the std::includes() function is provided. This form takes a fifth argument, which is the comparison function used to order the elements in the two sets.

8.2.7.2 Set Union or Intersection

The function std::set_union() can be used to construct a union of two sets. The two sets are specified by iterator pairs, and the union is copied into an output iterator that is supplied as a fifth argument. To form the result as a set, an insert iterator must be used to form the output iterator (Section 2.4). If the desired outcome is a union of one set with another, then a temporary set can be constructed, and the results swapped with the argument set prior to deletion of the temporary set:

// union two sets, copying result into a vector
std::vector<int> v_one(set_one.size() + set_two.size());

std::set_union(set_one.begin(), set_one.end(), 
               set_two.begin(), set_two.end(), v_one.begin());

// form union in place
std::set<int> temp_set;
std::set_union(set_one.begin(), set_one.end(), 
               set_two.begin(), set_two.end(), 
               std::inserter(temp_set, temp_set.begin()));
set_one.swap(temp_set);          // temp_set will be deleted

The function std::set_intersection() is similar, and forms the intersection of the two sets or two multisets.

As with the std::includes() function, the less-than operator<() is used to compare elements in the two argument sets, regardless of the operator provided in the declaration of the sets. Should this be inappropriate, alternative versions of both the std::set_union() or std::set_intersection() functions permit the comparison operator used to form the set to be given as a sixth argument.

The operation of taking the union of two multisets should be distinguished from the operation of merging two sets. Imagine that one argument set contains three instances of the element 7, and the second set contains two instances of the same value. The union will contain only three such values, while the merge will contain five. To form the merge, the function std::merge() can be used (Section 14.5). The arguments to this function exactly match those of the std::set_union() function.

8.2.7.3 Set Difference

There are two forms of set difference. A simple set difference represents the elements in the first set that are not contained in the second. A symmetric set difference is the union of the elements in the first set that are not contained in the second, with the elements in the second that are not contained in the first. These two values are constructed by the functions std::set_difference() and std::set_symmetric_difference(), respectively. The use of these functions is similar to the use of the std::set_union() function described earlier.

8.2.8 Other Generic Algorithms

Because sets are ordered and have constant iterators, a number of the generic functions described in Chapter 13 and Chapter 14 are either not applicable to sets or not particularly useful. However, Table 14 gives a few of the functions that can be used in conjunction with the set datatype.

Table 14: Generic algorithms useful for the set datatype

Purpose	Name	Where to find
Copy one sequence into another	`copy()`	Section 13.2.2
Find an element that matches a condition	`find_if()`	Section 13.3.1
Find a sub-sequence within a set	`search()`	Section 13.3.3
Count number of elements that satisfy condition	`count_if()`	Section 13.6.1
Reduce set to a single value	`accumulate()`	Section 13.6.2
Execute function on each element	`for_each()`	Section 13.8