string Operations

Apache C++ Standard Library User's Guide

12.2 string Operations

In the following sections, we'll examine the C++ Standard Library operations used to create and manipulate strings.

12.2.1 Declaration and Initialization of string

The simplest form of declaration for a string simply names a new variable, or names a variable along with the initial value for the string. This form was used extensively in the example graph program given in Section 9.3.2. A copy constructor also permits a string to be declared that takes its value from a previously defined string:

std::string s1;
std::string s2("a string");
std::string s3 = "initial value";
std::string s4(s3);

In these simple cases the capacity is initially exactly the same as the number of characters being stored. An alternative constructor lets you set the capacity and initialize the string with repeated copies of a single character value:

std::string s7(10, '\n');          // holds ten newline characters

Finally, like all the container classes in the C++ Standard Library, a string can be initialized using a pair of iterators. The sequence denoted by the iterators must have the appropriate type of elements.

string s8 (aList.begin(), aList.end());

12.2.2 Resetting Size and Capacity

As with the vector datatype, the member function size() yields the current size of a string, and the current capacity is returned by capacity(). The latter can be changed by a call on the reserve() member function, which adjusts the capacity if necessary so that the string can hold at least as many elements as specified by the argument. The member function max_size() returns the maximum string size that can be allocated:

std::cout << s6.size() << std::endl;
std::cout << s6.capacity() << std::endl;
s6.reserve(200);                    // change capacity to 200
std::cout << s6.capacity() << std::endl;
std::cout << s6.max_size() << std::endl;

The member function length() is simply a synonym for size(). The member function resize() changes the size of a string, either truncating characters from the end or inserting new characters. The optional second argument for resize() can be used to specify the character inserted into the newly created character positions.

s7.resize(15, '\t');                   // add tab characters at end
std::cout << s7.length() << std::endl; // size should now be 15

The member function empty() returns true if the string contains no characters, and is generally faster than testing the length against a zero constant.

if (s1.empty()) 
   std::cout << "string is empty" << std::endl;

12.2.3 Assignment, Append, and Swap

A string variable can be assigned the value of either another string, a literal C-style character array, or an individual character:

s1 = s2;
s2 = "a new value";
s3 = 'x';

The operator += can also be used with any of these three forms of argument, and specifies that the value on the right-hand side should be appended to the end of the current string value.

s3 += "yz";                   // s3 is now xyz

The more general assign() and append() member functions let you specify a subset of the right-hand side to be assigned to or appended to the receiver. Two arguments, pos and n, indicate that the n values following position pos should be assigned or appended:

s4.assign(s2, 0, 3);         // assign first three characters
s4.append(s5, 2, 3);         // append characters 2, 3 and 4

The addition operator+() is used to form the catenation of two strings. The operator+() creates a copy of the left argument, then appends the right argument to this value:

std::cout << (s2 + s3) << std::endl;  // output catenation of
                                      // s2 and s3

As with all the containers in the C++ Standard Library, the contents of two strings can be exchanged using the swap() member function:

s5.swap(s4);                 // exchange s4 and s5

12.2.4 Character Access

An individual character from a string can be accessed or assigned using operator[]. The member function at() is almost a synonym for this operation, except that an out_of_range exception is thrown if the requested location is greater than or equal to size().

std::cout << s4[2] << std::endl;        // output position 2 of s4
s4[2] = 'x';                            // change position 2
std::cout << s4.at(2) << std::endl;     // output updated value

The member function c_str() returns a pointer to a null-terminated character array, whose elements are the same as those contained in the string. This lets you use strings with functions that require a pointer to a conventional C-style character array. The resulting pointer is declared as constant, which means that you cannot use c_str() to modify the string. In addition, the value returned by c_str() might not be valid after any operation that may cause reallocation, such as append() or insert(). The member function data() returns a pointer to the underlying character buffer. Note that the buffer returned by data() is not null-terminated, and must not be used with functions that expect null-terminated sequences.

char d[256];
std::strcpy(d, s4.c_str());                // copy s4 into array d

12.2.5 Iterators

The member functions begin() and end() return beginning and ending random-access iterators for the string. The values denoted by the iterators are individual string elements. The functions rbegin() and rend() return backwards iterators.

12.2.6 Insertion, Removal, and Replacement

The string member functions insert() and erase() are similar to the vector functions insert() and erase(). Like the vector versions, they can take iterators as arguments, and specify the insertion or removal of the ranges specified by the arguments. The function replace() is a combination of erase and insert, in effect replacing the specified range with new values.

s2.insert(s2.begin()+2, aList.begin(), aList.end());
s2.erase(s2.begin()+3, s2.begin()+5);
s2.replace(s2.begin()+3, s2.begin()+6, s3.begin(), s3.end());

NOTE -- Note that the contents of an iterator are not guaranteed to be valid after any operation that might force a reallocation of the internal string buffer, such as an append or an insertion.

The functions above also have non-iterator implementations. The insert() member function takes as argument a position and a string, and inserts the string into the given position. The erase function takes two integer arguments, a position and a length, and removes the characters specified. The replace function takes two similar integer arguments, as well as a string and an optional length, and replaces the indicated range with the string or with an initial portion of a string, if the length has been explicitly specified.

s3.insert(3, "abc");       // insert abc after position 3
s3.erase(4, 2);            // remove positions 4 and 5
s3.replace(4, 2, "pqr");   // replace positions 4 and 5 with pqr

12.2.7 Copy and Substring

The member function copy() generates a substring and copies this substring to the char* target given as the first argument. The range of values for the substring is specified by either a length, or by a length and starting position. If only a length is provided, copying begins at the start of the string:

std::string s1("0123456789");

char buf[11] = "----------";

s1.copy(buf,3,5);  // copy 3 characters from s1 into buf,
                   // starting after position 5

// buf now contains 567-------

s1.copy(buf,5);    // copy the first 5 characters of s1 into buf

// buf now contains 01234-----

The member function substr() returns a string that represents a portion of the current string. The range is specified by either an initial position, or a position and a length:

std::cout << s1.substr(3) << std::endl;      // output 3 to end

std::cout << s1.substr(3,2) << std::endl;    // output positions 3
                                             // and 4

12.2.8 string Comparisons

The member function compare() is used to perform a lexical comparison between the receiver and an argument string. Optional arguments permit the specification of a different starting position, or a starting position and length of the argument string. See Section 13.6.5 for a description of lexicographical ordering. The function returns a negative value if the receiver is lexically smaller than the argument, a zero value if they are equal, and a positive value if the receiver is larger than the argument.

The relational and equality operators, operator<(), operator<=(), operator==(), operator!=(), operator>=(), and operator>(), are all defined using the comparison member function. Comparisons can be made either between two strings, or between strings and ordinary C-style character literals.

Having explained the functionality of compare(), we should note that users seldom invoke this member function directly. Instead, comparisons of strings are usually performed using the conventional comparison operators, which in turn make use of the function compare().

12.2.9 Searching Operations

The member function find() determines the first occurrence of the argument string in the current string. An optional integer argument lets you specify the starting position for the search. (Remember that string index positions begin at zero.) If the function can locate such a match, it returns the starting index of the match in the current string. Otherwise, it returns a value out of the range of the set of legal subscripts for the string. The function rfind() is similar, but scans the string from the end, moving backwards.

s1 = "mississippi";
std::cout << s1.find("ss") << std::endl;    // returns 2
std::cout << s1.find("ss", 3) << std::endl; // returns 5
std::cout << s1.rfind("ss") << std::endl;   // returns 5
std::cout << s1.rfind("ss", 4) << std::endl;// returns 2

The functions find_first_of(), find_last_of(), find_first_not_of(), and find_last_not_of() treat the argument string as a set of characters. As with many of the other functions, one or two optional integer arguments can be used to specify a subset of the current string. These functions find the first or last character that is either present or absent from the argument set. The position of the given character, if located, is returned. If no such character exists, a value out of the range of any legal subscript is returned.

i = s2.find_first_of("aeiou");           // find first vowel
j = s2.find_first_not_of("aeiou", i);    // next non-vowel