Character Sequences
Published by Juan Soulie
Last update on Sep 29, 2009 at 10:53am UTC
Last update on Sep 29, 2009 at 10:53am UTC
For example, the following array:
|
is an array that can store up to 20 elements of type char. It can be represented as:
Therefore, in this array, in theory, we can store sequences of characters up to 20 characters long. But we can also store shorter sequences. For example, jenny could store at some point in a program either the sequence "Hello" or the sequence "Merry christmas", since both are shorter than 20 characters.
Therefore, since the array of characters can store shorter sequences than its total length, a special character is used to signal the end of the valid sequence: the null character, whose literal constant can be written as '\0' (backslash, zero).
Our array of 20 elements of type char, called jenny, can be represented storing the characters sequences "Hello" and "Merry Christmas" as:
Notice how after the valid content a null character ('\0') has been included in order to indicate the end of the sequence. The panels in gray color represent char elements with undetermined values.
Initialization of null-terminated character sequences
Because arrays of characters are ordinary arrays they follow all their same rules. For example, if we want to initialize an array of characters with some predetermined sequence of characters we can do it just like any other array:
|
In this case we would have declared an array of 6 elements of type char initialized with the characters that form the word "Hello" plus a null character '\0' at the end.
But arrays of char elements have an additional method to initialize their values: using string literals.
In the expressions we have used in some examples in previous chapters, constants that represent entire strings of characters have already showed up several times. These are specified enclosing the text to become a string literal between double quotes ("). For example:
|
is a constant string literal that we have probably used already.
Double quoted strings (") are literal constants whose type is in fact a null-terminated array of characters. So string literals enclosed between double quotes always have a null character ('\0') automatically appended at the end.
Therefore we can initialize the array of char elements called myword with a null-terminated sequence of characters by either one of these two methods:
1 2 |
|
In both cases the array of characters myword is declared with a size of 6 elements of type char: the 5 characters that compose the word "Hello" plus a final null character ('\0') which specifies the end of the sequence and that, in the second case, when using double quotes (") it is appended automatically.
Please notice that we are talking about initializing an array of characters in the moment it is being declared, and not about assigning values to them once they have already been declared. In fact because this type of null-terminated arrays of characters are regular arrays we have the same restrictions that we have with any other array, so we are not able to copy blocks of data with an assignment operation.
Assuming mystext is a char[] variable, expressions within a source code like:
1 2 |
|
would not be valid, like neither would be:
|
The reason for this may become more comprehensible once you know a bit more about pointers, since then it will be clarified that an array is in fact a constant pointer pointing to a block of memory.
Using null-terminated sequences of characters
Null-terminated sequences of characters are the natural way of treating strings in C++, so they can be used as such in many procedures. In fact, regular string literals have this type (char[]) and can also be used in most cases.
For example, cin and cout support null-terminated sequences as valid containers for sequences of characters, so they can be used directly to extract strings of characters from cin or to insert them into cout. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
| Please, enter your first name: John Hello, John! |
As you can see, we have declared three arrays of char elements. The first two were initialized with string literal constants, while the third one was left uninitialized. In any case, we have to speficify the size of the array: in the first two (question and greeting) the size was implicitly defined by the length of the literal constant they were initialized to. While for yourname we have explicitly specified that it has a size of 80 chars.
Finally, sequences of characters stored in char arrays can easily be converted into string objects just by using the assignment operator:
1 2 3 |
|
No comments:
Post a Comment