cset type
Intro
The character set class (cset) implements Pascal-style set of integer values from 0 to 255, or set of characters. Unlike Pascal sets, the range of a cset cannot be changed and is always 0 through 255. Cset class implements various operators (membership, union, intersection, equality, less than or equal to, etc.) as they are described in the Pascal language. See Operators for details.
Cset is a packed array of 256 bits, it occupies 32 bytes of static or local memory. Each bit indicates whether the corresponding character is a member of a given set.
Another difference between cset and Pascal sets is that since C++ compiler does not have a built-in set constructor like the one in Pascal (e.g. ['A'..'Z', 'a'..'z']), cset provides a simple run-time interpreter instead (see Constructors).
The cset class is declared in <ptypes.h>.
The example below shows the general usage of character sets.
cset s = "A-Za-z!"; // same as ['A'..'Z', 'a'..'z', '!'] in Pascal include(s, '_'); // include underscore character include(s, '0', '9'); // include all chars from '0' to '9' if ('a' & s) // check membership cout << "Letter 'a' found in the set! :)\n"; const cset letters = "A-Za-z_"; // define a set of letters string tok = pin.token(letters); // read a token from the input stream
Constructors
cset::cset() -- default constructor, initializes the set object to an empty set.
cset::cset(const cset& s) -- copy constructor.
cset::cset(const char* setinit) constructs a character set from a string. The setinit parameter is a sequence of characters, range specifiers and escape sequences. Range specifier consists of: lower boundary, dash "-" and higher boundary, e.g. "A-Z". Escape sequence begins with tilde "~" and can be followed by a two-digit hexadecimal number. Escape sequences are also used to include special characters tilde "~" and dash "-" (see examples below).
Constructing character sets using this interpreter can be a time-consuming operation. A better practice is to declare all constant character sets as static variables, so that the interpretation of all set constructing strings will be done only once during program startup.
An initializer string can also be passed to a cset object through assign().
Note: this constructor does not generate errors if the syntax is violated.
Examples:
cset s1 = "135A-CZ"; // digits 1, 3, 5, letters A through C, and also Z cset wspace1 = "~09~0d~0a "; // tab, CR, LF and space cset wspace2 = "~00-~20"; // all control and whitespace chars cset s2 = ":@~~"; // colon, at and tilde (must be escaped with another tilde)
Operators
The following rules apply to +, -, and *:
- An ordinal O is in X + Y if and only if O is in X or Y (or both). Equivalent of bitwise OR.
- O is in X - Y if and only if O is in X but not in Y. Equivalent of bitwise AND NOT.
- O is in X * Y if and only if O is in both X and Y. Equivalent of bitwise AND.
The following rules apply to comparison operations <=, >=, ==, !=:
- X <= Y is true just in case every member of X is a member of Y; Z >= W is equivalent to W <= Z.
- U == V is true just in case U and V contain exactly the same members; otherwise, U != V is true.
For an ordinal O and a set S, O & S is true just in case O is a member of S. Unlike the Pascal language, where membership operator is in, PTypes uses ampersand "&" as a membership test operator.
Note: regardless of whether default char is signed or unsigned (usually set through compiler options) cset always treats char arguments as unsigned. This means, if the value of an argument is -1, e.g. in call to operator & or operator +, the value will be converted to 255, -2 will be treated as 254, etc.