ptypes

cset


You are looking at an old revision of the page cset. This revision was created by Natalie Adams.

Table of Contents

cset type

Intro

The character set class (cset) implements Pascal-style set of integer values from 0 to 255, or set of characters. Unlike Pascal sets, the range of a cset cannot be changed and is always 0 through 255. Cset class implements various operators (membership, union, intersection, equality, less than or equal to, etc.) as they are described in the Pascal language. See Operators for details.

Cset is a packed array of 256 bits, it occupies 32 bytes of static or local memory. Each bit indicates whether the corresponding character is a member of a given set.

Another difference between cset and Pascal sets is that since C++ compiler does not have a built-in set constructor like the one in Pascal (e.g. ['A'..'Z', 'a'..'z']), cset provides a simple run-time interpreter instead (see Constructors).

The cset class is declared in <ptypes.h>.

The example below shows the general usage of character sets.

cset s = "A-Za-z!";       // same as ['A'..'Z', 'a'..'z', '!'] in Pascal

include(s, '_');          // include underscore character
include(s, '0', '9');     // include all chars from '0' to '9'

if ('a' & s)              // check membership
    cout << "Letter 'a' found in the set! :)\n";

const cset letters = "A-Za-z_";     // define a set of letters
string tok = pin.token(letters);    // read a token from the input stream

Constructors

cset::cset() -- default constructor, initializes the set object to an empty set.

cset::cset(const cset& s) -- copy constructor.

cset::cset(const char* setinit) constructs a character set from a string. The setinit parameter is a sequence of characters, range specifiers and escape sequences. Range specifier consists of: lower boundary, dash "-" and higher boundary, e.g. "A-Z". Escape sequence begins with tilde "~" and can be followed by a two-digit hexadecimal number. Escape sequences are also used to include special characters tilde "~" and dash "-" (see examples below).

Constructing character sets using this interpreter can be a time-consuming operation. A better practice is to declare all constant character sets as static variables, so that the interpretation of all set constructing strings will be done only once during program startup.

An initializer string can also be passed to a cset object through assign().

Note: this constructor does not generate errors if the syntax is violated.

Examples:

cset s1 = "135A-CZ";            // digits 1, 3, 5, letters A through C, and also Z
cset wspace1 = "~09~0d~0a ";    // tab, CR, LF and space
cset wspace2 = "~00-~20";       // all control and whitespace chars
cset s2 = ":@~~";               // colon, at and tilde (must be escaped with another tilde)

Created: 10 years 9 months ago
by Natalie Adams

Old Revisions

Page rendered in 0.04118s using 26 queries.