DekGenius.com
[ Team LiB ] Previous Section Next Section

Recipe 2.1 Determining the Kind of Character

Problem

You have a variable of type char and wish to determine the kind of character it contains—a letter, digit, number, punctuation character, control character, separator character, symbol, whitespace, or surrogate character. Similarly, you have a string variable and want to determine the kind of character in one or more positions within this string.

Solution

Use the built-in static methods on the System.Char structure shown here:

Char.IsControl
Char.IsDigit
Char.IsLetter
Char.IsNumber
Char.IsPunctuation
Char.IsSeparator
Char.IsSurrogate
Char.IsSymbol
Char.IsWhitespace

Discussion

The following examples demonstrate how to use the methods shown in the Solution section in a function to return the kind of a character. First, create an enumeration to define the various types of characters:

public enum CharKind
{
    Control,
    Digit,
    Letter,
    Number,
    Punctuation,        
    Separator,
    Surrogate,
    Symbol,
    Whitespace,
    Unknown
}

Next, create a method that contains the logic to determine the type of a character and to return a CharKind enumeration value indicating that type:

public static CharKind GetCharKind(char theChar)
{
    if (Char.IsControl(theChar))
    {
        return CharKind.Control;
    }
    else if (Char.IsDigit(theChar))
    {
        return CharKind.Digit;
    }
    else if (Char.IsLetter(theChar))
    {
        return CharKind.Letter;
    }
    else if (Char.IsNumber(theChar))
    {
        return CharKind.Number;
    }
    else if (Char.IsPunctuation(theChar))
    {
        return CharKind.Punctuation;
    }
    else if (Char.IsSeparator(theChar))
    {
        return CharKind.Separator;
    }
    else if (Char.IsSurrogate(theChar))
    {
        return CharKind.Surrogate;
    }
    else if (Char.IsSymbol(theChar))
    {
        return CharKind.Symbol;
    }
    else if (Char.IsWhiteSpace(theChar))
    {
        return CharKind.Whitespace;
    }
    else
    {
        return CharKind.Unknown;
    }
}

If, however, a character in a string needs to be evaluated, use the overloaded static methods on the Char structure. The following code modifies the GetCharKind method to accept a string variable and a character position in that string. The character position determines which character in the string is evaluated:

public static CharKind GetCharKindInString(string theString, int charPosition)
{
    if (Char.IsControl(theString, charPosition))
    {
        return CharKind.Control;
    }
    else if (Char.IsDigit(theString, charPosition))
    {
        return CharKind.Digit;
    }
    else if (Char.IsLetter(theString, charPosition))
    {
        return CharKind.Letter;
    }
    else if (Char.IsNumber(theString, charPosition))
    {
        return CharKind.Number;
    }
    else if (Char.IsPunctuation(theString, charPosition))
    {
        return CharKind.Punctuation;
    }
    else if (Char.IsSeparator(theString, charPosition))
    {
        return CharKind.Separator;
    }
    else if (Char.IsSurrogate(theString, charPosition))
    {
        return CharKind.Surrogate;
    }
    else if (Char.IsSymbol(theString, charPosition))
    {
        return CharKind.Symbol;
    }
    else if (Char.IsWhiteSpace(theString, charPosition))
    {
        return CharKind.Whitespace;
    }
    else
    {
        return CharKind.Unknown;
    }
}

The GetCharKind method accepts a character as a parameter and performs a series of tests on that character using the Char type's built-in static methods. An enumeration of all the different types of characters is defined and is returned by the GetCharKind method.

Table 2-1 describes each of the static Char methods.

Table 2-1. Char methods

Char method

Description

IsControl

A control code in the ranges \U007F, \U0000-\U001F, and \U0080-\U009F.

IsDigit

Any decimal digit in the range 0-9.

IsLetter

Any alphabetic letter.

IsNumber

Any decimal digit or hexadecimal digit.

IsPunctuation

Any punctuation character.

IsSeparator

A space separating words, a line separator, or a paragraph separator.

IsSurrogate

Any surrogate character in the range \UD800-\UDFFF.

IsSymbol

Any mathematical, currency, or other symbol character. Includes characters that modify surrounding characters.

IsWhitespace

Any space character and the following characters:

\U0009

\U000A

\U000B

\U000C

\U000D

\U0085

\U2028

\U2029

The following code example determines whether the fifth character (the charPosition parameter is zero-based) in the string is a digit:

if (GetCharKind("abcdefg", 4) == CharKind.Digit) {...}

See Also

See the "Char Structure" topic in the MSDN documentation.

    [ Team LiB ] Previous Section Next Section