DekGenius.com
[ Team LiB ] Previous Section Next Section

Recipe 2.1 Determining the Kind of Character

Problem

You have a variable of type char and wish to determine the kind of character it contains�a letter, digit, number, punctuation character, control character, separator character, symbol, whitespace, or surrogate character. Similarly, you have a string variable and want to determine the kind of character in one or more positions within this string.

Solution

Use the built-in static methods on the System.Char structure shown here:

Char.IsControl
Char.IsDigit
Char.IsLetter
Char.IsNumber
Char.IsPunctuation
Char.IsSeparator
Char.IsSurrogate
Char.IsSymbol
Char.IsWhitespace

Discussion

The following examples demonstrate how to use the methods shown in the Solution section in a function to return the kind of a character. First, create an enumeration to define the various types of characters:

public enum CharKind
{
    Control,
    Digit,
    Letter,
    Number,
    Punctuation,        
    Separator,
    Surrogate,
    Symbol,
    Whitespace,
    Unknown
}

Next, create a method that contains the logic to determine the type of a character and to return a CharKind enumeration value indicating that type:

public static CharKind GetCharKind(char theChar)
{
    if (Char.IsControl(theChar))
    {
        return CharKind.Control;
    }
    else if (Char.IsDigit(theChar))
    {
        return CharKind.Digit;
    }
    else if (Char.IsLetter(theChar))
    {
        return CharKind.Letter;
    }
    else if (Char.IsNumber(theChar))
    {
        return CharKind.Number;
    }
    else if (Char.IsPunctuation(theChar))
    {
        return CharKind.Punctuation;
    }
    else if (Char.IsSeparator(theChar))
    {
        return CharKind.Separator;
    }
    else if (Char.IsSurrogate(theChar))
    {
        return CharKind.Surrogate;
    }
    else if (Char.IsSymbol(theChar))
    {
        return CharKind.Symbol;
    }
    else if (Char.IsWhiteSpace(theChar))
    {
        return CharKind.Whitespace;
    }
    else
    {
        return CharKind.Unknown;
    }
}

If, however, a character in a string needs to be evaluated, use the overloaded static methods on the Char structure. The following code modifies the GetCharKind method to accept a string variable and a character position in that string. The character position determines which character in the string is evaluated:

public static CharKind GetCharKindInString(string theString, int charPosition)
{
    if (Char.IsControl(theString, charPosition))
    {
        return CharKind.Control;
    }
    else if (Char.IsDigit(theString, charPosition))
    {
        return CharKind.Digit;
    }
    else if (Char.IsLetter(theString, charPosition))
    {
        return CharKind.Letter;
    }
    else if (Char.IsNumber(theString, charPosition))
    {
        return CharKind.Number;
    }
    else if (Char.IsPunctuation(theString, charPosition))
    {
        return CharKind.Punctuation;
    }
    else if (Char.IsSeparator(theString, charPosition))
    {
        return CharKind.Separator;
    }
    else if (Char.IsSurrogate(theString, charPosition))
    {
        return CharKind.Surrogate;
    }
    else if (Char.IsSymbol(theString, charPosition))
    {
        return CharKind.Symbol;
    }
    else if (Char.IsWhiteSpace(theString, charPosition))
    {
        return CharKind.Whitespace;
    }
    else
    {
        return CharKind.Unknown;
    }
}

The GetCharKind method accepts a character as a parameter and performs a series of tests on that character using the Char type's built-in static methods. An enumeration of all the different types of characters is defined and is returned by the GetCharKind method.

Table 2-1 describes each of the static Char methods.

Table 2-1. Char methods

Char method

Description

IsControl

A control code in the ranges \U007F, \U0000-\U001F, and \U0080-\U009F.

IsDigit

Any decimal digit in the range 0-9.

IsLetter

Any alphabetic letter.

IsNumber

Any decimal digit or hexadecimal digit.

IsPunctuation

Any punctuation character.

IsSeparator

A space separating words, a line separator, or a paragraph separator.

IsSurrogate

Any surrogate character in the range \UD800-\UDFFF.

IsSymbol

Any mathematical, currency, or other symbol character. Includes characters that modify surrounding characters.

IsWhitespace

Any space character and the following characters:

\U0009

\U000A

\U000B

\U000C

\U000D

\U0085

\U2028

\U2029

The following code example determines whether the fifth character (the charPosition parameter is zero-based) in the string is a digit:

if (GetCharKind("abcdefg", 4) == CharKind.Digit) {...}

See Also

See the "Char Structure" topic in the MSDN documentation.

    [ Team LiB ] Previous Section Next Section