DekGenius.com
[ Team LiB ] Previous Section Next Section

Recipe 8.2 Extracting Groups from a MatchCollection

Problem

You have a regular expression that contains one or more named groups, such as the following:

\\\\(?<TheServer>\w*)\\(?<TheService>\w*)\\

where the named group TheServer will match any server name within a UNC string, and TheService will match any service name within a UNC string.

You need to store the groups that are returned by this regular expression in a keyed collection (such as a Hashtable) in which the key is the group name.

Solution

The RegExUtilities class contains a method, ExtractGroupings, that obtains a set of Group objects keyed by their matching group name:

using System;
using System.Collections;
using System.Text.RegularExpressions;

public static ArrayList ExtractGroupings(string source, 
                                         string matchPattern,
                                         bool wantInitialMatch)
{
    ArrayList keyedMatches = new ArrayList( );
    int startingElement = 1;
    if (wantInitialMatch)
    {
        startingElement = 0;
    }

    Regex RE = new Regex(matchPattern, RegexOptions.Multiline);
    MatchCollection theMatches = RE.Matches(source);

    foreach(Match m in theMatches)
    {
        Hashtable groupings = new Hashtable( );

        for (int counter = startingElement; 
          counter < m.Groups.Count; counter++)
        {
            // If we had just returned the MatchCollection directly, the
            //  GroupNameFromNumber method would not be available to use
            groupings.Add(RE.GroupNameFromNumber(counter),
                           m.Groups[counter]);
        }

        keyedMatches.Add(groupings);
    }

    return (keyedMatches);
}

The ExtractGroupings method can be used in the following manner to extract named groups and organize them by name:

public static void TestExtractGroupings( )
{
    string source = @"Path = ""\\MyServer\MyService\MyPath;
                              \\MyServer2\MyService2\MyPath2\""";
    string matchPattern = @"\\\\(?<TheServer>\w*)\\(?<TheService>\w*)\\";

    foreach (Hashtable grouping in 
             ExtractGroupings(source, matchPattern, true))
    {
        foreach (DictionaryEntry DE in grouping)
            Console.WriteLine("Key / Value = " + DE.Key + " / " + 
                              DE.Value);
        Console.WriteLine("");
    }
}

This test method creates a source string and a regular expression pattern in the MatchPattern variable. The two groupings in this regular expression are highlighted here:

string matchPattern = @"\\\\(?<TheServer>\w*)\\(?<TheService>\w*)\\";

The names for these two groups are: TheServer and TheService. Text that matches either of these groupings can be accessed through these group names.

The source and matchPattern variables are passed in to the ExtractGroupings method, along with a Boolean value, which we will discuss shortly. This method returns an ArrayList containing Hashtable objects. These Hashtable objects contain the matches for each of the named groups in the regular expression, keyed by their group name.

This test method, TestExtractGroupings, returns the following:

Key / Value = 0 / \\MyServer\MyService\
Key / Value = TheService / MyService
Key / Value = TheServer / MyServer

Key / Value = 0 / \\MyServer2\MyService2\
Key / Value = TheService / MyService2
Key / Value = TheServer / MyServer2

If the last parameter to the ExtractGroupings method were to be changed to false, the following output would result:

Key / Value = TheService / MyService
Key / Value = TheServer / MyServer

Key / Value = TheService / MyService2
Key / Value = TheServer / MyServer2

The only difference between these two outputs are that the first grouping is not displayed when the last parameter to ExtractGroupings is changed to false. The first grouping is always the complete match of the regular expression.

Discussion

Groups within a regular expression can be defined in one of two ways. The first way is to add parentheses around the subpattern that you wish to define as a grouping. This type of grouping is sometimes labeled as unnamed. This grouping can later be easily extracted from the final text in each Match object returned by running the regular expression. The regular expression for this recipe could be modified, as follows, to use a simple unnamed group:

string matchPattern = @"\\\\(\w*)\\(\w*)\\";

After running the regular expression, you can access these groups using a numeric integer value starting with 1.

The second way to define a group within a regular expression is to use one or more named groups. A named group is defined by adding parentheses around the subpattern that you wish to define as a grouping and, additionally, adding a named value to each grouping, using the following syntax:

(?<Name>\w*)

The Name portion of this syntax is the name you specify for this group. After executing this regular expression, you can access this group by the name Name.

To access each group, you must first use a loop to iterate each Match object in the MatchCollection. For each Match object, you access the GroupCollection's indexer, using the following unnamed syntax:

string group1 = m.Groups[1].Value;
string group2 = m.Groups[2].Value;

or the following named syntax where m is the Match object:

string group1 = m.Groups["Group1_Name"].Value;
string group2 = m.Groups["Group2_Name"].Value;

If the Match method was used to return a single Match object instead of the MatchCollection, use the following syntax to access each group:

// Un-named syntax
string group1 = theMatch.Groups[1].Value;
string group2 = theMatch.Groups[2].Value;

// Named syntax
string group1 = theMatch.Groups["Group1_Name"].Value;
string group2 = theMatch.Groups["Group2_Name"].Value;

where theMatch is the Match object returned by the Match method.

See Also

See the ".NET Framework Regular Expressions" and "Hashtable Class" topics in the MSDN documentation.

    [ Team LiB ] Previous Section Next Section