DekGenius.com
[ Team LiB ] Previous Section Next Section

Recipe 8.10 Returning the Entire Line in Which a Match Is Found

Problem

You have a string or file that contains multiple lines. When a specific character pattern is found on a line, you want to return the entire line, not just the matched text.

Solution

Use the StreamReader.ReadLine method to obtain each line in a file in which to run a regular expression against:

public static ArrayList GetLines(string source, string pattern, 
                                 bool isFileName)
{
    string text = source;
    ArrayList matchedLines = new ArrayList( );

    // If this is a file, get the entire file's text
    if (isFileName)
    {
        FileStream FS = new FileStream(source, FileMode.Open, 
                                       FileAccess.Read, FileShare.Read);
        StreamReader SR = new StreamReader(FS);

        while (text != null)
        {
            text = SR.ReadLine( );

            if (text != null)
            {
                // Run the regex on each line in the string
                Regex RE = new Regex(pattern, RegexOptions.Multiline);
                MatchCollection theMatches = RE.Matches(text);

                if (theMatches.Count > 0)
                {
                    // Get the line if a match was found
                    matchedLines.Add(text);
                }
            }
        }

        SR.Close( );
        FS.Close( );
    }
    else
    {
        // Run the regex once on the entire string
        Regex RE = new Regex(pattern, RegexOptions.Multiline);
        MatchCollection theMatches = RE.Matches(text);

        // Get the line for each match
        foreach (Match m in theMatches)
        {
            int lineStartPos = GetBeginningOfLine(text, m.Index);
            int lineEndPos = GetEndOfLine(text, (m.Index + m.Length - 1));
            string line = text.Substring(lineStartPos, 
                                         lineEndPos - lineStartPos);
            matchedLines.Add(line);
        }
    }

    return (matchedLines);
}

public static int GetBeginningOfLine(string text, int startPointOfMatch)
{
    if (startPointOfMatch > 0)
    {
        --startPointOfMatch;
    }

    if (startPointOfMatch >= 0 && startPointOfMatch < text.Length)
    {
        // Move to the left until the first '\n char is found
        for (int index = startPointOfMatch; index >= 0; index--)
        {
            if (text[index] == '\n')
            {
                return (index + 1);
            }
        }

        return (0);
    }

    return (startPointOfMatch);
}

public static int GetEndOfLine(string text, int endPointOfMatch)
{
    if (endPointOfMatch >= 0 && endPointOfMatch < text.Length)
    {
        // Move to the right until the first '\n char is found
        for (int index = endPointOfMatch; index < text.Length; index++)
        {
            if (text[index] == '\n')
            {
                return (index);
            }
        }

        return (text.Length);
    }

    return (endPointOfMatch);
}

The following method shows how to call the GetLines method with either a filename or a string:

public static void TestGetLine( )
{
    // Get each line within the file TestFile.txt as a separate string
    Console.WriteLine( );
    ArrayList lines = GetLines(@"C:\TestFile.txt", "\n", true);
    foreach (string s in lines)
        Console.WriteLine("MatchedLine: " + s);

    // Get the lines matching the text "Line" within the given string
    Console.WriteLine( );
    lines = GetLines("Line1\r\nLine2\r\nLine3\nLine4", "Line", false);
    foreach (string s in lines)
        Console.WriteLine("MatchedLine: " + s);
}

Discussion

The GetLines method accepts three parameters:

source

The string or filename in which to search for a pattern.

pattern

The regular expression pattern to apply to the source string.

isFileName

Pass in true if the source is a filename or false if source is a string.

This method returns an ArrayList of strings that contains each line in which the regular expression match was found.

The GetLines method can obtain the lines on which matches occur, within a string or a file. When running a regular expression against a file whose name is passed in to the source parameter (when isFileName equals true) in the GetLines method, the file is opened and read line-by-line. The regular expression is run against each line and if a match is found, that line is stored in the matchedLines ArrayList. Using the ReadLine method of the StreamReader object saves us from having to determine where each line starts and ends. Determining where a line starts and ends in a string requires some work, as you shall see.

Running the regular expression against a string passed in to the source parameter (when isFileName equals false) in the GetLines method produces a MatchCollection. Each Match object in this collection is used to obtain the line on which it is located in the source string. The line is obtained by starting at the position of the first character of the match in the source string and moving one character to the left until either a '\n' character is found or the beginning of the source string is found (this code is found in the GetBeginningOfLine method). This gives you the beginning of the line, which is placed in the variable LineStartPos. Next, the end of the line is found by starting at the last character of the match in the source string and moving to the right until either a '\n' character is found or the end of the source string is found (this code is found in the GetEndOfLine method). This ending position is placed in the LineEndPos variable. All of the text between the LineStartPos and LineEndPos will be the line in which the match is found. Each of these lines is added to the matchedLines ArrayList and returned to the caller.

Something interesting you can do with the GetLines method is to pass in the string "\n" in the pattern parameter of this method. This trick will effectively return each line of the string or file as a string in the ArrayList.

Note that if more than one match is found on a line, each matching line will be added to the ArrayList.

See Also

See the ".NET Framework Regular Expressions," "FileStream Class," and "StreamReader Class" topics in the MSDN documentation.

    [ Team LiB ] Previous Section Next Section