Recipe 8.10 Returning the Entire Line in Which a Match Is Found
Problem
You have a string or file that
contains multiple lines. When a specific character pattern is found
on a line, you want to return the entire line, not just the matched
text.
Solution
Use the
StreamReader.ReadLine method to obtain each line
in a file in which to run a regular expression
against:
public static ArrayList GetLines(string source, string pattern,
bool isFileName)
{
string text = source;
ArrayList matchedLines = new ArrayList( );
// If this is a file, get the entire file's text
if (isFileName)
{
FileStream FS = new FileStream(source, FileMode.Open,
FileAccess.Read, FileShare.Read);
StreamReader SR = new StreamReader(FS);
while (text != null)
{
text = SR.ReadLine( );
if (text != null)
{
// Run the regex on each line in the string
Regex RE = new Regex(pattern, RegexOptions.Multiline);
MatchCollection theMatches = RE.Matches(text);
if (theMatches.Count > 0)
{
// Get the line if a match was found
matchedLines.Add(text);
}
}
}
SR.Close( );
FS.Close( );
}
else
{
// Run the regex once on the entire string
Regex RE = new Regex(pattern, RegexOptions.Multiline);
MatchCollection theMatches = RE.Matches(text);
// Get the line for each match
foreach (Match m in theMatches)
{
int lineStartPos = GetBeginningOfLine(text, m.Index);
int lineEndPos = GetEndOfLine(text, (m.Index + m.Length - 1));
string line = text.Substring(lineStartPos,
lineEndPos - lineStartPos);
matchedLines.Add(line);
}
}
return (matchedLines);
}
public static int GetBeginningOfLine(string text, int startPointOfMatch)
{
if (startPointOfMatch > 0)
{
--startPointOfMatch;
}
if (startPointOfMatch >= 0 && startPointOfMatch < text.Length)
{
// Move to the left until the first '\n char is found
for (int index = startPointOfMatch; index >= 0; index--)
{
if (text[index] == '\n')
{
return (index + 1);
}
}
return (0);
}
return (startPointOfMatch);
}
public static int GetEndOfLine(string text, int endPointOfMatch)
{
if (endPointOfMatch >= 0 && endPointOfMatch < text.Length)
{
// Move to the right until the first '\n char is found
for (int index = endPointOfMatch; index < text.Length; index++)
{
if (text[index] == '\n')
{
return (index);
}
}
return (text.Length);
}
return (endPointOfMatch);
}
The following method shows how to call the
GetLines method with either a filename or a
string:
public static void TestGetLine( )
{
// Get each line within the file TestFile.txt as a separate string
Console.WriteLine( );
ArrayList lines = GetLines(@"C:\TestFile.txt", "\n", true);
foreach (string s in lines)
Console.WriteLine("MatchedLine: " + s);
// Get the lines matching the text "Line" within the given string
Console.WriteLine( );
lines = GetLines("Line1\r\nLine2\r\nLine3\nLine4", "Line", false);
foreach (string s in lines)
Console.WriteLine("MatchedLine: " + s);
}
Discussion
The GetLines method accepts three parameters:
source
The string or filename in which to search for a pattern.
pattern
The regular expression pattern to apply to the
source string.
isFileName
Pass in true if the
source is a filename or
false if source is a
string.
This method returns an ArrayList of strings that
contains each line in which the regular expression match was found.
The GetLines method can obtain the lines on which
matches occur, within a string or a file. When running a regular
expression against a file whose name is passed in to the
source parameter (when
isFileName equals true)
in the GetLines method, the file is opened and
read line-by-line. The regular expression is run against each line
and if a match is found, that line is stored in the
matchedLines ArrayList. Using
the ReadLine method of the
StreamReader object saves us from having to
determine where each line starts and ends. Determining where a line
starts and ends in a string requires some work, as you shall see.
Running the regular expression against a string passed in to the
source parameter (when
isFileName equals
false) in the GetLines method
produces a MatchCollection. Each
Match object in this collection is used to obtain
the line on which it is located in the
source string. The line is obtained by
starting at the position of the first character of the match in the
source string and moving one character to
the left until either a '\n' character is found or
the beginning of the source string is
found (this code is found in the
GetBeginningOfLine method). This gives you the
beginning of the line, which is placed in the variable
LineStartPos. Next, the end of the line is found
by starting at the last character of the match in the
source string and moving to the right
until either a '\n' character is found or the end
of the source string is found (this code
is found in the GetEndOfLine method). This ending
position is placed in the LineEndPos variable. All
of the text between the LineStartPos and
LineEndPos will be the line in which the match is
found. Each of these lines is added to the
matchedLines ArrayList and
returned to the caller.
Something interesting you can do
with the GetLines method is to pass in the string
"\n" in the pattern parameter of this method. This
trick will effectively return each line of the string or file as a
string in the ArrayList.
Note that if more than one match is found on a line, each matching
line will be added to the ArrayList.
See Also
See the ".NET Framework Regular
Expressions," "FileStream
Class," and "StreamReader
Class" topics in the MSDN documentation.
|