CSharp in a Nutshell 2nd Edition-CSharp in a Nutshell 2nd Edition

6.6 Cookbook Regular Expressions

To wrap up this overview of how regular expressions are used in C# applications, the following is a set of useful expressions that have been used in other environments.^[1]

^[1] These expressions were taken from the Perl Cookbook by Tom Christiansen and Nathan Torkington (O'Reilly), and updated for the C# environment by Brad Merrill of Microsoft.

Matching roman numerals:

string p1 = "^m*(d?c{0,3}|c[dm])"
  + "(l?x{0,3}|x[lc])(v?i{0,3}|i[vx])$";
string t1 = "vii";
Match m1 = Regex.Match(t1, p1);

Swapping first two words:

string t2 = "the quick brown fox";
string p2 = @"(\S+)(\s+)(\S+)";
Regex x2 = new Regex(p2);
string r2 = x2.Replace(t2, "$3$2$1", 1);

Matching "keyword = value" patterns:

string t3 = "myval = 3";
string p3 = @"(\w+)\s*=\s*(.*)\s*$";
Match m3 = Regex.Match(t3, p3);

Matching lines of at least 80 characters:

string t4 = "********************"
  + "******************************"
  + "******************************";
string p4 = ".{80,}";
Match m4 = Regex.Match(t4, p4);

Extracting date/time values (MM/DD/YY HH:MM:SS):

string t5 = "01/01/01 16:10:01";
string p5 =
  @"(\d+)/(\d+)/(\d+) (\d+):(\d+):(\d+)";
Match m5 = Regex.Match(t5, p5);

Changing directories (for Windows):

string t6 =
  @"C:\Documents and Settings\user1\Desktop\";
string r6 = Regex.Replace(t6,
  @"\\user1\\",
  @"\user2\");

Expanding (%nn) hex escapes:

string t7 = "%41"; // capital A
string p7 = "%([0-9A-Fa-f][0-9A-Fa-f])";
// uses a MatchEvaluator delegate
string r7 = Regex.Replace(t7, p7,
  HexConvert);

Deleting C comments (imperfectly):

string t8 = @"
/*
 * this is an old cstyle comment block
 */
";
string p8 = @"
  /\*  # match the opening delimiter
  .*? # match a minimal numer of characters
  \*/ # match the closing delimiter
";
string r8 = Regex.Replace(t8, p8, "", RegexOptions.Singleline
             | RegexOptions.IgnorePatternWhitespace);

Removing leading and trailing whitespace:

string t9a = "   leading";
string p9a = @"^\s+";
string r9a = Regex.Replace(t9a, p9a, "");
  
string t9b = "trailing  ";
string p9b = @"\s+$";
string r9b = Regex.Replace(t9b, p9b, "");

Turning "\" followed by "n" into a real newline:

string t10 = @"\ntest\n";
string r10 = Regex.Replace(t10, @"\\n", "\n");

Detecting IP addresses:

string t11 = "55.54.53.52";
string p11 = "^" +
  @"([01]?\d\d|2[0-4]\d|25[0-5])\." +
  @"([01]?\d\d|2[0-4]\d|25[0-5])\." +
  @"([01]?\d\d|2[0-4]\d|25[0-5])\." +
  @"([01]?\d\d|2[0-4]\d|25[0-5])" +
  "$";
Match m11 = Regex.Match(t11, p11);

Removing leading path from filename:

string t12 = @"c:\file.txt";
string p12 = @"^.*\\";
string r12 = Regex.Replace(t12, p12, "");

Joining lines in multiline strings:

string t13 = @"this is 
a split line";
string p13 = @"\s*\r?\n\s*";
string r13 = Regex.Replace(t13, p13, " ");

Extracting all numbers from a string:

string t14 = @"
test 1
test 2.3
test 47
";
string p14 = @"(\d+\.?\d*|\.\d+)";
MatchCollection mc14 = Regex.Matches(t14, p14);

Finding all caps words:

string t15 = "This IS a Test OF ALL Caps";
string p15 = @"(\b[^\Wa-z0-9_]+\b)";
MatchCollection mc15 = Regex.Matches(t15, p15);

Finding all lowercase words:

string t16 = "This is A Test of lowercase";
string p16 = @"(\b[^\WA-Z0-9_]+\b)";
MatchCollection mc16 = Regex.Matches(t16, p16);

Finding all initial caps words:

string t17 = "This is A Test of Initial Caps";
string p17 = @"(\b[^\Wa-z0-9_][^\WA-Z0-9_]*\b)";
MatchCollection mc17 = Regex.Matches(t17, p17);

Finding links in simple HTML:

string t18 = @"
<html>
<a href=""http://windows.oreilly.com/news/first.htm"">first tag text</a>
<a href=""http://windows.oreilly.com/news/next.htm"">next tag text</a>
</html>
";
string p18 = @"<A[^>]*?HREF\s*=\s*[""']?"
  + @"([^'"" >]+?)[ '""]?>";
MatchCollection mc18 = Regex.Matches(t18, p18, RegexOptions.IgnoreCase
          | RegexOptions.Singleline);

Finding middle initials:

string t19 = "Hanley A. Strappman";
string p19 = @"^\S+\s+(\S)\S*\s+\S";
Match m19 = Regex.Match(t19, p19);

Changing inch marks to quotation marks:

string t20 = @"2' 2"" ";
string p20 = "\"([^\"]*)";
string r20 = Regex.Replace(t20, p20, "``$1''");

[ Team LiB ]