DekGenius.com
[ Team LiB ] Previous Section Next Section

Recipe 2.5 Finding the Location of All Occurrencesof a String Within Another String

Problem

You need to search a string for every occurrence of a specific string. In addition, the case-sensitivity, or insensitivity, of the search needs to be controlled.

Solution

Using IndexOf or IndexOfAny in a loop, we can determine how many occurrences of a character or string exist as well as their locations within the string. To find each occurrence of a case-sensitive string in another string, use the following code:

using System;
using System.Collections;

public static int[] FindAll(string matchStr, string searchedStr, int startPos)
{
    int foundPos = -1;   // -1 represents not found
    int count = 0;
    ArrayList foundItems = new ArrayList( );

    do
    {
       foundPos = searchedStr.IndexOf(matchStr, startPos);
       if (foundPos > -1)
       {
          startPos = foundPos + 1;
          count++;
          foundItems.Add(foundPos);

          Console.WriteLine("Found item at position: " + foundPos.ToString( ));
       }
    }while (foundPos > -1 && startPos < searchedStr.Length);

    return ((int[])foundItems.ToArray(typeof(int)));
}

If the FindAll method is called with the following parameters:

int[] allOccurrences = FindAll("Red", "BlueTealRedredGreenRedYellow", 0);

the string "Red" is found at locations 8 and 19 in the string searchedStr. This code uses the IndexOf method inside a loop to iterate through each found matchStr string in the searchStr string.

To find a case-sensitive character in a string, use the following code:

public static int[] FindAll(char MatchChar, string searchedStr, int startPos)
{
    int foundPos = -1;   // -1 represents not found
    int count = 0;
    ArrayList foundItems = new ArrayList( );

    do
    {
       foundPos = searchedStr.IndexOf(MatchChar, startPos);
       if (foundPos > -1)
       {
          startPos = foundPos + 1;
          count++;
          foundItems.Add(foundPos);

          Console.WriteLine("Found item at position: " + foundPos.ToString( ));
       }
    }while (foundPos > -1 && startPos < searchedStr.Length);

    return ((int[])foundItems.ToArray(typeof(int)));
}

If the FindAll method is called with the following parameters:

int[] allOccurrences = FindAll('r', "BlueTealRedredGreenRedYellow", 0);

the character 'r' is found at locations 11 and 15 in the string searchedStr. This code uses the IndexOf method inside a do loop to iterate through each found matchChar character in the searchStr string. Overloading the FindAll method to accept either a char or string type avoids the performance hit of boxing the char type to a string type.

To find each case-insensitive occurrence of a string in another string, use the following code:

public static int[] FindAny(string matchStr, string searchedStr, int startPos)
{
    int foundPos = -1;   // -1 represents not found
    int count = 0;
    ArrayList foundItems = new ArrayList( );

    // Factor out case-sensitivity
    searchedStr = searchedStr.ToUpper( );
    matchStr = matchStr.ToUpper( );

    do
    {
       foundPos = searchedStr.IndexOf(matchStr, startPos);
       if (foundPos > -1)
       {
          startPos = foundPos + 1;
          count++;
          foundItems.Add(foundPos);

          Console.WriteLine("Found item at position: " + foundPos.ToString( ));
       }
    }while (foundPos > -1 && startPos < searchedStr.Length);

    return ((int[])foundItems.ToArray(typeof(int)));
}

If the FindAny method is called with the following parameters:

int[] allOccurrences = FindAll("Red", "BlueTealRedredGreenRedYellow", 0);

the string "Red" is found at locations 8, 11, and 19 in the string searchedStr. This code uses the IndexOf method inside a loop to iterate through each found matchStr string in the searchStr string. The search is rendered case-insensitive by using the ToUpper method on both the searchedStr and the matchStr strings.

To find a character in a string, use the following code:

public static int[] FindAny(char[] MatchCharArray, string searchedStr, int startPos)
{
    int foundPos = -1;   // -1 represents not found
    int count = 0;
    ArrayList foundItems = new ArrayList( );

    do
    {
       foundPos = searchedStr.IndexOfAny(MatchCharArray, startPos);
       if (foundPos > -1)
       {
          startPos = foundPos + 1;
          count++;
          foundItems.Add(foundPos);

          Console.WriteLine("Found item at position: " + foundPos.ToString( ));
       }
    }while (foundPos > -1 && startPos < searchedStr.Length);

    return ((int[])foundItems.ToArray(typeof(int)));
}

If the FindAll method is called with the following parameters:

int[] allOccurrences = FindAll(new char[]  MatchCharArray = {'R', 'r'},
                               "BlueTealRedredGreenRedYellow", 0);

the characters 'r' or 'R' are found at locations 8, 11, 15, and 19 in the string searchedStr. This code uses the IndexOfAny method inside a loop to iterate through each found matchStr string in the searchStr string. The search is rendered case-insensitive by using an array of char containing all characters, both upper- and lowercase, to be searched for.

Discussion

In the example code, the foundPos variable contains the location of the found character/string within the searchedStr string. The startPos variable contains the next position in which to start the search. The IndexOf or IndexOfAny method is used to perform the actual searching. The count variable simply counts the number of times the character/string was found in the searchedStr string.

The example used a do loop so that the IndexOf or IndexOfAny operation would be executed at least one time before the check in the while clause is performed to determine whether there are any more character/string matches to be found in the searchedStr string. This loop terminates when foundPos returns -1 (meaning that no more character/strings can be found in the searchedStr string) or when an out-of-bounds condition exists. When foundPos equals -1, there are no more instances of the match value in the searchedStr string; therefore, we can exit the loop. If, however, the startPos overshoots the last character element of the searchedStr string, an out-of-bounds condition exists and an exception is thrown. To prevent this, always check to make sure that any positioning variables that are modified inside of the loop, such as the startPos variable, are within their intended bounds.

Once a match is found by the IndexOf or IndexOfAny method, the if statement body is executed to increment the count variable by one and to move the startPos up past the previously found match. The count variable is incremented by one to indicate that another match was found. The startPos is increased to the starting position of the last match found plus 1. Adding 1 is necessary so that we do not keep matching the same character/string that was previously matched, which would cause an infinite loop to occur in the code if at least one match was found in the searchedStr string. To see this behavior, remove the +1 from the code.

There is one potential problem with this code. Consider the case where:

searchedStr = "aa";
matchStr = "aaaa";

The code contained in this recipe would match "aa" three times.

(aa)aa
a(aa)a
aa(aa)

This situation may be fine for some applications, but not if you need it to return only the following matches:

(aa)aa
aa(aa)

To do this, change the following line in the while loop:

startPos = foundPos + 1;

to this:

startPos = foundPos + matchStr.Length;

This code moves the startPos pointer beyond the first matched string, disallowing any internal matches.

To convert this code to use a while loop rather than a do loop, the foundPos variable must be initialized to 0 and the while loop expression should be as follows:

while (foundPos >= 0 && startPos < searchStr.Length)
{
   foundPos = searchedStr.IndexOf(matchChar, startPos);
   If (foundPos > -1)
   {
      startPos = foundPos + 1;
      count++;
   }
}

See Also

See the "String.IndexOf Method" and "String.IndexOfAny Method" topics in the MSDN documentation .

    [ Team LiB ] Previous Section Next Section