DekGenius.com
[ Team LiB ] Previous Section Next Section

Recipe 2.22 Improving StringBuilder Performance

Problem

In an attempt to improve string-handling performance, you have converted your code to use the StringBuilder class. However, this change has not improved performance as much as you had hoped.

Solution

The chief advantage of a StringBuilder object over a string object is that it preallocates a default initial amount of memory in an internal buffer in which a string value can expand and contract. When that memory is used, however, .NET must allocate new memory for this internal buffer. You can reduce the frequency with which this occurs by explicitly defining the size of the new memory using either of two techniques. The first approach is to set this value when the StringBuilder class constructor is called. For example, the code:

StringBuilder sb = new StringBuilder(200);

specifies that a StringBuilder object can hold 200 characters before new memory must be allocated.

The second approach is to change the value after the StringBuilder object has been created, using one of the following properties or methods of the StringBuilder object:

sb.Capacity = 200;
sb.EnsureCapacity(200);

Discussion

As noted in previous recipes in this chapter, the string class is immutable; once a string is assigned to a variable of type string, that variable cannot be changed in any way. So changing the contents of a string variable entails the creation of a new string containing the modified string. The reference variable of type string must then be changed to reference this newly created string object. The old string object will eventually be marked for collection by the garbage collector, and, subsequently, its memory will be freed. Because of this intensive behind-the-scene action, code that performs intensive string manipulations using the string class suffers greatly from having to create new string objects for each string modification, and greater pressure is on the garbage collector to remove unused objects from memory more frequently.

The StringBuilder class solves this problem by preallocating an internal buffer to hold a string. The contents of this string buffer are manipulated directly. Any operations performed on a StringBuilder object do not carry with it the performance penalty of creating a whole new string or StringBuilder object and, consequently, filling up the managed heap with many unused objects.

There is one caveat with using the StringBuilder class, which, if not heeded, can impede performance. The StringBuilder class uses a default initial capacity to contain the characters of a string, unless you change this default initial capacity through one of the StringBuilder constructors. Once this space is exceeded, by appending characters, for instance, a new string buffer is allocated double the size of the original buffer. For example, a StringBuilder object with an initial size of 20 characters would be increased to 40 characters, then to 80 characters, and so on. The string contained in the original internal string buffer is then copied to this newly allocated internal string buffer along with any appended or inserted characters.

The default capacity for a StringBuilder object is 16 characters; in many cases, this is much too small. To increase this size upon object creation, the StringBuilder class has an overloaded constructor that accepts an integer value to use as the starting size of the preallocated string. Determining an initial size value that is not too large (thereby allocating too much unused space) or too small (thereby incurring a performance penalty for creating and discarding a large number of StringBuilder objects) may seem like more of an art than a science. However, determining the optimal size may prove invaluable when your application is tested for performance.

In cases where good values for the initial size of a StringBuilder object cannot be obtained mathematically, try running the applications under a constant load while varying the initial StringBuilder size. When a good initial size is found, try varying the load while keeping this size value constant. You may discover that this value needs to be tweaked to get better performance. Keeping good records of each run, and committing them to a graph, will be invaluable in determining the appropriate number to choose. As an added note, using PerfMon (Administrative Tools Performance Monitor) to detect and graph the number of garbage collections that occur might also provide useful information in determining whether your StringBuilder initial size is causing too many reallocations of your StringBuilder objects.


The most efficient method of setting the capacity of the StringBuilder object is to set it in the call to its constructor. The overloaded constructors of a StringBuilder object that accept a capacity value are defined as follows:

public StringBuilder(int capacity)
public StringBuilder(string str, int capacity)
public StringBuilder(int capacity, int maxCapacity)
public StringBuilder(string str, int startPos, int length, int capacity)

In addition to the constructor parameters, one property of the StringBuilder object allows its capacity to be increased (or decreased.) The Capacity property gets or sets an integer value that determines the new capacity of this instance of a StringBuilder object. Note that the Capacity property cannot be less than the Length property.

A second way to change the capacity is through the EnsureCapacity method, which is defined as follows:

public int EnsureCapacity(string capacity)

This method returns the new capacity for this object. If the capacity of the existing object already exceeds that of the value in the capacity parameter, the initial capacity is retained, and this value is also returned by this method.

There is one problem with using these last two members. If any of these members increases the size of the StringBuilder object by even a single character, the internal buffer used to store the string has to be reallocated. However, minimizing the capacity of the object does not force a reallocation of a new, larger internal string buffer. These methods are useful if they are used in exceptional cases when the StringBuilder capacity may need an extra boost, so that fewer reallocations are performed in the long run.

The StringBuilder object also contains a Length property, which, if increased, appends spaces to the end of the existing StringBuilder object's string. If the Length is decreased, characters are truncated from the StringBuilder object's string. Increasing the Length property can increase the Capacity property, but only as a side effect. If the Length property is increased beyond the size of the Capacity property, the Capacity property value is set to the new value of the Length property. This property acts similarly to the Capacity property:

sb.Length = 200;

The string and StringBuilder objects are considered nonblittable, which means that they must be marshaled across any managed/unmanaged boundaries in your code. The reason is that strings have multiple ways of being represented in unmanaged code, and there is no one-to-one correlation between these representations in unmanaged and managed code. In contrast, types such as byte, sbyte, short, ushort, int, uint, long, ulong, IntPtr, and UIntPtr are blittable types and do not require conversion between managed and unmanaged code. One-dimensional arrays of these blittable types, as well as structures or classes containing only blittable types, are also considered blittable and do not need extra conversion when passed between managed and unmanaged code.

The string and StringBuilder objects take more time to marshal, due to conversion between managed and unmanaged types. Performance will be improved when calling unmanaged code through P/Invoke methods if only blittable types are used. Consider using a byte array instead of a string or StringBuilder object, if at all possible.

See Also

See the "StringBuilder Class" topic in the MSDN documentation.

    [ Team LiB ] Previous Section Next Section