[ Team LiB ] |
Recipe 10.1 Creating a Hash Code for a Data TypeProblemYou have created a class or structure that will be used as a key in a Hashtable. You need to overload the GetHashCode method in order to return a good distribution of hash values to be used in a Hashtable (the Discussion section defines a good distribution of hash values). You also need to choose the best hash code algorithm to use in the GetHashCode method of your object. SolutionThe following procedures implement hash code algorithms and can be used to override the GetHashCode method. Included in the discussion of each method are the pros and cons of using it, as well as why you would want to use one instead of another. In addition, it is desirable, for performance reasons, to use the return value of the GetHashCode method to determine whether the data contained within two objects is equal. Calling GetHashCode to return a hash value of two objects and comparing their hash values can be faster than calling Equals, which individually tests the equality of all pertinent data within two objects. In fact, some developers even opt to compare hash code values returned from GetHashCode, within their overloaded Equals method. The simple hashThis hash accepts a variable number of integer values and XORs each value to obtain a hash code. This simple algorithm has a good chance of producing an adequate distribution and good performance. Remember to profile and measure it to confirm that it works as well for your particular data set. It fails when you need to integrate more than just numeric values equal or smaller in size to an integer. Its code is: public int SimpleHash(params int[] values) { int hashCode = 0; if (values != null) { foreach (int val in values) { hashCode ^= val; } } return (hashCode); } The folding hashThis hash allows you to integrate the long data type into a hash algorithm. It takes the upper 32 bits of the long value and folds them over the lower 32 bits of this value. The actual process of folding the two values is implemented by XORing them and using the result. Once again, this is a good performing algorithm with good distribution properties, but, again, it fails when you need to go beyond the long data type. A sample implementation is: public int FoldingHash(params long[] values) { int hashCode = 0; if (values != null) { int tempLowerVal = 0; int tempUpperVal = 0; foreach (long val in values) { tempLowerVal = (int)(val & 0x000000007FFFFFFF); tempUpperVal = (int)((val >> 32) & 0xFFFFFFFF); hashCode^= tempLowerVal ^ tempUpperVal; } } return (hashCode); } The contained object cacheThis hash obtains the hash codes from a variable number of object types. The only types that should be passed in to this method are reference type fields contained within your object. This method XORs all the values returned by the GetHashCode method of each object. Its source code is: public int ContainedObjHash(params object[] values) { int hashCode = 0; if (values != null) { foreach (int val in values) { hashCode ^= val.GetHashCode( ); } } return (hashCode); } The CryptoHash methodPotentially the best method of obtaining a hash value for an object is to use the hashing classes built in to the FCL. The CryptoHash method returns a hash value for some input using the MACTripleDES class. This method returns a very good distribution for the hash value, although you may pay for it in performance. If you do not require a near perfect hash value and are looking for an excellent distribution, consider using this approach to calculate a hash value: public int CryptoHash(string strValue) { int hashCode = 0; if (strValue != null) { byte[] encodedUnHashedString = Encoding.Unicode.GetBytes(strValue); // Replace the following Key with your own // key value byte[] Key = new byte[16] {1,122,3,11,65,7,9,45,42,98, 77,34,99,45,167,211}; MACTripleDES hashingObj = new MACTripleDES(Key); byte[] code = hashingObj.ComputeHash(encodedUnHashedString); // use the BitConverter class to take the // first 4 bytes and use them as an int for // the hash code hashCode = BitConverter.ToInt32(code,0); } return (hashCode); } The CryptoHash method using a nonstringThis method shows how other, nonstring data types can be used with the built-in hashing classes to obtain a hash code. This method converts a numeric value to a string and then to a byte array. The array is then used to create the hash value using the SHA256Managed class. Finally, each value in the byte array is added together to obtain a hash code. The code is: public int CryptoHash(long intValue) { int hashCode = 0; byte[] encodedUnHashedString = Encoder.Unicode.GetBytes(intValue.ToString( )); SHA256Managed hashingObj = new SHA256Managed( ); byte[] code = hashingObj.ComputeHash(encodedUnHashedString); // use the BitConverter class to take the // first 4 bytes and use them as an int for // the hash code hashCode = BitConverter.ToInt32(code,0); return (hashCode); } The shift and add hashThis method uses each character in the input string, strValue, to determine a hash value. This algorithm produces a good distribution of hash codes even when this method is fed similar strings. However, this method will break down when long strings that end with the same characters are passed. While this may not happen many times with your applications, it is something to be aware of. If performance is critical, this is an excellent method to use. Its code is: public int ShiftAndAddHash (string strValue) { int hashCode = 0; long workHashCode = 0; if (strValue != null) { for (int counter=0; counter<strValue.Length; counter++) { workHashCode = (workHashCode << (counter % 4)) + (int)strValue[counter]; } workHashCode = workHashCode % (127); } hashCode = (int)workHashCode; return (hashCode); } The calculated hashThis method is a rather widely accepted method of creating a good hash value that accepts several different data types and uses a different algorithm to compute the hash value for each. It calculates the hash code as follows:
This algorithm will produce a good distributed hash code for your object and has the added benefit of the flexibility to employ any type of data type. This is a high performing algorithm for simple, moderately complex, and even many complex objects. However, for extremely complex objects—ones that contain many large arrays, large Hashtables, or other objects that use a slower hash code algorithm—this algorithm will start performing badly. In this extreme case, you may want to consider switching to another hash code algorithm to speed performance or simply paring down the amount of fields used in the calculation. Be careful if you choose this second method to increase performance; you could inadvertently cause the algorithm to produce similar values for differing objects. The code for the calculated hash method is: public int CalcHash(short someShort, int someInt, long someLong, float someFloat, object someObject) { int hashCode = 7; hashCode = hashCode * 31 + (int)someShort; hashCode = hashCode * 31 + someInt; hashCode = hashCode * 31 + (int)(someLong ^ (someLong >> 32)); long someFloatToLong = (long)someFloat; hashCode = hashCode * 31 + (int)(someFloatToLong ^ (someFloatToLong >> 32)); if (someObject != null) { hashCode = hashCode * 31 + someObject.GetHashCode( ); } return (hashCode); } The string concatenation hashThis technique converts its input into a string, and then uses that string's GetHashCode method to automatically generate a hash code for an object. It accepts an integer array, but you could substitute any type that can be converted into a string. You could also use several different types of arguments as input to this method. This method iterates through each integer in the array passed as an argument to the method. The ToString method is called on each value to return a string. The ToString method of an int data type returns the value contained in that int. Each string value is appended to the string variable HashString. Finally, the GetHashCode method is called on the HashString variable to return a suitable hash code. This method is simple and efficient, but it does not work well with objects that have not overridden the ToString method to return something other than their data type. It may be best to simply call the GetHashCode method on each of these objects individually. You should use your own judgment and the rules found in this recipe to make your decision: public int ConcatStringGetHashCode(int[] someIntArray) { int hashCode = 0 StringBuilder hashString = new StringBuilder( ); if (someIntArray != null) { foreach (int i in someIntArray) { hashString.Append(i.ToString( ) + "^"); } } hashCode = hashString.GetHashCode( ); return (hashCode); } The following using directives must be added to any file containing this code: using System; using System.Text; using System.Security.Cryptography; DiscussionThe GetHashCode method is called when you are using an instance of this class as the key in a Hashtable object. Whenever your object is added to a Hashtable as a key, the GetHashCode method is called on your object to obtain a hash code to be used by the Hashtable. A hash code is also obtained from your object when a search is performed for your object in the Hashtable. The following class implements the SimpleHash algorithm for the overloaded GetHashCode method: public class SimpleClass { private int x = 0; private int y = 0; public override int GetHashCode( ) { return(SimpleHash(x, y)); } public int SimpleHash(params int[] values) { int hashCode = 0; if (values != null) { foreach (int val in values) { hashCode ^= val; } } return (hashCode); } } This class could then be used as a key in a Hashtable through the following code: SimpleClass simpleClass = new SimpleClass( ); Hashtable hashTable = new Hashtable( ); hashTable.Add(simpleClass, 100); There are several rules for writing a good GetHashCode method and a good hash code algorithm:
The System.Int32, System.UInt32, and System.IntPtr data types in the FCL use an additional hash code algorithm not covered in the Solution section. Basically, these data types return the value that they contain as a hash code. Most likely, your objects will not be so simple as to contain a single numeric value, but if they are, this method works extremely well. You may also want to combine specific algorithms to suit your purposes. For instance, if your object contains one or more string types and one or more long data types, you could combine the ContainedObjHash method and the FoldingHash method to create a hash value for your object. The return values from each method could either be added or XORed together. Once an object is in use as a key in a Hashtable, it should never return a different value for the hash code. Originally, it was documented that hash codes must be immutable, as the authors of Hashtable thought that this should be dealt with by whomever writes GetHashCode. It doesn't take much thought to realize that for mutable types, if you require both that the hashcode never changes, and that Equals represents the equality of the mutable objects, and that if a.Equals(b), then a.GetHashCode( ) == b.GetHashCode( ), then the only possible value implementation of GetHashCode is one that returns the same integer constant for all values. The GetHashCode method is called when you are using this object as the key in a Hashtable object. Whenever your object is added to a Hashtable as a key, the GetHashCode method is called on your object to obtain a hash code. This hash code must not change while your object is a key in the Hashtable. If it does, the Hashtable will not be able to find your object. The GetHashCode method is called when you are using this object as the key in a Hashtable object. Whenever your object is added to a Hashtable as a key, the GetHashCode method is called on your object to obtain a hash code. This hash code must not change while your object is a key in the Hashtable. If it does, the Hashtable will not be able to find your object. From the perspective of memory consumption, this object will not be able to have its memory freed until the Hashtable is collected causing a noteworthy degree of memory retention. See AlsoSee the "GetHashCode Method" and "Hashtable Class" topics in the MSDN documentation. |
[ Team LiB ] |