[ Team LiB ] |
Recipe 6.3 Find All Records with Names That Sound Alike6.3.1 ProblemYou enter people's names into a table in which misspellings are a common occurrence. You would like a way to search for a person's record disregarding slight differences in spelling. You've tried using the Like operator with the first letter of the person's last name, but that produces too many names. Is there any way to search for records that sound alike? 6.3.2 SolutionAccess has no built-in sound-alike function, but you can create one that employs a standard algorithm called the Russell Soundex algorithm. Using this algorithm, it's fairly easy to search for a last name phonetically. Run the qrySoundex query found in 06-03.MDB. Enter a last name in the query parameter dialog, and qrySoundex will return all records from tblStaff that sound like the name you entered. For example, if you enter the name "Jahnsin" at the parameter prompt, qrySoundex will return the records shown in Figure 6-5. Figure 6-5. The records returned by searching for "Jahnsin"To perform Soundex searches in your own applications, follow these steps:
Figure 6-6. The qrySoundex query in design view
6.3.3 DiscussionYou can find the acbSoundex function in basSoundex in 06-03.MDB. This function takes a last name and returns a four-digit Soundex code for the name. If you look at the fourth column in Figure 6-5, you can see that the Soundex code for all rows is the same. In this case—for names sounding like "Jahnsin"—the code is "J525". Soundex codes always begin with the first letter of the name followed by three digits ranging between 0 and 6 that represent the remaining significant consonants in the name. The acbSoundex function is shown here: Public Function acbSoundex( _ ByVal varSurName As Variant) As Variant ' Purpose: ' Takes a surname string and returns a 4-digit ' code representing the Russell Soundex code. ' In: ' varSurName: A surname (last name) as a variant ' Out: ' Return value: A 4-digit Soundex code as a variant Const acbcSoundexLength = 4 On Error GoTo HandleErr Dim intLength As Integer Dim intCharCount As Integer Dim intSdxCount As Integer Dim intSeparator As Integer Dim intSdxCode As Integer Dim intPrvCode As Integer Dim strChar As String * 1 Dim strSdx As String * acbcSoundexLength Dim strName As String ' We add vbNullString to take care of a passed Null strName = varSurName & vbNullString intLength = Len(strName) strSdx = String(acbcSoundexLength, "0") If intLength > 0 Then intSeparator = 0 'Keeps track of vowel separators intPrvCode = 0 'The code of the previous char intCharCount = 0 'Counts number of input chars intSdxCount = 0 'Counts number of output chars 'Loop until the soundex code is of acbcSoundexLength 'or we have run out of characters in the surname Do Until (intSdxCount >= acbcSoundexLength Or intCharCount >= intLength) intCharCount = intCharCount + 1 strChar = Mid$(strName, intCharCount, 1) 'Calculate the code for the current character Select Case strChar Case "B", "F", "P", "V" intSdxCode = 1 Case "C", "G", "J", "K", "Q", "S", "X", "Z" intSdxCode = 2 Case "D", "T" intSdxCode = 3 Case "L" intSdxCode = 4 Case "M", "N" intSdxCode = 5 Case "R" intSdxCode = 6 Case "A", "E", "I", "O", "U", "Y" intSdxCode = -1 Case Else intSdxCode = -2 End Select 'Special case the first character If intCharCount = 1 Then Mid$(strSdx, 1, 1) = UCase(strChar) intSdxCount = intSdxCount + 1 intPrvCode = intSdxCode intSeparator = 0 'If a significant constant and not a repeat 'without a separator then code this character ElseIf intSdxCode > 0 And _ (intSdxCode <> intPrvCode Or intSeparator = 1) Then Mid$(strSdx, intSdxCount + 1, 1) = intSdxCode intSdxCount = intSdxCount + 1 intPrvCode = intSdxCode intSeparator = 0 'If a vowel, this character is not coded, 'but it will act as a separator ElseIf intSdxCode = -1 Then intSeparator = 1 End If Loop acbSoundex = strSdx Else acbSoundex = Null End If ExitHere: Err.Clear Exit Function HandleErr: Select Case Err.Number Case Else MsgBox Err.Number & ": " & Err.Description, _ vbOKOnly + vbCritical, "acbSoundex" End Select Resume ExitHere End Function The acbSoundex function is based on the Russell Soundex standard algorithm. Soundex is the most commonly used sound-alike algorithm in the U.S. It works by discarding the most unreliable parts of a name, while retaining much of the name's discriminating power. It works best when used with the English versions of names of people of European descent. Its discriminating power is reduced when it is used with very short or very long names or names with a high percentage of vowels. Other sound-alike algorithms may work better in these situations. The Soundex algorithm was created to work with people's last names. It appears to work reasonably well with people's first names also, but not for names of businesses. Soundex does not work well for business names primarily because these names tend to be longer than people's names, and Soundex encodes only the first four significant characters. We've found that extending the number of encoded characters to eight works better for business names, although this is a nonstandard implementation of the algorithm. You can easily extend the number of encoded characters by changing the acbcSoundexLength constant found at the beginning of acbSoundex. If you decide to do this, however, we suggest you rename the function to something like acbSoundex8 to distinguish it from the standard function. Soundex will not work satisfactorily with data other than names. |
[ Team LiB ] |