DekGenius.com
[ Team LiB ] Previous Section Next Section

Recipe 6.4 Find the Median Value for a Field

6.4.1 Problem

You need to calculate the median for a numeric field. Access provides the DAvg function to calculate the mean value for a numeric field, but you can't find the equivalent function for calculating medians.

6.4.2 Solution

Access doesn't provide a built-in DMedian function, but you can make one using VBA code. This solution demonstrates a median function that you can use in your own applications.

Load the frmMedian form from 06-04.MDB. Choose the name of a table and a field in that table using the combo boxes on the form. After you choose a field, the median value will be calculated and displayed in a text box using the acbDMedian function found in basMedian (see Figure 6-7). An error message will be displayed if you have chosen a field with a nonnumeric data type; the string "(Null)" will be displayed if the median value happens to be Null.

Figure 6-7. The frmMedian form
figs/acb2_0607.gif

Follow these steps to use acbDMedian in your own applications:

  1. Import the basMedian module from 06-04.MDB into your database.

  2. Call the acbDMedian function using syntax similar to that of the built-in DAvg function. The calling syntax is summarized in Table 6-3.

    Make sure each parameter is delimited with quotes. The third parameter is optional. For example, you might enter the following statement at the debug window:

    ? acbDMedian("UnitPrice", "tblProducts", "SupplierID = 1")

    The function would return a median value of 18 (assuming you are using the data in the 06-04.MDB sample database).

Table 6-3. The acbDMedian parameters

Parameter

Description

Example

Field

Name of field for which to calculate median

"UnitPrice"

Domain

Name of a table or query

"Products"

Criteria

Optional WHERE clause to limit the rows considered

"CategoryID = 1"

  1. The return value from the function is the median value.

This example uses the DAO type library, and you'll need to include the reference to the most current version of DAO in your own applications in order to take advantage of this code. Use the Tools References menu to add the necessary reference to use this code in your own database.


6.4.3 Discussion

The acbDMedian function in basMedian in 06-04.MDB is patterned to look and act like the built-in DAvg domain function. The algorithm used to calculate the median, however, is more complicated than what you would use to calculate the mean. The median of a field is calculated using the following algorithm:

  • Sort the dataset on the field.

  • Find the middle row of the dataset and return the value of the field. If there is an odd number of rows, this will be the value in a single row. If there is an even number of rows, there is no middle row, so the function finds the mean of the values in the two rows straddling the middle. You could modify the function to pick an existing value instead.

After declaring a few variables, the acbDMedian function creates a recordset based on the three parameters passed to the function (strField, strDomain, and varCriteria), as shown in the following source code:

Public Function acbDMedian( _
 ByVal strField As String, ByVal strDomain As String, _
 Optional ByVal strCriteria As String) As Variant

    ' Purpose:
    '     To calculate the median value
    '     for a field in a table or query.
    ' In:
    '     strField: The field
    '     strDomain: The table or query
    '     strCriteria: An optional WHERE clause to
    '                  apply to the table or query
    ' Out:
    '     Return value: The median, if successful;
    '                   otherwise, an error value

    Dim db As DAO.Database
    Dim rstDomain As DAO.Recordset
    Dim strSQL As String
    Dim varMedian As Variant
    Dim intFieldType As Integer
    Dim intRecords As Integer
    
    Const acbcErrAppTypeError = 3169
    
    On Error GoTo HandleErr

    Set db = CurrentDb( )
    
    ' Initialize the return value.
    varMedian = Null
    
    ' Build a SQL string for the recordset.
    strSQL = "SELECT " & strField
    strSQL = strSQL & " FROM " & strDomain
    
    ' Use a WHERE clause only if one is passed in.
    If Len(strCriteria) > 0 Then
        strSQL = strSQL & " WHERE " & strCriteria
    End If
    
    strSQL = strSQL & " ORDER BY " & strField
    
    Set rstDomain = db.OpenRecordset(strSQL, dbOpenSnapshot)
    
    ' Check the data type of the median field.
    intFieldType = rstDomain.Fields(strField).Type
    Select Case intFieldType
    Case dbByte, dbInteger, dbLong, dbCurrency, dbSingle, dbDouble, dbDate
        ' Numeric field.
        If Not rstDomain.EOF Then
            rstDomain.MoveLast
            intRecords = rstDomain.RecordCount
            ' Start from the first record.
            rstDomain.MoveFirst
    
            If (intRecords Mod 2) = 0 Then
                ' Even number of records. No middle record, so move
                ' to the record right before the middle.
                rstDomain.Move ((intRecords \ 2) - 1)
                varMedian = rstDomain.Fields(strField)
                ' Now move to the next record, the one right after
                ' the middle.
                rstDomain.MoveNext
                ' Average the two values.
                varMedian = (varMedian + rstDomain.Fields(strField)) / 2
                ' Make sure you return a date, even when averaging
                ' two dates.
                If intFieldType = dbDate And Not IsNull(varMedian) Then
                    varMedian = CDate(varMedian)
                End If
            Else
                ' Odd number of records. Move to the middle record
                ' and return its value.
                rstDomain.Move ((intRecords \ 2))
                varMedian = rstDomain.Fields(strField)
            End If
        Else
            ' No records; return Null.
            varMedian = Null
        End If
    Case Else
        ' Nonnumeric field; raise an app error.
        Err.Raise acbcErrAppTypeError
    End Select

    acbDMedian = varMedian
    
ExitHere:
    On Error Resume Next
    rstDomain.Close
    Set rstDomain = Nothing
    Exit Function

HandleErr:
    ' Return an error value.
    acbDMedian = CVErr(Err)
    Resume ExitHere
End Function

The process of building the SQL string that defines the recordset is straightforward, except for the construction of the optional WHERE clause. Because strCriteria was defined as an optional parameter (using the Optional keyword), acbDMedian checks if a value was passed by checking that the string has a length greater than zero.

Once acbDMedian builds the SQL string, it creates a recordset based on that SQL string.

Next, acbDMedian checks the data type of the field: it will calculate the median only for numeric and date/time fields. If any other data type has been passed to acbDMedian, the function forces an error by using the Raise method of the Err object and then uses the special CVErr function in its error handler to send the error state back to the calling procedure:

' Check the data type of the median field.
intFieldType = rstDomain.Fields(strField).Type
Select Case intFieldType
    Case dbByte, dbInteger, dbLong, dbCurrency, dbSingle, dbDouble, dbDate
' ... more code follows ...
    Case Else
        ' Nonnumeric field; raise an app error.
        Err.Raise acbcErrAppTypeError
End Select

' ... more code follows ...

ExitHere:
    On Error Resume Next
    rstDomain.Close
    Set rstDomain = Nothing
    Exit Function

HandleErr:
    ' Return an error value.
    acbDMedian = CVErr(Err)
    Resume ExitHere
End Function

If the field is numeric, the acbDMedian function checks to see if there are any rows in the recordset using the following If...Then statement, returning Null if there are no rows:

' Numeric field.
If Not rstDomain.EOF Then
    ' ... more code follows ...
Else
    ' No records; return Null.
    varMedian = Null
End If

If there are rows, the function moves to the end of the recordset to get a count of the total number of records. This is necessary because the RecordCount property returns only the number of rows that have been visited. The code is:

rstDomain.MoveLast
intRecords = rstDomain.RecordCount

If the number of records is even, acbDMedian moves to the record just before the middle using the Move method, which allows you to move an arbitrary number of records from the current record. The number of records to move forward is calculated using the following formula:

intRecords \ 2 - 1

This tells Access to divide the total number of records by 2 and then subtract 1 from the result (because you are starting from the first record). For example, if you are on the first of 500 records, you would move (500 \ 2 - 1) = (250 - 1) = 249 records forward, which would bring you to the 250th record. Once the function has moved that many records, it's a simple matter to grab the value of the 250th and 251st records and divide the result by 2. This part of the function is shown here:

' Start from the first record.
rstDomain.MoveFirst 

If (intRecords Mod 2) = 0 Then
    ' Even number of records. No middle record, so move
    ' to the record right before the middle.
    rstDomain.Move ((intRecords \ 2) - 1)
    varMedian = rstDomain.Fields(strField)
    ' Now move to the next record, the one right after
    ' the middle.
    rstDomain.MoveNext
    ' Average the two values.
    varMedian = (varMedian + rstDomain.Fields(strField)) / 2

Because acbDMedian supports dates, the function needs to make sure that a date value is returned when taking the average of two dates. The following code handles this:

' Make sure you return a date, even when
' averaging two dates.
If intFieldType = dbDate And Not IsNull(varMedian) Then
    varMedian = CDate(varMedian)
End If

The code for an even number of rows is much simpler:

Else
    ' Odd number of records. Move to the middle record
    ' and return its value.
    rstDomain.Move ((intRecords \ 2))
    varMedian = rstDomain.Fields(strField)
End If

Finally, acbDMedian returns the median value to the calling procedure:

acbDMedian = varMedian

The median, like the average (or arithmetic mean), is known statistically as a measure of central tendency. In other words, both measures estimate the middle of a set of data. The mean represents the mathematical average value; the median represents the middle-most value. For many datasets, these two measures are the same or very close to each other. Sometimes, however, depending on how the data is distributed, the mean and median will report widely varying values. In these cases, many people favor the median as a better "average" than the mean.

Calculating the median requires sorting the dataset, so it can be rather slow on large datasets. Calculating the mean, however, doesn't require a sort, so it will always be faster to calculate the mean.

Microsoft Excel includes a Median function that you can call from Access using OLE Automation. Chapter 12 shows you how to do this. Because using OLE Automation with Excel requires starting a copy of Excel to do the calculation, you'll almost always find it simpler and faster to use the all-Access solution presented here.


    [ Team LiB ] Previous Section Next Section