DekGenius.com
[ Team LiB ] Previous Section Next Section

Recipe 3.3 Determining the Differences in Data Between Two DataSet Objects

Problem

You have two DataSet objects with the same schema but containing different data and need to determine the difference between the data in the two.

Solution

Compare the two DataSet objects with the GetDataSetDifference( ) method shown in this solution and return the differences between the data as a DiffGram.

The sample code contains two event handlers and a single method:

Form.Load

Sets up the sample by creating two DataSet objects each containing a different subset of records from the Categories table from the Northwind sample database. The default view for each table is bound to a data grid on the form.

Get Difference Button.Click

Simply calls GetDataSetDifference( ) when the user clicks the button.

GetDataSetDifference( )

This method takes two DataSet objects with identical schemas as arguments and returns a DiffGram of the differences between the data in the two.

The C# code is shown in Example 3-3.

Example 3-3. File: DataSetDifferenceForm.cs
// Namespaces, variables, and constants
using System;
using System.Configuration;
using System.IO;
using System.Data;
using System.Data.SqlClient;

// Field name constants
private const String CATEGORYID_FIELD = "CategoryID";

DataSet dsA, dsB;

//  . . . 

private void DataSetDifferenceForm_Load(object sender, System.EventArgs e)
{
    SqlDataAdapter da;
    String sqlText;

    // Fill table A with Category schema and subset of data.
    sqlText = "SELECT CategoryID, CategoryName, Description " +
        "FROM Categories WHERE CategoryID BETWEEN 1 AND 5";
    DataTable dtA = new DataTable("TableA");
    da = new SqlDataAdapter(sqlText,
        ConfigurationSettings.AppSettings["Sql_ConnectString"]);
    da.Fill(dtA);
    da.FillSchema(dtA, SchemaType.Source);
    // Set up the identity column CategoryID.
    dtA.Columns[0].AutoIncrement = true;
    dtA.Columns[0].AutoIncrementSeed = -1;
    dtA.Columns[0].AutoIncrementStep = -1;
    // Create DataSet A and add table A.
    dsA = new DataSet( );
    dsA.Tables.Add(dtA);

    // Fill table B with Category schema and subset of data.
    sqlText = "SELECT CategoryID, CategoryName, Description "
        "FROM Categories WHERE CategoryID BETWEEN 4 AND 8";
    DataTable dtB = new DataTable("TableB");
    da = new SqlDataAdapter(sqlText,
        ConfigurationSettings.AppSettings["Sql_ConnectString"]);
    da.Fill(dtB);
    da.FillSchema(dtB, SchemaType.Source);
    // Set up the identity column CategoryID.
    dtB.Columns[0].AutoIncrement = true;
    dtB.Columns[0].AutoIncrementSeed = -1;
    dtB.Columns[0].AutoIncrementStep = -1;
    // Create DataSet B and add table B.
    dsB = new DataSet( );
    dsB.Tables.Add(dtB);

    // Bind the default views for table A and table B to DataGrids
    // on the form.
    aDataGrid.DataSource = dtA.DefaultView;
    bDataGrid.DataSource = dtB.DefaultView;
}

private void getDifferenceButton_Click(object sender, System.EventArgs e)
{
    resultTextBox.Text = GetDataSetDifference(dsA, dsB);
}

private String GetDataSetDifference(DataSet ds1, DataSet ds2)
{
    // Accept any edits within the DataSet objects.
    ds1.AcceptChanges( );
    ds2.AcceptChanges( );

    // Create a DataSet to store the differences.
    DataSet ds = new DataSet( );

    DataTable dt1Copy = null;
    // Iterate over the collection of tables in the first DataSet.
    for (int i = 0; i < ds1.Tables.Count; i++)
    {
        DataTable dt1 = ds1.Tables[i];
        DataTable dt2 = ds2.Tables[i];

        // Create a copy of the table in the first DataSet.
        dt1Copy = dt1.Copy( );

        // Iterate over the collection of rows in the
        // copy of the table from the first DataSet.
        foreach(DataRow row1 in dt1Copy.Rows)
        {
            DataRow row2 = dt2.Rows.Find(row1[CATEGORYID_FIELD]);
            if(row2 == null)
            {
                // Delete rows not in table 2 from table 1.
                row1.Delete( );
            }
            else
            {
                // Modify table 1 rows that are different from
                // table 2 rows.
                for(int j = 0; j < dt1Copy.Columns.Count; j++)
                {
                    if(row2[j] == DBNull.Value)
                    {
                        // Column in table 2 is null,
                        // but not null in table 1
                        if(row1[j] != DBNull.Value)
                            row1[j] = DBNull.Value;
                    }
                    else if (row1[j] == DBNull.Value)
                    {
                        // Column in table 1 is null,
                        // but not null in table 2
                        row1[j] = row2[j];
                    }
                    else if(row1[j].ToString( ) !=
                        row2[j].ToString( ))
                    {
                        // Neither column in table 1 nor
                        // table 2 is null, and the
                        // values in the columns are
                        // different.
                        row1[j] = row2[j];
                    }
                }
            }
        }

        foreach(DataRow row2 in dt2.Rows)
        {
            DataRow row1 =
                dt1Copy.Rows.Find(row2[CATEGORYID_FIELD]);
            if(row1 == null)
            {
                // Insert rows into table 1 that are in table 2
                // but not in table 1.
                dt1Copy.LoadDataRow(row2.ItemArray, false);
            }
        }
        
        // Add the table to the difference DataSet.
        ds.Tables.Add(dt1Copy);
    }

    // Write a XML DiffGram with containing the differences between tables.
    StringWriter sw = new StringWriter( );
    ds.WriteXml(sw, XmlWriteMode.DiffGram);

    return sw.ToString( );
}

Discussion

A DiffGram is an XML format used to specify original and current values for the data elements in a DataSet. It does not include any schema information. The DiffGram is used by .NET Framework applications as the serialization format for the contents of a DataSet including changes made to the Dataset.

A DiffGram is XML-based, which makes it platform and application independent. It is not, however, widely used or understood outside of Microsoft .NET applications.

The DiffGram format is divided into three sections: current, original, and errors. The original and current data in the DiffGram can also be used to report the differences between data in two DataSet objects. For more information about the DiffGram XML format, see Recipe 8.8.

The sample code contains a method GetDataSetDifference( ) that takes two DataSet objects with the same schema as arguments and returns a DiffGram containing the differences in data when the second DataSet is compared to the first. Table 3-1 describes how the differences between the DataSet objects appear in the DiffGram.

Table 3-1. DiffGram representation of DataSet differences

Condition

DiffGram representation

Row is the same in both DataSet 1 and DataSet 2

Row data appears only in the current data section of the DiffGram.

Row is in both DataSet 1 and DataSet 2 but the rows do not contain the same data

Row data appears in the current data section of the DiffGram. The row element contains the attribute diffgr:hasChanges with a value of "modified". The data in the current section is the updated data. The original data appears in the original <diffgr:before> block of the DiffGram.

Row is in DataSet 2 but not in DataSet 1

Row data appears in the current data section of the DiffGram. The row element contains the attribute diffgr:hasChanges with a value of "inserted".

Row is DataSet 1 but not in DataSet 2

Row data appears only in the original <diffgr:before> block of the DiffGram.

The sample begins by loading different subsets of data from the Categories table and displaying it in two grids on the form. This data is editable within the grids to allow DataSet differences as reported in the DiffGram to be investigated. In this example, the DataSet objects both contain just a single table. To determine the difference between the DataSet objects, the tables within the DataSet objects are compared as described next and changes are applied to the data in a copy of the first DataSet until it matches the second DataSet. Once all differences in all tables are processed, the DiffGram of the copy of the first DataSet contains the difference in the second DataSet when compared to the first DataSet.

More specifically, as each table is processed, a copy is made of it. The data in the copy of the first table is modified to make it consistent with the data in the second table. The modified copy of the first table is then added to the DataSet containing the differences between the two DataSet objects.

The process of modifying the data in the copy of the first table to match the data in second table involves several steps:

  • Rows that are in the copy of the first table but not in the second table (based on the primary key value) are deleted from the copy of the first table.

  • If the row is found in the second table, the columns are compared and any differences in the columns in the second table are changed in the column in the first table.

  • Rows that are in the second table but not in the copy of the first table are inserted into the copy of the first table without accepting changes.

    [ Team LiB ] Previous Section Next Section