Java Database Best Practices-Java Database Best Practices

3.2 Concurrency

Transaction management is a big enough problem worrying only about consistency, atomicity, and durability. Perhaps one of the biggest issues is isolation—how do you manage the concurrent access to database resources by many different users? The mechanism your database uses is generally hidden from the software architect and database programmer. A data architect, however, should be familiar with how their database of choice manages isolation.

3.2.1 Isolation Levels

Before talking about the various mechanisms to support transaction isolation, it is useful to examine the concept in more detail. As I mentioned earlier, isolation is the I in ACID. When a second transaction attempts to read the data being modified by a first transaction, that second transaction will not see any changes the first transaction has made. For example, if my wife read a joint savings account balance at the same time I was transferring money from savings to checking, the following events might occur:

I see a savings account balance of $1,000.
I initiate a transfer of $500 from savings to checking.
The system debits my savings account.
My wife requests the balance for the savings account and sees $1000.
The system credits my checking account.

My wife should see $1,000 as the balance in the savings account even though the system has already debited the savings account. She reads the old value because the transfer transaction is in an inconsistent state at the time of her request.

So far in this chapter, we have looked at transaction isolation only in the context of full transaction isolation—where a transaction can act on the database as if it were the only thread operating on that database. In the real world, databases must support concurrent access by multiple transactions. Business applications in particular have significant concurrency demands. Because of the performance penalty of full transaction isolation, it is rarely practical for most systems.

ANSI SQL identifies four distinct transaction isolation levels that enable you to balance performance needs with data integrity needs. In order to understand the different transaction isolation levels, however, you should first understand a few terms relating to the interaction of concurrent transactions:

Dirty read: A dirty read occurs when one transaction views the uncommitted changes of another transaction. If the original transaction rolls back its changes, the one that read its changes is said to have "dirty" data.
Nonrepeatable read: A nonrepeatable read occurs when one transaction reads different data from the same query when it is issued multiple times and other transactions have changed the rows between reads by the first transaction. In other words, a transaction that mandates repeatable reads will not see the committed changes made by other transactions. An application needs to start a new transaction to see those changes.
Phantom read: A phantom read deals with changes occurring in other transactions that would result in new rows matching your transaction's WHERE clause. For example, if you had a transaction that reads all accounts with a balance of less than $100 and your transaction performs two reads of that data, a phantom read allows for new rows to appear in the second read based on changes made by other transactions. This situation can occur if someone withdrew money between your reads. The new rows are called phantom rows.

The four ANSI SQL transaction isolation levels are:

Read uncommittted transactions: The transaction allows dirty reads, nonrepeatable reads, and phantom reads.
Read committed transactions: Only data committed to the database may be read. The transaction can, however, perform nonrepeatable and phantom reads.
Repeatable read transactions: Committed, repeatable reads as well as phantom reads are allowed. Nonrepeatable reads are not allowed.
Serializable transactions: Only committed, repeatable reads are allowed. Phantom reads are specifically disallowed at this level.

Assuming your database engine supports these levels of transaction isolation, a Java database application can control its transaction isolation level through the JDBC Connection interface using the setTransactionIsolation( ) method:

Connection conn = dataSource.getConnection( );
   
conn.setTransactionIsolation(
    Connection.TRANSACTION_READ_COMMITTED);
// perform your transaction

To find out what transaction isolation levels your database engine and driver support, use the supportsTransactionIsolationLevel( ) method in DatabaseMetaData. Table 3-1 shows the transaction isolation constants in java.sql.Connection. You may also want to check whether the database supports transactions at all (mSQL, for example, does not support them). The method supportsTransactions( ) answers this question.

Table 3-1. Constants for managing JDBC transaction isolation levels

Constant

Meaning

TRANSACTION_NONE

Transactions are not supported by the database engine. You never set the transaction isolation level with this value. It is simply a return value from drivers when transactions are not supported.

TRANSACTION_READ_UNCOMMITTED

Reads are read uncommitted. This level represents the least-restrictive form in databases in which transactions are supported.

TRANSACTION_READ_COMMITTED

Reads pull only committed data from the database.

TRANSACTION_REPEATABLE_READ

Reads are repeatable, but phantom reads are allowed.

TRANSACTION_SERIALIZABLE

Phantom reads are not allowed.

As an application architect, your job is to balance database integrity with application performance. You therefore want to select the lowest level of transaction isolation that will protect the integrity of the data as you have it structured in your data model.

BEST PRACTICE: Use the lowest transaction isolation level that will still maintain the integrity of your database.

3.2.2 Locking

The way in which your database engine manages concurrency is specific to your database engine. As long as it maintains ACIDity, it does not matter from a standards perspective how it does it. Nevertheless, database engines all use some form of locking.

Conceptually, concurrency at the database level can be understood using Java as an example. Java, of course, uses locks on objects to manage concurrent access to synchronized blocks of Java code. Consider the following three methods:

public class MyClass {
    public synchronized void first( ) {
        // do something
    }
   
    public void second( ) {
        synchronized( this ) {
            // do something else
        }
    }
   
    public void third( ) {
        synchronized( otherObject ) {
            // do yet a third thing
        }
    }
}

If two threads call first( ) at the same time, one will be allowed to execute the method to completion before the other can start. This sequencing occurs because the synchronized keyword indicates that you are locking against a monitor. In the default case, the monitor is always this. Thus the monitor in first( ) and second( ) is this and the monitor for third( ) is otherObject. Only one thread at a time can execute code monitored by a common object. In this example, first( ) and the synchronized section of second( ) cannot be executed concurrently by multiple threads. However, first( ) and third( ) can be concurrently executed by two separate threads since they are guarded by different monitors.

Databases use a different concurrency mechanism, but the Java concepts are relevant. A database allows for locks to be taken on different scopes of data. A database can lock a table at any of the following levels:

Table

Page

Row

Depending on your database engine, your lock level can also be modified as read, update, exclusive, shared, or any other number of attributes. SQL Server, for example, supports the following kinds of locks:

Intent

Shared

Update

Exclusive

Schema

Bulk update

You need a nice spreadsheet and an understanding of your database of choice to appreciate how two kinds of locks interact. For example, many different threads can acquire a shared lock. These shared locks may—depending on the database engine—prevent another thread from acquiring an update lock until the shared locks are released. Because interaction varies by database, a discussion is beyond the scope of this book.

The easiest lock level to understand is a table-level lock. In this case, only one thread can touch the locked table during the lifetime of the lock. This approach is much like using MyClass.class as the monitor for access to all methods in your MyClass class. In other words, no two threads would be able to access any MyClass instance concurrently. As you can imagine, such a trick would not perform well in a Java application. Similarly, table-level locking performs horribly for databases. Page-level locking has no Java analogy. In short, it prevents concurrent access to all of the rows in a single page of memory. If, for example, you have rows 1-3 in one page of memory and 4-6 in another, access to data from rows 1-3 with page-level locking prevents other threads from accessing any of those three rows. A concurrent thread can, however, access the data in rows 4-6.

Row-level locking is like making every nonstatic method in a Java class synchronized. It prevents concurrent access by two threads to the same row in the database. Two threads may, however, access two different rows concurrently.

On one hand, locking is necessary to guarantee proper transaction isolation in a concurrent environment. On the other, locking slows down database applications. Any transaction that has a long duration and a large number of resources associated with it is going to drag down any application. Later in the chapter, I introduce some approaches to concurrency that enable you to strike a balance between isolation and performance.

[ Team LiB ]