Java Database Best Practices-Java Database Best Practices

10.4 Data Management

The first thing you will probably want to do with a newly created table is add data to it. Once the data is in place, you need to maintain it—add to it, modify it, and perhaps even delete it.

10.4.1 Inserts

Creating a row in a table is one of the more straightforward concepts in SQL. The standard form of the INSERT statement is:

INSERT [INTO] table_or_view_name (column1, column2, ..., columnN)
{ [DEFAULT] VALUES | VALUES (value1, value2, ..., valueN)
| select_statement }

You specify the columns followed by the values to populate those columns for the new row. When inserting data into numeric fields, you can insert the value as is; for all other fields, you must wrap them in single quotes. For example, to insert a row of data into a table of addresses, you might issue the following command:

INSERT INTO Address (name, address, city, state, phone, age)
VALUES('Robert Smith', '123 Fascination St.', 'New London', 'CT',
       '(800) 555-1234', 43)

In addition to the direct specification of the values to add, you can populate the table with a new row containing default values or even from the results of some other query. For example, to insert the results from a query as new rows in a table, you might execute the following SQL:

INSERT INTO FavoriteSong ( id, name, album, artist )
SELECT Song.id, Song.name, Album.title, Artist.name
FROM Song, Album, Artist
WHERE Song.ranking > 4
AND Song.album = Album.id
AND Album.artist = Artist.id

You should note that the number of columns in the INSERT call matches the number of columns in the SELECT call. In addition, the data types for the INSERT columns must match the data types for the corresponding SELECT columns. Finally, the SELECT clause in an INSERT statement cannot contain an ORDER BY modifier and cannot be selected from the same table in which the INSERT occurs.

10.4.2 Primary Keys

The best kind of primary key is one that has absolutely no meaning in the database except to act as a primary key. When you use information such as a username or an email address as a primary key, you are in effect saying that the username or email address is somehow an intrinsic part of who that person is. If that person ever changes the username or email address, you will have to go to great lengths to ensure the integrity of the data in the database. Consequently, it is a better design principle to use meaningless numbers as primary keys.

You thus need a mechanism for generating meaningless, yet unique, numbers every time you insert a new row. Every database provides some kind of extremely proprietary tool for generating unique identifiers. They differ so vastly that I cannot even begin to provide a generic description of unique identifier generation as I can with most SQL elements. Because of how greatly they differ, it is a good idea to simply avoid the proprietary database tools. Chapter 4 provides a database-independent approach to sequence generation.

10.4.3 Updates

The insertion of new rows into a database is just the start of data management. Unless your database is read-only, you will probably also need to make periodic changes to the data. The standard SQL modification statement looks like this:

UPDATE table_or_view_name
SET column1={DEFAULT |value} [, ... ]
[WHERE clause]

You specifically name the table you want to update and the values you want to assign in the SET clause and then identify the rows to be affected in the WHERE clause. If you fail to specify a WHERE clause, the database will update every row in the table.

In addition to assigning literal values to a column, you can also calculate the values. You can even calculate the value based on a value in another column:

UPDATE Project
SET end_year = begin_year+5

This command sets the value in the end_year column equal to the value in the begin_year column, plus 5, for each row in that table.

10.4.4 The WHERE Clause

The previous section introduced one of the most important SQL concepts, the WHERE clause. In SQL, a WHERE clause enables you to pick out specific rows in a table by specifying a value (like a primary key) that must be matched by the column in question. For example:

UPDATE Band
SET leadSinger = 'Ian Anderson'
WHERE id = 8

This UPDATE specifies that you should change only the leadSinger column for the row where id is 8. If the specified column is not a unique index, the WHERE clause may match multiple rows. Many SQL commands employ WHERE clauses to help pick out the rows on which you wish to operate. Because the columns in the WHERE clause are columns on which you search, you should generally have indexes created around whatever combinations you commonly use. We discuss the kinds of comparisons you can perform in the WHERE clause later in the chapter.

10.4.5 Deletes

Deleting data is a straightforward operation. You simply specify the table followed by a WHERE clause that identifies the rows you want to delete:

DELETE FROM table_name [WHERE clause]

As with other commands that accept a WHERE clause, the WHERE clause is optional. If you omit it, you will delete all of the records in the table! Of all the destructive commands in SQL, this is the easiest one to issue by mistake.

10.4.6 Queries

The last common SQL command, SELECT, enables you to view the data in the database. This action is by far the most common action performed in SQL. While data entry and modifications do happen on occasion, most databases spend the vast majority of their lives serving up data for reading. The general form of the SELECT statement is as follows:

SELECT [ALL | DISTINCT] column1 [, column2, ..., columnN ]
FROM table1 [, table2, ..., tableN ]
[JOIN condition]
[WHERE clause]
[GROUP BY column_list]
[HAVING condition]
[ORDER BY column_list [ASC | DESC]]

The SELECT statement enables you to identify the columns you want from one or more tables. The WHERE clause identifies the rows with the data you seek.

10.4.6.1 Basic queries

The variations on this syntax are numerous. The simplest form is:

SELECT 1;

This simple, though completely useless query returns a result set with a single row containing a single column with the value of 1. A more useful version of this query might be the MySQL query that tells you what database you are using:

mysql> SELECT DATABASE( );
+------------+
| DATABASE( ) |
+------------+
| jtest       |
+------------+
1 row in set (0.01 sec)

The expression DATABASE( ) is a MySQL function that returns the name of the current database. I will cover functions in more detail later in the chapter. Nevertheless, you can see how simple SQL can provide a quick-and-dirty way of finding out important information.

Most of the time, however, you should use slightly more complex queries that help you pull data from a table in the database. The first part of a SELECT statement enumerates the columns you wish to retrieve. You may specify a * to say that you want to select all columns. The FROM clause specifies which tables those columns come from.

The other optional clauses all determine what rows you are selecting and how to display the results. By now, you should feel comfortable with the WHERE clause. In the case of a SELECT statement, the WHERE clause tells the database to return only the rows that match the specified clause. I will cover the more complex clauses later in this chapter.

10.4.6.2 Aliasing

When you use column names that are fully qualified with their table and column names, the names can grow to be quite unwieldy. In addition, when referencing SQL functions (which will be discussed later in the chapter), you will likely find it cumbersome to refer to the same function more than once within a statement. You can get around these issues by using aliases. An alias is usually a shorter and more descriptive way of referring to a cumbersome name. You can use it anywhere in the same SQL statement in place of the longer name. For example:

# A column alias
SELECT long_field_names_are_annoying AS myfield
FROM table_name
WHERE myfield = 'Joe'
   
# A table alias
SELECT people.names, tests.score 
FROM tests, really_long_people_table_name AS people

10.4.6.3 Ordering

The results from a SELECT are, by default, indeterminate in the order they will appear. You can tell a database to order any results you see by a certain column. For example, if you specify that a query should order the results by last_name, then the results will appear alphabetized according to the last_name value. Ordering is handled by the ORDER BY clause:

SELECT last_name, first_name, age
FROM people
ORDER BY last_name, first_name

In this situation, we are ordering by two columns. You can order by any number of columns.

If you want to see things in reverse order, add the DESC (descending) keyword:

ORDER BY last_name DESC

The DESC keyword applies only to the field that comes directly before it. If you are sorting on multiple fields, only the field directly before DESC is reversed; the others are sorted in ascending order.

10.4.6.4 Grouping

Grouping lets you group rows with matching values for a specific column into a single row in order to operate on them together. You usually do this to perform aggregate functions on the results. I will go into functions a little later in the chapter.

Consider the following:

mysql> SELECT name, rank, salary FROM people;
+--------------+----------+--------+
| name           | rank      | salary  |
+--------------+----------+--------+
| Jack Smith     | Private    |  23000 |
| Jane Walker    | General    | 125000 |
| June Sanders   | Private    |  22000 |
| John Barker    | Sergeant   |  45000 |
| Jim Castle     | Sergeant   |  38000 |
+--------------+----------+--------+
5 rows in set (0.01 sec)

If you want to get a list of different ranks, you can use the GROUP BY clause to get a full account of the ranks:

mysql> SELECT rank FROM people GROUP BY rank;
+----------+
| rank     |
+----------+
| General  |
| Private  |
| Sergeant |
+----------+
3 rows in set (0.01 sec)

You should not, however, think of these results as simply a listing of the different ranks. The GROUP BY clause actually groups all of the rows matching the WHERE clause (in this case, every row) based on the GROUP BY clause. The two privates are thus grouped together into a single row with the rank Private. The two sergeants are similarly aggregated. With the individuals grouped according to rank, you can find out the average salary for each rank. Again, we will further discuss the functions you see in this example later in the chapter.

mysql> SELECT rank, AVG(salary) FROM people GROUP BY rank;
+----------+-------------+
| rank     | AVG(salary) |
+----------+-------------+
| General  | 125000.0000 |
| Private  |  22500.0000 |
| Sergeant |  41500.0000 |
+----------+-------------+
3 rows in set (0.04 sec)

Here you see the true power of grouping. This query uses an aggregate function, AVG( ) to operate on all of the rows grouped together for each row. In this case, the salaries of the two privates (23000 and 22000) are grouped together in the same row, and the AVG( ) function is applied to them.

The power of ordering and grouping combined with the utility of SQL functions enables you to do a great deal of data manipulation even before you retrieve the data from the server. However, you should take great care not to rely too heavily on this power. While it may seem more efficient to place as much processing load as possible onto the database server, this is not really the case. Your client application is dedicated to the needs of a particular client, while the server is shared by many clients. Because of the greater amount of work a server already has to do, it is almost always more efficient to place as little load as possible on the database server.

10.4.7 Operators

So far, we have used the = operator for the obvious task of verifying that two values in a WHERE clause equal each other. Other fairly basic operators include <>, >, <, <=, and >=. Note that though ANSI SQL requires the use of <> to check for inequality, most database engines also support !=. Table 10-4 contains a full set of ANSI SQL operators.

Not all databases support all operators. In addition, MySQL and SQL Server support various bitwise operators, including &, |, ^, <<, and >>.

Table 10-4. ANSI SQL Operators

Operator

Context

Description

+

Arithmetic

Addition (also works for date addition on some database engines)

-

Arithmetic

Subtraction

*

Arithmetic

Multiplication

/

Arithmetic

Division

=

Comparison

Equal

<>

Comparison

Not equal

<

Comparison

Less than

>

Comparison

Greater than

<=

Comparison

Less than or equal to

>=

Comparison

Greater than or equal to

BETWEEN

Comparison

Between two values

IN

Comparison

Membership in a list

LIKE

Comparison

Similarity

AND

Logical

And

OR

Logical

Or

NOT

Logical

Negation

+

Unary

Positive

-

Unary

Negative

~

Unary

Complement

ANSI SQL operators have the following order of precedence:

+ - ~ (unary)
NOT
* / %
+ - (arithmetic)
< <= > >= = <> IN LIKE
BETWEEN IN
AND
OR

Precedence moves from left to right for operators of equal precedence. You can override the rules of precedence through the use of parentheses, in which case elements within the parentheses have higher precedence. For expressions with nested parentheses, the innermost parentheses are evaluated first.

10.4.7.1 Logical operators

SQL's logical operators—AND, OR, and NOT—let you build more dynamic WHERE clauses. The AND and OR operators specifically let you add multiple criteria to a query:

SELECT name
FROM User
WHERE age > 18 AND status = 'RESIDENT';

This sample query provides a list of all users who are residents and are old enough to vote. In other words, it finds every resident 18 years or older.

You can build increasingly complex queries and override SQL's order of precedence with parentheses. The parentheses tell the database which comparisons to evaluate first:

SELECT name 
FROM User
WHERE (age > 18 AND status = 'RESIDENT')
OR (age > 18 AND status = 'APPLICANT');

In this more complex query, we are looking for anyone currently eligible to vote as well as people who might be eligible in the near future. You can also use the NOT operator to negate an entire expression:

SELECT name
FROM User
WHERE NOT (age > 18 AND status = 'RESIDENT');

In this case, negation provides all the users who are not eligible to vote.

10.4.7.2 Comparisons with NULL

NULL is a tricky concept for most people new to databases to understand. As in other programming languages, NULL is not a value, but the absence of a value. This concept is useful, for example, if you have a customer-profiling database that gradually gathers information about your customers as they offer it.

When you first create a record, for example, you may not know how many pets the customer has. You want that column to hold NULL instead of 0 so you can tell the difference between customers with no pets and customers whose pet ownership is unknown.

The concept of NULL gets a little funny when you use it in SQL calculations. Many programming languages use NULL as simply another kind of value. In Java, the following syntax evaluates to true when the variable is NULL and false when it is not:

str =  = NULL

The similar expression in SQL, col = NULL, is neither true nor false—it is always NULL, no matter what the value of the COL column. The following query will therefore not act as you might expect:

SELECT title FROM Book WHERE author = NULL;

Because the WHERE clause will never evaluate to true no matter what value is in the database for the author column, this query always provides an empty result set—even when you have author columns with NULL values. To test for "nullness," use the IS NULL and IS NOT NULL operators:

SELECT title FROM Book WHERE author IS NULL;

10.4.7.3 Membership tests

Sometimes applications need to check whether a value is a member of a set of values or within a particular range. The IN operator helps with the former:

SELECT title FROM Book WHERE author IN ('Stephen King', 'Richard Bachman');

This query will return the titles of all books written by Stephen King.^[5] Similarly, you can check for all books by authors other than Stephen King with the NOT IN operator.

^[5] Richard Bachman is a pseudonym used by Stephen King for some of his books.

To determine whether a value is in a particular range, use the BETWEEN operator:

SELECT title FROM Book WHERE bookID BETWEEN 1  AND 100;

Both of these simple examples could, of course, be replicated with the basic operators. The Stephen King check, for example, could have been done by using the = operator and an OR:

SELECT title 
FROM Book
WHERE author = 'Stephen King' OR author = 'Richard Bachman';

The check on book IDs could also have been done with an OR clause using the >= and <= or > and < operators. As your queries get more complex, however, membership tests can help you build both readable and better-performing queries than those you might create with the basic operators.

10.4.8 Functions

Functions in SQL are similar to functions in other programming languages such as C and Perl. A function takes zero or more arguments and returns some value. For example, the function SQRT(16) returns 4. Within an SQL SELECT statement, functions may be used in one of two ways:

As a value to be retrieved

This form involves a function in the place of a column in the list of columns to be retrieved. The return value of the function, evaluated for each selected row, is part of the returned result set as if it were a column in the database.^[6]

^[6] You can use aliasing, covered earlier in the chapter, to give the resulting columns "friendly" names.

This query selects the name of each event and today's date for all events more recent than the given time:

SELECT name, CURRENT_DATE( ) 
FROM Event
WHERE time > 90534323

This query selects the title of a paper, the full text of the paper, and the length of the text in bytes for all of the papers authored by Douglas Adams. The LENGTH( ) function returns the character length of a given string:

# The LENGTH( ) function returns the character length of
# a given string.
SELECT title, text, LENGTH(text)
FROM Paper
WHERE author = 'Douglas Adams'

As part of a WHERE clause

This form involves a function used in place of a constant when evaluating a WHERE clause. The value of the function is used for comparison for each row of the table.

This query randomly selects the name of an entry from a pool of 35 entries. The RAND( )function generates a random number between 0 and 1. This random value is then multiplied by 34 to turn the value into a number between 0 and 34. Incrementing the value by 1 provides a number between 1 and 35. The ROUND( ) function rounds the result to the nearest integer. The result is a whole number between 1 and 35 and will therefore match one of the ID numbers in the table:

SELECT name 
FROM Entry
WHERE id = ROUND( (RAND( )*34) + 1 )

You may use functions in both the value list and the WHERE clause. This query selects the name and date of each event less than a day old:

SELECT name
FROM Event
WHERE time > (CURRENT_TIMESTAMP( ) - (60 * 60 * 24) )

You may also use the value of a table field within a function. This example returns the names of people who used their names as passwords. The ENCRYPT( ) function from MySQL returns a Unix password-style encryption of the specified string using the supplied two-character salt. The LEFT( ) function returns the left-most n characters of the specified string:

SELECT name
FROM People
WHERE password = ENCRYPT(name, LEFT(name, 2))

Though there is a basic set of ANSI SQL functions, the number of non-ANSI functions in any database engine likely outnumbers their ANSI counterparts. When programming in SQL, it is therefore always a good idea to have a reference specific to your database at your side.

10.4.9 Joins

Joins put the "relational" in relational databases by enabling you to relate the data in one table with data in other tables. The basic form of a join is sometimes described as an inner join. Joining tables is a matter of specifying equality in columns from two tables:

SELECT Book.title, Author.name 
FROM Author, Book
WHERE Book.author = Author.id

This query pulls columns from two different tables when a relationship exists between rows in the two tables. Specifically, this query looks for situations in which the value of the Author column in the Book table matches the id value in the Author table. Consider a database in which the Book table looks like Table 10-5, and the Author table looks like Table 10-6.

Table 10-5. A Book table

ID

Title

Author

1

Slaughterhouse 5

4

2

Last Rites

2

3

The Vampire Lestat

3

4

The Shining

1

Table 10-6. An Author table

ID

Name

1

Stephen King

2

Terry Pratchett

3

Anne Rice

4

Kurt Vonnegut

5

Douglas Adams

An inner join creates a virtual table by combining the fields of both tables for rows that satisfy the query in both tables. In our example, the query specifies that the author field of the Book table must be identical to the id field of the Author table. The query's result would look like Table 10-7.

Table 10-7. Query results based on an inner join

Book title

Author name

The Shining

Stephen King

Last Rites

Terry Pratchet

Slaughterhouse 5

Kurt Vonnegut

The Vampire Lestat

Anne Rice

Douglas Adams is nowhere to be found in these results. He is left out because there is no value for his Author.id value found in the author column of the Book table. In other words, he did not write any of the books in our database! An inner join contains only those rows that match the query exactly.

Just about every database supports inner joins via the WHERE clause. Nevertheless, this approach violates the ANSI standard. ANSI-compliant joins—inner and otherwise—occur in the JOIN clause of your SELECT statement. The book query is properly written in ANSI SQL as:

SELECT Book.title, Author.name
FROM Author
JOIN Book ON Author.id = Book.author

What you get in your result set involving a join depends on the join type. You have already seen how an inner join fails to show Douglas Adams since he has no books listed in the Book table. ANSI SQL supports the following kinds of joins:

Inner join: An inner join is the default kind of join. Under an inner join, any rows that do not match from either table are discarded from the result set.
Left join: A left join enables us to see that we have Douglas Adams in the database with no books. In short, all rows from the left side of the join are included in the results regardless of whether they match a row from the right side of the join. Where no match exists, NULL values are shown for fields from the table on the right side of the join.
Right join: Right and left joins are variations of a single kind of join known as an outer join. As a left join provides all the rows from the left side of the join, a right join provides all the rows from the right side.

Because these two joins are only semantically different, MySQL has chosen not to support right joins. If you need a right join, you can simply restate your SQL as a left join in order to achieve the same results.

Full join

A full join is a combination of a right and left join. In other words, all rows from both tables appear in the result set. If no match exists for each side, NULL values appear in the result set.

Natural join

A natural join looks for columns in the joined tables having identical names, data types, and values. For example, the inner join we performed on the Book and Author tables could be made a natural join if the author column in the Book table were renamed authorID and the id column in the Author table renamed authorID:

SELECT Book.title, Author.name
FROM Author
NATURAL JOIN Book;

Cross join

A cross join provides the full data from the joined two tables and is the same as specifying no JOIN or WHERE clause at all. I cannot think of any reason why you would want to perform a cross join.

The left outer join that gives us Douglas Adams in our results looks like:

SELECT Book.title, Author.name 
FROM Author
LEFT JOIN Book ON Book.author = Author.id

The results of the outer join would therefore look like this:

+--------------------+----------------+
| Book.title         | Author.name    |
+--------------------+----------------+
| The Shining        | Stephen King   |
| Last Rites         | Terry Pratchett|
| The Vampire Lestat | Anne Rice      |
| Slaughterhouse 5   | Kurt Vonnegut  |
| NULL               | Douglas Adams  |
+--------------------+----------------+

[ Team LiB ]