[ Team LiB ] |
10.4 Data ManagementThe first thing you will probably want to do with a newly created table is add data to it. Once the data is in place, you need to maintain it—add to it, modify it, and perhaps even delete it. 10.4.1 InsertsCreating a row in a table is one of the more straightforward concepts in SQL. The standard form of the INSERT statement is: INSERT [INTO] table_or_view_name (column1, column2, ..., columnN) { [DEFAULT] VALUES | VALUES (value1, value2, ..., valueN) | select_statement } You specify the columns followed by the values to populate those columns for the new row. When inserting data into numeric fields, you can insert the value as is; for all other fields, you must wrap them in single quotes. For example, to insert a row of data into a table of addresses, you might issue the following command: INSERT INTO Address (name, address, city, state, phone, age) VALUES('Robert Smith', '123 Fascination St.', 'New London', 'CT', '(800) 555-1234', 43) In addition to the direct specification of the values to add, you can populate the table with a new row containing default values or even from the results of some other query. For example, to insert the results from a query as new rows in a table, you might execute the following SQL: INSERT INTO FavoriteSong ( id, name, album, artist ) SELECT Song.id, Song.name, Album.title, Artist.name FROM Song, Album, Artist WHERE Song.ranking > 4 AND Song.album = Album.id AND Album.artist = Artist.id You should note that the number of columns in the INSERT call matches the number of columns in the SELECT call. In addition, the data types for the INSERT columns must match the data types for the corresponding SELECT columns. Finally, the SELECT clause in an INSERT statement cannot contain an ORDER BY modifier and cannot be selected from the same table in which the INSERT occurs. 10.4.2 Primary KeysThe best kind of primary key is one that has absolutely no meaning in the database except to act as a primary key. When you use information such as a username or an email address as a primary key, you are in effect saying that the username or email address is somehow an intrinsic part of who that person is. If that person ever changes the username or email address, you will have to go to great lengths to ensure the integrity of the data in the database. Consequently, it is a better design principle to use meaningless numbers as primary keys. You thus need a mechanism for generating meaningless, yet unique, numbers every time you insert a new row. Every database provides some kind of extremely proprietary tool for generating unique identifiers. They differ so vastly that I cannot even begin to provide a generic description of unique identifier generation as I can with most SQL elements. Because of how greatly they differ, it is a good idea to simply avoid the proprietary database tools. Chapter 4 provides a database-independent approach to sequence generation. 10.4.3 UpdatesThe insertion of new rows into a database is just the start of data management. Unless your database is read-only, you will probably also need to make periodic changes to the data. The standard SQL modification statement looks like this: UPDATE table_or_view_name SET column1={DEFAULT |value} [, ... ] [WHERE clause] You specifically name the table you want to update and the values you want to assign in the SET clause and then identify the rows to be affected in the WHERE clause. If you fail to specify a WHERE clause, the database will update every row in the table. In addition to assigning literal values to a column, you can also calculate the values. You can even calculate the value based on a value in another column: UPDATE Project SET end_year = begin_year+5 This command sets the value in the end_year column equal to the value in the begin_year column, plus 5, for each row in that table. 10.4.4 The WHERE ClauseThe previous section introduced one of the most important SQL concepts, the WHERE clause. In SQL, a WHERE clause enables you to pick out specific rows in a table by specifying a value (like a primary key) that must be matched by the column in question. For example: UPDATE Band SET leadSinger = 'Ian Anderson' WHERE id = 8 This UPDATE specifies that you should change only the leadSinger column for the row where id is 8. If the specified column is not a unique index, the WHERE clause may match multiple rows. Many SQL commands employ WHERE clauses to help pick out the rows on which you wish to operate. Because the columns in the WHERE clause are columns on which you search, you should generally have indexes created around whatever combinations you commonly use. We discuss the kinds of comparisons you can perform in the WHERE clause later in the chapter. 10.4.5 DeletesDeleting data is a straightforward operation. You simply specify the table followed by a WHERE clause that identifies the rows you want to delete: DELETE FROM table_name [WHERE clause] As with other commands that accept a WHERE clause, the WHERE clause is optional. If you omit it, you will delete all of the records in the table! Of all the destructive commands in SQL, this is the easiest one to issue by mistake. 10.4.6 QueriesThe last common SQL command, SELECT, enables you to view the data in the database. This action is by far the most common action performed in SQL. While data entry and modifications do happen on occasion, most databases spend the vast majority of their lives serving up data for reading. The general form of the SELECT statement is as follows: SELECT [ALL | DISTINCT] column1 [, column2, ..., columnN ] FROM table1 [, table2, ..., tableN ] [JOIN condition] [WHERE clause] [GROUP BY column_list] [HAVING condition] [ORDER BY column_list [ASC | DESC]] The SELECT statement enables you to identify the columns you want from one or more tables. The WHERE clause identifies the rows with the data you seek. 10.4.6.1 Basic queriesThe variations on this syntax are numerous. The simplest form is: SELECT 1; This simple, though completely useless query returns a result set with a single row containing a single column with the value of 1. A more useful version of this query might be the MySQL query that tells you what database you are using: mysql> SELECT DATABASE( ); +------------+ | DATABASE( ) | +------------+ | jtest | +------------+ 1 row in set (0.01 sec) The expression DATABASE( ) is a MySQL function that returns the name of the current database. I will cover functions in more detail later in the chapter. Nevertheless, you can see how simple SQL can provide a quick-and-dirty way of finding out important information. Most of the time, however, you should use slightly more complex queries that help you pull data from a table in the database. The first part of a SELECT statement enumerates the columns you wish to retrieve. You may specify a * to say that you want to select all columns. The FROM clause specifies which tables those columns come from. The other optional clauses all determine what rows you are selecting and how to display the results. By now, you should feel comfortable with the WHERE clause. In the case of a SELECT statement, the WHERE clause tells the database to return only the rows that match the specified clause. I will cover the more complex clauses later in this chapter. 10.4.6.2 AliasingWhen you use column names that are fully qualified with their table and column names, the names can grow to be quite unwieldy. In addition, when referencing SQL functions (which will be discussed later in the chapter), you will likely find it cumbersome to refer to the same function more than once within a statement. You can get around these issues by using aliases. An alias is usually a shorter and more descriptive way of referring to a cumbersome name. You can use it anywhere in the same SQL statement in place of the longer name. For example: # A column alias SELECT long_field_names_are_annoying AS myfield FROM table_name WHERE myfield = 'Joe' # A table alias SELECT people.names, tests.score FROM tests, really_long_people_table_name AS people 10.4.6.3 OrderingThe results from a SELECT are, by default, indeterminate in the order they will appear. You can tell a database to order any results you see by a certain column. For example, if you specify that a query should order the results by last_name, then the results will appear alphabetized according to the last_name value. Ordering is handled by the ORDER BY clause: SELECT last_name, first_name, age FROM people ORDER BY last_name, first_name In this situation, we are ordering by two columns. You can order by any number of columns. If you want to see things in reverse order, add the DESC (descending) keyword: ORDER BY last_name DESC The DESC keyword applies only to the field that comes directly before it. If you are sorting on multiple fields, only the field directly before DESC is reversed; the others are sorted in ascending order. 10.4.6.4 GroupingGrouping lets you group rows with matching values for a specific column into a single row in order to operate on them together. You usually do this to perform aggregate functions on the results. I will go into functions a little later in the chapter. Consider the following: mysql> SELECT name, rank, salary FROM people; +--------------+----------+--------+ | name | rank | salary | +--------------+----------+--------+ | Jack Smith | Private | 23000 | | Jane Walker | General | 125000 | | June Sanders | Private | 22000 | | John Barker | Sergeant | 45000 | | Jim Castle | Sergeant | 38000 | +--------------+----------+--------+ 5 rows in set (0.01 sec) If you want to get a list of different ranks, you can use the GROUP BY clause to get a full account of the ranks: mysql> SELECT rank FROM people GROUP BY rank; +----------+ | rank | +----------+ | General | | Private | | Sergeant | +----------+ 3 rows in set (0.01 sec) You should not, however, think of these results as simply a listing of the different ranks. The GROUP BY clause actually groups all of the rows matching the WHERE clause (in this case, every row) based on the GROUP BY clause. The two privates are thus grouped together into a single row with the rank Private. The two sergeants are similarly aggregated. With the individuals grouped according to rank, you can find out the average salary for each rank. Again, we will further discuss the functions you see in this example later in the chapter. mysql> SELECT rank, AVG(salary) FROM people GROUP BY rank; +----------+-------------+ | rank | AVG(salary) | +----------+-------------+ | General | 125000.0000 | | Private | 22500.0000 | | Sergeant | 41500.0000 | +----------+-------------+ 3 rows in set (0.04 sec) Here you see the true power of grouping. This query uses an aggregate function, AVG( ) to operate on all of the rows grouped together for each row. In this case, the salaries of the two privates (23000 and 22000) are grouped together in the same row, and the AVG( ) function is applied to them. The power of ordering and grouping combined with the utility of SQL functions enables you to do a great deal of data manipulation even before you retrieve the data from the server. However, you should take great care not to rely too heavily on this power. While it may seem more efficient to place as much processing load as possible onto the database server, this is not really the case. Your client application is dedicated to the needs of a particular client, while the server is shared by many clients. Because of the greater amount of work a server already has to do, it is almost always more efficient to place as little load as possible on the database server. 10.4.7 OperatorsSo far, we have used the = operator for the obvious task of verifying that two values in a WHERE clause equal each other. Other fairly basic operators include <>, >, <, <=, and >=. Note that though ANSI SQL requires the use of <> to check for inequality, most database engines also support !=. Table 10-4 contains a full set of ANSI SQL operators.
ANSI SQL operators have the following order of precedence:
Precedence moves from left to right for operators of equal precedence. You can override the rules of precedence through the use of parentheses, in which case elements within the parentheses have higher precedence. For expressions with nested parentheses, the innermost parentheses are evaluated first. 10.4.7.1 Logical operatorsSQL's logical operators—AND, OR, and NOT—let you build more dynamic WHERE clauses. The AND and OR operators specifically let you add multiple criteria to a query: SELECT name FROM User WHERE age > 18 AND status = 'RESIDENT'; This sample query provides a list of all users who are residents and are old enough to vote. In other words, it finds every resident 18 years or older. You can build increasingly complex queries and override SQL's order of precedence with parentheses. The parentheses tell the database which comparisons to evaluate first: SELECT name FROM User WHERE (age > 18 AND status = 'RESIDENT') OR (age > 18 AND status = 'APPLICANT'); In this more complex query, we are looking for anyone currently eligible to vote as well as people who might be eligible in the near future. You can also use the NOT operator to negate an entire expression: SELECT name FROM User WHERE NOT (age > 18 AND status = 'RESIDENT'); In this case, negation provides all the users who are not eligible to vote. 10.4.7.2 Comparisons with NULLNULL is a tricky concept for most people new to databases to understand. As in other programming languages, NULL is not a value, but the absence of a value. This concept is useful, for example, if you have a customer-profiling database that gradually gathers information about your customers as they offer it. When you first create a record, for example, you may not know how many pets the customer has. You want that column to hold NULL instead of 0 so you can tell the difference between customers with no pets and customers whose pet ownership is unknown. The concept of NULL gets a little funny when you use it in SQL calculations. Many programming languages use NULL as simply another kind of value. In Java, the following syntax evaluates to true when the variable is NULL and false when it is not: str = = NULL The similar expression in SQL, col = NULL, is neither true nor false—it is always NULL, no matter what the value of the COL column. The following query will therefore not act as you might expect: SELECT title FROM Book WHERE author = NULL; Because the WHERE clause will never evaluate to true no matter what value is in the database for the author column, this query always provides an empty result set—even when you have author columns with NULL values. To test for "nullness," use the IS NULL and IS NOT NULL operators: SELECT title FROM Book WHERE author IS NULL; 10.4.7.3 Membership testsSometimes applications need to check whether a value is a member of a set of values or within a particular range. The IN operator helps with the former: SELECT title FROM Book WHERE author IN ('Stephen King', 'Richard Bachman'); This query will return the titles of all books written by Stephen King.[5] Similarly, you can check for all books by authors other than Stephen King with the NOT IN operator.
To determine whether a value is in a particular range, use the BETWEEN operator: SELECT title FROM Book WHERE bookID BETWEEN 1 AND 100; Both of these simple examples could, of course, be replicated with the basic operators. The Stephen King check, for example, could have been done by using the = operator and an OR: SELECT title FROM Book WHERE author = 'Stephen King' OR author = 'Richard Bachman'; The check on book IDs could also have been done with an OR clause using the >= and <= or > and < operators. As your queries get more complex, however, membership tests can help you build both readable and better-performing queries than those you might create with the basic operators. 10.4.8 FunctionsFunctions in SQL are similar to functions in other programming languages such as C and Perl. A function takes zero or more arguments and returns some value. For example, the function SQRT(16) returns 4. Within an SQL SELECT statement, functions may be used in one of two ways:
Though there is a basic set of ANSI SQL functions, the number of non-ANSI functions in any database engine likely outnumbers their ANSI counterparts. When programming in SQL, it is therefore always a good idea to have a reference specific to your database at your side. 10.4.9 JoinsJoins put the "relational" in relational databases by enabling you to relate the data in one table with data in other tables. The basic form of a join is sometimes described as an inner join. Joining tables is a matter of specifying equality in columns from two tables: SELECT Book.title, Author.name FROM Author, Book WHERE Book.author = Author.id This query pulls columns from two different tables when a relationship exists between rows in the two tables. Specifically, this query looks for situations in which the value of the Author column in the Book table matches the id value in the Author table. Consider a database in which the Book table looks like Table 10-5, and the Author table looks like Table 10-6.
An inner join creates a virtual table by combining the fields of both tables for rows that satisfy the query in both tables. In our example, the query specifies that the author field of the Book table must be identical to the id field of the Author table. The query's result would look like Table 10-7.
Douglas Adams is nowhere to be found in these results. He is left out because there is no value for his Author.id value found in the author column of the Book table. In other words, he did not write any of the books in our database! An inner join contains only those rows that match the query exactly. Just about every database supports inner joins via the WHERE clause. Nevertheless, this approach violates the ANSI standard. ANSI-compliant joins—inner and otherwise—occur in the JOIN clause of your SELECT statement. The book query is properly written in ANSI SQL as: SELECT Book.title, Author.name FROM Author JOIN Book ON Author.id = Book.author What you get in your result set involving a join depends on the join type. You have already seen how an inner join fails to show Douglas Adams since he has no books listed in the Book table. ANSI SQL supports the following kinds of joins:
The left outer join that gives us Douglas Adams in our results looks like: SELECT Book.title, Author.name FROM Author LEFT JOIN Book ON Book.author = Author.id The results of the outer join would therefore look like this: +--------------------+----------------+ | Book.title | Author.name | +--------------------+----------------+ | The Shining | Stephen King | | Last Rites | Terry Pratchett| | The Vampire Lestat | Anne Rice | | Slaughterhouse 5 | Kurt Vonnegut | | NULL | Douglas Adams | +--------------------+----------------+ |
[ Team LiB ] |