DNS and BIND 4th Edition-DNS and BIND 4th Edition

I l@ve RuBoard

6.1 The Resolver

We introduced resolvers way back in Chapter 2, but we didn't say much more about them. The resolver, you'll remember, is the client half of the Domain Name System. It's responsible for translating a program's request for host information into a query to a name server and for translating the response into an answer for the program.

We haven't done any resolver configuration yet, because the occasion for it hasn't arisen. When we set up our name servers in Chapter 4, the resolver's default behavior worked just fine for our purposes. But if we'd needed the resolver to do more than what it does by default or to behave differently from the default, we would have had to configure the resolver.

There's one thing we should mention up front: what we'll be describing in the next few sections is the behavior of the vanilla BIND 8.2.3 resolver in the absence of other naming services. Not all resolvers behave quite this way; some vendors still ship resolvers based on earlier versions of the DNS code, and some have implemented special resolver functionality that lets you modify the resolver algorithm. Whenever we think it's important, we'll point out differences between the behavior of the 8.2.3 BIND resolver and that of earlier resolvers, particularly the 4.8.3 and 4.9 resolvers, which many vendors were shipping when we last updated this book. We'll cover various vendors' extensions later in this chapter.

So what exactly does the resolver allow you to configure? Most resolvers let you configure at least three aspects of the resolver's behavior: the local domain name, the search list, and the name server(s) that the resolver queries. Many Unix vendors also allow you to configure other resolver behavior through nonstandard extensions to DNS. Sometimes these extensions are necessary to cope with other software, such as Sun's Network Information Service (NIS); sometimes they're simply value added by the vendor.^[1]

^[1] NIS used to be called " Yellow Pages" or "YP," but its name was changed to NIS because the British phone company had a copyright on the name Yellow Pages.

Almost all resolver configuration is done in the file /etc/resolv.conf (this might be /usr/etc/resolv.conf or something similar on your host—check the resolver manual page, usually in section 4 or 5, to make sure). There are five main directives you can use in resolv.conf: the domain directive, the search directive, the nameserver directive, the sortlist directive, and the options directive. These directives control the behavior of the resolver. There are other, vendor-specific directives available on some versions of Unix—we'll discuss them at the end of this chapter.

6.1.1 The Local Domain Name

The local domain name is the domain name in which the resolver resides. In most situations, it's the domain name of the zone in which you'd find the host running the resolver. For example, the resolver on the host terminator.movie.eduwould probably use movie.edu as its local domain name.

The resolver uses the local domain name to interpret domain names that aren't fully qualified. For example, when you add an entry like:

relay bernie

to your .rhosts file, the name relay is assumed to be in your local domain. This makes a lot more sense than allowing access to a user called bernie on every host on the Internet whose domain name starts with relay. Other authorization files like hosts.equiv and hosts.lpd work the same way.

Normally, the local domain name is determined from the host's hostname ; the local domain name is everything after the first " ." in the name. If the name doesn't contain a ".", the local domain is assumed to be the root domain. So the hostname asylum.sf.ca.us implies a local domain name of sf.ca.us, while the hostname dogbert implies a root local domain—which probably isn't correct, given that there are very few hosts with single-label domain names.^[2]

^[2] There are actually some single-label domain names that point to addresses, like cc.

You can also set the local domain name with the domain directive in resolv.conf. If you specify the domain directive, it overrides deriving the local domain name from the hostname.

The domain directive has a very simple syntax, but you've got to get it right since the resolver doesn't report errors. The keyword domain starts the line in column one, followed by whitespace (one or more blanks or tabs), then the name of the local domain. The local domain name should be written without a trailing dot, like this:

domain colospgs.co.us

In older versions of the BIND resolver (those before BIND 4.8.3), trailing spaces are not allowed on the line and will cause your local domain to be set to a name ending with one or more spaces, which is almost certainly not what you want. And there's yet another way to set the local domain name—via the LOCALDOMAIN environment variable. LOCALDOMAIN is handy because you can set it on a per-user basis. For example, you might have a big, massively parallel box in your corporate computing center that employees from all over the world access. Each employee may do most of her work in a different company subdomain. With LOCALDOMAIN, each employee can set her local domain name appropriately in her shell startup file.

Which method should you use—hostname, the domain directive, or LOCALDOMAIN? We prefer using hostname primarily because that's the way Berkeley does it and it seems "cleaner" in that it requires less explicit configuration. Also, some Berkeley software, particularly software that uses the ruserok( ) library call to authenticate users, allows short host names in files like hosts.equiv only if hostname is set to the full domain name.

If you run software that can't tolerate long hostnames, though, you can use the domain directive. The hostname command will continue to return a short name, and the resolver will fill in the domain from resolv.conf. You may even find occasion to use LOCALDOMAIN on a host with lots of users.

6.1.2 The Search List

The local domain name, whether derived from hostname or resolv.conf, also determines the default search list. The search list was designed to make users' lives a little easier by saving them some typing. The idea is to search one or more domains for names typed at the command line that might be incomplete—that is, that might not be fully qualified domain names.

Most Unix networking commands that take a domain name as an argument, like telnet, ftp, rlogin, and rsh, apply the search list to those arguments.

Both the way the default search list is derived and the way it is applied changed from BIND 4.8.3 to BIND 4.9. If your resolver is an older make, you'll still see the 4.8.3 behavior, but if you've got a newer model, including BIND 8.2.3,^[3] you'll see the improvements in the 4.9 resolver.

^[3] Though the ISC added lots of new server functionality in BIND 8, the resolver is nearly identical to the BIND 4.9 resolver.

With any BIND resolver, a user can indicate that a domain name is fully qualified by adding a trailing dot to it.^[4] For example, the trailing dot in the command:

^[4] Note that we said that the resolver can handle a trailing dot. Some programs, particularly some Unix mail user agents, don't deal correctly with a trailing dot in email addresses. They choke even before they hand the domain name in the address to the resolver.

% telnet ftp.ora.com.

means "don't bother searching any other domains; this domain name is fully qualified." This is analogous to the leading slash in full pathnames in the Unix and MS-DOS filesystems. Pathnames without a leading slash are interpreted as relative to the current working directory while pathnames with a leading slash are absolute, anchored at the root.

6.1.2.1 The BIND 4.8.3 search list

With BIND 4.8.3 resolvers, the default search list includes the local domain name and the domain names of each of its parent domains with two or more labels. Therefore, on a host running a 4.8.3 resolver and configured with:

domain cv.hp.com

the default search list would contain first cv.hp.com, the local domain name; then hp.com, the local domain's parent; but not com, as it has only one label.^[5] The name is looked up as-is, after the resolver appends each element of the search list, and only if the name typed contains at least one dot. Thus, a user typing:

^[5] One reason older BIND resolvers didn't append just the top-level domain name is that there were—and still are—very few hosts at the second level of the Internet's name space, so tacking on just com or edu to foo is unlikely to result in the domain name of a real host. Also, looking up the address of a foo.com or foo.edu might well require sending a query to a root name server, which taxes the roots and can be time-consuming.

% telnet pronto.cv.hp.com

causes lookups of pronto.cv.hp.com.cv.hp.com and pronto.cv.hp.com.hp.com before the resolver looks up pronto.cv.hp.com by itself. A user typing:

% telnet asap

on the same host causes the resolver to look up asap.cv.hp.com and asap.hp.com, but not just asap, since the name typed ("asap") contains no dots.

Note that application of the search list stops as soon as a prospective domain name turns up the data being looked up. In the asap example, the search list would never get around to appending hp.com if asap.cv.hp.com resolved to an address.

6.1.2.2 The BIND 4.9 and later search list

With BIND 4.9 and later resolvers, the default search list includes just the local domain name. So, if you configure a host with:

domain cv.hp.com

the default search list would contain just cv.hp.com. Also, in a change from earlier resolvers, the search list is usually applied after the name is tried as-is. As long as the argument you type has at least one dot in it, it's looked up exactly as you typed it before any element of the search list is appended. If that lookup fails, the search list is applied. Even if the argument has no dots in it (that is, it's a single label name), it's tried as-is after the resolver appends the elements of the search list.

Why is it better to try the argument literatim first? From experience, the designers of DNS found that, more often than not, if a user bothered to type in a name with even a single dot in it, he was probably typing in a fully qualified domain name without the trailing dot. With older search list behavior, the resolver sent several fruitless queries before ever trying the name as typed.

Therefore, with a 4.9 or newer resolver, a user typing:

% telnet pronto.cv.hp.com

causes pronto.cv.hp.com to be looked up first (there are three dots in the argument). If that query fails, the resolver tries pronto.cv.hp.com.cv.hp.com. A user who types:

% telnet asap

on the same host causes the resolver to look up asap.cv.hp.com first, since the name doesn't contain a dot, and then just asap.

6.1.3 The search Directive

What if you don't like the default search list you get when you set your local domain name? In BIND 4.8.3 and all newer resolvers, you can set the search list explicitly, domain name by domain name, in the order you want the domains searched. You do this with the search directive.

The syntax of the search directive is very similar to that of the domain directive, except that it can take multiple domain names as arguments. The keyword search starts the line in column one, followed by a space or a tab, followed in turn by one to six domain names in the order you want them searched.^[6] The first domain name in the list is interpreted as the local domain name, so the search and domain directives are mutually exclusive. If you use both in resolv.conf, the one that appears last will override the other.

^[6] BIND 9 resolvers actually support eight elements in the search list.

For example, the directive:

search corp.hp.com paloalto.hp.com hp.com

instructs the resolver to search the corp.hp.com domain first, then paloalto.hp.com, and then the parent of both domains, hp.com.

This directive might be useful on a host whose users access hosts in both corp.hp.com and paloalto.hp.com frequently. On the other hand, on a BIND 4.8.3 resolver, the directive:

search corp.hp.com

causes the resolver to skip searching the local domain's parent domain when the search list is applied. (On a 4.9 or later resolver, the parent domain's name usually isn't in the search list, so this is no different from the default behavior.) This might be useful if the host's users only access hosts in the local domain, or if connectivity to the parent name servers isn't good (since it minimizes unnecessary queries to the parent name servers).

If you use the domain directive and update your resolver to BIND Version 4.9 or later, users who relied on your local domain's parent being in the search list may believe the resolver has suddenly broken. You can restore the old behavior by using the search directive to configure your resolver to use the same search list that it would have built before. For example, under BIND 4.9, BIND 8, or BIND 9, you can replace domain nsr.hp.com with search nsr.hp.com hp.com and get the same functionality.

6.1.4 The nameserver Directive

Back in Chapter 4, we talked about two types of name servers: primary master name servers and slave name servers. But what if you don't want to run a name server on a host, yet still want to use DNS? Or, for that matter, what if you can't run a name server on a host (because the operating system doesn't support it, for example)? Surely you don't have to run a name server on every host, right?

No, of course you don't. By default, the resolver looks for a name server running on the local host—which is why we could use nslookup on terminator.movie.edu and wormhole.movie.edu right after we configured their name servers. You can, however, instruct the resolver to look to another host for name service. This configuration is called a DNS client in the BIND Operations Guide.

The nameserver directive (yep, all one word) tells the resolver the IP address of a name server to query. For example, the line:

nameserver 15.32.17.2

instructs the resolver to send queries to the name server running at the IP address 15.32.17.2 instead of to the local host. This means that on hosts not running name servers, you can use the nameserver directive to point them at a remote name server. Typically, you configure the resolvers on your hosts to query your own name servers.

However, since name servers before BIND 4.9 don't have any notion of access control and many administrators of newer servers don't restrict queries, you can configure your resolver to query almost anyone's name server. Of course, configuring your host to use someone else's name server without first asking permission is presumptuous, if not downright rude, and using one of your own usually gives you better performance, so we'll consider this only an emergency option.

You can also configure the resolver to query the host's local name server by using either the local host's IP address or the zero address. The zero address, 0.0.0.0, is interpreted by most TCP/IP implementations to mean "this host." The host's real IP address, of course, also means "this host." On hosts that don't understand the zero address, you can use the loopback address, 127.0.0.1.

Now what if the name server your resolver queries is down? Isn't there any way to specify a backup? Do you just fall back to using the host table?

The resolver also allows you to specify up to three (count 'em, three) name servers using multiple nameserver directives. The resolver queries those name servers, in the order listed, until it receives an answer or times out. For example, the lines:

nameserver 15.32.17.2
nameserver 15.32.17.4

tell the resolver to first query the name server at 15.32.17.2, and if it doesn't respond, to query the name server at 15.32.17.4. Be aware that the number of name servers you configure dictates other aspects of the resolver's behavior, too.

If you use multiple nameserver directives, don't use the loopback address! There's a bug in some Berkeley-derived TCP/IP implementations that can cause problems with BIND if the local name server is down. The resolver's connected datagram socket won't rebind to a new local address if the local name server isn't running, and consequently the resolver sends query packets to the fallback remote name servers with a source address of 127.0.0.1. When the remote name servers try to reply, they end up sending the reply packets to themselves.

6.1.4.1 One name server configured

If there's only one name server configured,^[7] the resolver queries that name server with a timeout of five seconds. The timeout is the length of time the resolver will wait for a response from the name server before sending another query. If the resolver encounters an error that indicates the name server is really down or unreachable, or if it times out, it doubles the timeout and queries the name server again. The errors that could cause this include:

^[7] When we say "one name server configured," that means either one nameserver directive in resolv.conf or no nameserver directive with a name server running locally.

Receipt of an ICMP port unreachable message, which means that no name server is listening on the name server port
Receipt of an ICMP host unreachable or network unreachable message, which means that queries can't be sent to the destination IP address

If the domain name or data doesn't exist, the resolver doesn't retry the query. Theoretically, at least, each name server should have an equivalent "view" of the name space; there's no reason to believe one and not another. So if one name server tells you that a given domain name doesn't exist or that the type of data you're looking for doesn't exist for the domain name you specified, any other name server should give you the same answer.^[8] If the resolver receives a network error each time it sends a query (for a total of four errors^[9]), it falls back to using the host table. Note that these are errors, not timeouts. If it times out on even one query, the resolver returns a null answer and does not fall back to /etc/hosts.

^[8] The built-in latency of DNS makes this a small fib—a primary master name server can have authority for a zone and have different data from a slave that also has authority for the zone. The primary master may have just loaded new zone data from disk, while the slave may not have had time to transfer the new zone data from its master. Both name servers return authoritative answers for the zone, but the primary master may know about a brand-new host that the slave doesn't yet know about.

^[9] Two for BIND 8.2.1 and newer resolvers.

6.1.4.2 More than one name server configured

With more than one name server configured, the behavior is a little different. Here's what happens: the resolver starts by querying the first name server in the list, with a timeout of five seconds, just as in the single name server case. If the resolver times out or receives a network error, it falls back to the next name server, waiting the same five seconds for that name server. Unfortunately, the resolver won't receive many of the possible errors; the socket the resolver uses is "unconnected" since it must be able to receive responses from any of the name servers it queries, and unconnected sockets don't receive ICMP error messages. If the resolver queries all the configured name servers to no avail, it updates the timeouts and cycles through them again.

The resolver timeout for the next round of queries is based on the number of name servers configured in resolv.conf. The timeout for the second round of queries is 10 seconds divided by the number of name servers configured, rounded down. Each successive round's timeout is double the previous timeout. After three sets of retransmissions (a total of four timeouts for every name server configured), the resolver gives up trying to query name servers.

In BIND 8.2.1, the ISC changed the resolver to send only one set of retries, or a total of two queries to each name server in resolv.conf. This was intended to reduce the amount of time a user would have to wait for the resolver to return if none of the name servers was responding.

For you mathophobes, Table 6-1 shows what the timeouts look like when you have one, two, or three name servers configured.

Table 6-1. Resolver Timeouts in BIND 4.9 to 8.2

Name Servers Configured

Retry

1

2

3

0

5s

(2x) 5s

(3x) 5s

1

10s

(2x) 5s

(3x) 3s

2

20s

(2x) 10s

(3x) 6s

3

40s

(2x) 20s

(3x) 13s

Total

75s

80s

81s

For BIND 8.2 and later resolvers, Table 6-2 shows the default timeout behavior.

Table 6-2. Resolver Timeouts in BIND 8.2.1 and Later

Name Servers Configured

Retry

1

2

3

0

5s

(2x) 5s

(3x) 5s

1

10s

(2x) 5s

(3x) 3s

Total

15s

20s

24s

So if you configure three servers, the resolver queries the first server with a timeout period of five seconds. If that query times out, the resolver queries the second server with the same timeout, and similarly for the third. If the resolver cycles through all three servers, it doubles the timeout period and divides by three (to three seconds, 10/3 rounded down) and queries the first server again.

Do these times seem awfully long? Remember, this describes a worst-case scenario. With properly functioning name servers running on tolerably fast hosts, your resolvers should get their answers back in well under a second. Only if all the configured name servers are really busy or they or your network is down will the resolver ever make it all the way through the retransmission cycle and give up.

What does the resolver do after it gives up? It times out and returns an error. Typically this results in an error message like:

% telnet tootsie
tootsie: Host name lookup failure

Of course, it may take 75 or so seconds of waiting to see this message, so be patient.

6.1.5 The sortlist Directive

The sortlist directive is a mechanism in BIND 4.9 and later resolvers that lets you specify subnets and networks for the resolver to prefer if it receives multiple addresses as the result of a query. In some cases, you'll want your host to use a particular network to get to certain destinations. For example, say your workstation and your NFS server have two network interfaces each: one on an Ethernet, subnet 128.32.1/24; and one on an FDDI ring, subnet 128.32.42/24. If you leave your workstation's resolver to its own devices, it's anybody's guess which of the NFS server's IP addresses you'll use when you mount a filesystem from the server—presumably, the first one in a reply packet from the name server. To make sure you try the interface on the FDDI ring first, you can add a sortlist directive to resolv.conf that sorts the address on 128.32.42/24 to the preferred position in the structure passed back to programs:

sortlist 128.32.42.0/255.255.255.0

The argument after the slash is the subnet mask for the subnet in question. To prefer an entire network, you can omit the slash and the subnet mask:

sortlist 128.32.0.0

The resolver then assumes you mean the entire network 128.32/16. (The resolver derives the default unsubnetted net mask for the network from the first two bits of the IP address.)

And, of course, you can specify several (up to 10) subnets and networks to prefer over others:

sortlist 128.32.42.0/255.255.255.0 15.0.0.0

The resolver sorts any addresses in a reply that match these arguments into the order in which they appear in the directive, and appends addresses that don't match to the end.

6.1.6 The options Directive

The options directive was introduced with BIND 4.9 and lets you tweak several internal resolver settings. The first is the debug flag, RES_DEBUG. The directive:

options debug

sets RES_DEBUG, producing lots of exciting debugging information on standard output, assuming your resolver was configured with DEBUG defined. (Actually, that may not be a good assumption, since most vendors compile their stock resolvers without DEBUG defined.) This is very useful if you're attempting to diagnose a problem with your resolver or with name service in general, but very annoying otherwise.

The second setting you can modify is ndots, which sets the minimum number of dots a domain name argument must have for the resolver to look it up beforeapplying the search list. By default, one or more dots will do; this is equivalent to ndots:1. The resolver first tries the domain name as typed as long as the name has any dots in it. You can raise the threshold if you believe your users are more likely to type partial domain names that will need the search list applied. For example, if your local domain name is mit.edu and your users are accustomed to typing:

% ftp prep.ai

and having mit.edu automatically appended to produce prep.ai.mit.edu, you may want to raise ndots to two so that your users won't unwittingly cause lookups to the root name servers for names in the top-level ai domain. You could do this with:

options ndots:2

BIND 8.2 introduced four new resolver options: attempts, timeout, rotate, and no-check-names. attempts allows you to specify how many queries the resolver should send to each name server in resolv.conf before giving up. If you think the new default value, two, is too low for your name servers, you can boost it back to four, the default value before BIND 8.2.1, with:

options attempts:4

The maximum value is five.

timeout allows you to specify the initial timeout for a query to a name server in resolv.conf. The default value is five seconds. If you'd like your resolver to retransmit more quickly, you could lower the timeout to two seconds with:

options timeout:2

The maximum value is 30 seconds. For the second and successive rounds of queries, the resolver still doubles the initial timeout and divides by the number of name servers in resolv.conf.

rotate lets your resolver make use of all the name servers in resolv.conf, not just the first one. As long as your resolver's first name server is healthy, it'll service all of your resolver's queries. Unless that name server gets very busy or goes down, your resolver will never query the second or third name servers in resolv.conf. If you'd like to spread the load around, you can set:

options rotate

to have each instance of the resolver rotate the order in which it uses the name servers in resolv.conf. In other words, an instance of the resolver still queries the first name server in resolv.conf first, but for the next domain name it looks up, it queries the second name server first, and so on.

Note that many programs can't take advantage of this since most programs initialize the resolver, look up a name, then exit. Rotation has no effect on repeated ping commands, for example, because each ping process initializes the resolver, queries the first name server in resolv.conf, and then exits before using the resolver again. Each successive invocation of ping has no idea which name server the previous one used—or even that ping was run earlier. But long-lived processes that send lots of queries, such as a sendmail daemon, can take advantage of rotation.

Rotation can also make debugging trickier. If you use it, you'll never be sure which name server in resolv.conf your sendmail daemon queried when it received that funky response.

no-check-names, finally, allows you to turn off the resolver's name checking, which is on by default.^[10] These routines examine domain names in responses to make sure they adhere to Internet host naming standards, which allow only alphanumerics and dashes in host names. You'll need to set this if you want your users to be able to resolve domain names with underscores or other illegal characters in them.

^[10] In all resolvers that support it, from BIND 4.9.4 on.

If you want to specify multiple options, you can combine them on a single line in resolv.conf, like so:

options attempts:4 timeout:2 ndots:2

6.1.7 Comments

Also introduced with BIND 4.9 resolvers (and it's about time, if you ask us), is the ability to put comments in the resolv.conf file. Lines that begin with a pound sign or semicolon in the first column are interpreted as comments and ignored by the resolver.

6.1.8 A Note on the 4.9 Resolver Directives

If you're just moving to a BIND 4.9 resolver, be careful when using the new directives. You may still have older resolver code statically linked into programs on your host. Often, this isn't a problem because Unix resolvers ignore directives they don't understand. But don't count on all programs on your host obeying the new directives.

If you're running on a host with programs that include really old resolver code (before 4.8.3) and you still want to use the search directive with programs that can take advantage of it, here's a trick: use both a domain directive and a search directive in resolv.conf, with the domain directive first. Old resolvers will read the domain directive and ignore the search directive because they won't recognize it. New resolvers will read the domain directive, but the search directive will override its behavior.

I l@ve RuBoard