DekGenius.com
I l@ve RuBoard Previous Section Next Section

14.3 Potential Problem List

Now that we've given you a nice set of tools, let's talk about how you can use them to diagnose real problems. There are some problems that are easy to recognize and correct. We should cover these as a matter of course—they're some of the most common problems because they're caused by some of the most common mistakes. Here are the contestants, in no particular order. We call 'em our "Unlucky Thirteen."

14.3.1 Forgot to Increment Serial Number

The main symptom of this problem is that slave name servers don't pick up any changes you made to the zone's data file on the primary master. The slaves think the zone data hasn't changed since the serial number is still the same.

How do you check whether or not you remembered to increment the serial number? Unfortunately, that's not so easy. If you don't remember what the old serial number was and your serial number gives you no indication of when it was updated, there's no direct way to tell whether it's changed.[3] When you reload the primary, it loads the updated zone file regardless of whether you've changed the serial number. It checks the file's timestamp, sees that it's been modified since it last loaded the data, and reads the file. About the best you can do is to use nslookup to compare the data returned by the primary and by a slave. If they return different data, you probably forgot to increment the serial number. If you can remember a recent change you made, you can look for that data. If you can't remember a recent change, you could try transferring the zone from a primary and from a slave, sorting the results, and using diff to compare them.

[3] On the other hand, if you encode the date into the serial number, as many people do (e.g., 2001010500 is the first rev of data on January 5, 2001), you may be able to tell at a glance whether you updated the serial number when you made the change.

The good news is that, although determining whether the zone was transferred is tricky, making sure the zone is transferred is simple. Just increment the serial number on the primary master's copy of the zone data file and reload the zone on the primary. The slaves should pick up the new data within their refresh interval, or sooner if they use NOTIFY. If you want to make sure the slaves transfer the new data, you can execute named-xfer by hand (on the slaves, naturally):

# /usr/sbin/named-xfer -z movie.edu -f db.movie -s 0 terminator.movie.edu
# echo $?

If named-xfer returns 1 or 4, the zone was transferred successfully. Other return values indicate that no zone was transferred, either because of an error or because the slave thought the zone was up to date. (See Section 14.2.1 earlier in this chapter for more details.)

There's another variation of the "forgot to increment the serial number" problem. We see it in environments where administrators use tools like h2n to create zone data files from the host table. With scripts like h2n, it's temptingly easy to delete old zone data files and create new ones from scratch. Some administrators do this occasionally because they mistakenly believe that data in the old zone data files can creep into the new ones. The problem with deleting the zone data files is that, without the old data file to read for the current serial number, h2n starts over at serial number 1. If your zone's serial number on the primary master rolls all the way back to 1 from 598 or what-have-you, the slaves (Versions 4.8.3 and earlier) won't complain; they just figure they're all caught up and don't need zone transfers. A 4.9 or later slave server, however, is ever watchful and will emit a syslog error message warning you that something might be wrong:

Jun  7 20:14:26 wormhole named[29618]: Zone "movie.edu"
                (class 1) SOA serial# (1) rcvd from [192.249.249.3]
                is < ours (112)

So if the serial number on the primary master looks suspiciously low, check the serial number on the slaves, too, and compare them:

% nslookup
Default Server:  terminator.movie.edu
Address:  192.249.249.3

> set q=soa
> movie.edu.
Server:  terminator.movie.edu
Address:  192.249.249.3

movie.edu
        origin = terminator.movie.edu
        mail addr = al.robocop.movie.edu
        serial = 1
        refresh = 10800 (3 hours)
        retry   = 3600 (1 hour)
        expire  = 604800 (7 days)
        minimum ttl = 86400 (1 day)
> server wormhole.movie.edu.
Default Server:  wormhole.movie.edu
Addresses:  192.249.249.1, 192.253.253.1

> movie.edu.
Server:  wormhole.movie.edu
Addresses:  192.249.249.1, 192.253.253.1

movie.edu
        origin = terminator.movie.edu
        mail addr = al.robocop.movie.edu
        serial = 112
        refresh = 10800 (3 hours)
        retry   = 3600 (1 hour)
        expire  = 604800 (7 days)
        minimum ttl = 86400 (1 day)

wormhole.movie.edu, as a movie.edu slave, should never have a larger serial number than the primary master, so clearly something's amiss.

This problem is really easy to spot, by the way, with the tool we'll write in Chapter 15, coming up next.

14.3.2 Forgot to Reload Primary Master Name Server

Occasionally, you may forget to reload your primary master name server after making a change to the configuration file or to a zone data file. The name server won't know to load the new configuration or the new zone data—it doesn't automatically check the timestamp of the file and notice that it changed. Consequently, any changes you've made won't be reflected in the name server's data: new zones won't be loaded, and new records won't percolate out to the slaves.

To check when you last reloaded the name server, scan the syslog output for the last entry like this for a BIND 9 name server:

Mar  8 17:22:08 terminator named[22317]: loading configuration from '/etc/named.conf'

Or like this for a BIND 4.9 or BIND 8 name server:

Mar  8 17:22:08 terminator named[22317]: reloading nameserver

These messages tell you the last time you sent a reload command to the name server. If you killed and then restarted the name server, you'll see an entry like this on a BIND 9 name server:

Mar  8 17:22:08 terminator named[22317]: starting BIND 9.1.0

On a BIND 8 name server, it'd look like:

Mar  8 17:22:08 terminator named[22317]: restarted

or, on a 4.9 name server:

Mar  8 17:22:08 terminator named[22317]: starting

If the time of the restart or reload doesn't correlate with the time you made the last change, reload the name server again. And check that you incremented the serial numbers in zone data files you changed, too. If you're not sure when you edited the zone data file, you can check the file modification time by doing a long listing of the file with ls -l.

14.3.3 Slave Name Server Can't Load Zone Data

If a slave name server can't get the current serial number for a zone from its master name server, it logs a message via syslog. On a BIND 9 name server, that looks like:

Sep 25 22:02:38 wormhole named[21246]: refresh_callback: zone 
movie.edu/IN: failure for 192.249.249.3#53: timed out

On BIND 8, look for:

Jan  6 11:55:25 wormhole named[544]: Err/TO getting serial# for "movie.edu"

On BIND 4, it looks like this:

Mar  3 8:19:34 wormhole named[22261]: zoneref: Masters for secondary
       zone movie.edu unreachable

If you let this problem fester, the slave will expire the zone. A BIND 9 name server will report:

Sep 25 23:20:20 wormhole named[21246]: zone_expire: zone 
movie.edu/IN: expired

A BIND 4.9 or 8 name server will log:

Mar  8 17:12:43 wormhole named[22261]: secondary zone
       "movie.edu" expired

Once the zone has expired, you'll start getting SERVFAIL errors when you query the name server for data in the zone:

% nslookup robocop wormhole.movie.edu.
Server:  wormhole.movie.edu
Addresses:  192.249.249.1, 192.253.253.1

*** wormhole.movie.edu can't find robocop.movie.edu: Server failed

There are three leading causes of this problem: a loss in connectivity to the master server due to network failure, an incorrect IP address for the master server in the configuration file, or a syntax error in the zone data file on the master server. First check the configuration file's entry for the zone and see what IP address the slave is attempting to load from:

zone "movie.edu" {
                type slave;
                masters { 192.249.249.3; };
                file "bak.movie.edu";
};

On a BIND 4 server, the directive looks like this:

secondary        movie.edu        192.249.249.3        bak.movie.edu

Make sure that's really the IP address of the master name server. If it is, check connectivity to that IP address:

% ping 192.249.249.3 -n 10
PING 192.249.249.3: 64 byte packets

----192.249.249.3 PING Statistics----
10 packets transmitted, 0 packets received, 100% packet loss

If the master server isn't reachable, make sure that the host the name server runs on is really running (e.g., is powered on, etc.) or look for a network problem. If the host is reachable, make sure named is running on the host and that you can manually transfer the zone:

# /usr/sbin/named-xfer -z movie.edu -f /tmp/db.movie.edu -s 0 192.249.249.3
# echo $?
2

A return code of 2 means that an error occurred. Check to see if there is a syslog message. In this case, there was a message:

Jan  6 14:56:07 zardoz named-xfer[695]: record too short from [192.249.249.3], zone movie.edu

At first glance, this error looks like a truncation problem. The real problem is easier to see if you use nslookup :

% nslookup - terminator.movie.edu 
Default Server:  terminator.movie.edu
Address:  192.249.249.3

 > ls movie.edu                    —This attempts a zone transfer
[terminator.movie.edu]
*** Can't list domain movie.edu: Query refused

What's happening here is that named is refusing to allow you to transfer its zone data. The remote server has secured its zone data with an allow-transfer substatement, the secure_zone resource record, or the xfrnets boot file directive.

If the master server is responding as not authoritative for the zone, you'll see a message like this from your BIND 9 name server:

Sep 26 13:29:23 zardoz named[21890]: refresh_callback: zone 
movie.edu/IN: 
non-authoritative answer from 192.249.249.3#53

Or on BIND 8, like this:

Jan  6 11:58:36 zardoz named[544]: Err/TO getting serial# for "movie.edu"
Jan  6 11:58:36 zardoz named-xfer[793]: [192.249.249.3] not authoritative for
     movie.edu, SOA query got rcode 0, aa 0, ancount 0, aucount 0

If this is the correct master server, the server should be authoritative for the zone. This probably indicates that the master had a problem loading the zone, usually because of a syntax error in the zone data file. Contact the administrator of the master server and have her check her syslog output for indications of a syntax error (see problem 5, coming up).

14.3.4 Added Name to Zone Data File but Forgot to Add PTR Record

Because mappings of host names to IP addresses are disjointed from mappings of IP addresses to host names in DNS, it's easy to forget to add a PTR record for a new host. Adding the A record is intuitive, but many people who are used to host tables assume that adding an address record takes care of the reverse mapping, too. That's not true—you need to add a PTR record for the host to the appropriate reverse-mapping zone.

Forgetting to add the PTR record for a host's address usually causes that host to fail authentication checks. For example, users on the host won't be able to rlogin to other hosts without specifying a password, and rsh or rcp to other hosts simply won't work. The servers these commands talk to must be able to map a client's IP address to a domain name to check .rhosts and hosts.equiv. These users' connections will cause entries like this to be syslogged:

Aug 15 17:32:36 terminator inetd[23194]: login/tcp:
       Connection from unknown (192.249.249.23)

Also, many large FTP archives, including ftp.uu.net, refuse anonymous FTP access to hosts whose IP addresses don't map back to domain names. ftp.uu.net's FTP server emits a message that reads, in part:

530- Sorry, we're unable to map your IP address 140.186.66.1 to a hostname
530- in the DNS.  This is probably because your nameserver does not have a
530- PTR record for your address in its tables, or because your reverse
530- nameservers are not registered.  We refuse service to hosts whose
530- names we cannot resolve.

That makes the reason you can't use anonymous FTP pretty evident. Other FTP sites, however, don't bother printing informative messages; they simply deny service.

nslookup is handy for checking whether you've forgotten the PTR record or not:

% nslookup 
Default Server:  terminator.movie.edu
Address:  192.249.249.3

 > beetlejuice        —Check for a name-to-address mapping
Server:  terminator.movie.edu
Address:  192.249.249.3

Name:    beetlejuice.movie.edu
Address:  192.249.249.23

 > 192.249.249.23    —Now check for a corresponding address-to-name mapping
Server:  terminator.movie.edu
Address:  192.249.249.3

*** terminator.movie.edu can't find 192.249.249.23: Non-existent domain

On the primary master for 249.249.192.in-addr.arpa, a quick check of the db.192.249.249 file will tell you if the PTR record hasn't been added to the zone data file yet or if the name server hasn't been reloaded. If the name server having trouble is a slave for the zone, check that the serial number was incremented on the primary master and that the slave has had enough time to load the zone.

14.3.5 Syntax Error in Configuration File or Zone Data File

Syntax errors in a name server's configuration file and in zone data files are also relatively common (more or less, depending on the experience of the administrator). Generally, an error in the config file will cause the name server to fail to load one or more zones. Some typos in the options statement will cause the name server to fail to start at all and to log an error like this via syslog (BIND 9):

Sep 26 13:39:30 terminator named[21924]: change directory to '/var/name' failed: file not found
Sep 26 13:39:30 terminator named[21924]: options configuration failed: file not found
Sep 26 13:39:30 terminator named[21924]: loading configuration: failure
Sep 26 13:39:30 terminator named[21924]: exiting (due to fatal error)

A BIND 8 name server logs:

Jan  6 11:59:29 terminator named[544]: can't change directory to /var/name: No
     such file or directory

Note that you won't see an error message when you try to start named on the command line or at boot time, but named won't stay running for long.

If the syntax error is in a less important line in the config file—say, in a zone statement—only that zone will be affected. Usually, the name server won't be able to load the zone at all (say, you misspell "masters" or the name of the zone data file, or you forget to put quotes around the filename or domain name). This would produce syslog output from BIND 9 like this:

Sep 26 13:43:03 terminator named[21938]: /etc/named.conf:80: 
parse error near 'masters'
Sep 26 13:43:03 terminator named[21938]: loading configuration: failure
Sep 26 13:43:03 terminator named[21938]: exiting (due to fatal error)

Or from BIND 8:

Jan  6 12:01:36 terminator named[841]: /etc/named.conf:10: syntax error near
     'movie.edu'

If a zone data file contains a syntax error yet the name server succeeds in loading the zone, it will either answer as nonauthoritative for all data in the zone or return a SERVFAIL error for lookups in the zone:

% nslookup carrie
Server:  terminator.movie.edu
Address:  192.249.249.3

Non-authoritative answer:
Name:    carrie.movie.edu
Address:  192.253.253.4

Here's the BIND 9 syslog message produced by the syntax error that caused this problem:

Sep 26 13:45:40 terminator named[21951]: error: dns_rdata_fromtext: db.movie.edu:11: 
near 'postmanrings2x': unexpected token
Sep 26 13:45:40 terminator named[21951]: error: dns_zone_load: zone movie.edu/IN: 
database db.movie.edu: dns_db_load failed: unexpected token
Sep 26 13:45:40 terminator named[21951]: critical: loading zones: unexpected token
Sep 26 13:45:40 terminator named[21951]: critical: exiting (due to fatal error)

Here's BIND 8's error:

Jan  6 15:07:46 terminator named[693]: db.movie.edu:11: Priority error
     (postmanrings2x.movie.edu.)
Jan  6 15:07:46 terminator named[693]: master zone "movie.edu" (IN) rejected due
     to errors (serial 1997010600)

If you looked in the zone data file for the problem, you'd find this record:

postmanrings2x     IN     MX     postmanrings2x.movie.edu.

The MX record is missing the preference field, which causes the error.

Note that unless you correlate the lack of authority (when you expect the name server to be authoritative) with a problem or scan your syslog file assiduously, you might never notice the syntax error!

Starting with BIND 4.9.4, an "invalid" host name can be a syntax error:

Jan  6 12:04:10 terminator named[841]: owner name "ID_4.movie.edu" IN (primary)
     is invalid - rejecting
Jan  6 12:04:10 terminator named[841]: db.movie.edu:11: owner name error
Jan  6 12:04:10 terminator named[841]: db.movie.edu:11: Database error near (A)
Jan  6 12:04:10 terminator named[841]: master zone "movie.edu" (IN) rejected
     due to errors (serial 1997010600)

BIND 9, however, doesn't implement name checking as of 9.1.0. A future version of BIND 9 may.

14.3.6 Missing Dot at the End of a Domain Name in a Zone Data File

It's very easy to leave off trailing dots when editing a zone data file. Since the rules for when to use them change so often (don't use them in the configuration file, don'tuse them in resolv.conf, do use them in zone data files to override $ORIGIN . . . ), it's hard to keep them straight. These resource records:

zorba         IN     MX     10 zelig.movie.edu
movie.edu     IN     NS     terminator.movie.edu

really don't look that odd to the untrained eye, but they probably don't do what they're intended to. In the db.movie.edu file, they'd be equivalent to:

zorba.movie.edu.        IN    MX    10 zelig.movie.edu.movie.edu.
movie.edu.movie.edu.    IN    NS    terminator.movie.edu.movie.edu.

unless the origin were explicitly changed.

If you omit a trailing dot after a domain name in the resource record's data (as opposed to leaving off a trailing dot in the resource record's name), you usually end up with wacky NS or MX records:

% nslookup -type=mx zorba.movie.edu.
Server:  terminator.movie.edu
Address:  192.249.249.3

zorba.movie.edu      preference = 10, mail exchanger
                     = zelig.movie.edu.movie.edu
zorba.movie.edu      preference = 50, mail exchanger
                     = postmanrings2x.movie.edu.movie.edu

The cause of this should be fairly clear from the nslookup output. But if you forget the trailing dot on the domain name field in a record (as in the movie.edu NS record just listed), spotting your mistake might not be as easy. If you try to look up the record with nslookup, you won't find it under the domain name you thought you used. Dumping your name server's database may help you root it out:

$ORIGIN edu.movie.edu.
movie    IN    NS    terminator.movie.edu.movie.edu.

The $ORIGIN line looks odd enough to stand out.

14.3.7 Missing Root Hints Data

If, for some reason, you forget to install a root hints file on your name server or if you accidentally delete it, your name server will be unable to resolve names outside of its authoritative data. This behavior is easy to recognize using nslookup, but be careful to use full, dot-terminated domain names or else the search list may cause misleading failures:

% nslookup 
Default Server:  terminator.movie.edu
Address:  192.249.249.3

 > ftp.uu.net.      —A lookup of a name outside your name server's authoritative data
                                 —causes a SERVFAIL error...
Server:  terminator.movie.edu
Address:  192.249.249.3

*** terminator.movie.edu can't find ftp.uu.net.: Server failed

A lookup of a name in your name server's authoritative data returns a response:

> wormhole.movie.edu.
Server:  terminator.movie.edu
Address:  192.249.249.3

Name:    wormhole.movie.edu
Addresses:  192.249.249.1, 192.253.253.1

> ^D

To confirm your suspicion that the root hints data is missing, check the syslog output for an error like this:

Jan  6 15:10:22 terminator named[764]: No root nameservers for class IN

Class 1, you'll remember, is the IN, or Internet, class. This error indicates that because no root hints data was available, no root name servers were found.

You're unlikely to run into this problem with BIND 9, since it has built-in root hints.

14.3.8 Loss of Network Connectivity

Though the Internet is more reliable today than it was back in the wild and woolly days of the ARPAnet, network outages are still relatively common. Without "lifting the hood" and poking around in debugging output, these failures usually look like poor performance:

% nslookup nisc.sri.com.
Server:  terminator.movie.edu
Address:  192.249.249.3

*** Request to terminator.movie.edu timed out ***

If you turn on name server debugging, though, you may see that your name server, anyway, is healthy. It received the query from the resolver, sent the necessary queries, and waited patiently for a response. It just didn't get one. Here's what the debugging output might look like on a BIND 8 name server:

Debug turned ON, Level 1

Here, nslookup sends the first query to our local name server for the IP address of nisc.sri.com. Then the query is forwarded to another name server, and, when no answer is received, it is resent to a different name server:

datagram from [192.249.249.3].1051, fd 5, len 30
req: nlookup(nisc.sri.com) id 18470 type=1 class=1
req: missed 'nisc.sri.com' as 'com' (cname=0)
forw: forw -> [198.41.0.4].53 ds=7 nsid=58732 id=18470 0ms retry 4 sec
resend(addr=1 n=0) -> [128.9.0.107].53 ds=7 nsid=58732 id=18470 0ms

Now nslookup is getting impatient, and it queries our local name server again. Notice that it uses the same source port. The local name server ignores the duplicate query and tries forwarding the query two more times:

datagram from [192.249.249.3].1051, fd 5, len 30
req: nlookup(nisc.sri.com) id 18470 type=1 class=1
req: missed 'nisc.sri.com' as 'com' (cname=0)
resend(addr=2 n=0) -> [192.33.4.12].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=3 n=0) -> [128.8.10.90].53 ds=7 nsid=58732 id=18470 0ms

nslookup queries the local name server again, and the name server fires off more queries:

datagram from [192.249.249.3].1051, fd 5, len 30
req: nlookup(nisc.sri.com) id 18470 type=1 class=1
req: missed 'nisc.sri.com' as 'com' (cname=0)
resend(addr=4 n=0) -> [192.203.230.10].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=0 n=1) -> [198.41.0.4].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=1 n=1) -> [128.9.0.107].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=2 n=1) -> [192.33.4.12].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=3 n=1) -> [128.8.10.90].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=4 n=1) -> [192.203.230.10].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=0 n=2) -> [198.41.0.4].53 ds=7 nsid=58732 id=18470 0ms
Debug turned OFF

On a BIND 9 name server, there's considerably less detail at debug level 1. Still, you can see that the name server is trying repeatedly to look up nisc.sri.com:

Sep 26 14:33:27.486 client 192.249.249.3#1028: query: nisc.sri.com A
Sep 26 14:33:27.486 createfetch: nisc.sri.com. A
Sep 26 14:33:32.489 client 192.249.249.3#1028: query: nisc.sri.com A
Sep 26 14:33:32.490 createfetch: nisc.sri.com. A
Sep 26 14:33:42.500 client 192.249.249.3#1028: query: nisc.sri.com A
Sep 26 14:33:42.500 createfetch: nisc.sri.com. A
Sep 26 14:34:02.512 client 192.249.249.3#1028: query: nisc.sri.com A
Sep 26 14:34:02.512 createfetch: nisc.sri.com. A

At higher debug levels, you can actually see the timeouts, but BIND 9.1.0 still doesn't show the addresses of the remote name servers tried.

From the BIND 8 debugging output, you can extract a list of the IP addresses of the name servers that your name server tried to query, and then check your connectivity to them. Odds are, ping won't have much better luck than your name server did:

% ping 198.41.0.4 -n 10   —ping first name server queried
PING 198.41.0.4: 64 byte packets

----198.41.0.4 PING Statistics----
10 packets transmitted, 0 packets received, 100% packet loss
% ping 128.9.0.107 -n 10   —ping second name server queried
PING 128.9.0.107: 64 byte packets

----128.9.0.107 PING Statistics----
10 packets transmitted, 0 packets received, 100% packet loss

If it does, you should check that the remote name servers are really running. You might also check whether your Internet firewall is inadvertently blocking your name server's queries. If you've upgraded to BIND 8 or 9 recently, see A Gotcha with BIND 8 or 9 and Packet-Filtering Firewalls in Chapter 11, and see if it applies to you.

If ping can't get through either, all that's left to do is to locate the break in the network. Utilities like traceroute and ping 's record route option can be very helpful in determining whether the problem is on your network, the destination network, or somewhere in the middle.

Also, use your own common sense when tracking down the break. In this trace, for example, the remote name servers your name server tried to query are all root name servers. (You might have had their PTR records cached somewhere, so you could find out their domain names.) Now it's not very likely that each root's local network went down, nor that the Internet's backbone networks collapsed entirely. Occam's razor says that the simplest condition that could cause this behavior—namely, the loss of your network's link to the Internet—is the most likely cause.

14.3.9 Missing Subdomain Delegation

Even though registrars do their very best to process your requests as quickly as possible, it may take a day or two for your subdomain's delegation to appear in your parent zone's name servers. If your parent zone isn't one of the generic top-level domains, your mileage may vary. Some parents are quick and responsible, others are slow and inconsistent. Just like in real life, though, you're stuck with them.

Until your zone's delegation appears in your parent zone's name servers, your name servers will be able to look up data in the Internet's namespace, but no one out on the Internet (outside of your domain) will know how to look up data in your namespace.

That means that even though you can send mail outside of your domain, the recipients won't be able to reply to it. Furthermore, no one will be able to telnet to, ftp to, or even ping your hosts by domain name.

Remember that this applies equally to any in-addr.arpa zones you may run. Until their parent zones add delegation to your servers, name servers on the Internet won't be able to reverse map addresses on your networks.

To determine whether or not your zone's delegation has made it into your parent zone's name servers, query a parent name server for the NS records for your zone. If the parent name server has the data, any name server on the Internet can find it:

% nslookup 
Default Server:  terminator.movie.edu
Address:  192.249.249.3

 > server a.root-servers.net.   —Query a root name server
Default Server:  a.root-servers.net
Address:  198.41.0.4

> set norecurse               —Instruct the server to answer out of its own data
> set type=ns                 —and to look for NS records
> 249.249.192.in-addr.arpa.   —for 249.249.192.in-addr.arpa
Server:  a.root-servers.net
Address:  198.41.0.4

*** a.root-servers.net can't find 249.249.192.in-addr.arpa.: Non-existent domain

Here, the delegation clearly hasn't been added yet. You can either wait patiently or, if an unreasonable amount of time has passed since you requested delegation from your parent zone, contact your parent zone's administrator and ask what's up.

14.3.10 Incorrect Subdomain Delegation

Incorrect subdomain delegation is another familiar problem on the Internet. Keeping delegation up to date requires human intervention—informing your parent zone's administrator of changes to your set of authoritative name servers. Consequently, delegation information often becomes inaccurate as administrators make changes without letting their parents know. Far too many administrators believe that setting up delegation is a one-shot deal: they let their parents know which name servers are authoritative once when they set up their zone and then they never talk to them again. They don't even call on Mother's Day.

An administrator may add a new name server, decommission another, and change the IP address of a third, all without telling the parent zone's administrator. Gradually, the number of name servers correctly delegated to by the parent zone dwindles. In the best case, this leads to long resolution times as querying name servers struggle to find an authoritative name server for the zone. If the delegation information becomes badly out of date and the last authoritative name server is brought down for maintenance, the information within and below the zone will be inaccessible.

If you suspect bad delegation from your parent zone to your zone, from your zone to one of your children, or from a remote zone to one of its children, you can check with nslookup :

% nslookup 
Default Server:  terminator.movie.edu
Address:  192.249.249.3

> server a.root-servers.net.       —Set server to the parent zone's name server that 
                                                             —you suspect has bad delegation
Default Server:  a.root-servers.net
Address:  198.41.0.4

> set type=ns                      —Look for NS records
> hp.com.                          —for the zone in question
Server:  a.root-servers.net
Address:  198.41.0.4

Non-authoritative answer:
hp.com          nameserver = RELAY.HP.COM
hp.com          nameserver = HPLABS.HPL.HP.COM
hp.com          nameserver = NNSC.NSF.NET
hp.com          nameserver = HPSDLO.SDD.HP.COM

Authoritative answers can be found from:
hp.com          nameserver = RELAY.HP.COM
hp.com          nameserver = HPLABS.HPL.HP.COM
hp.com          nameserver = NNSC.NSF.NET
hp.com          nameserver = HPSDLO.SDD.HP.COM
RELAY.HP.COM    internet address = 15.255.152.2
HPLABS.HPL.HP.COM       internet address = 15.255.176.47
NNSC.NSF.NET    internet address = 128.89.1.178
HPSDLO.SDD.HP.COM       internet address = 15.255.160.64
HPSDLO.SDD.HP.COM       internet address = 15.26.112.11

Let's say you suspect that the delegation to hpsdlo.sdd.hp.com is incorrect. You now query hpsdlo.sdd.hp.com for data in the hp.com zone (e.g., the SOA record for hp.com) and check the answer:

> server hpsdlo.sdd.hp.com.
Default Server:  hpsdlo.sdd.hp.com
Addresses:  15.255.160.64, 15.26.112.11

> set norecurse
> set type=soa
> hp.com.
Server:  hpsdlo.sdd.hp.com
Addresses:  15.255.160.64, 15.26.112.11

Non-authoritative answer:
hp.com
        origin = relay.hp.com
        mail addr = hostmaster.hp.com
        serial = 1001462
        refresh = 21600 (6 hours)
        retry   = 3600 (1 hour)
        expire  = 604800 (7 days)
        minimum ttl = 86400 (1 day)

Authoritative answers can be found from:
hp.com          nameserver = RELAY.HP.COM
hp.com          nameserver = HPLABS.HPL.HP.COM
hp.com          nameserver = NNSC.NSF.NET
RELAY.HP.COM    internet address = 15.255.152.2
HPLABS.HPL.HP.COM       internet address = 15.255.176.47
NNSC.NSF.NET    internet address = 128.89.1.178

If hpsdlo.sdd.hp.com really were authoritative for hp.com, it would have responded with an authoritative answer. The administrator of the hp.com zone can tell you whether hpsdlo.sdd.hp.com should be an authoritative name server for hp.com, so that's who you should contact.

Another common symptom of this is a "lame server" error message:

Oct 1 04:43:38 terminator named[146]: Lame server on '40.234.23.210.in-addr.arpa' 
(in '210.in-addr.arpa'?): [198.41.0.5].53 'RS0.INTERNIC.NET': learnt(A=198.41.0.
21,NS=128.63.2.53)

Here's how to read that: your name server was referred by the name server at 128.63.2.53 to the name server at 198.41.0.5 for a name in the domain 210.in-addr.arpa, specifically 40.234.23.210.in-addr.arpa. The response from the name server at 198.41.0.5 indicated that it wasn't, in fact, authoritative for 210.in-addr.arpa, and therefore either the delegation that 128.63.2.53 gave you is wrong or the server at 198.41.0.5 is misconfigured.

14.3.11 Syntax Error in resolv.conf

Despite the resolv.conf file's simple syntax, people do occasionally make mistakes when editing it. And, unfortunately, lines with syntax errors in resolv.conf are silently ignored by the resolver. The result is usually that some part of your intended configuration doesn't take effect: either your local domain name or search list isn't set correctly, or the resolver won't query one of the name servers you configured it to query. Commands that rely on the search list won't work, your resolver won't query the right name server, or it won't query a name server at all.

The easiest way to check whether your resolv.conf file is having the intended effect is to run nslookup. nslookup will kindly report the local domain name and search list it derives from resolv.conf, plus the name server it's querying, when you type set all, as we showed you in Chapter 12:

% nslookup
Default Server:  terminator.movie.edu
Address:  192.249.249.3

> set all
Default Server:  terminator.movie.edu
Address:  192.249.249.3

Set options:
  nodebug         defname          search         recurse
  nod2            novc             noignoretc     port=53
  querytype=A     class=IN         timeout=5      retry=4
  root=ns.nic.ddn.mil.
  domain=movie.edu
  srchlist=movie.edu

>

Check that the output of set all is what you expect, given your resolv.conf file. For example, if you set search fx.movie.edu movie.edu in resolv.conf, you expect to see:

domain=fx.movie.edu
srchlist=fx.movie.edu/movie.edu

in the output. If you don't see what you're expecting, look carefully at resolv.conf. If there's nothing obvious, look for unprintable characters (with vi 's set list command, for example). Watch out for trailing spaces, especially; on older resolvers, a trailing space after the domain name will set the local domain name to include a space. No real top-level domain names actually end with spaces, of course, so all of your non-dot-terminated lookups will fail.

14.3.12 Local Domain Name Not Set

Failing to set your local domain name is another old standby gaffe. You can set it implicitly by setting your hostname to your host's fully qualified domain name or explicitly in resolv.conf. The characteristics of an unset local domain name are straightforward: folks who use single-label names (or abbreviated domain names) in commands get no joy:

% telnet br
br: No address associated with name
% telnet br.fx
br.fx: No address associated with name
% telnet br.fx.movie.edu
Trying...
Connected to bladerunner.fx.movie.edu.
Escape character is '^]'.

HP-UX bladerunner.fx.movie.edu A.08.07 A 9000/730 (ttys1)
login:

You can use nslookup to check this one, much as you do when you suspect a syntax error in resolv.conf:

% nslookup
Default Server:  terminator.movie.edu
Address:  192.249.249.3

> set all
Default Server:  terminator.movie.edu
Address:  192.249.249.3

Set options:
  nodebug         defname         search          recurse
  nod2            novc            noignoretc      port=53
  querytype=A     class=IN        timeout=5       retry=4
  root=ns.nic.ddn.mil.
  domain=
  srchlist=

Notice that neither the local domain name nor the search list is set. You can also track this down by enabling debugging on the name server. (This, of course, requires access to the name server, which may not be running on the host that the problem is affecting.) Here's how the debugging output from a BIND 9 name server might look after trying those telnet commands:

Sep 26 16:17:58.824 client 192.249.249.3#1032: query: br A
Sep 26 16:17:58.825 createfetch: br. A
Sep 26 16:18:09.996 client 192.249.249.3#1032: query: br.fx A
Sep 26 16:18:09.996 createfetch: br.fx. A
Sep 26 16:18:18.677 client 192.249.249.3#1032: query: br.fx.movie.edu A

On a BIND 8 name server, it would look something like this:

Debug turned ON, Level 1

datagram from [192.249.249.3].1057, fd 5, len 20
req: nlookup(br) id 27974 type=1 class=1
req: missed 'br' as '' (cname=0)
forw: forw -> [198.41.0.4].53 ds=7 nsid=61691 id=27974 0ms retry 4 sec

datagram from [198.41.0.4].53, fd 5, len 20
ncache: dname br, type 1, class 1
send_msg -> [192.249.249.3].1057 (UDP 5) id=27974

datagram from [192.249.249.3].1059, fd 5, len 23
req: nlookup(br.fx) id 27975 type=1 class=1
req: missed 'br.fx' as '' (cname=0)
forw: forw -> [128.9.0.107].53 ds=7 nsid=61692 id=27975 0ms retry 4 sec

datagram from [128.9.0.107].53, fd 5, len 23
ncache: dname br.fx, type 1, class 1
send_msg -> [192.249.249.3].1059 (UDP 5) id=27975

datagram from [192.249.249.3].1060, fd 5, len 33
req: nlookup(br.fx.movie.edu) id 27976 type=1 class=1
req: found 'br.fx.movie.edu' as 'br.fx.movie.edu' (cname=0)
req: nlookup(bladerunner.fx.movie.edu) id 27976 type=1 class=1
req: found 'bladerunner.fx.movie.edu' as 'bladerunner.fx.movie.edu'
     (cname=1)
ns_req: answer -> [192.249.249.3].1060 fd=5 id=27976 size=183 Local
Debug turned OFF

Contrast this with the debugging output produced by the application of the search list in Chapter 13. The only names looked up here are exactly what the user typed, with no domain names appended at all. Clearly, the search list isn't being applied.

14.3.13 Response from Unexpected Source

One problem we've seen increasingly often in the DNS newsgroups is the "response from unexpected source." This was once called a Martian response: it's a response that comes from an IP address other than the one your name server sent a query to. When a BIND name server sends a query to a remote server, BIND conscientiously makes sure that answers come only from the IP addresses on that server. This helps minimize the possibility of accepting spoofed responses. BIND is equally demanding of itself: a BIND server makes every effort to reply via the same network interface that it received a query on.

Here's the error message you'd see upon receiving a possibly unsolicited response:

Mar  8 17:21:04 terminator named[235]: Response from unexpected source ([205. 199.4.131].53)

This can mean one of two things: either someone is trying to spoof your name server, or—more likely—you sent a query to an older BIND server or a different make of name server that's not as assiduous about replying from the same interface it receives queries on.

    I l@ve RuBoard Previous Section Next Section