Previous section   Next section

Recipe 15.14 Debugging DSLw

15.14.1 Problem

You want to debug and isolate DLSw problems.

15.14.2 Solution

The first thing to do with any DLSw issue is to verify that the peers are working correctly, as in Recipe 15.12. If the peers are not established, then test IP connectivity with ping packets. If you can ping but the peers won't come up, then verify your configuration as in Recipe 15.1. In particular, ensure that the remote peer of each router precisely matches the local peer on the other end.

If the DLSw peers are active, check the circuits, as in Recipe 15.12.

For failed circuits involving SDLC devices, check the serial interface, as in Recipe 15.13.

For Token Ring or Ethernet devices, verify that the interface is functioning properly as in Chapter 16.

If the peers are active and the interfaces look good, then there are three main things that could still be wrong. There could be a loop problem within the DLSw network. There could be a MAC address problem, or a MAC or LSAP filtering issue. Or there could be a network congestion or performance problem.

There are several useful debug commands for use with DLSw. For looking at the router-to-router DLSw transport, you can use the debug dlsw command:

dlsw-branch#debug dlsw

You can get other useful information about SNA and LLC2 connection problems with these debug commands:

dlsw-branch#debug sna state
dlsw-branch#debug llc2 state

15.14.3 Discussion

If your routers will not establish a DLSw peer relationship, the most common problem is simple IP connectivity. Verify that you can ping the address in the remote-peer statement. Note also that at most one end can use the promiscuous keyword to avoid configuring a remote-peer statement. This feature allows the router to sit and quietly wait for other routers to initiate connections. Somebody has to start the conversation.

Another common reason for failing to connect is simply a configuration mismatch between the local peer and remote peer definitions. Make sure that if you have a remote-peer statement, then the IP address exactly matches the address in the local-peer statement of the other router. It is not enough just to target any address on the remote peer router: it must be exactly the same address.

If this is correct and you can ping but still cannot connect, then there are two remaining possibilities. One is that there are port-specific filters or routing rules that permit ping, but handle DLSw packets differently. This is particularly possible when using policy-based routing in an attempt to give DLSw packets a preferred path through the network. The preferred path may have serious congestion problems, for example.

The other possibility is that there is a complete incompatibility between the versions of DLSw running at the two ends. If both are Cisco routers, then this is not going to happen. But there are still some pre-RFC 1795 DLSw implementations out there, and Cisco's DLSw+ cannot negotiate a connection with noncompliant versions.

If you are trying to debug a DLSw problem, make sure that the peers are connected:

dlsw-branch#show dlsw peers
Peers:            state     pkts_rx   pkts_tx  type  drops ckts TCP   uptime
 TCP 10.1.1.5    CONNECT    1348082    547505  conf      0    3   0     4w6d
 TCP 10.1.1.9    CONNECT     167013    200235  conf      0    3   0     4w5d
 TCP 10.2.1.5    DISCONN          0         0  conf      0    0   -        -
 TCP 10.2.1.9    DISCONN          0         0  conf      0    0   -        -
Total number of connected peers: 2
Total number of connections:     2

In this example, four peers have been configured, but only two of them are connected. A simple ping test would reveal in this case that the two disconnected peers are currently unreachable because of a network problem.

This command output also shows that there are currently three circuits connected to each of the active peers, for a total of six active circuits. You can look at information on the individual circuits with the show dlsw circuits command:

dlsw-branch#show dlsw circuits
Index           local addr(lsap)    remote addr(dsap)  state          uptime
2164260933      4000.53aa.2440(04)  4000.3745.aaaa(04) CONNECTED         5d00h
1627390033      4000.53aa.8420(04)  4000.3745.aaaa(04) CONNECTED         2d01h
2684354644      4000.53aa.7023(04)  4000.3745.aaaa(04) CONNECTED      09:41:36
2013265928      000c.29d0.bb41(14)  4000.3745.aaab(04) CONNECTED          4w5d
654311497       4001.001b.a1f3(04)  4000.3745.aaab(04) CONNECTED         3d18h
1342177367      4000.53aa.e3a1(04)  4000.3745.aaab(04) CONNECTED      06:09:14
Total number of circuits connected: 6

Here you can see that these six circuits connect to two different FEP MAC addresses. All of the DSAP values are 0x04, and one of the circuits has an LSAP value of 0x14. This is useful in making sure that you have appropriate SAP filters in place.

The following shows the output of a debug trace on this same router:

dlsw-branch#debug dlsw
DLSw reachability debugging is on at event level for all protocol traffic
DLSw peer debugging is on
DLSw local circuit debugging is on
DLSw core message debugging is on
DLSw core state debugging is on
DLSw core flow control debugging is on
DLSw core xid debugging is on
dlsw-branch#
May 15 22:18:35.320:  DLSW Received-ctlQ : CLSI Msg : TEST_STN.Ind   dlen: 40 
May 15 22:18:35.320: CSM: Received CLSI Msg : TEST_STN.Ind   dlen: 40 from
TokenRing0/0
May 15 22:18:35.320: CSM:   smac c001.001b.21aa, dmac 0020.353f.ab4a, ssap 4 , dsap 0 
May 15 22:18:35.320: broadcast filter failed mac check
May 15 22:18:35.320: broadcast filter failed mac check
May 15 22:18:35.320: CSM: Write to all peers not ok - PEER_NO_CONNECTIONS
May 15 22:18:35.556:  DISP Sent : CLSI Msg : TEST_STN.Req   dlen: 46 
May 15 22:18:35.556:  DISP Sent : CLSI Msg : TEST_STN.Req   dlen: 46 
May 15 22:18:35.556:  DISP Sent : CLSI Msg : TEST_STN.Req   dlen: 46 
May 15 22:18:35.556:  DISP Sent : CLSI Msg : TEST_STN.Req   dlen: 46 
May 15 22:18:35.556:  DISP Sent : CLSI Msg : TEST_STN.Req   dlen: 46 
May 15 22:18:35.556:  DISP Sent : CLSI Msg : TEST_STN.Req   dlen: 46 
May 15 22:18:36.240:  DLSW Received-ctlQ : CLSI Msg : TEST_STN.Ind   dlen: 40 
May 15 22:18:36.244: CSM: Received CLSI Msg : TEST_STN.Ind   dlen: 40 from
TokenRing0/0
May 15 22:18:36.244: CSM:   smac c001.001b.21aa, dmac 0060.9442.7cf3, ssap 4 , dsap 0 
May 15 22:18:36.244: broadcast filter failed mac check
May 15 22:18:36.244: broadcast filter failed mac check
May 15 22:18:36.244: CSM: Write to all peers not ok - PEER_NO_CONNECTIONS

Note that one device with a Token Ring MAC address of c001.001b.21aa is attempting to make a connection to two different destination MAC addresses. These connections are being rejected with the message "broadcast filter failed mac check." This message appears because of the dlsw icanreach mac-exclusive command that filters out all MAC addresses from crossing the network unless they are explicitly configured on the router.

Watching this same debug trace for a little while longer reveals more interesting entries:

May 15 22:18:46.852: DLSw: 654311497 decr s - s:58 so:0 r:89 ro:0
May 15 22:18:46.852: DLSW Received-disp : CLSI Msg : DATA.Ind   dlen: 336 
May 15 22:18:46.852: DLSw: START-FSM (654311497): event:DLC-Data.Ind state:CONNECTED
May 15 22:18:46.852: DLSw: core: dlsw_action_l(  )
May 15 22:18:46: %DLSWC-3-SENDSSP: SSP OP = 10( INFO ) to peer 10.1.1.5(2065) success
May 15 22:18:46.852: DLSw: END-FSM (654311497): state:CONNECTED->CONNECTED
May 15 22:18:46: %DLSWC-3-RECVSSP: SSP OP = 10( INFO )  from peer 10.1.1.5(2065)
May 15 22:18:46.920: DLSw: 654311497 decr r - s:58 so:0 r:88 ro:0
May 15 22:18:46.920: DLSw: START-FSM (654311497): event:WAN-INFO state:CONNECTED
May 15 22:18:46.924: DLSw: core: dlsw_action_m(  )
May 15 22:18:46.924:  DISP Sent : CLSI Msg : DATA.Req   dlen: 308 
May 15 22:18:46.924: DLSw: END-FSM (654311497): state:CONNECTED->CONNECTED
May 15 22:18:48.612: DLSw: Keepalive Request sent to peer 10.1.1.9(2065))
May 15 22:18:48.628: DLSw: Keepalive Response from peer 10.1.1.9(2065)

In this section of the trace, you can see one of the circuits (654311497) actually sends and receives a piece of information. You can see that it uses the first of the active peers, 10.1.1.5, and that the conversation happens on TCP port 2065. A short time later, you can see the router exchange keepalive messages with the other peer to ensure that the peer relationship remains active.

The following shows what happens when a particular circuit suddenly disconnects. In this case, we have deliberately cleared one of the circuits:

dlsw-branch# clear dlsw circuit 2013265928
May 15 22:18:55.412: DLSw: START-FSM (2013265928): event:ADMIN-STOP state:CONNECTED
May 15 22:18:55.412: DLSw: core: dlsw_action_r(  )
May 15 22:18:55: %DLSWC-3-SENDSSP: SSP OP = 25( HLTN )  to peer 10.1.1.5(2065) 
success
May 15 22:18:55.412:  DISP Sent : CLSI Msg : DISCONNECT.Req   dlen: 4 
May 15 22:18:55.412: DLSw: END-FSM (2013265928): state:CONNECTED->HALT_NOACK_PEND
May 15 22:18:55.424: SNA: Connection to Focal Point SSCP lost
May 15 22:18:55.424:  DLSW Received-ctlQ : CLSI Msg : DISCONNECT.Cfm CLS_OK dlen: 8 
May 15 22:18:55.424: DLSw: START-FSM (2013265928): event:DLC-Disc.Cnf state:HALT_
NOACK_PEND
May 15 22:18:55.424: DLSw: core: dlsw_action_z(  )
May 15 22:18:55.424:  DISP Sent : CLSI Msg : CLOSE_STN.Req   dlen: 4 
May 15 22:18:55.424: DLSw: END-FSM (2013265928): state:HALT_NOACK_PEND->CLOSE_PEND
May 15 22:18:55.424:  DLSW Received-ctlQ : CLSI Msg : DISCONNECT.Ind   dlen: 8 
May 15 22:18:55.424: DLSw: START-FSM (2013265928): event:DLC-Disc.Ind state:CLOSE_
PEND
May 15 22:18:55.424: DLSw: END-FSM (2013265928): state:CLOSE_PEND->CLOSE_PEND
May 15 22:18:55.424:  DLSW Received-ctlQ : CLSI Msg : CLOSE_STN.Cfm CLS_OK dlen: 8 
May 15 22:18:55.424: DLSw: START-FSM (2013265928): event:DLC-CloseStn.Cnf state:
CLOSE_PEND
May 15 22:18:55.428: DLSw: core: dlsw_action_y(  )
May 15 22:18:55.428: DLSw: 2013265928 to dead queue
May 15 22:18:55.428: DLSw: START-TPFSM (peer 10.1.1.5(2065)): event:CORE-DELETE
CIRCUIT state:CONNECT
May 15 22:18:55.428: DLSw: dtp_action_v(  ), peer delete circuit for peer 10.1.1.
5(2065)
May 15 22:18:55.428: DLSw: END-TPFSM (peer 10.1.1.5(2065)): state:CONNECT->CONNECT
May 15 22:18:55.428: DLSw: END-FSM (2013265928): state:CLOSE_PEND->DISCONNECTED

As you can see, there are several stages that the connection must go through to complete the disconnection request. First it goes into a "HALT_NOACK_PEND" state, meaning that both ends have not yet acknowledged the halt order. From there it goes into a "CLOSE_PEND" state, and finally "DISCONNECTED," as the two DLSw peer routers finally agree that the connection has been terminated.

For one final example, here is what the debug sna state trace looks like for a similar disconnection and reconnection:

dlsw-branch#debug sna state
  SNA state change debugging is on for all PUs
   
dlsw-branch#clear dlsw circuit 2130706520
dlsw-branch#
May 15 22:22:57.399: SNA: LS VDLCTEST: input=Disc.Ind, Connected -> PendClose
May 15 22:22:57.403: SNA: PU VDLCTEST: input=T2ResetPu, Active -> Reset
May 15 22:22:57.403: SNA: LS VDLCTEST: input=Close.Cnf, PendClose -> Reset
May 15 22:22:57.403: SNA: Connection to Focal Point SSCP lost
May 15 22:23:27.035: SNA: LS VDLCTEST: input=StartLs, Reset -> PendConOut
May 15 22:23:33.035: SNA: LS VDLCTEST: input=ReqOpn.Cnf, PendConOut -> Xid
May 15 22:23:33.303: SNA: LS VDLCTEST: input=Connect.Ind, Xid -> ConnIn
May 15 22:23:33.303: SNA: LS VDLCTEST: input=Connected.Ind, ConnIn -> Connected
May 15 22:23:33.427: SNA: PU VDLCTEST: input=Actpu, Reset -> Active
May 15 22:23:33.427: SNA: Connection to Focal Point SSCP established

In this trace, you can see the exchange of XID information and the final activation of the circuit takes less than a second here. The main time lag occurs as the end devices wait a few seconds after disconnection before trying to reconnect their SNA sessions.

15.14.4 See Also

Recipe 15.1; Recipe 15.12; Recipe 15.13; Chapter 16


  Previous section   Next section
Top