Previous section   Next section

Recipe 17.22 Using SAA

17.22.1 Problem

You want to configure the routers to automatically poll one another to collect performance statistics.

17.22.2 Solution

Cisco supplies a feature called the Service Assurance Agent (SAA) in IOS Version 12.0(5)T and higher, which allows the routers to automatically poll one another to collect end-to-end performance statistics:

Router1#configure terminal 
Enter configuration commands, one per line.  End with CNTL/Z.
Router1(config)#rtr responder
Router1(config)#rtr 10
Router1(config-rtr)#type echo protocol ipIcmpEcho 10.1.2.3
Router1(config-rtr)#tag ECHO_TEST
Router1(config-rtr)#threshold 1000
Router1(config-rtr)#frequency 300
Router1(config-rtr)#exit
Router1(config)#rtr schedule 10 life 2147483647 start-time now 
Router1(config)#rtr 20
Router1(config-rtr)#type jitter dest-ipaddr 10.1.2.3 dest-port 99 num-packets 100
Router1(config-rtr)#tag JITTER_TEST
Router1(config-rtr)#frequency 300
Router1(config-rtr)#exit
Router1(config)#rtr schedule 20 life 100000 start-time now ageout 3600
Router1(config)#exit
Router1#

The target router (10.1.2.3), which is specified as the destination in both of these tests, must be configured to respond to SAA tests:

Router2#configure terminal 
Enter configuration commands, one per line.  End with CNTL/Z.
Router2(config)#rtr responder
Router2(config)#exit
Router2#

17.22.3 Discussion

The SAA feature includes replace the earlier Round Trip Reporter (RTR) and Route Trip Time Monitor (RTTMON) facilities, which were available in IOS Version 11.3, and use the same basic syntax. However, where RTR includes only some simple round-trip ping and SNA tests, SAA includes several more interesting and useful features.

The first line in the example is the rtr responder command. This is required on all routers that will be taking part in SAA, including the targets of these tests. Both of these examples use a target IP address of 10.1.2.3. This destination must be another Cisco router that is also configured with the rtr responder command.

In this example, we have configured two tests. The first test is given the arbitrary number 10 and the name ECHO_TEST. The second test is number 20, and is called JITTER_TEST. Note that you don't actually need to give your SAA tests names, but it is a good idea if you have several of them. This name (or tag) is included in the SAA SNMP MIB table for this test. So, if you intend to download the test data via SNMP for performance management purposes, it can be extremely useful to name your tests so you know what they are.

Let's look at both of these example tests in more detail.

The first test does an ICMP echo (ping) to the destination device, 10.1.2.3:

Router1(config)#rtr 10
Router1(config-rtr)#type echo protocol ipIcmpEcho 10.1.2.3
Router1(config-rtr)#tag ECHO_TEST
Router1(config-rtr)#threshold 1000
Router1(config-rtr)#frequency 300
Router1(config-rtr)#exit

The threshold command defines a minimum interesting threshold, which is set to 1,000 milliseconds here. This allows you to count the number of ping tests in which the round trip time was greater than one second, in addition to keeping track of the ping times and number of ping failures, which we will show in a moment.

Next is the frequency command, which defines how often this test will be run in seconds. In this case, we want the test to run every 5 minutes (300 seconds).

Then, once you have defined the test in the rtr configuration block, you have to tell the router when to run it. This is done with the rtr schedule command:

Router1(config)#rtr schedule 10 life 2147483647 start-time now 

This command defines the schedule for running test number 10. It sets a lifetime for this test of 2,147,483,647 seconds (a very long time), which is the maximum value. This effectively means that this test will continue to run indefinitely. It is scheduled to start immediately.

When we scheduled the second test we used slightly different parameters:

Router1(config)#rtr schedule 20 life 100000 start-time now ageout 3600

In this case, the test is scheduled to run only for 100,000 seconds, which is about 27 hours. We have also configured an ageout value of 3600 seconds for this test. This says that the router will keep this test rule in memory for this length of time after it expires. The ageout option allows you to restart the test without having to reconfigure it.

You can view the data for the first test as follows:

Router1#show rtr operational-state 10
        Current Operational State
Entry Number: 10
Modification Time: 18:51:53.000 EST Tue Dec 17 2002
Diagnostics Text: 
Last Time this Entry was Reset: Never
Number of Octets in use by this Entry: 1910
Connection Loss Occurred: FALSE
Timeout Occurred: FALSE
Over Thresholds Occurred: FALSE
Number of Operations Attempted: 203
Current Seconds Left in Life: 2147483647
Operational State of Entry: active
Latest Completion Time (milliseconds): 54
Latest Operation Start Time: 11:41:53.000 EST Wed Dec 18 2002
Latest Operation Return Code: ok
Latest 10.1.2.3

In this output you can see that it has run this test 203 times, and that the last test took 54 milliseconds and completed successfully. Note that it doesn't give a running average ping time. However, one of the nicest features of SAA is that you can configure a network management station to download this data using SNMP, provided you have the SAA MIB loaded on your server.

The second test is considerably more interesting. This test measures jitter between the routers by sending a series of UDP packets and looking at the time differences between them at both ends:

Router1(config)#rtr 20
Router1(config-rtr)#type jitter dest-ipaddr 10.1.2.3 dest-port 99 num-packets 100
Router1(config-rtr)#tag JITTER_TEST
Router1(config-rtr)#frequency 300
Router1(config-rtr)#exit
Router1(config)#rtr schedule 20 life 100000 start-time now ageout 3600

The type command defines a jitter test to the same destination IP address as the previous test. In this case, we have decided to use UDP port 99 for our test, and each test run will consist of 100 packets. The frequency command tells the router to run this test every five minutes. Here is some sample output from this test:

Router1#show rtr operational-state 20
        Current Operational State
Entry Number: 20
Modification Time: 10:25:36.000 EST Wed Dec 18 2002
Diagnostics Text: 
Last Time this Entry was Reset: Never
Number of Octets in use by this Entry: 1742
Number of Operations Attempted: 22
Current Seconds Left in Life: 93400
Operational State of Entry: active
Latest Operation Start Time: 12:10:36.000 EST Wed Dec 18 2002
RTT Values:
NumOfRTT: 98    RTTSum: 6063    RTTSum2: 384317
Packet Loss Values:
PacketLossSD: 0 PacketLossDS: 2
PacketOutOfSequence: 2  PacketMIA: 0    PacketLateArrival: 0
InternalError: 0        Busies: 0
Jitter Values:
MinOfPositivesSD: 4     MaxOfPositivesSD: 14
NumOfPositivesSD: 32    SumOfPositivesSD: 175   Sum2PositivesSD: 1111
MinOfNegativesSD: 1     MaxOfNegativesSD: 5
NumOfNegativesSD: 60    SumOfNegativesSD: 175   Sum2NegativesSD: 547
MinOfPositivesDS: 1     MaxOfPositivesDS: 45
NumOfPositivesDS: 20    SumOfPositivesDS: 78    Sum2PositivesDS: 2166
MinOfNegativesDS: 1     MaxOfNegativesDS: 16
NumOfNegativesDS: 21    SumOfNegativesDS: 69    Sum2NegativesDS: 693

There is a clearly a lot more information in this test output. This is because measuring jitter is not a simple single-variable test. A jitter measurement characterizes the statistical distribution of packet-by-packet variation in forward and backward latency, as well as for the round trip. Note that, as with the SAA ping test we discussed earlier, the router records only the results of the most recent test. If you want to keep historical records, you need to poll and download the SAA MIB tables once per poll cycle.

The first set of numbers are the Round Trip Time (RTT) values. You can see that this sample included 98 packets. The total of all of the round trip times of all of these packets was 6063 milliseconds, and the sum of the squares of all of these times was 384,317 milliseconds. These values are not extremely meaningful in themselves, but if you divide the RTTSum value by the number of measurements, you get the average latency for this set of packets: roughly 61 milliseconds.

Applying some simple statistics, you can use the square value to understand how the actual values are spread around this average. The mean of the squares of the round trip times is 3,922 milliseconds2 (just dividing the sum of the squares by the total number of samples). If you subtract the square of the average from this value and take the square root, you get a statistical estimate of the variation in milliseconds. The higher this value, the greater the spread. In this case, you can calculate that this spread is roughly 14 milliseconds. This means that half of the time, the round trip latency is within the range 61±14ms. Note that the ± symbol is a standard mathematical notation that, in this case, indicates a range from 47ms (61-14) to 75ms (61+14).

The next set of data records dropped packets. Recall that the sample size is 100 packets, but the NumOfRTT value is only 98. So the network must have dropped two of our test packets. SAA separately keeps track of packets lost in both directions, source to destination (PacketLossSD) and destination to source (PacketLossDS). This router is the source; the other router is the destination. So, in this example, both of the lost packets happened on the way back. Also note that the output claims that there were two out-of-sequence packets during this test, which is consistent with the number of dropped packets. The router saw the sequence number jump by 2 because of the dropped packet, and recorded it as an out-of-sequence error.

The next group of numbers includes the actual jitter measurements. There are two groups of numbers here. The variables that end with "SD" are measured from the source to the destination, and those labeled "DS" are for the return path. Within each of these groups there are two subgroups, one for "Positives," and the other for "Negatives." "Positives" are events where the spacing between two packets has increased since the last pair of successive packets. The "Negatives" counters record all of the times that the interpacket spacing decreased. Let's look a little bit more closely at one set of values:

MinOfPositivesSD: 4     MaxOfPositivesSD: 14
NumOfPositivesSD: 32    SumOfPositivesSD: 175   Sum2PositivesSD: 1111

Of the 100 packets the router sent in this polling interval, there were 32 cases in which the jitter in the forward direction had a positive value. Of these, the largest value was 14 milliseconds and the smallest was 4 milliseconds. We can use the sum and the sum of the squares to calculate the average and spread of values in precisely the same way as we did to calculate the average latency. The result we get is that half of the time, positive jitter in this direction was within the range 5.5±2.2ms.

Applying this same technique to the other jitter measurements gives other useful statistics. The negative jitter from source to destination was 2.9±0.8ms with a maximum of 5ms and a minimum value of 1ms. In the other direction, the positive jitter was 3.9±9.6ms and the negative jitter was 3.3±4.7ms. These last two values might look a little bit funny because the spread is larger than the mean. However, this is not actually bad because the output also shows that the maximum positive jitter in this direction was 45ms, and the maximum negative jitter was 16ms. The spread is very large, but the mean jitter values are relatively small. This is a fairly typical result.


  Previous section   Next section
Top