Cisco IP SLA IOS-XE

Cisco IP Service Level Agreements (SLAs) is a proprietary feature available on Cisco routers and switches, which actively generates monitoring traffic, processes replies, and measures network performance.

This feature can be used to perform continuous end-to-end connectivity testing with automated re-routing over failover links. It can also simulate different application behavior, such as voice and video to check if the network provides the expected level of service.

The examples and features described in this article are based on Cisco IOS-XE version 16.9.

Configuration Components

IP SLA configuration starts with defining an SLA operation and then scheduling it to run immediately or at a specific time.

SLA Operation Definition

To create or edit an SLA operation use the “ip sla <operation-id>” global configuration mode command. It places CLI into the IP SLA configuration sub-mode, where you can select one of the IP SLA types and provide corresponding configuration parameters.

One of the simplest types of IP SLA operations is icmp-echo. The router pings a specified IP address and records round-trip times if the remote side is reachable. In the example below, we will set the type of SLA entry as ICMP echo with the destination’s IP address of 10.0.3.1. The sample topology is shown in Figure 1.

Figure 1. IP SLA Test Topology
Figure 1. IP SLA Test Topology

The configuration mode changes to IP SLA echo where you can adjust different optional parameters, such as request sending frequency and timeout for replies.

X(config)#ip sla 1 
X(config-ip-sla)# icmp-echo 10.0.3.1 source-ip 10.0.0.1
X(config-ip-sla-echo)#?
IP SLAs Icmp Echo Configuration Commands:
  data-pattern       Data Pattern
  default            Set a command to its defaults
  exit               Exit operation configuration
  frequency          Frequency of an operation
  history            History and Distribution Data
  no                 Negate a command or set its defaults
  owner              Owner of Entry
  request-data-size  Request data size
  tag                User defined tag
  threshold          Operation threshold in milliseconds
  timeout            Timeout of an operation
  tos                Type Of Service
  verify-data        Verify data
  vrf                Configure IP SLAs for a VRF

Scheduling IP SLA

Once SLA is defined and optional parameters are specified, start it by running the “ip sla schedule <id>” command. IP SLA can be configured to start at a specific time or as soon as the command is entered, which is shown in the example below.

X(config)#ip sla schedule 1 start-time now life forever

To check SLA operation’s details use the “show ip sla statistics <id> details” command.

X#show ip sla statistics 1 details 
IPSLAs Latest Operation Statistics

IPSLA operation id: 1
        Latest RTT: 1 milliseconds
Latest operation start time: 10:08:18 UTC Sat Jan 30 2021
Latest operation return code: OK
Over thresholds occurred: FALSE
Number of successes: 2
Number of failures: 0
Operation time to live: Forever
Operational state of entry: Active
Last time this entry was reset: Never

To see configuration details, including default values for various parameters, use the “show ip sla configuration” command.

X#show ip sla configuration  
IP SLAs Infrastructure Engine-III
Entry number: 1
Owner: 
Tag: 
Operation timeout (milliseconds): 5000
Type of operation to perform: icmp-echo
Target address/Source address: 10.0.3.1/10.0.0.1
Type Of Service parameter: 0x0
Request size (ARR data portion): 28
Data pattern: 0xABCDABCD
Verify data: No
Vrf Name: 
Schedule:
   Operation frequency (seconds): 60  (not considered if randomly scheduled)
   Next Scheduled Start Time: Start Time already passed
   Group Scheduled : FALSE
   Randomly Scheduled : FALSE
   Life (seconds): Forever
   Entry Ageout (seconds): never
   Recurring (Starting Everyday): FALSE
   Status of entry (SNMP RowStatus): Active
Threshold (milliseconds): 5000
Distribution Statistics:
   Number of statistic hours kept: 2
   Number of statistic distribution buckets kept: 1
   Statistic distribution interval (milliseconds): 20
Enhanced History:
History Statistics:
   Number of history Lives kept: 0
   Number of history Buckets kept: 15
   History Filter Type: None

Once SLA operation is scheduled, it cannot be modified. Instead, the old operation can be deleted, and then a new one created again using the same ID.

Different IP SLA Types

To view available types of SLA operations, use context-sensitive help as shown in the next example.

X(config-ip-sla)#?
IP SLAs entry configuration commands:
  dhcp         DHCP Operation
  dns          DNS Query Operation
  ethernet     Ethernet Operations
  exit         Exit Operation Configuration
  ftp          FTP Operation
  http         HTTP Operation
  icmp-echo    ICMP Echo Operation
  icmp-jitter  ICMP Jitter Operation
  mpls         MPLS Operation
  path-echo    Path Discovered ICMP Echo Operation
  path-jitter  Path Discovered ICMP Jitter Operation
  tcp-connect  TCP Connect Operation
  udp-echo     UDP Echo Operation
  udp-jitter   UDP Jitter Operation

The next subsections will review some of them.

Jitter operations

Jitter is a variation in the delay between packets. The smaller the jitter the better performance of time-sensitive applications such as voice. It also means that the network delivers packets with a predictable delay and doesn’t experience congestions causing intermittent delays along the paths.

There are 2 types of SLA operations performing Jitter measurements – ICMP and UDP-based.

ICMP Jitter

ICMP Jitter SLA operation is based on ICMP message types (Timestamp Request and Timestamp Reply). The destination can be any device that supports these ICMP messages. Not all devices support it, or it can be blocked by the firewalls. The previously shown ICMP Echo operation is based on more commonly used Echo and Reply message types.

The configuration in the example below demonstrates how to configure ICMP jitter operation and, that once launched, it reports the round-trip-time (RTT) and jitter statistics from the source to the destination, and vice versa.

X(config)# ip sla 2 
X(config-ip-sla)# icmp-jitter 10.0.3.1 source-ip 10.0.0.1
X(config)# ip sla schedule 2 start-time now life forever

X#show ip sla statistics 2
IPSLAs Latest Operation Statistics

IPSLA operation id: 2
Type of operation: icmp-jitter
        Latest RTT: 1 milliseconds
Latest operation start time: 00:20:39 UTC Sun Jan 31 2021
Latest operation return code: OK
RTT Values:
        Number Of RTT: 10               RTT Min/Avg/Max: 1/1/1 milliseconds
Latency one-way time:
        Number of Latency one-way Samples: 0
        Source to Destination Latency one way Min/Avg/Max: 0/0/0 milliseconds
        Destination to Source Latency one way Min/Avg/Max: 0/0/0 milliseconds
Jitter Time:
        Number of SD Jitter Samples: 9
        Number of DS Jitter Samples: 9
        Source to Destination Jitter Min/Avg/Max: 0/1/1 milliseconds
        Destination to Source Jitter Min/Avg/Max: 0/1/1 milliseconds
Over Threshold:
        Number Of RTT Over Threshold: 0 (0%)
Packet Late Arrival: 0
Out Of Sequence: 0
        Source to Destination: 0        Destination to Source 0
        In both Directions: 0
Packet Skipped: 0       Packet Unprocessed: 0
Packet Loss: 0
        Loss Periods Number: 0
        Loss Period Length Min/Max: 0/0
        Inter Loss Period Length Min/Max: 0/0
Number of successes: 4
Number of failures: 0
Operation time to live: Forever

To measure jitter, a source router sends a number of packets (10, by default) periodically. The time between these packets is called interval (20ms). The operation is repeated at a specified frequency (60 seconds by default).

The example below shows default configuration values – every 60 seconds the router will send 10 packets with 20 milliseconds interval between each packet.

X#show ip sla configuration 2
IP SLAs Infrastructure Engine-III
Entry number: 2
Owner: 
Tag: 
Operation timeout (milliseconds): 5000
Type of operation to perform: icmp-jitter
Target address/Source address: 10.0.3.1/10.0.0.1
Packet Interval (milliseconds)/Number of packets: 20/10
Type Of Service parameter: 0x0
Vrf Name: 
Schedule:
   Operation frequency (seconds): 60  (not considered if randomly scheduled)
<output is truncated>

UDP Jitter and IP SLA responder

UDP is used in voice and video communications. Using UDP traffic for measurements suits better to simulate such applications. In an IP SLA operation configuration codec type can be specified, and this will define packet size and enable various voice-specific metric calculations, such as Mean Opinion Score (MOS).

Figure 2. IP SLA Responder Configuration
Figure 2. IP SLA Responder Configuration

To use the UDP Jitter destination device must also be a Cisco device with an IP SLA responder feature enabled on it. The responder’s basic configuration is completed with the single command on router Z:

Z(config)#ip sla responder

Back on the router X, IP SLA configuration is done per the example below.

X(config)#ip sla 3
X(config-ip-sla)#udp-jitter 10.0.3.1 2345 source-ip 10.0.0.1
X(config)#ip sla schedule 3 start-time now life forever
X#show ip sla statistics 
IPSLAs Latest Operation Statistics

IPSLA operation id: 3
Type of operation: udp-jitter
        Latest RTT: 1 milliseconds
Latest operation start time: 05:53:56 UTC Sun Jan 31 2021
Latest operation return code: OK
RTT Values:
        Number Of RTT: 10               RTT Min/Avg/Max: 1/1/1 milliseconds
Latency one-way time:
        Number of Latency one-way Samples: 0
        Source to Destination Latency one way Min/Avg/Max: 0/0/0 milliseconds
        Destination to Source Latency one way Min/Avg/Max: 0/0/0 milliseconds
Jitter Time:
        Number of SD Jitter Samples: 9
        Number of DS Jitter Samples: 9
        Source to Destination Jitter Min/Avg/Max: 0/1/1 milliseconds
        Destination to Source Jitter Min/Avg/Max: 0/1/1 milliseconds
Over Threshold:
        Number Of RTT Over Threshold: 0 (0%)
Packet Loss Values:
        Loss Source to Destination: 0
        Source to Destination Loss Periods Number: 0
        Source to Destination Loss Period Length Min/Max: 0/0
        Source to Destination Inter Loss Period Length Min/Max: 0/0
        Loss Destination to Source: 0
        Destination to Source Loss Periods Number: 0
        Destination to Source Loss Period Length Min/Max: 0/0
        Destination to Source Inter Loss Period Length Min/Max: 0/0
        Out Of Sequence: 0      Tail Drop: 0
        Packet Late Arrival: 0  Packet Skipped: 0
Voice Score Values:
        Calculated Planning Impairment Factor (ICPIF): 0
        Mean Opinion Score (MOS): 0
Number of successes: 1
Number of failures: 0
Operation time to live: Forever

Path operations

When measuring latency between two hosts it is useful to know how much delay is contributed by each hop along the path. Path-type SLAs work by running traceroute first and then performing echo or jitter operation against each discovered hop.

X(config)#ip sla 4
X(config-ip-sla)#path-echo 10.0.3.1 source-ip 10.0.0.1
X(config)#ip sla schedule 4 start-time now life forever

The configuration is based on the same network topology with 2 paths available to the destination.

Figure 3. IP SLA Path Operations
Figure 3. IP SLA Path Operations

The “show ip sla statistics aggregated 4 details” command will display round-trip time statistics for each hop. For example, the command output below shows statistics for the first hop in the path going via router B. The remaining hops statistics are omitted in the example below.

X#show ip sla statistics aggregated 4 details 
IPSLAs aggregated statistics

Distribution Statistics:
Bucket Range: 0 to < 20 ms
Avg. Latency: 1 ms
Percent of Total Completions for this Range: 100 %
Number of Completions/Sum of Latency: 1/1 
Sum of RTT squared low 32 Bits/Sum of RTT squared high 32 Bits: 1/0
Operations completed over threshold: 0
Start Time Index: *09:54:47.234 UTC Tue Feb 2 2021
Path Index: 3
Hop in Path Index: 1
Type of operation: path-echo
Number of successes: 18
Number of failures: 0
Number of over thresholds: 0
Failed Operations due to Disconnect/TimeOut/Busy/No Connection: 0/0/0/0
Failed Operations due to Internal/Sequence/Verify Error: 0/0/0
Failed Operations due to Control enable/Stats retrieve Error: 0/0
Target Address 172.16.100.6
<output is truncated>

Other operations

IP SLA can perform application-specific monitoring, for example, it can run HTTP or FTP operations against a remote server. For generic TCP applications, TCP operation can be used, which validates if the TCP connection can be established and how long it takes to connect to the server.

The other available operation types are DHCP (checks how long it takes to obtain an IP address) and DNS queries.

Reactions and Proactive Monitoring

In the previous sections, we’ve reviewed different types of SLA operations and how to check their operations using CLI. By using SNMP-based monitoring software, administrators can poll SLA data from routers periodically, which is a more scalable approach to monitoring. Cisco provides SNMP MIB called RTTMON that provides access to IP SLAs from monitoring software.

It is also possible for a router to send an SNMP trap or Syslog message if a certain event happens. IP SLA includes a feature called proactive threshold monitoring. To enable it IP SLA reaction must be configured.

SLA reaction configuration command references an IP SLA operation and has a mandatory parameter specifying what monitored element of SLA we want to monitor. For example, this can be a timeout or threshold-based value. Available monitored elements vary based on the type of IP SLA operation. A compatibility table is available on this page. https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/ipsla/configuration/xe-16-9/sla-xe-16-9-book/sla-threshold-mon.html

The example below uses a previously configured IP SLA operation of icmp-echo type.

ip sla 1
 icmp-echo 10.0.3.1 source-ip 10.0.0.1
ip sla schedule 1 life forever start-time now

The reaction configuration below references SLA operation 1 and selects timeout as a monitored element. Then we enable simple SNMP configuration to send a trap when operations times out.

X(config)#ip sla reaction-configuration 1 react timeout action-type trapOnly threshold-type immediate
X(config)#snmp-server enable traps ipsla
X(config)#snmp-server host 10.0.0.5 PUBLIC

To test it, I’ve turned off the interface on router Z and enabled debug of SNMP packets to confirm that SNMP traps were generated by the router.

X#debug snmp packets
*Feb  4 09:41:49.232: SNMP: Queuing packet to 10.0.0.5
*Feb  4 09:41:49.232: SNMP: V1 Trap, ent rttMonNotificationsPrefix, addr 10.0.0.1, gentrap 6, spectrap 2 
 rttMonCtrlAdminTag.1 =  
 rttMonHistoryCollectionAddress.1 = 0A 00  03 01    
 rttMonCtrlOperTimeoutOccurred.1 = 2
<output is truncated>

Enhanced Object Tracking

The final section of this document addresses how a router can change its data plane operation in response to IP SLA operational status.

The feature responsible for translating the status of IP SLA operations to the data plane protocols is called Enhanced Object Tracking. First-Hop Redundancy Protocols, such as HSRP and VRRP use the tracking objects to take over or release data forwarding roles. Static and policy-based routing can also use the tracking objects to re-route around degraded paths.

EEM (Embedded Event Manager) scripts can monitor tracking object’s status changes. This makes it possible to run powerful EEM scripting in response to track status change.

A Track object can have 2 states – Up and Down. IP SLA operations can have several return codes. For the purpose of object tracking, we are interested in the 3 return codes discussed below. All other codes that are not covered in this article translate into the Down state of the Track object.

Let’s re-use the IP SLA ICMP Echo operation from the previous example and check the code of the operation. The second command in the listing shows how to check the default threshold (5000ms):

X#show ip sla statistics 1 | incl code           
Latest operation return code: OK
X#show ip sla configuration 1 | include Threshold
Threshold (milliseconds): 5000

The operation’s return code is OK, which means that the ping operation is successful and its round-trip time is within the threshold (in our network it is 1ms). OK status always translates to the UP state of the corresponding track object.

If we will turn off the remote interface on router Z, the return code will change to Timeout, as per the listing below. This is the second SLA operation code, which always translates into the Down state of the Track object.

X#show ip sla statistics 1 | incl code
Latest operation return code: Timeout

The third code that is relevant for the purpose of track operation is OverThreshold. To demonstrate it, we increased the request data size to 5000 bytes, so the operation takes longer than 1ms. We also decreased the threshold to 1ms, so the operation goes over the threshold.

The OverThreshold code translates differently to the track object state, which we will discuss in the next section.

X(config)#ip sla 2
X(config-ip-sla)#icmp-echo 10.0.3.1 source-ip 10.0.0.1
X(config-ip-sla-echo)# request-data-size 5000
X(config-ip-sla-echo)# threshold 1
X(config)#ip sla schedule 2 start-time now life forever

X#show ip sla statistics 2               
IPSLAs Latest Operation Statistics

IPSLA operation id: 2
        Latest RTT: 2 milliseconds
Latest operation start time: 09:44:48 UTC Sat Feb 6 2021
Latest operation return code: Over threshold
Number of successes: 2
Number of failures: 0
Operation time to live: Forever

IP SLA Track State vs. Reachability

track <id> ip sla <sla-id>” command creates a track object that references SLA operation. There are 2 types of track objects for SLA – reachability, and state. Let’s configure both types against the previously configured SLA 2 to see the difference.

X(config)#track 1 ip sla 2 reachability
X(config)#track 2 ip sla 2 state

X#show track 
Track 1
  IP SLA 2 reachability
  Reachability is Up
    1 change, last change 00:00:29
  Latest operation return code: Over threshold
  Latest RTT (millisecs) 2
Track 2
  IP SLA 2 state
  State is Down
    1 change, last change 00:00:17
  Latest operation return code: Over threshold
  Latest RTT (millisecs) 2

Track 1 is based on reachability and it is UP, even when the underlying IP SLA object is over the threshold. As the name suggests, we only interested in the fact that we can reach the remote side of the connection. Track 2 is state-based and it returns DOWN for the same IP SLA object. Track of the state type checks both reachability and that the IP SLA object is under the specified threshold.

Static Routes

The next listing demonstrates how to use the track objects with static routes:

X(config)#ip route 0.0.0.0 0.0.0.0 172.16.100.2 track 1 200
X(config)#ip route 0.0.0.0 0.0.0.0 172.16.100.6 track 2

X#sh ip route 0.0.0.0
Routing entry for 0.0.0.0/0, supernet
  Known via "static", distance 200, metric 0, candidate default path
  Routing Descriptor Blocks:
  * 172.16.100.2
      Route metric is 0, traffic share count is 1

The route via 172.16.100.6 is the preferred, as it has the default administrative distance of 1. The route via 172.16.100.2 is a backup route with an administrative distance of 200. However, the route table shows that the router prefers the route via 172.16.100.2.

For a route to be installed into the routing table the corresponding track must be UP. Because track object #2 is Down, the static route via 172.16.100.6 is withdrawn and the backup route is chosen and installed into the routing table.

EEM Scripts with IP SLA

The final example of this article is an EEM (Embedded Event Manager) script that adjusts the router configuration, generates a Syslog message in response to the Track state change.

The script is triggered when Track object #1 goes down, with the “event track <object-id> state down” command.

The access-list is being adjusted by the script in this example only for demonstration purposes. It can be adjusted to suit real-life requirements. For example, the script can reconfigure VPN tunnel endpoints or enable a backup interface. EEM can perform many other non-CLI actions as well.

X(config)#event manager applet TRACK-1-DOWN 
X(config-applet)#event track 1 state down
X(config-applet)#action 1 cli command "enable"
X(config-applet)#action 2 cli command "conf t"
X(config-applet)#action 3 cli command "ip access-list extended INTERNET-IN"
X(config-applet)#action 4 cli command "permit ip host 1.2.3.4 any"         
X(config-applet)#action 5 syslog msg "TEST Message"

X#show run | section ^ip access-list
<none>

To test the script, I’ve turned off the remote interface on router Z causing IP SLA and track to go down. The router generates a test Syslog message and creates an access-list, as shown in the listing below. “terminal monitor” command displays console messages into VTY lines, such as SSH or Telnet.

X#terminal monitor
*Feb  6 10:13:50.285: %TRACK-6-STATE: 1 ip sla 2 reachability Up -> Down

*Feb  6 10:13:50.530: %HA_EM-6-LOG: TRACK-1-DOWN: TEST Message

X#sh run | section ^ip access-list
ip access-list extended INTERNET-IN
 permit ip host 1.2.3.4 any