Introduction

Multicast routing differs from unicast routing in several ways. The most important differences are in the ways that multicast routers use source and destination addresses. A multicast packet is addressed to a special IP address representing a group of devices that can be scattered anywhere throughout a network. Since the destinations can be anywhere, the only reliable way to eliminate loops in multicast routing is to look at the reverse path back to the source. So, while unicast routing cares about where the packet is going, multicast routing also needs to know where it came from.

For this reason, multicast routing protocols such as Protocol Independent Multicast (PIM) always work with the source address and destination group simultaneously. The usual notation for a multicast route is (Source, Group), as opposed to the unicast case, in which routes are defined by the destination address alone. We have already mentioned that this is necessary for avoiding loops, but the router also needs to keep track of both source and group addresses in each multicast routing table entry because there could be several sources for the same group.

For example, in Chapter 14 we discussed how a central device can use NTP to send time synchronization information as a multicast router. We also explained why it was important to have more than one NTP server. So, even in a simple multicast example like this it is quite likely that the routers will need to forward packets to the same set of end devices from two sources that may be on different network segments. The group address alone doesn't tell you enough about how to forward packets belonging to this group.

When you look at the multicast routing table with the show ip mroute command, you will see not only (Source, Group) pairs such as (192.168.15.35, 239.5.5.55), but also pairs that look like (*, 239.5.5.55). This means that the source is unspecified. Cisco routers organize their multicast routing tables with a parent (*, Group) for each group, and any number of (Source, Group) pairs under it. If there is a (*, Group), but no (Source, Group) entries for a group, then that just means that the router knows of group members but doesn't yet know where to expect this multicast traffic from.

Each of these (Source, Group) entries represents a Shortest Path Tree (SPT) that leads to the source of the multicast traffic. In sparse mode multicast routing, the root of the tree could actually be a central Rendezvous Point (RP) router, rather than the actual traffic source. Because each router must know about the path back to the source or RP, the term Reverse Path Forwarding (RPF) is often used to describe the process of building the SPT.

Two important elements are required for a multicast network to work. The first we've already mentioned: you need a way to route multicast packets from the source to all of the various destinations in the group. The other critical element is that the multicast network has to provide a way for end devices to subscribe to a multicast group so that they can receive the data. The network uses the Internet Group Management Protocol (IGMP) to manage group subscriptions.

IGMP and CGMP

IGMP functions mainly at Layer 3. Individual end devices use IGMP to announce that they wish to join a particular multicast group. The IGMP request is picked up by a router that attempts to fulfill the request by forwarding the multicast data stream to the network containing this device. The IGMP protocol is in its second version, which is defined in RFC 2236. A third version is currently in the draft stages.

What IGMP does is relatively simple in concept. It provides a method for end devices to join and leave multicast groups. Here is the output of tcpdump showing the device 192.168.1.104 joining the group 239.5.5.55:

17:10:16.397055 192.168.1.104 > 239.5.5.55: igmp nreport 239.5.5.55 (DF) [ttl 1]
17:10:19.276998 192.168.1.104 > 239.5.5.55: igmp nreport 239.5.5.55 (DF) [ttl 1]
17:10:21.027002 192.168.1.104 > 239.5.5.55: igmp nreport 239.5.5.55 (DF) [ttl 1]

Note that the device sends three IGMP packets stating its membership to make sure that it is heard. The router receives the request to join this group and sets a timer to count down for three minutes. As long as some device reasserts its membership with IGMP within this period, the group will remain in the router's multicast routing table. If all of the group members leave, or if they all simply stop sending IGMP updates for more than three minutes, the router will remove this group from its tables to save memory.

When the device wants to stop receiving a multicast group, it sends a single IGMP leave packet. The router immediately reacts by sending a query to this segment to find out if there are still any other members left in this group. It tries twice before deciding to stop sending traffic for this group to this network segment:

17:16:17.934667 192.168.1.104 > ALL-ROUTERS.MCAST.NET: igmp leave 239.5.5.55 (DF)
[ttl 1]
17:16:17.937715 192.168.1.1 > 239.5.5.55: igmp query [gaddr 239.5.5.55] [tos 0xc0]
[ttl 1]
17:16:19.050430 192.168.1.1 > 239.5.5.55: igmp query [gaddr 239.5.5.55] [tos 0xc0]
[ttl 1]

The important changes to the protocol between Versions 1 and 2 of IGMP have to do with determining when all of the members of a group in a particular network have left. The most important addition to Version 3 is the ability to specify and filter multicast sources. So a device may specify that it is interested in receiving multicast messages from one source, but not from another—even though both sources may be sending to the same group.

Although it is not yet fully standard and few end devices support IGMP Version 3, Cisco has already adopted these extensions. You lose nothing by implementing them now because the IGMP protocol is fully backward compatible.

In a switched Ethernet LAN (including 100Mbps, 1000Mbps and higher speed variants), there is an additional benefit to multicast transmission. If the switches are multicast aware, they can forward packets with a particular group address to only those devices that are members of this group. So it is not necessary to flood the entire VLAN with multicast packets just because one device is a multicast group member. Naturally, this means that the switch must be able to read and use Layer 3 information, so this sort of functionality is not available on all Ethernet switches.

Many multicast-aware switches use IGMP snooping to read IGMP packets from devices as they join and leave particular groups. This sounds like a perfect and simple solution, but, in practice, it can be very complex to implement in the switch. The first problem is that there are several special cases that are difficult to manage. For example, things become quite complex when you have several multicast routers on a segment, or when there are complicated trunk topologies or connections to workgroup hubs. Another important problem with IGMP snooping is that the switch must read the contents of all multicast packets passing through it so that it won't miss any IGMP Join or Leave messages. In effect, the switch acts as if it were a member of every multicast group. If there is a heavy multicast application such as a multi-media application, this can cause serious CPU overhead on the switch.

Cisco has developed a proprietary protocol called Cisco Group Management Protocol (CGMP) to deal with these problems. CGMP is implemented on all Cisco routers and most new switches, even those without Layer 3 capabilities. It is a relatively simple protocol that allows the router to do most of the hard work for the switch. When a device on the LAN segment joins a multicast group by sending an IGMP Join message, the switch simply passes the IGMP packet through to the router as it would with any other packet. The router then sends a CGMP packet to the switch to let it know the MAC addresses of the device and the group. Similarly, when a device leaves a group, the router uses CGMP to tell the switch to stop forwarding this particular multicast group to this device. In this way the router, which has to keep track of this information anyway, can simply tell the switch what to do.

Unfortunately, CGMP doesn't solve all of the problems inherent in the IGMP model. Specifically, a device doesn't need to send an IGMP Leave message when it is no longer interested in receiving packets for that group. If the last group member leaves without sending the appropriate IGMP Leave message, the router will still think that there are devices in the group. It will continue to forward multicast packets to the segment until a timer expires. The router will eventually poll the LAN segment to see if any devices are still interested in receiving this group. If it gets no response, it will finally stop sending the multicast data stream. However, most implementations of IGMP Version 2 do send explicit Leave messages unless the end devices crash or terminate improperly. In any case it is usually better to have a device receive multicast data it didn't subscribe to than to lose the data. The only time when this isn't true is when the multicast data stream consumes too much bandwidth and starts to cause congestion for normal unicast traffic, or when processing the unnecessary multicast traffic causes CPU problems on the end devices.

Switches running newer versions of CGMP include a particularly nice feature called Local Leave Processing. With it, the switches are able to intercept IGMP Leave messages from devices and process them internally. If there are other group members elsewhere on the switch, it can simply stop sending data from this group to the device that no longer wishes to be a member. Then, when the last group member leaves the group, the switch will send a global IGMP Leave packet to the router to tell it to stop sending this multicast group.

Multicast Routing Protocols

There are two general types multicast routing protocols, called dense and sparse mode. Dense mode means that every multicast router receives every multicast packet unless and until it explicitly says that it doesn't want it. As we will discuss shortly, this applies to each group and each interface separately. Sparse mode, on the other hand, means (loosely) that no router will receive a multicast group unless it explicitly requests it. It is important to note that end devices, whether multicast servers or group members, are completely unaware of which mode their network uses, or even which multicast routing protocol. Indeed it is possible to run a network where the routers use a combination of these modes.

There are many examples of dense-mode protocols, such as Protocol Independent Multicast-Dense Mode (PIM-DM), Distance Vector Multicast Routing Protocol (DVMRP), and Multicast Open Shortest Path First (MOSPF). There are fewer sparse-mode protocols, with the best examples being Protocol Independent Multicast-Sparse Mode (PIM-SM) and Core Based Trees (CBT).

Not all of these protocols are available in Cisco routers. Like most vendors, Cisco implements PIM-DM and PIM-SM, as well as MBGP. But Cisco does not implement MOSPF or CBT, and has a limited version of DVMRP.

There are two other general categories of multicast routing protocols: protocol dependent and protocol independent. The difference has to do with the interaction with an underlying routing protocol, not with the ability to handle non-IP multicast traffic. All of the multicast protocols mentioned in this book are specific to IP multicast communications.

For example, MOSPF is protocol dependent because it relies on OSPF, and uses a special OSPF LSA type to carry information about multicast routing. PIM and CBT, on the other hand, both use the multicast traffic itself, along with the standard unicast IP routing table and IGMP requests to build the multicast forwarding trees. Since they don't care how the router got its unicast IP routing table, they are called protocol independent.

For the network engineer these distinctions are quite important, since they affect flexibility, reliability, and network performance. In general, if you have a large network (particularly one with bandwidth-constrained WAN links) and the multicast sources and destinations can be more or less anywhere through your network, you should use a protocol independent sparse mode multicast routing protocol. Even if you're not sure that this really describes your network, it is generally safer and easier to lean in this direction.

The PIM protocols, in particular PIM-SM, are generally the best choices for implementing new multicast networks. In the past there were problems with interoperability in multivendor networks as different router manufacturers implemented different sets of multicast routing protocols. Since DVMRP was the first widely implemented multicast routing protocol, the rule of thumb used to be that DVMRP was the best way to allow communication between groups of routers from different vendors. However, a quick survey of protocols supported by major router vendors shows that almost all of them now support PIM and DVMRP.

PIM-DM and PIM-SM

PIM-DM and PIM-SM have several important similarities, as well as important differences. Let's look schematically at how each builds and maintains its multicast forwarding trees to explain how they work. Note that this is not intended to be a rigorous explanation of the protocols. Instead, we just want to give you a good basic understanding of what they do and how they do it. For more detailed information, refer to the standards documents, particularly RFCs 2362 and 2715.

Suppose a device wants to join a group, G. The first thing it does is to send an IGMP Join message to its local router. If this is the first group member, the router creates an entry in its multicast forwarding table for (*,G). This says to forward to this interface all multicast packets addressed to group G from any source. At this point, if the router receives any packets for this group, it knows at least one place to which to forward them.

In PIM-DM, the router will create the group and wait for packets. It will also send a Join request to each of its PIM neighbors to find out if they have this group. If it receives multicast packets for a group that it doesn't care about, then the router will send Prune messages back to where they came from, to ask to be removed from the forwarding tree for this group. This is commonly called a "flood and prune" model, which is common to all dense mode multicast protocols.

If this router uses PIM-SM, however, it will attempt to join a multicast tree rooted at the RP. An RP is a router somewhere in the network that acts as a central distribution point for one or more multicast groups. We will discuss how the other routers come to know about the RP in Recipe 23.2 and Recipe 23.3, but for now we'll assume that they know how to find it. When the last-hop router receives an IGMP message from a device asking to join a group, it has to go looking for that group. The best place to start looking is the RP.

So, the last-hop router looks at its unicast routing table to figure out which of its neighboring routers is the best path to the RP, and it sends it an explicit PIM-SM Join message for this group. If the neighboring router is already receiving this group, then the problem is solved and the data starts to flow. Otherwise, this neighbor must send another Join to the next-hop router in the direction of the RP, and so on until a multicast-forwarding tree is created with its root at the RP.

The upstream router will automatically prune the branches of this multicast tree if they don't receive another explicit Join within the three-minute timeout period. So, by default, the routers all refresh the tree with a new Join for every active group once per minute. This creates and maintains a stable tree rooted at the RP and extending to all group members in the network that remains active even if there is no multicast traffic being forwarded.

The only remaining piece of the puzzle is how the packets get from the sender to the RP. When the source device sends its first packet, the first-hop router receives it normally as it would any other packet. This first-hop router has already learned where the RP is. When it receives a multicast packet from a new source, the router must register this source with the RP. The router encapsulates the multicast packet in a PIM-SM registration packet, which it sends by unicast to the RP. The RP then removes the encapsulation and forwards the packet down the tree. The RP also sends an explicit PIM-SM Join message toward the source. The Join message links up a tree from the RP upstream to the source and downstream to the pre-existing tree containing the group members. Once the tree is built, there is no need for the first-hop router to continue encapsulating multicast packets to send them to the RP. So the first-hop router can revert to normal multicast forwarding instead, knowing that the RP is somewhere downstream on the SPT.

Finally, once there is a tree connecting the ultimate source with all of the group members, there is no more need for the RP. So the last-hop routers start to send PIM-SM Join messages to create a new tree that is centered on the source rather than the RP. This is actually controlled by a minimum traffic flow threshold. PIM-SM starts to build the new tree rooted at the source only if the amount of traffic coming down the tree for this group exceeds this threshold. This threshold traffic flow rate is zero by default on Cisco routers, but we will show how to adjust it in Recipe 23.4

For LAN segments that have more than one multicast router, PIM also includes the concept of a Designated Router (DR), which preferentially handles multicast forwarding for this segment. The DR ensures that each multicast application packet appears on the LAN segment once and only once. This is similar to the OSPF Designated Router concept that we discussed in Chapter 8, but PIM's election process is much simpler. The PIM DR is just the router whose interface on this segment has the higher IP address. Whenever there are two multicast routers on the same segment, they learn about one another through PIM, and periodically send special multicast "Hello" packets to one another. This ensures that, if the current DR device becomes unavailable, the next candidate can take over this function. Since the backup device resides on the same LAN segment, it is able to monitor IGMP, so it always knows which groups it will need to forward if it needs to become the DR.

DVMRP

Distance Vector Multicast Routing Protocol (DVMRP) is defined in RFC 1075, and was the first widely implemented multicast routing protocol. This protocol is similar to RIP in many ways. There are a few important differences, though. The maximum diameter of a RIP network is 16 hops, as we mentioned in Chapter 6. DVMRP has a maximum metric of 32, which drastically improves its flexibility. It's not hard to find a network with a diameter greater than 16 hops, but a 32-hop diameter is sufficient for most real-world corporate networks. It is not sufficient for the public Internet, but that is why multiprotocol extensions to the Border Gateway Protocol, sometimes also called Multicast Border Gateway Protocol (MBGP), were invented.

DVMRP is often a good choice for allowing routers from different manufacturers to exchange multicast routing information. It is a dense mode protocol, however, so it is generally less efficient with network resources. We recommend using DVMRP primarily as a mechanism for exchanging multicast routing information with older non-Cisco devices. In recent years, PIM has become the popular choice for multicast routing among most large router vendors, though, so DVMRP's niche is now mostly in interconnecting with existing non-Cisco multicast networks.

In many ways, DVMRP functions in a similar way to PIM-DM. It uses a dense-mode strategy that forces all routers to prune themselves from any multicast trees that they don't require. And it also uses the unicast routing table to determine the shortest path back to the source device. The main difference, however, is that DVMRP includes its own internal unicast routing protocol that it uses to help make decisions about the best SPT.

DVMRP uses an algorithm called Truncated Reverse Path Broadcasting (TRPB) to allow every router in the network to determine where it is relative to the multicast source, and to calculate the optimal SPT back to the source. Because DVMRP uses its own internal unicast routing protocol, it is not considered protocol independent.

You must take special measures to force DVMRP to follow the standard unicast routing table and make it protocol independent. Of course, this would break one of the main reasons for using DVMRP in the first place. Because it maintains its own routing tables, DVMRP is able to work in networks where the multicast and unicast topologies are different. This is not uncommon in cases where parts of the unicast network don't support multicast routing, or where traffic engineering leads you to put multicast traffic through different network links.

In fact, Cisco routers do not provide a full DVMRP implementation. They can take part in discovering and exchanging routing information with DVMRP neighbors. But the actual multicast routing is done using PIM while referring to the DVMRP routing tables.

MOSPF

Multicast Open Shortest Path First (MOSPF) is not really a separate protocol, but rather is a set of extensions to the popular Open Shortest Path First (OSPF) unicast routing protocol. OSPF is described in more detail in Chapter 8. To allow OSPF to carry multicast routing information, RFC 1584 added a new Link State Advertisement (LSA) type called Type 6, or simply the MOSPF LSA.

Cisco routers do not support MOSPF, so we will not discuss this protocol in any detail except to point out that Cisco routers will generate log error messages whenever they encounter Type 6 OSPF LSAs. Recipe 23.7 shows how to configure the router to ignore these packets.

The biggest advantage to MOSPF is that it is tightly integrated with OSPF, which can simplify network administration. Furthermore, because it uses the same Link State algorithm as OSPF, every router in the network can independently deduce the best path back to the source.

However, it is a dense-mode protocol, consequently less efficient with network resources, and requires OSPF to work. This is almost certainly why Cisco has chosen not to implement it.

MBGP

Multicast Border Gateway Protocol (MBGP) is based on a small set of extensions to BGP defined in RFC 2858 to allow the exchange of any routable protocol information between ASes. It does this by simply introducing two new attributes to the BGP protocol: Multiprotocol Reachable Network Layer Routing Information (MP_REACH_NLRI) and Multiprotocol Unreachable Network Layer Routing Information (MP_UNREACH_NLRI), which are used to carry information about reachable and unreachable networks.

It's important to understand that MBGP is not really a multicast routing protocol in the same sense as PIM or DVMRP. It doesn't understand or have the ability to Join or Prune SPTs. It doesn't include any functionality for dealing with Rendezvous Points. All it does is forward information about multicast groups and sources, and make this information available to other multicast routing protocols. It needs another protocol to do all of the other work of joining and pruning multicast distribution trees. The two protocols most commonly used for this are PIM and DVMRP.

Top