Previous section   Next section

Introduction

Border Gateway Protocol (BGP) Version 4 is the lifeblood of the Internet. It is responsible for exchanging routing information between all of the major Internet Service Providers (ISPs), as well between larger client sites and their respective ISPs. And, in some large enterprise networks, BGP is used to interconnect different geographical or administrative regions.

Primarily to support the complexity of the public Internet, Cisco has added several clever and useful features to its BGP implementation. This book is focused on solutions to real-world problems, so we will not try to describe all of these features. And it would take a whole book to describe how to operate BGP in a large ISP network, so we will avoid discussing extremely large-scale BGP problems. Instead, we will look at two main classes of BGP problems: connecting a network to the public Internet, and interconnecting two or more Interior Gateway Protocols (IGP) in a private network.

A detailed discussion of the BGP protocol and its features is out of the scope of this book. For this type of information, we recommend referring instead to IP Routing by Ravi Malhotra (O'Reilly), or BGP by Iljitsch van Beijnum (O'Reilly). However, we will include a brief review of the most critical concepts.

Basic Terminology

BGP is an Exterior Gateway Protocol (EGP), which means that it exchanges routing information between Autonomous Systems (ASes). This is different from purely IGPs, such as RIP, EIGRP, and OSPF, which we discussed in Chapter 6, Chapter 7 and Chapter 8, respectively. It also uses a different basic algorithm for building a loop-free topology than any of those protocols. RIP is a distance vector protocol, OSPF is a link state protocol, and EIGRP is a distance vector protocol that incorporates many of the advantages of a link state protocol. BGP, on the other hand, uses a path vector algorithm. This means that instead of reducing each route's relative importance in the routing table to a single metric or cost value, BGP keeps a list of every AS that the path passes through. It uses this list to eliminate loops, because a router can check whether a route has already passed through a particular AS by simply looking at the path.

RFC 1930 describes what the Internet Engineering Task Force (IETF), which is the official Internet standards organization, considers to be the Best Current Practices (BCPs) for creating and numbering ASes. This document defines an AS as "a connected group of one or more IP prefixes run by one or more network operators which has a single and clearly defined routing policy." In practical terms, what appears on the Internet as a single AS may in fact represent an ISP as well as all their customer networks that aren't using BGP to advertise themselves as unique administrative domains.

A consistent routing policy in this context means that if a device on the edge of the AS advertises that it can handle routing for a particular set of prefixes, then all of the routers in the same AS can handle the same prefixes. It doesn't matter if some of these prefixes refer to internal routes and others refer to external routes. What matters is that the routers inside the AS must agree with one another on how to handle each route, and which internal or external router is the best place to send traffic for this particular network. This agreement is what it means for the AS to behave consistently.

It is important to note that this definition doesn't mean that there has to be one and only one IGP inside of an AS. In fact, there could be many IGPs, and there could even be no IGP. The interior routing inside of the AS could be handled entirely by a combination of BGP and static routes, for example.

BGP routers talk to one another over a permanent TCP connection on port 179. When BGP operates between two routers that are in the same AS, it is called Interior Border Gateway Protocol (iBGP). When the peers are in different ASes, they use external Border Gateway Protocol (eBGP). Unless you are using one of the more complex features that were invented specifically to avoid it, all of the BGP routers in an AS must peer with one another in a complete mesh. This ensures that the AS behaves consistently when advertising routes to other ASes.

Synchronization is a concept that comes up frequently in BGP configurations. Because the AS needs to behave consistently, if you run an IGP and iBGP, they have to agree. Think of a network where the iBGP peers are several hops apart, and the intervening network uses an IGP to communicate between them. Synchronization requires that, for a BGP route to be useable, the IGP must also contain a route to the same prefix. This ensures that one of these BGP peer routers doesn't try to forward a packet to the other internal BGP peer unless the network connecting them knows what to do with this packet.

Cisco routers allow you to disable synchronization, which is actually necessary in any case where you don't redistribute the IGP routes into BGP. Make sure that your network design doesn't require the IGP to have access to the BGP routes in order to communicate between the iBGP peers if you disable synchronization.

Every discussion of BGP includes frequent references to IP prefixes. A prefix is a Classless Inter-Domain Routing (CIDR) block of addresses. We previously discussed CIDR in Chapter 5. CIDR is a set of rules for IP subnetting that allows you to summarize groups of IP addresses. For example, you might have four network segments that use the IP addresses 172.25.4.0/24, 172.25.5.0/24, 172.25.6.0/24 and 172.25.7.0/24. Each of these network addresses is a prefix. If, for example, you wanted to send a packet to the device 172.25.5.5/32, your router only needs to know how to route packets for 172.25.5.0/24. This route prefix includes the specific host address.

But you can go one step further than this. If the paths to all of these IP networks pass through the same router, it is often useful to summarize or aggregate the prefixes. The router that leads to all of these networks might simply advertise a single prefix, 172.25.4.0/22, that covers all of the individual networks.

Similarly, CIDR allows you to create supernets that summarize several classful networks. For example, you could summarize 172.24.0.0/16 through 172.31.0.0/16 as 172.24.0.0/13.

BGP requires that every AS must have a 16-bit Autonomous System Number (ASN). Because it is 16 bits long, the ASN can have any value between and 65535.

The ASN is a globally unique identification number. BGP uses these ASNs to eliminate loops. Suppose two networks are using the same ASN. A router in the first AS will send out its routes normally, but the BGP router for the second network will drop these routes because they already appear to have passed through this AS. So it is important to ensure that you follow the rules on ASN selection, which are described in RFC 1930.

RFC 1930 originally divided up the range from 1 through 22,527 among the three major international Internet registry organizations (RIPE, ARIN, and APNIC) to allocate to networks connected to the public Internet. Since publication of that RFC, however, the IANA has distributed further blocks of numbers, currently extending up to almost ASN 30,000.

Just as RFC 1918 defines private unregistered ranges of IP addresses for networks that don't connect directly to the public Internet, RFC 1930 defines a series of private unregistered ASN values. You can use these private ASNs freely as long as they don't leak onto the public Internet. And, just as you can use NAT to hide your private IP addresses when you connect to the Internet, you can also hide private ASNs, as long as the AS that connects directly to the public Internet has a registered ASN and registered IP addresses.

All ASN values between 64,512 and 65,534 are designated for private use. This gives 1,023 ASN values that you can freely use in your internal network without registering, and without fear of conflict. If you use these private ASNs in an enterprise network, you must ensure that each private ASN is unique throughout the network. Enterprise networks that are large enough to require multiple ASs are generally managed by several different groups, so it is critical to coordinate the use of these private ASNs. If there is ever a conflict, with two ASes using the same ASN, it will disrupt routing to both of the conflicting ASes. And, if either of the conflicting ASes is used for transit, it could disrupt routing throughout the entire enterprise network, causing routing loops and unreachable networks. Although, of course, each individual AS will continue to function normally internally.

There are many situations where you can use unregistered ASNs. In fact, the only time a registered ASN is required is when you need to use BGP to exchange routing information with an ISP. Note that if you only have a single link to a single ISP, then you really don't require BGP at all. If you have only a single connection to the Internet, then you can get by with a single default route to the Internet because everything passes through this one link. If the link does go, there's nothing you can do anyway. So, in this case, running BGP is overkill. A small router with a default route is more than adequate.

You should consult your ISP to discuss your options. They might also be willing to let you use BGP and a private ASN, which they will remove when passing your routes to the rest of the world. They may even be willing to let you run a simpler routing protocol (such as RIPv2) to provide redundancy among two or more links that all use their network. In any case, your ISP will probably not pass your routes directly to the Internet anyway. It is more likely, and preferable, that they will allocate addresses to your network from a range that they can summarize. Then the ISP will just pass a single routing entry to the rest of the Internet to represent many customer networks.

You can also do this kind of AS path filtering internally. If you have several internal ASs, only one of which connects to the public Internet, then you can register the one directly connected ASN, and simply filter the private ASNs out of any path information that you pass to your ISPs. We show an example of this kind of filtering in Recipe 9.9.

Another special ASN value that bears mentioning is 65,535, which the IANA reserves for future requirements. RFC 1930, on the other hand, says that this ASN is part of the range that is freely available for unregistered use. We recommend avoiding this number, because the IANA is the ultimate authority. Although there is currently no conflict with this number, the IANA may decide to give it some special significance later, which could break existing private networks that might use it.

Throughout this chapter we will use private ASNs and private IP addresses in examples that are intended to represent the public Internet. This is purely for demonstration purposes. You must never allow these private values to reach the public Internet.

BGP Attributes

BGP associates several different basic attributes with each route prefix. These attributes include useful pieces of information about the route, where it came from, and how to reach it. Well-known attributes must be supported by every BGP implementation. Some well-known attributes are mandatory. All of the mandatory attributes must be included with every route entry. A BGP router will generate an error message if it receives a route that is missing one or more well-known mandatory attributes.

There are also well-known discretionary attributes, which every BGP router must recognize and support, but they don't have to be present with every route entry. Whenever a router passes along a route that it has learned via BGP to another BGP peer, it must include all of the well-known attributes that came with this route, including any discretionary attributes. Of course, the router may need to update some of these attributes before passing them along, to include itself in the path, for example.

BGP routes can also include one or more optional attributes. These are not necessarily supported by all BGP implementations. Optional attributes can be either transitive or non-transitive, which is specified by a special flag in the attribute type field. If a router receives a route with a transitive optional attribute, it will pass this information along intact to other BGP routers, even if it doesn't understand the option. The router will mark the Partial bit in the attribute flags to indicate that it was unable to handle this attribute, however.

The router will quietly drop any unrecognized non-transitive optional attributes from the route information without taking any action.

We will now describe several of the most common BGP attributes.

ORIGIN (well-known, mandatory)

This attribute can have one of three different values, reflecting how the BGP router that was responsible for originating the route first learned of it. The possible values are:

0 - IGP: The route came from an IGP interior to the originating AS.

1 - EGP: The route came from an EGP other than BGP.

2 - Incomplete: Any other method.

AS_PATH (well-known, mandatory)

The AS_PATH is a list of ASNs, showing the path taken to reach the destination network. There are actually two types of AS_PATHs. An AS_SEQUENCE describes the literal path taken to reach the destination, while an AS_SET is an unordered list of ASNs along the path. Each time a BGP router passes a route update to an eBGP peer, it updates the AS_PATH variable to include its own ASN.

NEXT_HOP (well-known, mandatory)

This attribute carries the IP address of the first BGP router along the path to the destination network. When the router installs the route for the associated prefix in its routing table, it will use this attribute for the next-hop router. This is where the router will forward its packets for this destination network.

By default, the NEXT_HOP router will be the router that announced this route to the AS. For routes learned from an external AS via eBGP, the NEXT_HOP router will be the first router in the neighboring AS. This information is passed intact throughout the AS using iBGP, so all routers in the AS see the same NEXT_HOP router.

MULTI_EXIT_DISC (optional, non-transitive)

The Multiple Exit Discriminatory (MED) option is also often called the BGP Metric. Because this 32-bit value is non-transitive, it is only propagated to adjacent ASes. Routers can use the MED to help differentiate between two or more equivalent paths between these ASes.

LOCAL_PREF (well-known, mandatory)

BGP only distributes Local Preference information with routes inside of an AS. Routers can use this number to allow the network to favor a particular exit point to reach a destination network. This information is not included with eBGP route updates.

ATOMIC_AGGREGATE (well-known, discretionary)

When a BGP router aggregates several route prefixes to simplify the routing tables that it passes to its peers, it usually sets the ATOMIC_AGGREGATE attribute to indicate that some information has been lost. It doesn't set this attribute, however, in cases where it uses an AS_SET in its AS_PATH to show the ASNs of all of the different prefixes being summarized.

AGGREGATOR (optional, transitive)

The AGGREGATOR attribute indicates that a router has summarized a range of prefixes. The router doing the route aggregation can include this attribute, which will include its own ASN and IP address or router ID.

Both the AGGREGATOR and the ATOMIC_AGGREGATE attributes have become relatively uncommon since the universal conversion to BGP Version 4.

COMMUNITY (optional, transitive)

A COMMUNITY is a logical grouping of networks. This attribute is defined in RFC 1997, and RFC 1998 describes a useful application of the concept to ISP networks.

MP_REACH_NLRI (optional, non-transitive)

This attribute carries information about reachable multiprotocol destinations and next-hop routers. Multiprotocol in this context could refer to any foreign protocol such as IPv6, although it is most commonly used with IP multicast, as we discuss in Chapter 23. MultiProtocol Label Switching (MPLS) also uses MBGP for per-VPN routing tables.

Carrying foreign routing information this way ensures backward compatibility. Routers that don't support the extension can easily interoperate with routers that do.

MP_UNREACH_NLRI (optional, non-transitive)

The MP_UNREACH_NLRI attribute is similar to the MP_REACH_NLRI, except that it carries information about unreachable multiprotocol destinations.

BGP has several other optional attributes as well, although we will not discuss them in this book. For more information, we suggest referring to Internet Routing Architectures (Cisco Press).

Route Selection

Unlike the various interior routing protocols that we discussed in the preceding chapters, BGP doesn't support multipath routing by default. So, if there are two or more paths to a destination, BGP will go to great extremes to ensure that only one of them is actually used.

BGP decides which route to use by applying a series of tests in order. It is important to understand these tests and the order that the router looks at them, particularly when you are trying to influence which routes are used. Otherwise you might end up wasting a lot of time trying to adjust your routing tables using one method, while the router is making the actual decision at some earlier step, without ever seeing your adjustments.

Note that at each step, there may be several routes to the same destination prefix that all meet the requirement, or are equal after a particular test. In that case, BGP will proceed to the next test to attempt to break the tie.

We should point out that these are the route selection rules on Cisco routers. Several of these rules are not part of the BGP specification. So, for non-Cisco equipment, you should consult the vendor's BGP documentation to see what the differences are.

  1. The first test is whether the next-hop router is accessible. By default, routers do not update the next-hop attribute when exchanging routes by iBGP. So it is possible to receive a route whose next-hop router is actually several hops away, and perhaps unreachable. BGP will not pass these routes to the main routing table, but it will keep them in its own route database.

  2. If synchronization is enabled, the router will ignore any iBGP routes that are not synchronized.

  3. The third test uses the Cisco proprietary weight parameter, selecting the route with the largest weight. This parameter is not part of the routing protocol. Adjusting the weight of a particular route on a router will only affect route selection on this router. It is a purely local concept. The default weight value is zero, except for locally sourced routes, which get a default weight of 32,768. The maximum possible weight is 65,535.

  4. If the weights are the same, BGP then selects the route with the highest Local Preference value from the LOCAL_PREF attribute. Routers only include this attribute when communicating within an AS (iBGP). For external routes, the router that receives a particular route via eBGP sets the Local Preference value. For internal routes, it is set by the router that introduced the route into BGP. This allows you to force every router in your AS to preferentially send traffic for a particular destination through a particular eBGP link.

  5. If two or more routes to the same destination network are still equal after comparing Local Preference values, the router moves on to look at the AS_PATH. This is the path vector that gives BGP its essential character. It is a set of AS numbers that describes the path to the destination network.

    BGP routers prefer routes that originate inside their own ASes.

    For routes that originate outside of the AS, BGP will prefer the one with the shortest path (i.e., the one with the fewest ASNs). This is a simple indication of the most direct path.

  6. BGP then looks at the ORIGIN attribute if the AS path lengths are the same, and selects IGP routes in preference to EGP, and EGP in preference to INCOMPLETE routes. An INCOMPLETE route is one that is injected into BGP via redistribution, so BGP isn't able to vouch for its validity.

  7. The next test looks at the MED, and selects the route with the lowest value. The MED is used only if both routes are received from the same AS, or if the command bgp always-compare-med has been enabled. With this command enabled, BGP will compare MED values even if they come from different ASes, although to reach this step the AS_PATHs must have the same length. Note that if you use this command at all, you should use it throughout the AS or you risk creating routing loops. MED values are only propagated to adjacent ASes, so routers that are further downstream don't see them at all.

  8. BGP prefers eBGP to iBGP paths. This helps to eliminate loops by ensuring that the route selected is the one that leads out of the AS most directly. Note that the iBGP routes don't include internal routes that are sourced from within your AS, because they are selected at step number 5. So this test only looks at routes to external destinations.

  9. The next test compares the IGP costs of the paths to the next-hop routers, and selects the closest one. This helps to ensure that faster links and shorter paths are used where possible.

  10. Next, BGP will look at the ages of the routes and use the oldest route to a particular destination. This is an indication of stability. If two routes are otherwise equivalent, it is best to use the one that appears to be the most stable.

  11. Finally, if the routes are still equivalent, BGP resorts to the router IDs of the next-hop routers to break any ties, selecting the next-hop router with the lowest router ID. Since router IDs are unique, this is guaranteed to eliminate any remaining duplicate route problems.

Note that there are subtle variations to these rules for special situations such as AS confederations, and many individual rules can be disabled if you want the router to skip them.

Cisco has also implemented a BGP multipath option that changes this route selection process somewhat. If you enable multiple path support, BGP will still perform the first 7 tests, evaluating everything up to and including the MED values. But, if two or more routes are still equivalent at this point, the router will install some or all of them (depending on how you implement this feature). Please refer to Recipe 9.8 for a discussion of this option.


  Previous section   Next section
Top