Unreal-to-Real: Path Vector – Border Gateway Protocol (BGP)

Path Vector – Border Gateway Protocol (BGP)

The Autonomous System (AS)

Although with routing protocols such as OSPF and EIGRP we talk of ASs, these ASs mean purely routing domains that use different IGPs. When we talk of ASs in the more global sense, then we are talking about ASs that are under different administrations, where we do not have the control on policies that we would if we were managing a group of internal ASs.

IDLP is BGP as implemented by ISO. The Internet Assigned Numbers Authority (IANA) now gives out Autonomous System (AS) numbers which range from 1 to 65,535. Any AS numbers between 64512 and 65535 are for private use. RFC 1930 gives guidelines as to the use of AS numbers.

Nowadays, ASs may be Single-homed (Stub), Multi-Homed or Transit ASs. When you need to pass traffic through your network from one or more ASs or you need to manage the traffic coming in from a particular AS, BGP is often necessary. ISPs typically use BGP. BGP is good at prevent looping, along with better advertisement of tens of thousands of routes and for better administration of routing policies.

Really, a Single-homed AS need only have a default route advertised internally, and the ISP need only advertise the internal AS network if it is not already part of the ISP's address space. No routing protocol would be required. You can have a Multi-homed AS where one link to a particular ISP acts as backup to a higher bandwidth link to the same ISP. Again using default routes where the backup route has a higher administrative distance would be fine here.

Having a backup link is not very efficient, a better solution is to load share across two links and enable each link to back up the other. Using OSPF is a good way to do this since both default routes can be advertised into the AS with equal costs and as External Type 1s. Routers within the AS take into account the internal cost of a route that is an External Type 1 (E1) to the ASBRs. This results in internal routers using the nearest exit points, thereby load balancing traffic. These solutions so far do not require BGP, but if you wanted to have more control on routes that are to be advertised and to modify metrics associated with these routes, then BGP may be more suitable. An example is when you are multihoming to multiple ASs where you have to advertise routes through different ISPs that own different blocks of addresses. These ISPs are unlikely to want to coordinate with each other, let alone 'punch holes' in their address blocks or advertise small address spaces .

Ultimately, load-balancing when connecting to ISPs is not precise because you cannot control the quality of access of the ISPs and beyond. Instead, multihoming should primarily be used for resilience.

IGPs and EGPs

IGPs (Interior Gateway protocols) use metric interface costs (OSPF) or hop counts (RIP) to determine the best paths. Exterior Gateway Protocols (EGPs) link varying IGPs and use administered routing policies to determine best paths through service providers. Originally EGP was used with the old Internet topology which, due to its small size, was a simple two tier model with a core AS and the additional ASs around it. An AS was given a 16-bit number and every 3 minutes EGP advertised the routes that it knew with other EGP peers via a full class IP address (no subnets) and a metric from 1 to 255, with 255 being unreachable. EGP is considered obsolete except in large private networks.

The main problem with EGP is that it could not cope with a meshed network of ASs, EGP could not detect loops and had no way of creating policies for routing. EGP was merely a reachability protocol rather than a routing protocol.

The Internet has grown substantially and now has a very hierarchical structure which can be summarized thus:

• Subscribers

• Local ISPs - Tier III

• Regional Service Providers - Tier II

• Network Service Providers - Tier I

• Network Access Points (NAP) - these interconnect the Tier I providers.

NAPs use Unix route servers running BGP. They share the Routing Arbiter Database of BGP routes which is copied between route servers.

Border Gateway Protocol (BGP) was the replacement for EGP and is not strictly a routing protocol, it is often described as a distance vector protocol because it uses path vectors, it is more policy-based than RIP. BGP knows nothing of what goes on within an Autonomous System (AS) it is used to link ASs, guaranteeing a loop-free environment.

BGP-1 was first defined in RFC 1105 in 1989, then was updated with BGP-2 in 1990 with RFC 1163 only to updated again to BGP-3 in 1991 with RFC 1267. The current version of BGP is known as BGP-4 and was defined in RFC 1654, RFC 1655, RFC 1771 and RFC 1772. BGP-4 was different from earlier versions in one main respect and that was it became a classless routing protocol and thereby supported CIDR. For information on CIDR have a look at CIDR.

CIDR does have a number of limitations:

· Lack of portability - an ISP will give you a CIDR block that is part of a larger CIDR block owned by that ISP. If you want to change ISP then you are unlikely to be able to keep the CIDR block. Having to change the IP addresses of the end user devices is made easier by using DHCP and/or NAT with private addressing. If you are an ISP and you want to change your upstream provider e.g. a Regional ISP, then you not only have to deal with your own addressing problems but also those end users that depend on you for access.

· Lack of flexibility when connecting to multiple providers - in order for Internet traffic to reach your small CIDR from both ISPs, you must advertise that block to both ISPs. If the CIDR block is say /22, then the rest of the world can see this block through two different ISPs and will use the more specific route to this network. The original ISP that 'owns' this block has it included as part of a larger block and only this larger block is advertised out. It is possible for the more specific /22 route to be re-advertised back into the originating ISP. To stop this the ISP that owns the block must advertise the more specific /22 block as well as the larger block i.e. punch a hole in the block.

This is not desirable. Even if the ISP does agree to advertise the /22 block, most Tier 1 providers do not accept any blocks less than /19 (called a Globally Routable Address) in order to minimize the routes in their backbone. To help with the address dependency problem, it is possible to obtain Provider Independent Address Space which is portable and is not dependent on the ISP. This does not help however if the address space is smaller than /19 and is therefore not accepted by Tier 1 providers.

BGP-4 Overview

BGP is considered to be a 'Path Vector' routing protocol rather than a distance vector routing protocol since it utilises a list of AS numbers to describe the path that a packet should take. This list is called the AS_PATH. Loops are prevented because if a BGP speaking router sees it's own AS in the AS_PATH of a route it rejects the route. A router in a transit AS may have extremely large routing tables (up to 90,000 networks amounting to over 30Mb) and BGP-4 uses Classless InterDomain Routing (CIDR) to slow the growth of these tables. BGP, border gateway protocol, The router maintains routing tables for the IGP as well as the BGP and information can be exchanged between them.

There are two types of sessions between a router and its neighbors:

· External BGP (EBGP) sessions occur between routers in different ASs which are usually next to each other sharing the same media and subnet.

· Internal BGP (IBGP) sessions occur between routers within the same AS and these sessions are used to synchronise the routing policy within an AS. These routers do not have to be next to each other however they do need to be able to see each other so that a TCP connection can be made between them! You would configure these if you were needing to pass BGP information to other ASs.

BGP-4 uses TCP (port 179) for sending and receiving messages reliably between Peer routers. (BGP calls routers Speakers, and routers that run between each other are called Peers). The reliable connection means that only changes need to be sent between peers rather than complete tables. These updates can be triggered updates rather than periodic updates. Only keep alive messages are sent regularly.

The Routing Process Simplified

Until now, this section has examined discrete aspects of routing—specifically, peer negotiation and static versus dynamic routing. Before diving into details of routing configuration, it makes sense to pause here and briefly overview the BGP routing process in its entirety. BGP is a fairly simple protocol, which is why it's so flexible. Routes are exchanged between BGP peers via UPDATE messages. BGP routers receive the UPDATE messages, run some policies or filters over the updates, and then pass the routes to other BGP peers. An implementation is required to keep all BGP updates in a BGP routing table separate from the IP routing table. In case multiple routes to the same destination exist, BGP does not flood its peers with all those routes; rather, it picks the best route and sends it. In addition to passing along EBGP routes from peers or IBGP routes from route reflector clients, a BGP router can originate routing updates to advertise internal networks that belong to its own autonomous system. Valid local routes originated in the system and the best routes learned from BGP peers are then installed in the IP routing table. The IP routing table is the final routing decision and is used to populate the forwarding table.

To model the BGP process, imagine that each BGP speaker has different pools of routes and different policy engines applied to the routes (although in reality only one pool may exist). The model would involve the following components:

• A pool of routes that the router receives from its peers

• An Input Policy Engine that can filter the routes or manipulate their attributes

• A decision process that decides which routes the router itself will use

• A pool of routes that the router itself uses

• An Output Policy Engine that can filter the routes or manipulate their attributes

• A pool of routes that the router advertises to other peers

BGP Message Structure

The BGP message varies between 19 and 4096 octets in size and has the following structure:

On this newly formed TCP connection the following list describes the types of messages with the code numbers:

1. Open message (code 1) - containing the BGP version number, the originating AS, the Hold time and the BGP router ID. This ID is by default the highest IP address on the router (as in OSPF).

2. Keepalive (code 4) - this follows the acceptance of the open message as it is sent back and consists of just the 19 octet message header. The keepalives are sent every 60 seconds just to stop the hold down time from expiring.

3. Update message (code 2) - one of these is required for each path. This contains one or more Network Layer Reachability Information (NLRI) listed as IP (Length, Prefixes) tuples, any withdrawn IP (Length, Prefixes) tuples (CIDR supernets) and Path attributes associated with the NLRI.

4. Notification Message (code 3) - this is sent if there is an error and always closes the connection. The following list details the possible Error codes and Error Subcodes in the Notification Message:

1. Message Header Error for which the possible Error Subcodes are:

      ·         Connection not synchronized
        ·         Bad message length
      ·         Bad message type
      2. Open Message Error for which the possible Error Subcodes are:
       ·         Version number not supported
       ·         Bad peer AS
          ·         Bad BGP ID
       ·         Optional parameter not supported
       ·         Authentication failed
       ·         Hold time not accepted
3. Update Message Error for which the possible Error Subcodes are:
      ·         Attribute list corrupted
      ·         Well-known attribute unrecognised
      ·         Well-known attribute missing
      ·         Attribute flag error
      ·         Attribute length error
      ·         ORIGIN attribute incorrect
      ·         AS routing loop
      ·         NEXT_HOP attribute incorrect
      ·         Optional attribute error
      ·         Network field incorrect
      ·         AS_PATH incorrect
4. Hold Timer expired
5. Finite State Machine error
6. Cease

The BGP connection has 6 possible states:

Idle - the router waits for a Start Event from either a new BGP process being configured or being reset before it initialises a TCP connection and starts the Connect Retry timer which defaults to 60 seconds.

Connect - the router waits for the completion of the TCP connection. Once complete, the router resets the Connect Retry timer and sends an Open message to the neighbor.

Active - this the state of the router that is initiating a TCP connection i.e. sends a Start Event. Again the ConnectRetry timer is used.

OpenSent - The Open message has been sent and the router is waiting for an Open Message from its neighbor. The Keepalive is sent next and the Hold Time is negotiated down to whichever router has the lowest value.

OpenConfirm - the router is waiting for a Keepalive or Notification message.

Established - once a Keepalive or Update message is received the Hold time is started and the BGP peer connection has started.

Path Attributes

In the Update message, Path attributes are sent in triplets which consist of Attribute Type, Attribute Length and Attribute Value. The Attribute Type is a 2 octet field and has this structure:

• Optional - 0 is for Optional and 1 for Well-known.

• Transitive - 0 is for Transitive and 1 for Non-transitive.
• Partial - 0 is for when the Transitive attribute is partial and 1 is for when it is complete.
• Extended Length - 0 indicates that the attribute length is 1 octet and 1 indicates that it is 2 octets.
• Attribute Type - the codes for the attribute types are listed as follows:

1. ORIGIN

2. AS_PATH
3. NEXT_HOP
4. MULTI_EXIT_DISC
5. LOCAL_PREF
6. ATOMIC_AGGREGATE
7. AGGREGATOR
8. COMMUNITY
9. ORIGINATOR_ID
10. CLUSTER_LIST

We have already mentioned AS_PATH, however there are other Path Attributes that contribute to complex policy making being available in BGP-4. BGP router sends Path Attributes in the Update messages and these act as metrics. These Path Attributes apply to the destination networks, the BGP routes. These attributes fall into a number of categories:

Well-known mandatory attributes - must be included in updates propagated to all peers and includes AS_PATH, NEXT-HOP and ORIGIN.

Well-known Discretionary attributes - includes LOCAL_PREF and ATOMIC_AGGREGATE and are optional attributes to include in updates.

Optional Transitive Attributes - includes AGGREGATOR and COMMUNITY and should be accepted by BGP even if the attribute is not supported by that router, it should pass on the attribute.

Optional Non-transitive Attribute - Includes the MULTI_EXIT_DISC (MED, the ORIGINATOR_ID and CLUSTER_LIST. Non-transitive means that if the BGP router does not recognize the attribute it can ignore it and not pass it on.

A transitive attribute that is not implemented by a router can be passed on to another router and is called 'partial'. A non-transitive attribute has to be deleted by a router if it hasn't implemented it.

ORIGIN

The mandatory Origin Attribute specifies the origin of a routing update. It uses the letter i to indicate that the NLRI was learned from a protocol inside the originating AS i.e. an IGP (e.g. by using the network statement). The letter e indicates that the NLRI has been learned from EGP externally, and a ? means that the NLRI is incomplete and was probably redistributed into BGP for which the source is obviously unknown. IGP is preferred over EGP which in turn is preferred over 'unknown'.

The Attribute Value Codes for these are 0 for AS_SET, 1 for EGP and 2 for incomplete.

AS_PATH

The path to the network specified by the NLRI is shared in the form of Path Vectors that contain AS numbers which a route should take to the destination network. Both BGP-3 and BGP-4 carry AS numbers of the ASs that have been traversed using the mandatory AS_PATH attribute and a router will reject updates containing its own AS number so preventing loops. When a BGP speaker originates a route it adds it's own AS number to the AS_PATH attribute for the NLRI sent in an update to an EBGP peer.

Subsequent BGP routers prepend their own ASs so the AS_PATH attribute grows from the beginning i.e. the originating AS is at the end of the string. This type of AS_PATH is strictly called an AS_SEQUENCE.

Other types include AS_SET, AS_CONFED_SEQUENCE and AS_CONFED_SET. The Attribute Value Codes for these are 1 for AS_SET, 2 for AS_SEQUENCE, 3 for AS_CONFED_SET and 4 for AS_CONFED_SEQUENCE.

If there are multiples paths to a destination network, BGP prefers the route with the shortest AS_PATH. Manipulation of where packets will go can be carried out by a BGP router modifying the AS_PATH. For instance, there may be two paths to a particular network but the shortest AS_PATH takes packets down a slower link than the other path. The router can prepend any number of AS numbers to this AS_PATH to increase the length of the AS_PATH, thereby forcing packets down the longer but more favourable route.

It is considered wise to just use multiple instances of the local AS number to increase the length of the AS_PATH rather than use arbitrary AS numbers in case of loops forming.

ATOMIC_AGGREGATE

A BGP router may advertise routes that 'overlap' meaning for example, both routes 20.1.0.0/16 and 20.1.1.0/24 may be advertised where 20.1.0.0/16 is an aggregate which includes the route 20.1.1.0/24. Routers always prefer the route with the longest mask (more specific route). When aggregating routes path information is lost as mentioned before, because the router advertising the aggregate just includes it's own AS in the AS_PATH rather than include the path information for the original routes being aggregated.

The Atomic Aggregate attribute informs a BGP neighbour that the router has aggregated the IP networks. As the route is advertised further to other peers the ATOMIC_AGGREGATE has to remain attached to the route. Another optional attribute called the AGGREGATOR indicates the router ID (IP address) and the AS number of the router that performed the aggregation.

AS_SET

Where the AS_PATH atribute (AS_SEQUENCE) is an ordered list, the AS_SET is unordered. Why have an unordered list? Well, when aggregation is performed the AS_PATH information is lost and the aggregate is advertised as originating from the BGP peer that performs the aggregation. The problem with this is that loop detection cannot occur and the aggregate may be advertised back to an AS which already knows about one or more of the specific routes included in the aggregate. Maintaining an ordered AS_PATH history for each of the specific routes is not possible, but an unordered list is perfectly possible and this is where AS_SET is useful.

The aggregating router has the AS_SEQUENCE information and just includes this as a jumbled list of ASs in the AS_SET attribute so helping other peers to still prevent routes. A route that has the AS_SET attribute need not have the ATOMIC_AGGREGATE.

NEXT_HOP

Consider these scenarios:

For EBGP peers the next hop to an external destination network is the IP address of the EBGP peer that sent the update.

· For IBGP peers the next hop to an internal network is that of the IBGP neighbor that originated the route. If this neighbour is separated by an IGP (i.e. the IBGP peers do not share a common link) then a recursive lookup must be performed to find the route to this next hop.

· Alternatively, for IBGP peers the next hop to an external network is the EBGP peer that learned the route, rather than the router that sent the update, since the IBGP router may not be the best located to get to the other AS. On a multi-access network like Ethernet, this is fine, since there is a route to the router in the other AS. There are however, issues if the network is a Non-broadcast Multi-access (NBMA) network such as Frame Relay.

Take the following example:

R1 and R3 are EBGP peers. R3 advertises the network 192.168.20.0 with the NEXT_HOP attribute set to 10.1.1.2 which is part of AS65001 NOT AS65000! R2 and R1 have an IBGP peering, however although R1 can advertise the 192.168.20.0 network to R2, when R2 performs a recursive route lookup it will not find the route to 10.1.1.0 network since it is not in AS65000. The route 192.168.20.0 will be in the BGP table but NOT installed as a BGP route in the IGP routing table because it is unreachable. You could get around this by setting a static route, or run the IGP in passive (listening) mode or set the AS border router (R1) as the next hop using the next-hop-self switch. This next hop address is known to the IGP.

LOCAL_PREF

The Local Preference is relevant when there is more than one path to a network outside of the current AS for instance if your network is connected to more than one ISP. Each of the routers that link to outside the AS can set a preference value for routes advertised into the AS, and this value indicates the router's preference for these routes. Only IBGP routers share the local preference values it does not leave the AS. The higher the value the more preferable the route is so if there are multiple paths to this network the route with the highest Local Preference is chosen and all traffic destined for the network is sent this way.

WEIGHT

The Weight attribute :Router-originated routes have a weight of 32768 by default and other routes have a weight of zero where higher weight routes are preferred. Weight acts in the same way as Local Preference, the only difference is that it only applies to routes within the box and is not communicated to other peers. If two peers are advertising the same route to a particular peer, then that peer can assign a higher weight to routes learned from one of those peers, and these routes would be preferred.

MULTI_EXIT_DISC

The Multi-Exit-Discriminator (MED) (BGP-2 and BGP-3 called this the INTER_AS metric) is used between EBGP peers when there are multiple paths from one AS to another and it indicates to external neighbors which path is preferred into an AS. The reason may be that one link has a higher capacity than another link. As with metrics the lower the MED the more preferable the path is. The LOCAL_PREF influences traffic leaving an AS whilst the MED influences traffic entering an AS.

The MED attribute is only used between directly connected ASs so it is not passed onwards to other ASs, if MED attributes are required there then they would have to be set separately.

COMMUNITY

The Community attribute (or Tag) allows BGP communities to be set up and provides a way of grouping destinations according to common BGP attributes, filters and policies. The Community attribute is made up of four octets where the first two indicate the AS and the last two contain the community identifier.

INTERNET • - by default all routes belong to this community and are advertised.

NO_EXPORT - this has the value 0xffffff01 and routes with this attribute cannot be advertised to EBGP peers i.e. outside of their AS with the exception of internal ASs within a Confederation.

NO_ADVERTISE - this has the value 0xffffff02 and routes with this attribute cannot be advertised to either EBGP or IBGP peers.

LOCAL_AS - this has the value 0xffffff03 and acts in the same way as NO_EXPORT except that routes with this attribute cannot even be advertised between EBGP peers in private ASs within a Confederation

Use of the Community attribute is described in RFC 1998. Communities are used to apply policies to a collection of routes. If a router sets a community attribute to a particular value for a group of routes, then the neighbours can apply their filtering, redistribution and attribute change policies to a group of routes based on the Community attribute rather than one at a time. One route can be assign more than one Community attribute. An aggregate route inherits all the community attributes of all the routes that are being aggregated.

ORIGINATOR_ID

This attribute is 32 bits long and is used by a Route Reflector as a Reflector ID (RID) to ensure that no loops occur in an AS using Route Reflectors.

CLUSTER_LIST

This attribute lists the route reflector cluster IDs that the route has passed through so that if a route reflector sees it's own cluster ID it drops the route to stop loops.

Peer Groups

We have mentioned the Community attribute to ease administration of policies for groups of routes. You can also set up groups of peers rather than routes and apply policies to groups of routers rather than groups of routes. Routers then only need to consult the policy database once and send copies of updates to multiple routers rather than perform multiple lookups.

Decision Criteria in a Multi-homed Environment

In a Multi-homed environment where resilience means that there are multiple connections to ISPs BGP only chooses one route to a particular destination and uses the following priorities in order when making its decision:

1. Only look at synchronized routes with a valid next hop.

2. Prefer highest WEIGHT
3. Prefer highest LOCAL_PREF.
4. Prefer route originated by the local router.
5. Prefer shortest AS_PATH.
6. Prefer IGP origin code over EGP and furthermore over incomplete.
7. Prefer lowest MED.
8. Prefer EBGP path rather than IBGP path.
9. Prefer the path through the closest IGP neighbor.
10. Prefer the oldest EBGP path.
11. Prefer the path with the lowest BGP router ID.

An Example demonstrating AS_PATH, AS_SET and the NLRI

In BGP terminology, a route is made up of Network Layer Reachability Information (NLRI) and path attributes. In BGP-4 the NLRI consists of an IP address prefix and a prefix length which is the number of bits that make up the range of addresses (like a subnet mask, but NOT), so a class B address would have 16-bit prefix length. BGP-4 does not respect the traditional class distinctions, e.g. 10.1.0.0 with a 16-bit prefix length would be treated like a class B address instead of a class A address as most other protocols would. This is the implementation of CIDR. BGP-4 can aggregate many networks into a single advertisement similar to OSPF summaries and indicates this with the two Aggregate Attributes described earlier.

Provider A advertises a BGP NLRI of 200.0.0.0/8 to provider B so that B knows that A is taking care of hosts 200.0.0.0 to 200.255.255.255. 'A' administers AS10, AS20, AS30 and AS40 and these can use a default route to access the routing tables on any router in AS100. AS200 learns two paths to 200.20.0.0, one AS path being (AS10, AS100) and the other being (AS10, AS100, AS50), the administrator can assign weights to these ASs in order to develop a policy on which path to use.

One rule in BGP is that a router in a particular AS will ignore routes that contain its own AS number since they represent a loop. For instance, when AS200 advertises to AS100 the paths it learns to 200.10.0.0, i.e. the AS path (AS100, AS20), then BGP in AS100 ignores it as it can see the AS100 in the AS path.

Another rule in BGP is that you should only advertise paths that you use. For example, routers running BGP in AS50 have two paths to 200.20.0.0, one being (AS10, AS100) and one from a router in AS200 (based on its choice of path).

You can use address aggregation in BGP-4 so that, instead of just advertising a single path you can advertise an AS set e.g. (10,20,30,40,100), which means any one of these ASs. Although you can inject routes learned by BGP into ASs running an IGP (ideal for a stub AS), you need to create at least two peer BGP routers within an IGP AS (normally on the edge of the AS) in order to maintain AS_path information if the AS is a transit AS, this is because BGP attributes must be shared by all routers running BGP. Similarly, an external BGP session needs to be created to a router in a different AS.

Synchronization

IBGP is used to pass routes learned from one EBGP (edge router) to another EBGP. Because IBGP peers are in the same AS, the AS_PATH does not change so there is no loop protection via the use of the AS_PATH attribute. The AS_PATH is only prepended when a route is advertised to an EBGP peer. So an IBGP peer is not allowed to advertise routes that it learned about to another IBGP peer. An IBGP router can only advertise routes that it knows about (i.e. learned from an IGP) to its peers in other ASs i.e. EBGP peers. The default configuration of BGP on a circuit does not advertise any routes or allow any learned routes into the IGP routing table, these have to be manually entered as Network statements or be redistributed into the IGP. The trouble with redistributing BGP routes into the IGP is that you can flood many thousands of routes into the IGP if you are not careful. This can overload some routers and bring down a network.

If you had a hub spoke topology of IBGP peers, the spokes will not share learned routes with each other. If you wanted to operate BGP with full reach ability within an AS and prevent routing loops, you would need to configure a fully-meshed IBGP peering topology, if you were not redistributing BGP routes into the IGP.

Consider the following topology where the IBGP peers in AS 10 are partially meshed.

An IGP such as RIP or OSPF is used in AS 10 to provide TCP connectivity between R3 and R4. R3 and R4 are IBGP peers with each other, plus they have EBGP peerings with routers in AS 20 and AS 30 respectively. They therefore share routes learned from these different ASs with each other. The TCP connection for the IBGP peering however, is routed using the IGP through R1 and R2. If R4 learns of the route 172.16.10.0/24 from R6 in AS 30 and advertises this to R3. R3 will advertise this route to AS 20. Any packets in AS 20 destined for 172.16.10.0/24, will now be forwarded to R3. R3 knows that to get to 172.16.10.0/24 it needs to get these packets to R4, however it reaches R4 via R1 and R2 using the IGP so packets destined for 172.16.10.0/24 are sent to R1. R1 however, knows nothing of the network 172.16.10.0/24 and so drops the packet thereby creating a black hole.

If the IGP knows about the same routes that BGP does then this black hole would not occur. The process of making sure that both BGP and the IGP know the same routes is called Synchronisation. This rule states that a route must first be known via IGP before this same route learned from an IBGP peer is advertised to any BGP peers or entered into the IGP routing table as a BGP route. This 'BGP Synchronisation' prevents Black Holes and creates consistency of routing information throughout the AS. It cannot however, influence how the other AS will route traffic and it assumes that you are redistributing routes between the IGP and BGP.

In a fully meshed BGP network without an IGP, all routers can learn all the routes without any gaps, but if synchronisation is still operating as discussed earlier, no BGP learned routes will be entered into the routing table. In this fully meshed scenario, synchronisation can be turned off and routes learned by one peer will be allowed to be advertised and to be entered into the IGP routing table. The fully meshed BGP network is becoming the way forward for many ISPs. Take another example where there is no redistribution between the IGP and BGP:

R1 and R3 are IBGP peers and R1 knows how to get to the 10.0.0.0. R2, R4 and R5 however do not know about the 10.0.0.0 network, all they see are IP packets being routed through them hop by hop. If R2, R4 and R5 need to know about the 10.0.0.0 network then you need to redistribute between the BGP and the IGP and have synchronisation on so that routes are not advertised into BGP unless it exists in the IGP routing table. Or you can have IBGP fully meshed peers everywhere and turn off Synchronisation. The latter option is impractical in even moderately sized networks, which is where techniques such as Router Reflectors and Confederations are useful.

Route Reflectors

As discussed earlier, in order to prevent routing loops the normal BGP rule is that routes learned via IBGP are not propagated to other IBGP peers, sometimes called BGP Split Horizon. All BGP Speakers have to be fully meshed in TCP/IP. This can be very tedious to set up plus impossible for more than a few routers, so the idea of Route Reflectors was introduced whereby a BGP Speaker would sit at the hub of a hubspoke arrangement and become a Route Reflector for the spoke BGP Speakers. In contrast to the normal rules, this Route Reflector can propagate routes learned via IBGP to other IBGP peers and does not affect the paths that normal packets travel along. The Route Reflector 'reflects' routes learned via one IBGP peer in the cluster to the other IBGP peers in the cluster. The peers that are not Route Reflectors are sometimes called Route Reflector clients, other BGP routers in the AS that are not part of the cluster are called 'non-clients'.

The rules for when a reflector updates are as follows:

• An update from a client peer is sent to all client and non-client peers.

• An update from a non-client peer is sent to all client peers.
• An update from a EBGP peer is sent to all client and non-client peers.

Route Reflectors are defined in RFC 1966.

There can be multiple Router Reflectors in the same cluster and at different levels (nested). A Route Reflector forms a peering with the other non-route reflectors called Clients and forms a Cluster with them. The clients are not peers with each other. Other IBGP routers not in the Cluster are called Nonclients. The ORIGINATOR_ID attribute is used to identify the router-id of the route reflector this enables the router to determine whether the route has come from itself in the first place thereby preventing loops.

Another way of preventing loops is by use of cluster lists. If you want multiple route reflectors for resilience then a CLUSTER_LIST attribute can be configured so that the route reflectors recognise each other as being part of the same cluster. You can have multiple clusters as well and a Route Reflector can determine whether a loop exists by looking for its own cluster ID in the Cluster List of the advertisement. It is important that the route reflectors themselves are fully meshed.

Confederations

In very large BGP Autonomous Systems with fully-meshed IBGP peers you can divide up the routers into smaller private (member) ASs and form a confederation of private ASs that come together to produce a public AS as far as the external networks are concerned. Confederations are described in RFC 1965. Reserved AS numbers used for private ASs are 64512 through to 65535. These numbers should therefore be used for the internal AS numbers. EBGP peers in each private AS peer with each other and the whole confederation has a Confederation ID which is a legitimate AS number that external ASs see. The confederation structure itself is invisible to the external ASs. Other ASs connect using the Confederation AS number. Normally Next-hop, metric and Local Preference information is not passed between EBGP peers, however within a confederation this information is shared. In order to prevent loops from occurring the AS_PATH has two versions, one called the

AS_CONFED_SEQUENCE that is an ordered list of ASs that are internal to the confederation, and the other called the AS_CONFED_SET that is an unordered list of ASs.

Other RFCs on BGP include:

RFC 1403 - BGP interaction with OSPF.

RFC 1773 - Experience of BGP-4.
RFC 1774 - Protocol Analysis
RFC 1863 - BGP/IDRP Route Server
RFC 2042 - Registering new attributes
RFC 2283 - Multiprotocol Extensions for BGP-4
RFC 2385 - BGP and TCP MD5 signatures
RFC 2439 - BGP Route Flap Damping

Overlapping Protocols: Backdoors

With different IGPs and EGPs working together to achieve routing, routes can be learned via different protocols; choosing one protocol over another affects how the traffic flows. For example, if traffic follows a RIP route, it might traverse one link, whereas if it follows an external BGP route, it might end up on another link. Backdoor links offer an alternative IGP path that can be used instead of the external BGP path. IGP routes that can be reached over the backdoor link are called backdoor routes. With the existence of such alternative routes, a mechanism that gives one protocol preference over other protocols is needed. Some Systems offers a preference parameter called the administrative distance of a protocol. The lower a routing protocol's administrative distance, the higher the preference for the protocol. It should be noted that administrative distance is a parameter that is relative only to the locally configured router and is not known by or communicated to any other routers in the AS. Thus, if you intend to modify the administrative distance of one router in the AS, it is highly recommended that the administrative distance be changed similarly on all routers in the AS in order to guarantee a consistent routing decision.

Route Dampening

Another mechanism for controlling route instability is route dampening. A route that appears and disappears intermittently causes BGP UPDATE and WITHDRAWN messages to be repeatedly propagated on the Internet. The tremendous amount of routing traffic generated can use up all the link's bandwidth and drive up CPU utilization of routers.

Dampening categorizes routes as well either behaved or ill behaved. A well-behaved route shows a high degree of stability during an extended period of time. On the other hand, an ill behaved route experiences a high level of instability in a short period of time. Ill-behaved routes should be penalized in a way that is proportional to the route's expected future instability. An unstable route should be suppressed (not advertised) until there is some degree of confidence that the route has become stable.

A route's recent history is used as a basis for estimating future stability. To track a route history, it is essential to track the number of times the route has flapped over a period of time. Under route dampening, each time a route flaps, it is given a penalty. Whenever the penalty reaches a predefined threshold, the route is suppressed. The route can continue to accrue penalties even after it is suppressed. The more frequently a route oscillates in a short amount of time, the faster the route is suppressed. Similar criteria are put in place to unsuppress a route and start readvertising it. An algorithmis implemented to decay (reduce) the penalty value exponentially. The algorithm bases itsconfiguration on a user-defined set of parameters. The following set of terms and parameters applies to the implementation:

• Penalty—

An incremented numeric value that is assigned to a route each time it flaps.

• Half-life—

A configurable numeric value that describes the amount of time that must elapse to reduce the penalty by one-half.

• Suppress limit—

A numeric value that is compared with the penalty. If the penalty is greater than the suppress limit, the route is uppressed.

• Reuse limit—

A configurable numeric value that is compared with the penalty. If the penalty is less than the reuse limit, a suppressed route that is up will no longer be suppressed.

• Suppressed route—

A route that is not advertised, even if it is up. A route is suppressed if the penalty value is greater than the suppressed limit.

• History entry—

An entry used to store flap information. For the purposes of monitoring and calculating a route's oscillation level, it is important to store this information in the router when the route oscillates. When the route stabilizes, the history entry becomes useless and must be flushed from the router.

Route Dampening: ISP Environment

One possible side effect of route dampening is that the customer will experience some short

outages even if his routes become stable. In Figure below, route R2 in the customer network is

flapping. When the customer's ISP is running route dampening, R2 will be penalized and

suppressed according to its level of oscillation. R2 could be dampened for minutes. Even if R2 stops oscillating, the penalty it had accumulated still might be far above the reuse limit, and it has to be decayed before the route can be used. In the meantime, some poor soul on the customer's network is pulling out his or her hair trying to figure out why some subnets can't be reached from the outside world. If administrators are unaware that their routes are being dampened, they might try to remedy the situation by some other means, which makes their routes flap even more and become more penalized. The better approach is to ask the provider whether he is receiving the routes, and if he is, check why they are not being advertised. Providers have strict policies and might not change the dampening behavior per the customer's request. What the provider can do is "flush" the history information of the routes being dampened to advertise the route. This is, of course, under the condition that the customer will investigate the routing problems causing the routes to fluctuate.

On the other hand, instabilities can be caused by the providers themselves, and the effect can be much larger. If a link carrying full routes between a provider and customer or a provider and another provider oscillates, the border routers will feel the impact. Suppose that you are getting full Internet routes (currently about 75,000 routes) from multiple providers. Now imagine that 5 percent of these routes (about 3,750 routes) are toggling every 2 minutes. Your border router will be unable to handle this load. Without route dampening, it is difficult to determine what is really happening. All you know is that the process utilization on your border router is increasing rapidly. With route dampening, all the unstable routes generate a history entry that shows the routes' level of stability. After the unstable routes are identified, it is easy to determine where they are coming from by looking at the next-hop address. Although route dampening in this case did not help solve the problem, it helped identify who was causing the problem. After you identify the culprit, you can temporarily remove your BGP session with the ISP at fault. Pick up the telephone, call the ISP, and start complaining.

In conclusion, route instabilities in the Internet will affect everybody one way or the other. It is everyone's responsibility to minimize route oscillation by being more aware of the things they do and why they do them. Providers are becoming tougher on culprits; some providers apply harsher penalties to routes with longer masks, for example. This might sound like overkill, but it is getting harder to control the Internet. Having a "routing patrol" issue tickets whenever someone breaks the rules might become necessary.

AS Prepending

Enterprise networks primarily use BGP with their Internet Service Providers if they want to be multi-homed (connected to more than one ISP). A very common requirement in a multi-homed design is the primary/backup setup where the lower speed (or sometimes lower quality) link should only be used when the primary link fails.

Competent ISPs help their customers reach this goal by using BGP local preference within their network and giving the customers the ability to indicate the desired value of BGP local preference through BGP communities: if the route received directly from the customer has low local preference, all other routes are preferred, resulting in the desired traffic flow that avoids the backup link if at all possible as shown in the next diagram:

Sometimes you are forced to deal with less than ideal ISPs (or the two ISPs you’re using are so far apart in the Internet topology that the BGP local preference solution doesn’t work). In these cases, the only means of influencing BGP route selection in the Internet is the extension of the AS path attribute (routes with shorter AS paths are preferred) with multiple copies of your own AS number: AS-path prepending. AS-path prepending is configured in IOS with route-map based per-neighbor outbound filter. The actual prepending is specified within the route-map with the set as-path prepend command, as illustrated in the following sample configuration:

router bgp 65001
neighbor 10.1.0.2 remote-as 65200
neighbor 10.1.0.2 description Backup ISP
neighbor 10.1.0.2 route-map prepend out
!
route-map prepend permit 10
set as-path prepend 65001 65001 65001

Unreal-to- Real

Saturday, May 25, 2013

Path Vector – Border Gateway Protocol (BGP)

No comments:

Post a Comment