Being in the hosting business is an unusual experience. At home or at a regular office, there's usually not very much choice regarding the connectivity to the internet that's available. In the United States, most consumers have the choice between cable and ADSL—if that—where cable is an order of magnitude faster, making ADSL not much of a choice at all. Businesses can bring fiber into their office, but this gets very expensive
very fast. In a carrier-neutral datacenter, on the other hand, there's usually at least a handful of wholesale ISPs available, vying for the business of hosting companies or other organizations hosting significant content in a data center. (We'll call them "hosters" for our purposes.)

Obviously, one strategy is to talk to the sales people of each available ISP, and then sign up with the one that offers the best deal. However, that still leaves a hoster in a very bad position if something bad happens in the chosen ISP's network. So many hosters multihome by connecting to two or more ISPs. This requires getting an Autonomous System (AS) number and IP addresses that are (at least somewhat) independent from any given ISP, as well as routers that can run BGP. The great thing about multihoming is that if one ISP goes down, within a few seconds to minutes, BGP detects this and
reroutes traffic over the connection with another ISP. You can also add new connections within days without having to reconfigure servers with new IP addresses. Also, sometimes ISPs de-peer because of peering disputes. For instance, in 2008, Cogent customers couldn't reach Telia customers and vice versa for some time. Multihoming also protects against this by rerouting the affected destinations through another ISP.

However, having more options also makes life more complex. Suppose a hoster has a gigabit worth of traffic and three ISPs. How are they going to balance this traffic over the ISPs they're connected to? This question comes up at two levels: the business level and the technical (BGP) level.
On the business level, many ISPs offer attractive rates if you commit to a certain amount of traffic. For instance, the regular price per megabit might be $2 per month. (This is usually charged based on 95th-percentile usage where traffic is measured every five minutes, but the top 5% measurements are thrown out.) So our example hoster could buy 400 Mbps from three ISPs and pay $1200 rather than pay an aggregate of $2000 for 1000 Mbps of traffic. But now it's important that traffic is reasonably balanced over the
three ISPs. Suppose BGP decides that ISP A is much better than ISPs B and C, so all traffic flows over ISP A. That means 600 Mbps of extra traffic on top of the committed 400 Mbps, so the hoster now has to pay ISP A $1200 in overage charges while paying ISPs B and C $400 each without getting any use out of those connections—adding up to $2400, more than the $2000 they'd have paid without any commitments!

So we need BGP to align its decision making with bandwidth commitments.
Unfortunately, BGP isn't set up to work that way. What BGP does is determine the "best" path for each of the around 600,000 prefixes that are present in the BGP table (570,000 IPv4 and 25,000 IPv6). BGP has an extensive algorithm to determine which of the available paths to reach a certain destination is the best one. But left to its own devices, it pretty much comes down to the length of the AS path: the path that traverses the smallest number of ASes is the "best" one. In practice, often different ISPs reach the same destination over the same number of intermediate ASes and then BGP has to use its tie breaking rules, which are pretty arbitrary. In other situations, a path that's slightly longer is actually better, but BGP has no way of knowing that. Hence our use of "best" in quotation marks.

AS Numbers

BGP makes sure that traffic towards AS60 is sent through ISP A because the de-peering between AS10 and AS 20 makes it impossible for the traffic to get there through ISP B. To BGP, the paths ISP A – AS20 and ISP B – AS20 look the same. This is also true for the paths ISP B – AS 40 vs ISP C AS40. However, the IRP will detect that the path ISP A – AS20 has packet loss and the path ISP C – AS40 has more bandwidth available. To BGP, the path ISP C – AS50 looks better (shorter) than the path ISP B – AS20 – AS50, even though the connection between ISP C and AS50 is very long and thus incurs higher latency. The IRP will also detect and prefer the path over ISP B.

Network administrators can go in and manage traffic engineering manually to make sure that the amounts of traffic sent over different connections conforms to the commitments that are in place and/or make sure the best path is used to reach important destinations.

However, this is a slow and labor-intensive process. A process that automated route optimizers like Noction Intelligent Routing Platform will take off of the network administrator's hands so that they can focus on other things. Noction IRP probes different paths towards destinations to determine the best path based on actual performance metrics rather than solely on the limited information available to BGP. At the same time, the IRP takes commitments into consideration, so it optimizes the flow of outgoing traffic such that the best path is used to reach each destination while at the same time staying within committed traffic levels whenever possible. (Of course, it is possible for total traffic to exceed total commitments.) Noction states that their Commit Control algorithm is now being successfully deployed on the incoming traffic stream while usually Route Optimizers are known for working with outgoing traffic only.

May 29, 2020

Content hosting: fast, reliable, cheap – how to pick all three.