This is sorta an internet architecture question for those in the know. Assuming ...

philip1209 · on April 13, 2015

My understanding is that they basically "fast flux" IPs to funnel traffic for targeted attack to a specific data center. So, while you normally may be sharing IPs, if an enterprise customer's website example.com starts getting attacked they will put it on dedicated IPs, then broadcast those IPs from one or two data centers. They will then reroute all other enterprise traffic away from those data centers, thus minimizing the attack effect on other customers. If these websites were all on the same IP, it would be impossible to distribute traffic selectively between data centers like this.

Another thing they can do is use anycast to load balance across data centers. So, if a data center rather than a website is a target - the attackers will need to know which IPs to attack. They can start flooding the broadcasted IPs from a particular route. However, if this happens then hypothetically Cloudflare could just stop broadcasting the IPs at this particular data center, re-broadcast them at all the surrounding data centers, and basically spread out the attack load across multiple sites. If the attackers change the IPs that they target based on new routes, then Cloudflare can continue fast-fluxing the IPs every 5 minutes and mitigate the attack.

It's pretty cool use of BGP and anycast, but being able to change IPs of website and where they are broadcasted in real-time is core to Cloudflare's security.

throwaway000002 · on April 13, 2015

Thanks for this comment. I guess, along with jgrahamc's sibling comment, you have to make a routing decision based on (source, port) at most if you have a fixed IP, since HTTPS ports are stupidly fixed. That is 32+16 bits of info at most, so an ethernet MACs worth. So now I can clarify my question as follows: with X bits of data, what is the present state-of-the-art latency wrt to routing T Gbps of traffic. And it's not just that, you have to have good latency for updating that routing table.

Any research on the real entropy of (source,port) entropy on the Internet? The are also real issues like the distribution of (source, port) is hardly uniform, and is especially nasty when undergoing an attack, i.e. you want to manage latency based the both the distribution and authenticity of traffic.

This is a very interesting mathematical problem. I have to work on expressing it a bit better before I can hope of formulating a solution, but yes I can totally see now how leveraging BGP, anycast, and DNS TTL are all knobs to heuristically solve this problem, instead of a some crazy genius way of making use of router TCAM silicon.

throwaway000002 · on April 13, 2015

As a further observation, it makes the GitHub attack an interesting case study. You now have to further route on the GET target, and if traffic is encrypted, the routing decision is moved to a later stage.

In order to protect latency to other GET targets, you're going to have to start doing interesting things.

One future solution I can see is multipath-tcp the anomalous traffic, and closing the original connection. But at that point you have to refilter based on genuine vs malicious traffic, and then there's the encrypted state you have to share for the proper stream handover. Ooof... what a nightmare.

At least it's an interesting one. :)

devicenull · on April 13, 2015

Keep in mind you can generally only anycast a /24's worth of IP addresses, so it's very unlikely they're doing this with single IPs.

p1mrx · on April 14, 2015

CloudFlare has an IPv4 /12 to play with: http://bgp.he.net/net/104.16.0.0/12#_whois

philip1209 · on April 14, 2015

Here is their complete IP list:

https://www.cloudflare.com/ips

jgrahamc · on April 13, 2015

1. Non-SNI based SSL means you need an IP per host.

2. People attack IP addresses. Handy to be able to change the IP address of a web site.

3. Countries block sites based on IP addresses. Handy to be able to move sites around to prevent collateral damage.

throwaway000002 · on April 13, 2015

In my defense, I was assuming SNI (aka the modern internet), and that the IP was reachable by those you care for it to be reachable by. Ignoring these issues, is there an "engineering" reason why a single IP won't work, in terms of, for example, hardware can't demultiplex the aggregate ingress volume of CloudFlare and handle DOS mitigation?

I guess I'm asking this because of how woeful looking the "load-balancing" solutions are from the major cloud providers. I feel they way they're externally documented, and how their APIs are specified, hitting them with more than a 40Gbps fat-server's load of traffic will cause issues, regardless of how many hosts you have serving that load.

I'd appreciate some insight from those who handle such crazy amounts of traffic.

cortesoft · on April 13, 2015

I work for a major CDN that uses anycast, and there are a number of reasons. I won't go into too many of them, but quickly:

1) Anycast doesn't give you fine grain control. Once we announce our anycast routes, what traffic actually gets sent where is out of our control - it is based on the peering arrangements of our transit providers. If we need to balance traffic between our pops, we need finer grained control than a single anycast IP.

2) IP addresses get blocked for all sorts of reasons (looking at you China!) If all customers were on one IP address, as soon as China decides to block one customer, they are all blocked.

3) Anycast sometimes has weird behavior. For example, traffic might be sent to a datacenter that might be close in terms of peer links, but far in terms of physical distance and latency. Using DNS, we can route around these issues.

I am not sure what you mean about the "40gbps fat-server's load of traffic" causing issues. We handle many customers that push more than that.

throwaway000002 · on April 14, 2015

Just disregard my "fat-server" comment. It's more from being disillusioned with all load-balancing solutions being tied to the service provider. I'd like something that was cloud agnostic, that was peered at multiple points with the major providers.

I guess this is step 1 in the same effort from CloudFlare, before they add AWS and Azure. But their interface is over-simple, understandable considering the technical proficiency of their average customer.

CloudFlare is too one-size fits all, but from a business perspective it's totally understandable.

I know it's a pipe dream, but I wish we could defragment the IP space and clean up the BGP tables. It would at least make anycast more reliable without resorting to DNS tricks like edns-client-subnet.

As for IP blocking, if undesirable sites are behind the same IP as publically demanded ones, it could make blocking actions harder to get the populace to support. But worrying about authoritative regimes is not my concern. After all, why make a service accessible if you cannot monetize the user base sufficiently.

Yes, I'm a little jaded.

15155 · on April 14, 2015

> 2) IP addresses get blocked for all sorts of reasons (looking at you China!) If all customers were on one IP address, as soon as China decides to block one customer, they are all blocked.

Feature, not a bug.

jgrahamc · on April 13, 2015

If you think of a connection is defined by the tuple (source_ip, source_port, destination_ip, destination_port) then you might run into problems if destination_ip was a single value, just because whatever hashing you are using/table lookups for connection management, DoS protection etc. etc. might have problems with the sheer size. We are doing a huge amount of traffic and I can imagine having to engineer around some things related to that.

But the real issues are the ones that outline above.