I think you've got the right approach - a vertical slice through the app that ch...

redbeard0x0a · on July 20, 2016

The concept I try to go for with that status check is 'Can this node connect to everything so it can successfully respond to http requests'. However my approach wouldn't identify an overloaded server, which might be a good thing if we need to scale up - taking down an overloaded server is just going to make the other servers that much more overloaded.

I'm aways up for hearing about other ways people solve health checks.

jocro · on July 21, 2016

> taking down an overloaded server is just going to make the other servers that much more overloaded

Emphatically agree, but it's important in the first place to design and deploy your infrastructure such that basic increases of scale are accounted for - prevention is the most important piece of the puzzle. Get that right and an overloaded server is symptomatic of something else, in which case taking down access to the unruly resource is first priority.

IMO, the big takeaway here is that they were load balancing simply by only hitting the top level - selectivity is somewhat tedious to build but worth it in the long run.

orf · on July 20, 2016

Just because it can connect to everything doesn't mean it can successfully respond though.

mioelnir · on July 20, 2016

> Putting an IP address filter on that endpoint is usually enough to stop that.

Why is there even direct external IP connectivity to the realserver, sidestepping the loadbalancer?

kbenson · on July 20, 2016

Not all load balancers function at the same OSI layer. If you are using an IP based load balancer with transparent NAT, you don't really have a choice. Requests will be balanced across systems to that health check, but it's still a DOS if it's intensive enough to serve and someone hammers it hard enough.

chiph · on July 21, 2016

Lots of times companies will have a 3rd party firm do their availability monitoring for them.