Hacker News new | past | comments | ask | show | jobs | submit login

   Fun fact: since Linux has no built-in DNS caching, most of the DNS queries are looking for…itself. Oh wait, that’s not a fun fact — it’s actually a pain in the ass.
Surely that should just be a very fast lookup in /etc/hosts?



The problem here is that these services move - so if it's in /etc/hosts, our failover mechanisms (to a DR data center which has a replica server) are severely hindered. We're adding some local cache, but there are some nasty gotchas with subnet-local ordering on resolution. By this I mean, New York resolves the local /16 first, and Denver resolve's its local /16...instead BIND doesn't care (by default) and likes to auth against let's say: the London office. Good times!


but thats what DNS scope is for surely?

we had n datacenters each named after their city: ldn.$company.com, ny.$company.com etc etc. in the DHCP we pushed out the search order so that it would try and resolve locally, if that failed try a level up until something worked.

This meant that you'd bind to service it would first look up service.$___location.$company.com, if thats not there it'd try and find service.$company.com

This cuts down the need for nasty split horizon DNS, moving VMs/services/machines between datacenters was simple and zero config.

If you were taking a service out of commission in one datacenter, you'd CNAME service.$___location.$company.com to a different datacenter, do a staged kick of the machines, and BOOM failed over with only one config change.

On a side note, you can use SSSD or shudder NSLCD to cache DNS.


We do, but in the specific case of Active Directory, we want to fail over and auth against another data center if the primary is offline. This means for our ___domain, the local (to the /16) ___domain controllers are returned first and then the others. The problem is BIND locally doesn't preserve this order and applications are suddenly authenticating across the planet.

DNS devolution isn't a good idea here, since the external ___domain is a wildcard. We'll be paying for that mistake from long ago until (if ever) we change the internal ___domain name.

This is a pretty recent problem we're just now getting to because the DNS volume has been a back-burner issue - we'll look into permanent solutions for all Linux services after the CDN testing completes. Recommendations on the Linux DNS caching are much appreciated - we'll review each. It's something that just hasn't been an issue in the past so not experts on that particular area. I am surprised caching hasn't landed natively in most of the major distros yet though.


Aha gotcha. I was under the impression that SSSD chose the fastest AD server it could find(either via the SRV records, or via a pre-determined list)? I've not had too much trouble with it stubbornly binding to the furthest away server. (thats with AD doing the DNS and delegation to BIND )

NSCD (name service caching daemon) is in RHEL and debian, so I assume it'll be in ubuntu as well. The problem is that it fights with SSSD if you're not careful. https://access.redhat.com/documentation/en-US/Red_Hat_Enterp...

out of interest, what are you using to bind to AD?


> The problem is BIND locally doesn't preserve this order

Nor need any other DNS server software do so. The actual DNS protocol has no notion of an ordering within a resource record set in an answer.

I suspect, from your brief description here, that what you'll end up with is using the "sortlist" option in the BIND DNS client library's configuration file /etc/resolv.conf . Although SRV RRSets will introduce some interesting complexities.

* http://homepage.ntlworld.com./jonathan.deboynepollard/FGA/dn...


It will. AFAIK systemd-resolved does caching by default.


I'm confused. The lookup is for the localhost, so how would this alter failover mechanisms? You don't want a lookup for the localhost being responded to with an address of a different data centre surely?


It's not for localhost, it's for the server name. While Gitlab and Teamcity normally are on the same box, they can operate on different boxes or in different data centers. It's looking up a DNS name which happens to point at the same box...does that explain it more clearly?


Can't you just have all traffic in a DC to go only through your local DNS resolvers?

The first lookup might take longer, but subsequent ones should be fast.

caching DNS resolvers can fit in a 256MB RAM VM and use virtually 0 CPU.


Also, Linux has "built in" (whatever that means) DNS caching. It's called nscd. It's just usually not enabled by default (which is sensible, since it's better off shared).


nscd also has a known TTL bug that hasn't been fixed in 9 years.

https://sourceware.org/bugzilla/show_bug.cgi?id=4428


Wow. A testament to focus on quality that is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: