4. Anycast
4.1. Introduction
A solution for failing servers is anycast. With anycasst, we basically
do not care which server replies. We let the network determine which
server is best in terms of availability or speed. The routing protocol
(in this case RIP) will solve all our availability issues. Furthermore,
client configuration will be easier; we no longer have to determine by hand
which DNS server is closest.
There are some drawbacks. The routing protocol will work at layer 3. That means
that, if the server is available, but the service is not, the routing will
still route to that server. So some sort of monitoring in the DNS server
must be done. Furthermore, it is best if the DNS host participates in the
routing protocol. This make the configuration of the server more complicated.
4.2. Anycast on the DNS servers
On Linux, routing is done via Quagga. Quagga has a RIP daemon.
So first install Quagga on xenial1 and xenial2:
apt-get install quagga
Next, we'll define our DNS service address on xenial1 and xenial2:
ifconfig lo:0 10.128.224.2 netmask 255.255.255.255
On the xenials 1 and 2 in
/etc/quagga
create a file
zebra.conf
with the following content:
hostname xenial1 password password enable password password interface lo interface lo:0 ip address 10.128.244.2/32 interface eth1 ip address 10.128.5.2/24 line vty
For lo:0 use the same IP address for both name servers. For eth1, use the
actual IP address.
Next, tell quagga which daemons to start. Create a file
daemons
with the following content:
zebra=yes bgpd=no ospfd=no ospf6d=no ripd=yes ripngd=no isisd=no
And, since we're using RIP, create
ripd.conf
with:
hostname xenial1 password password log stdout router rip version 2 network 10.128.0.0/16 network lo:0 network eth1
It would also be nice if we could switch on/off anycast with a simple flag
at start-up, so an if-then is put around the anycast configuration.
This gives us the following setup-file:
echo "Setup xenial1" ETH1=$(dmesg | grep -i 'renamed from eth1' | sed -n 's/: renamed from eth1//;s/.* //p') ifconfig $ETH1 10.128.5.2 netmask 255.255.255.0 route add -net 10.128.0.0 netmask 255.255.0.0 gw 10.128.5.1 netstat -rn apt-get update echo "apt-get -y install bind9 dnsutils" apt-get -y install bind9 dnsutils apt-get -y install quagga cd /etc/bind perl /vagrant/make_config.1.perl /vagrant/dns-input-file cat > /etc/resolv.conf <<EOF domain home search home nameserver 127.0.0.1 EOF cat > /etc/hosts <<EOF 127.0.0.1 localhost 10.128.5.2 xenial1.home xenial1 EOF hostname xenial1 domainname home if [ -f /vagrant/do_anycast ] ; then ifconfig lo:0 10.128.224.2 netmask 255.255.255.255 cat > /etc/quagga/zebra.conf <<EOF hostname xenial1 password password enable password password interface lo interface lo:0 ip address 10.128.244.2/32 interface eth1 ip address 10.128.5.2/24 line vty EOF cat > /etc/quagga/daemons <<EOF zebra=yes bgpd=no ospfd=no ospf6d=no ripd=yes ripngd=no isisd=no EOF cat > /etc/quagga/ripd.conf <<EOF hostname xenial1 password password log stdout router rip version 2 network 10.128.0.0/16 network lo:0 network eth1 EOF ed /etc/default/bind9 << EOF 1 /-u bind d w q EOF /etc/init.d/quagga restart /etc/init.d/bind9 restart ping -c1 10.128.244.2 cat > /etc/resolv.conf <<EOF domain home search home nameserver 10.128.224.2 EOF fi hostname domainname echo /etc/resolv.conf cat /etc/resolv.conf echo /etc/hosts cat /etc/hosts ping -c1 10.128.5.1
and then, according to all the websites, it should work....
4.3. Debugging
... and of course it does not.
I had to do some debugging and this is what I found.
4.3.1. Strategy
The debugging strategy works as follows:
-
first make sure that the name-servers work as expected
-
then verify the routing from/to the servers (network part)
-
then make sure that the clients are configured correctly
4.3.2. Provisioning problems
First, make sure you've created
do_anycast
(do a
touch do_anycast
to create the file).
The replacement of
/etc/resolv.conf
and
/etc/hosts
by the provisioning script did not work as expected. The DHCP-client overwrites
these files, apparently.
Moving the creation of these files to the ens of the set-up script seemed to work.
The alternative is to change the system interface-section.
4.3.3. Bind specific problems
There are two things that I see:
-
nslookup does not give answers on 10.128.224.1 (not even from the connecting router), but ping replies correctly
-
when a nameserver is shut-down, even the ping does not seem to get redirected.
4.3.4. DNS
Bind9 on Ubuntu is, as a default, run with
-u bind
as option. That is for security reasons.
From the man-page (BSD, not debian, but that doesn't make much difference):
-u Specifies the user the server should run as after it initializes. The value specified may be either a username or a numeric user ID. If the -g flag is not specified, then the group ID used will be the primary group of the user specified (initgroups() is called, so all of the user's groups will be available to the server). Note: normally, named will rescan the active Ether- net interfaces when it receives SIGHUP. Use of the -u option makes this impossible since the default port that named listens on is a reserved port that only the superuser may bind to.
This suggests that BIND cannot rescan ports after a SIGHUP. But it should
be able to register the ports at start-up. BIND binds to port 53 which should
only be possible for root.
However, it also suggests that running as
root
instead of
bind
should make it easier to debug, because I don't have to restart every time.
Running as root is done by removing the option
-u bind
from
/etc/default/bind9.
Restart, and suddenly it works.
A quick look at
anycast/rc
solves the mistery. First, bind is started, and after that the
lo:0
is created. If bind has already started and did the setuid() to run as bind,
it cannot listen to a port <1023 on the new interface. So either start bind
after creating lo:0 or run as root.
4.3.5. No take-over
On the Internet I read that RIP anycast make take some time to do the take-over.
So, we'll make a script to test this theory. We'll use ping to determine if the
host is reachable.
First, add a host 'test' with different IP address to both nameservers. This enables
us to see which nameserver is being used. Then start the following script:
while :; do if ping -c1 10.128.224.2 ; then date >> log nslookup test >> log echo -n . fi done
When the dots start running, shutdown the DNS server (xenial1 if you're doing
this on xenial3).
And sure enough, after a while you'll see the take-over:
<snip> Wed Feb 20 15:55:04 CET 2013 Server: 10.128.224.2 Address: 10.128.224.2#53 Name: test.home Address: 10.128.1.1 Wed Feb 20 15:55:04 CET 2013 Server: 10.128.224.2 Address: 10.128.224.2#53 Name: test.home Address: 10.128.1.1 Wed Feb 20 15:55:04 CET 2013 ;; connection timed out; no servers could be reached Wed Feb 20 15:58:54 CET 2013 Server: 10.128.224.2 Address: 10.128.224.2#53 Name: test.home Address: 10.128.2.2 <snip>
From the IP address of test, you can see that 10.128.224.2 is now routed
to the other name server.
You can also see that in this case, it took around 4 minutes for the take-over.
This is a normal period.
Because the routing is l4 protocol un-aware
you can also do this with a ping:
[ljm@verlaine anycast.vagrant]$ vagrant ssh xenial3 Last login: Sun Nov 30 21:13:13 2014 from 10.0.2.2 [vagrant@xenial3 ~]$ ping 10.128.224.2 PING 10.128.224.2 (10.128.224.2) 56(84) bytes of data. 64 bytes from 10.128.224.2: icmp_seq=1 ttl=63 time=16.9 ms 64 bytes from 10.128.224.2: icmp_seq=2 ttl=63 time=11.9 ms 64 bytes from 10.128.224.2: icmp_seq=3 ttl=63 time=13.7 ms 64 bytes from 10.128.224.2: icmp_seq=4 ttl=63 time=19.1 ms 64 bytes from 10.128.224.2: icmp_seq=5 ttl=63 time=16.2 ms 64 bytes from 10.128.224.2: icmp_seq=6 ttl=63 time=11.5 ms 64 bytes from 10.128.224.2: icmp_seq=7 ttl=63 time=18.4 ms 64 bytes from 10.128.224.2: icmp_seq=8 ttl=63 time=15.7 ms 64 bytes from 10.128.224.2: icmp_seq=9 ttl=63 time=11.5 ms 64 bytes from 10.128.224.2: icmp_seq=10 ttl=63 time=16.1 ms 64 bytes from 10.128.224.2: icmp_seq=11 ttl=63 time=12.4 ms 64 bytes from 10.128.224.2: icmp_seq=12 ttl=63 time=18.0 ms 64 bytes from 10.128.224.2: icmp_seq=13 ttl=63 time=12.3 ms 64 bytes from 10.128.224.2: icmp_seq=14 ttl=63 time=16.9 ms From 10.128.7.1 icmp_seq=232 Destination Host Unreachable From 10.128.7.1 icmp_seq=233 Destination Host Unreachable From 10.128.7.1 icmp_seq=234 Destination Host Unreachable From 10.128.7.1 icmp_seq=235 Destination Host Unreachable From 10.128.7.1 icmp_seq=236 Destination Host Unreachable From 10.128.7.1 icmp_seq=237 Destination Host Unreachable From 10.128.7.1 icmp_seq=238 Destination Host Unreachable From 10.128.7.1 icmp_seq=239 Destination Host Unreachable From 10.128.7.1 icmp_seq=240 Destination Host Unreachable From 10.128.7.1 icmp_seq=241 Destination Host Unreachable From 10.128.7.1 icmp_seq=242 Destination Host Unreachable From 10.128.7.1 icmp_seq=243 Destination Host Unreachable From 10.128.7.1 icmp_seq=244 Destination Host Unreachable From 10.128.7.1 icmp_seq=245 Destination Host Unreachable From 10.128.7.1 icmp_seq=246 Destination Host Unreachable From 10.128.7.1 icmp_seq=247 Destination Host Unreachable From 10.128.7.1 icmp_seq=248 Destination Host Unreachable From 10.128.7.1 icmp_seq=249 Destination Host Unreachable From 10.128.7.1 icmp_seq=250 Destination Host Unreachable From 10.128.7.1 icmp_seq=251 Destination Host Unreachable From 10.128.7.1 icmp_seq=252 Destination Host Unreachable From 10.128.7.1 icmp_seq=253 Destination Host Unreachable From 10.128.7.1 icmp_seq=254 Destination Host Unreachable From 10.128.7.1 icmp_seq=255 Destination Host Unreachable From 10.128.7.1 icmp_seq=256 Destination Host Unreachable 64 bytes from 10.128.224.2: icmp_seq=257 ttl=60 time=53.3 ms 64 bytes from 10.128.224.2: icmp_seq=258 ttl=60 time=46.9 ms 64 bytes from 10.128.224.2: icmp_seq=259 ttl=60 time=42.3 ms 64 bytes from 10.128.224.2: icmp_seq=260 ttl=60 time=37.8 ms 64 bytes from 10.128.224.2: icmp_seq=261 ttl=60 time=47.5 ms 64 bytes from 10.128.224.2: icmp_seq=262 ttl=60 time=42.8 ms 64 bytes from 10.128.224.2: icmp_seq=263 ttl=60 time=49.2 ms 64 bytes from 10.128.224.2: icmp_seq=264 ttl=60 time=36.6 ms 64 bytes from 10.128.224.2: icmp_seq=265 ttl=60 time=50.9 ms
Take-over when the server reboots is more seamles:
[ljm@xenial4 Documents]$ while :; do date; nslookup test.home; sleep 10; done Thu May 9 13:02:59 CEST 2013 Server: 10.128.224.2 Address: 10.128.224.2#53 Name: test.home Address: 10.128.1.1 Thu May 9 13:03:09 CEST 2013 Server: 10.128.224.2 Address: 10.128.224.2#53 Name: test.home Address: 10.128.1.1 Thu May 9 13:03:19 CEST 2013 Server: 10.128.224.2 Address: 10.128.224.2#53 Name: test.home Address: 10.128.2.2 Thu May 9 13:03:29 CEST 2013 Server: 10.128.224.2 Address: 10.128.224.2#53 Name: test.home Address: 10.128.2.2
4.4. Conclusions
Using anycast as a fail-over mechanism
with RIP as routing protocol is not a feasible option. Some optimalization may
produce better results, but as fail-over it is not sufficient.
Server configuration is uniform, which is nice if you want to deploy a standard
image without much customization.