Anycast

4. Anycast

4.1. Introduction

A solution for failing servers is anycast. With anycasst, we basically do not care which server replies. We let the network determine which server is best in terms of availability or speed. The routing protocol (in this case RIP) will solve all our availability issues. Furthermore, client configuration will be easier; we no longer have to determine by hand which DNS server is closest.

There are some drawbacks. The routing protocol will work at layer 3. That means that, if the server is available, but the service is not, the routing will still route to that server. So some sort of monitoring in the DNS server must be done. Furthermore, it is best if the DNS host participates in the routing protocol. This make the configuration of the server more complicated.

4.2. Anycast on the DNS servers

On Linux, routing is done via Quagga. Quagga has a RIP daemon. So first install Quagga on xenial1 and xenial2:

apt-get install quagga

Next, we'll define our DNS service address on xenial1 and xenial2:

ifconfig lo:0 10.128.224.2 netmask 255.255.255.255

On the xenials 1 and 2 in /etc/quagga create a file zebra.conf with the following content:

hostname xenial1
password password
enable password password
interface lo
interface lo:0
ip address 10.128.244.2/32
interface eth1
ip address 10.128.5.2/24
line vty

For lo:0 use the same IP address for both name servers. For eth1, use the actual IP address. Next, tell quagga which daemons to start. Create a file daemons with the following content:

zebra=yes
bgpd=no
ospfd=no
ospf6d=no
ripd=yes
ripngd=no
isisd=no

And, since we're using RIP, create ripd.conf with:

hostname xenial1
password password
log stdout
router rip
 version 2
 network 10.128.0.0/16
 network lo:0
 network eth1

It would also be nice if we could switch on/off anycast with a simple flag at start-up, so an if-then is put around the anycast configuration. This gives us the following setup-file:

echo "Setup xenial1"
ETH1=$(dmesg | grep -i 'renamed from eth1' | sed -n 's/: renamed from eth1//;s/.* //p')
ifconfig $ETH1 10.128.5.2 netmask 255.255.255.0
route add -net 10.128.0.0 netmask 255.255.0.0 gw 10.128.5.1
netstat -rn
apt-get update
echo "apt-get -y install bind9 dnsutils"
apt-get -y install bind9 dnsutils
apt-get -y install quagga
cd /etc/bind
perl /vagrant/make_config.1.perl /vagrant/dns-input-file
cat > /etc/resolv.conf <<EOF
domain home
search home
nameserver 127.0.0.1
EOF
cat > /etc/hosts <<EOF
127.0.0.1    localhost
10.128.5.2   xenial1.home xenial1
EOF
hostname xenial1
domainname home
if [ -f /vagrant/do_anycast ] ; then
ifconfig lo:0 10.128.224.2 netmask 255.255.255.255
cat > /etc/quagga/zebra.conf <<EOF
hostname xenial1
password password
enable password password
interface lo
interface lo:0
ip address 10.128.244.2/32
interface eth1
ip address 10.128.5.2/24
line vty
EOF
cat > /etc/quagga/daemons <<EOF
zebra=yes
bgpd=no
ospfd=no
ospf6d=no
ripd=yes
ripngd=no
isisd=no
EOF
cat > /etc/quagga/ripd.conf <<EOF
hostname xenial1
password password
log stdout
router rip
 version 2
 network 10.128.0.0/16
 network lo:0
 network eth1
EOF
ed /etc/default/bind9 << EOF
1
/-u bind
d
w
q
EOF
/etc/init.d/quagga restart
/etc/init.d/bind9 restart
ping -c1 10.128.244.2
cat > /etc/resolv.conf <<EOF
domain home
search home
nameserver 10.128.224.2
EOF
fi
hostname 
domainname
echo  /etc/resolv.conf
cat  /etc/resolv.conf
echo /etc/hosts 
cat /etc/hosts 
ping -c1 10.128.5.1

and then, according to all the websites, it should work....

4.3. Debugging

... and of course it does not.

I had to do some debugging and this is what I found.

4.3.1. Strategy

The debugging strategy works as follows:

first make sure that the name-servers work as expected
then verify the routing from/to the servers (network part)
then make sure that the clients are configured correctly

4.3.2. Provisioning problems

First, make sure you've created do_anycast (do a touch do_anycast to create the file).

The replacement of /etc/resolv.conf and /etc/hosts by the provisioning script did not work as expected. The DHCP-client overwrites these files, apparently. Moving the creation of these files to the ens of the set-up script seemed to work. The alternative is to change the system interface-section.

4.3.3. Bind specific problems

There are two things that I see:

nslookup does not give answers on 10.128.224.1 (not even from the connecting router), but ping replies correctly
when a nameserver is shut-down, even the ping does not seem to get redirected.

4.3.4. DNS

Bind9 on Ubuntu is, as a default, run with -u bind as option. That is for security reasons. From the man-page (BSD, not debian, but that doesn't make much difference):

       -u     Specifies the user the server should run  as  after
              it  initializes.  The value specified may be either
              a username or a numeric user ID.  If the -g flag is
              not  specified,  then the group ID used will be the
              primary group of the user  specified  (initgroups()
              is  called,  so  all  of  the user's groups will be
              available to the server).
              Note: normally, named will rescan the active Ether-
              net interfaces when it receives SIGHUP.  Use of the
              -u option makes this impossible since  the  default
              port  that named listens on is a reserved port that
              only the superuser may bind to.

This suggests that BIND cannot rescan ports after a SIGHUP. But it should be able to register the ports at start-up. BIND binds to port 53 which should only be possible for root.

However, it also suggests that running as root instead of bind should make it easier to debug, because I don't have to restart every time.

Running as root is done by removing the option -u bind from /etc/default/bind9.

Restart, and suddenly it works. A quick look at anycast/rc solves the mistery. First, bind is started, and after that the lo:0 is created. If bind has already started and did the setuid() to run as bind, it cannot listen to a port <1023 on the new interface. So either start bind after creating lo:0 or run as root.

4.3.5. No take-over

On the Internet I read that RIP anycast make take some time to do the take-over.

So, we'll make a script to test this theory. We'll use ping to determine if the host is reachable. First, add a host 'test' with different IP address to both nameservers. This enables us to see which nameserver is being used. Then start the following script:

while :; do
    if ping -c1 10.128.224.2 ; then
        date >> log
        nslookup test >> log
        echo -n .
    fi
done

When the dots start running, shutdown the DNS server (xenial1 if you're doing this on xenial3). And sure enough, after a while you'll see the take-over:

<snip>
Wed Feb 20 15:55:04 CET 2013
Server:  10.128.224.2
Address: 10.128.224.2#53
Name:    test.home
Address: 10.128.1.1
Wed Feb 20 15:55:04 CET 2013
Server:  10.128.224.2
Address: 10.128.224.2#53
Name:    test.home
Address: 10.128.1.1
Wed Feb 20 15:55:04 CET 2013
;; connection timed out; no servers could be reached
Wed Feb 20 15:58:54 CET 2013
Server:  10.128.224.2
Address: 10.128.224.2#53
Name:    test.home
Address: 10.128.2.2
<snip>

From the IP address of test, you can see that 10.128.224.2 is now routed to the other name server. You can also see that in this case, it took around 4 minutes for the take-over. This is a normal period.

Because the routing is l4 protocol un-aware you can also do this with a ping:

[ljm@verlaine anycast.vagrant]$ vagrant ssh xenial3
Last login: Sun Nov 30 21:13:13 2014 from 10.0.2.2
[vagrant@xenial3 ~]$ ping 10.128.224.2
PING 10.128.224.2 (10.128.224.2) 56(84) bytes of data.
64 bytes from 10.128.224.2: icmp_seq=1 ttl=63 time=16.9 ms
64 bytes from 10.128.224.2: icmp_seq=2 ttl=63 time=11.9 ms
64 bytes from 10.128.224.2: icmp_seq=3 ttl=63 time=13.7 ms
64 bytes from 10.128.224.2: icmp_seq=4 ttl=63 time=19.1 ms
64 bytes from 10.128.224.2: icmp_seq=5 ttl=63 time=16.2 ms
64 bytes from 10.128.224.2: icmp_seq=6 ttl=63 time=11.5 ms
64 bytes from 10.128.224.2: icmp_seq=7 ttl=63 time=18.4 ms
64 bytes from 10.128.224.2: icmp_seq=8 ttl=63 time=15.7 ms
64 bytes from 10.128.224.2: icmp_seq=9 ttl=63 time=11.5 ms
64 bytes from 10.128.224.2: icmp_seq=10 ttl=63 time=16.1 ms
64 bytes from 10.128.224.2: icmp_seq=11 ttl=63 time=12.4 ms
64 bytes from 10.128.224.2: icmp_seq=12 ttl=63 time=18.0 ms
64 bytes from 10.128.224.2: icmp_seq=13 ttl=63 time=12.3 ms
64 bytes from 10.128.224.2: icmp_seq=14 ttl=63 time=16.9 ms
From 10.128.7.1 icmp_seq=232 Destination Host Unreachable
From 10.128.7.1 icmp_seq=233 Destination Host Unreachable
From 10.128.7.1 icmp_seq=234 Destination Host Unreachable
From 10.128.7.1 icmp_seq=235 Destination Host Unreachable
From 10.128.7.1 icmp_seq=236 Destination Host Unreachable
From 10.128.7.1 icmp_seq=237 Destination Host Unreachable
From 10.128.7.1 icmp_seq=238 Destination Host Unreachable
From 10.128.7.1 icmp_seq=239 Destination Host Unreachable
From 10.128.7.1 icmp_seq=240 Destination Host Unreachable
From 10.128.7.1 icmp_seq=241 Destination Host Unreachable
From 10.128.7.1 icmp_seq=242 Destination Host Unreachable
From 10.128.7.1 icmp_seq=243 Destination Host Unreachable
From 10.128.7.1 icmp_seq=244 Destination Host Unreachable
From 10.128.7.1 icmp_seq=245 Destination Host Unreachable
From 10.128.7.1 icmp_seq=246 Destination Host Unreachable
From 10.128.7.1 icmp_seq=247 Destination Host Unreachable
From 10.128.7.1 icmp_seq=248 Destination Host Unreachable
From 10.128.7.1 icmp_seq=249 Destination Host Unreachable
From 10.128.7.1 icmp_seq=250 Destination Host Unreachable
From 10.128.7.1 icmp_seq=251 Destination Host Unreachable
From 10.128.7.1 icmp_seq=252 Destination Host Unreachable
From 10.128.7.1 icmp_seq=253 Destination Host Unreachable
From 10.128.7.1 icmp_seq=254 Destination Host Unreachable
From 10.128.7.1 icmp_seq=255 Destination Host Unreachable
From 10.128.7.1 icmp_seq=256 Destination Host Unreachable
64 bytes from 10.128.224.2: icmp_seq=257 ttl=60 time=53.3 ms
64 bytes from 10.128.224.2: icmp_seq=258 ttl=60 time=46.9 ms
64 bytes from 10.128.224.2: icmp_seq=259 ttl=60 time=42.3 ms
64 bytes from 10.128.224.2: icmp_seq=260 ttl=60 time=37.8 ms
64 bytes from 10.128.224.2: icmp_seq=261 ttl=60 time=47.5 ms
64 bytes from 10.128.224.2: icmp_seq=262 ttl=60 time=42.8 ms
64 bytes from 10.128.224.2: icmp_seq=263 ttl=60 time=49.2 ms
64 bytes from 10.128.224.2: icmp_seq=264 ttl=60 time=36.6 ms
64 bytes from 10.128.224.2: icmp_seq=265 ttl=60 time=50.9 ms

Take-over when the server reboots is more seamles:

[ljm@xenial4 Documents]$ while :; do date; nslookup test.home; sleep 10; done
Thu May  9 13:02:59 CEST 2013
Server:         10.128.224.2
Address:        10.128.224.2#53
Name:   test.home
Address: 10.128.1.1
Thu May  9 13:03:09 CEST 2013
Server:         10.128.224.2
Address:        10.128.224.2#53
Name:   test.home
Address: 10.128.1.1
Thu May  9 13:03:19 CEST 2013
Server:         10.128.224.2
Address:        10.128.224.2#53
Name:   test.home
Address: 10.128.2.2
Thu May  9 13:03:29 CEST 2013
Server:         10.128.224.2
Address:        10.128.224.2#53
Name:   test.home
Address: 10.128.2.2

4.4. Conclusions

Using anycast as a fail-over mechanism with RIP as routing protocol is not a feasible option. Some optimalization may produce better results, but as fail-over it is not sufficient.

Server configuration is uniform, which is nice if you want to deploy a standard image without much customization.