Sunday, January 29, 2017

RaspberryPi2 As Network Failover/Load Balancer/Edge Router–A Reliable Implementation

 
 
I. MultiHoming & Multipath Routing:
 
Sometime back I’ve proposed an idea about using RPi2 as a NetworkFailOver device for a home network, given there are multiple sources of Internet.
 
I’ve tried many solutions including NAT pooling, Custom Routng Tables, Network Interface Bonding etc. But none gave me a reliable implementation.
 
Finally I’ve found the concept of MultiHoming & Multipath Routing in Linux, which Actually did the trick. In MultiHoming you have multiple network interfaces in your Linux Box, each provides a separate subnet/network. Also you can configure a single network interface to provide multiple subnets.(e.g Say eth0 is your device you can configure eth0:0, eth0:1 to deliver two seperate network segments say 192.168.1.0/24, 172.16.1.0/24 respectively).
 
The real magic of Network Failover/Load Balancing is provided by MultiPath Routing, which seems surprisingly simple as below:
ip route add default \
        nexthop via $P1 dev $IF1 weight 1 \
        nexthop via $P2 dev $IF2 weight 1
By default Linux allows a single gateway for routing packets, which are not destined for your local network. Now the trick is to change this Single Gateway to a MultiPath Gateway, as shown above. In the above example, P1 is the gateway for interface IF1 and P2 is the gateway of interface IF2 and we added both with equal weight, which in turn work as a Load Balancer. Linux will try to equally split the packet traffic between the two interfaces.
 
Now if you change the weight, say you give weight 5 to IF1, it will work as a Network Failover Router, as most of the traffic will be routed to IF1, and if its is down traffic will be routed to IF2. See our example, where we’ve three network interfaces and each of them provides internet.

ip route add default \
      nexthop via 192.168.1.1 dev eth0 weight 8 \
      nexthop via 192.168.42.129 dev usb0 weight 6 \

      nexthop via 112.111.12.13 dev ppp0 weight 3
eth0 is our LAN interface, which also connected to a ADSL Broadband router provides internet, which is our Primary Source of Internet. Next usb0, is a 4G Modem, USB ethernet, considered as a Secondary source of internet. If both are down, we’ve a 3G GSM USB Modem (ppp0) running with PPP protocol and it has the least priority, as it is much slower compared to the other two. So above single line actually provided the required Load Balancing/Network Failover for the Home network. (Ofcourse we’ve Source NATed both usb0 and ppp0 network, as they are in a different address space compared to the LAN-eth0-192.168.1.0/24 )
 
I’ve followed this article, to implement the multipath routing, for my own environment, and voila it worked like a charm! Other than this I’d to perform a few tweaks with DNS name server resolutions, NATing my secondary internet sources etc, which have detailed in the solution sections below.
 
II. Linux Kernel Verson Problems past 3.6:
 
The above solution has worked, since the Linux Kernel Caches the route, when a packet opts a particular route, through a specified interface. All subsequent packets in that connection will follow the same route through the same interface, and hence we’ve a reliable connection with Load Balancing/Network Failover. This is based on the flow-based load balancing technique.
 
Around version 3.6 of the Linux Kernel, they had removed “Routing Cache”, since it is assumed to have a Denial Of Service (DoS) vulnerability. As of now, I’m using Ubuntu 14.04, which is having a Kernel version of 3.13, with Routing Cache removed. The above implementation mysteriously stopped working after the 14.04 upgrade. Since the Routing Cache has been removed, the packets belongs to the same session, will no longer stick to the same route/interface and they will pick any available interface in quasi-random fashion. The result, you may never be able to estabalish a TCP Connection at all, as the inital SYN packet may pick eth0, but the subsequent SYN+ACK will pick usb0, and the TCP 3-Way handshake may not succeed at all.
 
Though Routing Cache, has been re-introduced in Kernel Version 4.4 (Ubuntu 16.04) with a more robust hash over Source/Destination address load balancing, than flow-based algorithm, I don’t wan’t to stick to the Kernel features, as it can be changed in future, which may break my implementation. So I’ve decided to mimic the “Route Caching” with my own implementation using “TCP Connection Tracking”.
 
 
III. Reliable Solution: MultiHoming, Multipath Routing with TCP Connection Tracking:
 
I owe to WILL COOKE, for his article and the initial scripts, shared in his blog on how to realize this method. I’ve modified his scripts according to my environment, introduced a few other scripts and changes to achieve this implementation fully.
 
Before diving into the solution, refer the below figure which depicts our environment.
 
RPi2-Network Failover (3)
 
As you can see I’ve three networks.
 
a. Home LAN (Subnet: 192.168.1.0/ 24) – Connected to shared Network Switch
Home LAN also has an attached cabled ADSL Broadband Modem (192.168.1.1), Source of primary internet.
 
It has its own WAN port running a subnet range of 117.197.X.X. LAN Address are SNATed, by ADSL router itself for outgoing packets to its WAN port.
b. 4G USB Ethernet WAN (Subnet: 192.168.42.0/ X) – Connected to RPi2
Source of secondary internet. Priority: 2.
LAN Address are SNATed, while routing through this interface, as this has a different address space.
C. 3G USB PPP WAN (Subnet: 100.84.236.0/ X) – Connected to RPi2
Source of secondary internet. Priority: 3
LAN Address are SNATed, while routing through this interface, as this has a different address space.
RaspberryPi2 for its sole purpose, will act as the Edge Router, DHCP Server and DNS Server for the Home Network and I’ve disabled the DHCP Server on any other system and in ADSL Router. Now we need to implement a few scripts, to make it as a Network FailOver/Load Balance. By default Pi2 will route all internet traffic to the Primary Internet, Once its is down the traffic will be routed to Secondary Internet (4G Modem), If it too down, traffic will be routed to the other Secondary Internet (3G Modem).
 
If Primary Internet is back online, new traffic will be re-routed through the Primary. i.e Internet Traffic should always pick the source which has the highest priority and online. This was the major requirement to switch back to primary, when its is online. 
 
 
 
RPi2-Configuration Steps
 
Solution Step1 : Enable Router Mode-
 
We’ve to enable IP Forward, ICMP redirects in /etc/sysctl.conf. See the sysctl.conf file in the attached ZIP file.
 
Solution Step2 : Setup DHCP, DNS Server and Policies-
 
We’re using DNSMASQ as the DHCP, DNS server. We’ve to indicate the DHCP clients to use RPi2 as the DNS/Name server for address resolution using (—server and --dhcp-option=6 options). Mention RPi2’s LAN address (192.168.1.5) for these settings. See the ‘dnsmasq’ file in the attached ZIP file.
 
Now we’ve to mention the upstream DNS Servers in ‘/etc/resolvconf/resolv.conf.d/head’. i.e DNS IP Address of ADSL Broadband Modem, 4G USB modem and 3G USB modem. Also we’ve to enable ‘options timeout:5 attempts:2’, in the same file, so that RPi2 will fallback to other secondary DNS Servers, if primary interface is down and primary DNS server is not available at the moment. This is the Network FailOver settings for the DNS Servers, without which we cannot resolve DNS names, though we’ve the fallback secondary network running (As primary DNS server does not known to the Secondary Internet Source Network, and it only knows its own DNS Sever, the Secondary DNS Server, which should be used during fallback), once the primary is down. See the ‘head and tail’ files in the attached ZIP file.
 
 
Solution Step3 : Setup Routing Tables, NAT rules, Multipath Routes and Connection Marking/Tracking-
 
The steps and concepts have been much detailed here. So I’m only providing hints on certain constructs based on my environment.
 
Refer “iptables.save” in the attached zip file, here we does the below. We’re using PreRouting, Filter and PostRouting rules to accomplish the below.
a. Mark all new packets, based on the incoming network interface as 1,2 or 3 to track it during transit
 
b. Enable Source NAT on 4G USB (usb0) and 3G USB (ppp0)
 
c. Enable IP Forwarding between LAN and WAN interfaces
Referroute.sh” in the attached zip file, here we does the below.
a. Create separate routing tables for LAN, 4G USB, 3G USB and populate with their gateways
 
b. Create a dedicated routing table (loadbal) with multipath routes to LAN and WANs with corresponding weights for network failover/load balancing
 
c. Add rule, to pick the ‘loadbal’ routing table for all unmarked/untracked (new) packets, so that they are load balanced or pick a failover route
 
d. Add rule, to pick the corresponding routing table for all marked/tracked packets as per the marked number
 
e. Allocate each network interface’s send/receive queues to separate CPU cores for better performance
ReferRunJob_RouteConfig.sh” in the attached zip file, here we does the below.
a. Check the route configuration in every 15 seconds, if any glitches happens it rebuilds all routes periodically
 
It should be run on startup using the command, $ sudo bash RunJob_RouteConfig.sh
Run it manually or place it in /etc/rc.local

Solution Step4 : Redial 3G USB modem periodically, if it goes down-
 
Refer “RunJob_WvDialPPP.sh” in the attached zip file, here we does the below.
It should be run on startup using the command, $ sudo bash RunJob_WvDialPPP.sh
Run it manually or place it in /etc/rc.local
Solution Deployment: Configure the scripts at startup -
 
Now when it comes to fireup the Pi2 as FailOver/LoadBalance router, just plug-in the LAN cable, 4G USB Ethernet Modem, 3G USB PPP Modem and switch it on.
 
Run the below two scripts on start (Run it manually after bootup or configure them in /etc/rc.local, so they start automatically after the boot), which will run indefinitely to periodically validate and refresh routing config and initiate 3G Dialup, if it get disconnected.
 
$ sudo bash RunJob_RouteConfig.sh
$ sudo bash RunJob_WvDialPPP.sh
 
 
Congratulations, now you’ve transformed your RaspberryPi2 to a highly fault tolerant Network Failover/Load Balance router ! 
 
A few screenshots from my environment: 
 
IMG_20170129_170452503

RPi2 Running

No comments:

Post a Comment