Lawyer Broflovski’s Principle, or DIY: Cloud Load Balancing

Lawyer Broflovski’s Principle, or DIY: Cloud Load Balancing

Representing the side of “Everyone” is Gerald Broflovski, the lawyer from South Park who plans to make quite a commission. Representing the side of “Everyone Else” is Gerald Broflovski. So whatever the outcome, things look very bright for Kyle’s dad.
South Park, Episode 306, Panda Sexual Harassment

Smug Alert!

If you have already launched projects with multiple frontends, you probably know how to distribute traffic. Hosting providers like Amazon AWS or Google Clouds have plenty of tools. But what if you are still not happy with them? Or what if you like working with bare metal servers, not clouds? At Getintent, we love bare metal equipment and can talk about the advantages for hours. Check out some highlights below:

  • No traffic rip-off. Pay per gigabyte is the most popular revenue structure for cloud providers that have been spoilt by the start-ups’ demand. By contrast, hardware providers compete fiercely for the tight-fisted customer, and their common practice is to offer big packages of traffic for free. There are also clouds with pre-packaged traffic such as Linode, but you will be disappointed by the power of the instances provided. You might find out that to build some custom service you need a lot of instances. This means you end up paying the cloud provider for traffic all the same.
  • You can buy servers predictably. For example, the AWS Service Limits description takes up more than 30 printed pages. Each time you bump into a limit, you’ll have to explain use case to Amazon, and the latter will decide whether to give it to you. The requests from small customers are being processed within 3 days, but more often are not resolved for a full week. Meanwhile, reliable bare metal providers arrange everything immediately or the next day, and their only concern is how law-obedient you are and how promptly you pay the bills.
  • Hardware life makes you use proven open source, not proprietary solutions with vague features that cannot be changed if inappropriate or fixed when broken down.
  • Some things work properly only on the carefully adjusted equipment.Take Aerospike DBMS for example. The number of problems with it is directly proportional to the age of SSDs. That is why we do not only require bare metal SSDs, but also brand new ones. Moving to NVMe interfaces where possible reduced the load average fivefold. Those using Aerospike on cloud have nothing to do but smile sadly and go on fixing yet another cluster problem.

A lot of dedicated providers have implemented virtual networks feature so server farm expansion in them is no problem. A popular exception is Hetzner where you still need to reserve units and allocate space. Let us hope it will soon come up with virtual networking too.

Where it even makes sense to discuss balancing with the provider, they will most likely offer a modern version of this:

This hardware is pretty rare, therefore costs a fortune and requires specific expertise for configuration.

And then it is time to think about …

Chef, what’s for lunch today?

Your farms must already be using some kind of automation, which means making changes at scale is cheap. As an HTTP front-end, you are most likely using nginx on the same servers that serve the application. This makes it easy to build a balancing scheme, where each server is a balancer and a host for the app that responds to the request.

First, configure the server pool in gdnsd:

; Let’s check with a command whether the server 
; with a specific IP-address is alive
service_types => {
   adserver => {
      plugin => extmon,
      cmd => ["/usr/local/bin/check-adserver-node-for-gdnsd", "%%IPADDR%%"]
      down_thresh => 4,  ; Failed to respond to the test fourfold? 
                         ; It’s dead.
      ok_thresh => 1,    ; Responded at least once, it’s alive.
      interval => 20,    ; Check every 20 seconds.
      timeout => 5,      ; Consider a failure to respond 
                         ; within 5 seconds as no-response.
   } ; adserver
} ; service_types
plugins => {
   multifo => {
      ; If less than one third of the hosts are alive, 
      ; DNS will return the whole list of the servers 
      ; as there must be some problem, e.g. an application overload.
      ; Disabling the broken servers can only make things worse.  
      up_thresh => 0.3,
      adserver-eu => {
         service_types => adserver,
         www1-de => 192.168.93.1,
         www2-de => 192.168.93.2,
         www3-de => 192.168.93.3,
         www4-de => 192.168.93.4,
         www5-de => 192.168.93.5,
      } ; adserver-eu
   } ; multifo
} ; plugins

If less than one third of the hosts are alive, DNS will return the whole list of the servers as there must be something wrong, like an application overload. Disabling the broken servers can only make things worse.

Use this configuration in the DNS zone file:

eu.adserver.sample. 60 DYNA multifo!adserver-eu

Then create a pool of nginx servers and deploy the configuration across the cluster:

upstream adserver-eu {
    server localhost:8080 max_fails=5  fail_timeout=5s;
    server www1-de.adserver.sample:8080 max_fails=5 fail_timeout=5s;
    server www2-de.adserver.sample:8080 max_fails=5 fail_timeout=5s;
    server www3-de.adserver.sample:8080 max_fails=5 fail_timeout=5s;
    server www4-de.adserver.sample:8080 max_fails=5 fail_timeout=5s;
    server www5-de.adserver.sample:8080 max_fails=5 fail_timeout=5s;
    keepalive 1000;
}

If there are more than 25 servers in the rotation, make sure to split them into groups of 25 under one name and dynamically select synonyms (DYNC) for gdnsd instead of A-records.

The list of servers can be filled in automatically based on the data from gdnsdor automation system.

At Getintent, we use Puppet software and a special cron script keeps the lists of the several types of servers registered on the puppetmaster up to date. The localhost line keeps us safe in case the list is mistakenly empty. As long as each nginx serves at least itself, ­no crash should occur.

Bring it on, boys, bring it on!

So, what we’ve got:

  • Excellent fault-tolerance. Risks are limited to network and configuration issues. Single point of failure in the connectivity can be solved by distributing the application across several regions with independent power and network. As for the configuration errors, you can make nginx or gdnsdchanges in the configuration and break everything down, but luckily there are dozens of ways to fix this problem. For example, force delay between updating the servers and update each region independently.
  • No problems like a pool warm-up or lack of server capacity. Surely your web application will die from the load first.
  • Even traffic usage and network load distribution. Take traffic at 5 Gbit/s on 10 servers with gigabit? No problem! Just don’t forget to direct the traffic between the balancers and applications to the internal network. Also, make sure the provider does not sell you a switch with 1 Gbit/s of total uplink to the Internet for these 10 servers.
  • Smooth deploys. When we used DNS-balancing only, our SSP partners would get pissed off each time our servers dropped out from the pool one by one. Furthermore, during initialization our Java applications load a lot of data into the memory, which takes time and would make the situation even more jumbled. Cross-balancing has made the process almost unnoticeable for the other side.

Mr Twig gets offended

There is no such thing as a free lunch. So you have to pay for free balancing, specifically for sticky sessions. However, a free version of nginx offers IP hash balancing, while modern NoSQL databases like Aerospike can be used as a reliable storage for shared sessions with fast response. Finally, you may switch to HAProxy that can route users on multiple balancer sheets by a single rule. So when we need sticky sessions, we will use these tools available.

Numbers

Now each of our 20-core servers processes up to 15,000 requests per second, which is close to the upper limit for our Java application. Across all clusters, we balance and process in peak more than half a million requests per second, 5 Gbps of incoming and 4 Gbps of outgoing traffic excluding CDN. Yes, the specifics of DSP is that incoming traffic surpasses the outgoing one, as there are more bids than relevant responses to them.