Build your own high availability home data center, part 3!

      8 min read  

Hey there!

Welcome to my blog! This is the third blog of a series of blog posts I’ve been doing so if you haven’t had the chance to check the other parts, I started this series here and the second part is here. In the last part of the series we configured the Apache web servers, SSH, FTP and the database server with all its components. In this tutorial we’ll be configuring the load balancer and then we’ll configure the web app that shows us the site that we want so play some music, grab something to drink and follow along! :) Hope you like this one!

Intro to load balancers

First, let’s tackle what a load balancer is. A load balancer is a device that sits in a network and efficiently distributes incoming traffic to different servers or farms of them. The load balancer helps the servers to maximize their speed and avoid one server from being overworked. For more info you can read Nginx guide here. Based on that, we now know that a load balancer will take the incoming requests we make to it and pass them on to the web servers, the web servers get the data we requested and send it back to us. This process of taking the incoming requests from the client, gathering the information and responding is called reverse proxying. A reverse proxy is usually the same server that performs the load balancing, which is doing all this different processes without the client noticing. The process works a little like this:

  1. A user sends a request for /index.html to our IP. In this case, our IP will be as example.
  2. The reverse proxy with IP gets the request, meaning that the end user doesn’t have access to the web servers themselves allowing a more secure environment.
  3. The reverse proxy load balances the request to the next web server available based on the load balancing algorightm it uses (more on that later).
  4. Once the web server gets the request, it queries the data, prepares it and sends it back to the load balancer.
  5. The proxy server (or load balancer) gets the response and sends it back to the client, fulfilling its request. All without the client noticing the interactions that the proxy server did.

With the process explained, lets now look at these load balancing algorithms I talked about a few moments ago. Choosing between the different algorithms will depend on what our site is doing. For example, if we were a bank and have two web servers, when we log into the bank website (as a user) the session will be opened in one web server and we’ll be able to see our money $$$. Now, lets say we are using a round-robin algorithm at the load balancing server. This means that the load balancer will send the requests to the two web servers, one to each at a time. If web A got the first request, then B will be the next to receive one, then back to A, and so on… With that said, as we need a session to be established with the web servers and be persistent, if we use a round robin algorithm then our next request will go to a web server that doesn’t know us, we haven’t established a connection with it as we did with the first web server that took our request in the first place. For this specific case, we need the connection to be persistent, to go to the same web server. This means that we’ll use an IP hashing algorithm, which takes the IP address of the user and sends the requests from that IP to the same server everytime, allowing the persistence we need.

Lastly, we also have a **least connected algorithm **that does what the name says, the loadbalancer takes the requests from the client and sends them to the web server that has been used the least. As a summary:

I’d also like to go over the types of load balancers that we can have. Just as we have type 1 and type 2 hypervisors, we have hardware and software load balancers. As you can imagine, a hardware load balancer will be a server that has an bultin software installed specifically to move the requests at hardware level, normally through ASICs that are built for that but these are regullarly expensive as they are custom made, for example F5 load balancers. On the other hand, we’ll be using a much cheaper and flexible version of them that are software load balancers. The latter will have software installed on top of the operating system (in our case Ubuntu) and do all its functions via software. We’ll be configuring Nginx, an open source software that is very much used in the industry. Let’s now proceed to the fun part, installing the software and configuring the load balancer!

Configuring Nginx

To be able to configure the load balancer we’ll just need to add SSH to the server so we don’t need to login to it through the console. Take a look at part 1 of my previous tutorial if you don’t know how to install SSH.

Next, we’ll proceed to install and configure Nginx, you’ll notice how easy it is for us to setup Nginx as a load balancer for high availability and as a reverse proxy. The steps are as follows:

sudo apt-get install nginx
sudo nano /etc/nginx/sites-available/default
upstream backend_servers {    server;    server;}
location / {   proxy_pass http://backend_servers;} 
location /php.php { proxy_pass http://backend_servers; }
sudo service nginx restart

The final configuration looks something like this:

And we are done! Go to your browser, search for the IP address of your load balancer server and you should see the Apache1 and Apache2 hostnames changing each time you reload the website. This is our load balancer performing a round robin on the traffic!


Final part, the code!

The last step we need to do is to setup our website. I have coded it for you, you’ll just need to follow the next steps to add the website to the Apache servers and you should be done! The code is here, in a GitHub repository. There are 3 important files there:

Please download these 3 files from the directory, you can either copy their content and save it (make sure you change the file name and extension exactly as it is) or clone the repo to your PC. The only change you need to do is to open the db_config.php file in your favorite text editor and change the line

$servername = "server_ip";

with the ip of the database server, this will tell php where to look for our data.

Finally, all you need to do is to upload the files to the web servers with FileZilla. You can refer to the part 2 of this series of tutorials to know how to do this. Remember to change the pemissions of the files! Once that is done, look up for the IP address of the Nginx server, you should now be able to see traffic being load balanced to each web server and the CRUD working!


And that is it! We have successfully configured a reverse proxy with Nginx that takes incoming HTTP requests and load balances them to two web servers, these web servers get data from a fourth server which is our database.



For this last reason, my next tutorial will be about scaling this lab and making it even more flexible and less prone to downtime. I’ll be configuring a virtual IP that will be shared between two reverse proxies, one master and one slave. If at any time one server crashed, the second would take the mastership of the IP and continue routing traffic without issues. Coming soon!

I hope you liked this blog post, found it interesting and learned something! For comments, questions or suggestions feel free to send them through the contact me box at the bottom of the screen here. Thanks a lot for reading!

- Gabriel