Scaling of our app

So far, our application is running on a single Pod. What might work well for our demo application, won’t be sufficient for the real world out there - especially for websites with high and / or spiky traffic loads. Here, scaling comes into play.

Scaling means running multiple Pods with our application as well as load balancing to distribute traffic to all these Pods. Luckily, Services will help us there, as they have an integrated load balancer and will monitor continuously all running Pods to ensure the traffic is sent only to available and healthy Pods.

Scale out

In order to scale our application, we first want to check which ReplicaSet was automatically created by our deployment from Lab 2:

kubectl get rs

By default, a deployment is creating only one pod, so DESIRED counts 1. As our Pod is up and running, we can also see 1 in CURRENT.

Let’s scale out to 4 replicas:

kubectl scale deployments/hello-world --replicas=4

Let’s check what has happened to our deployments and pods:

kubectl get deployments
kubectl get pods -o wide

Our scaling worked and we have now 4 pods of our application available - you can also check in the Docker Desktop app.

Scale down

We can also scale our app down, e.g. running only on 2 pods:

kubectl scale deployments/hello-world --replicas=2

If you’ll check the status of the pods, you will see two of them changing their status to “Terminating” before vanishing completetely by running again

kubectl get pods

Test the load balancing

So we have our two pods running with our app, but does the load balacing work or is all traffic sent to one Pod? Let’s find out!

We want to simulate traffic spikes by sending HTTP requests to our app. Before we can start, we need to install hey, a CLI to send HTTP requests.

Now that we have installed hey, let’s send requests by running

hey -n 1000 http://localhost:3000/

in the terminal.

You should receive an output similar to this:

Summary:
  Total:	0.1128 secs
  Slowest:	0.0354 secs
  Fastest:	0.0008 secs
  Average:	0.0055 secs
  Requests/sec:	8861.5919

  Total data:	393000 bytes
  Size/request:	393 bytes

Response time histogram:
  0.001 [1]	|
  0.004 [395]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.008 [523]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.011 [35]	|■■■
  0.015 [18]	|■
  0.018 [11]	|■
  0.022 [3]	|
  0.025 [3]	|
  0.028 [4]	|
  0.032 [4]	|
  0.035 [3]	|

Latency distribution:
  10% in 0.0033 secs
  25% in 0.0038 secs
  50% in 0.0046 secs
  75% in 0.0060 secs
  90% in 0.0074 secs
  95% in 0.0101 secs
  99% in 0.0256 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0002 secs, 0.0008 secs, 0.0354 secs
  DNS-lookup:	0.0001 secs, 0.0000 secs, 0.0019 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0002 secs
  resp wait:	0.0053 secs, 0.0008 secs, 0.0320 secs
  resp read:	0.0000 secs, 0.0000 secs, 0.0003 secs

Status code distribution:
  [200]	1000 responses

So we sent 1000 requests and always received HTTP status code 200.

Run the hey command from above several times in your terminal and then visit the Docker Desktop app.

Now we can investigate what happened within the containers of our app. Click on each of the containers and select Stats in the details view. You should see that both our containers had peaks in CPU usage and our load balancing works.