So far, our application is running on a single Pod. What might work well for our demo application, won’t be sufficient for the real world out there - especially for websites with high and / or spiky traffic loads. Here, scaling comes into play.
Scaling means running multiple Pods with our application as well as load balancing to distribute traffic to all these Pods. Luckily, Services will help us there, as they have an integrated load balancer and will monitor continuously all running Pods to ensure the traffic is sent only to available and healthy Pods.
In order to scale our application, we first want to check which ReplicaSet was automatically created by our deployment from Lab 2:
kubectl get rs
By default, a deployment is creating only one pod, so DESIRED counts 1. As our Pod is up and running, we can also see 1 in CURRENT.
Let’s scale out to 4 replicas:
kubectl scale deployments/hello-world --replicas=4
Let’s check what has happened to our deployments and pods:
kubectl get deployments
kubectl get pods -o wide
Our scaling worked and we have now 4 pods of our application available - you can also check in the Docker Desktop app.
We can also scale our app down, e.g. running only on 2 pods:
kubectl scale deployments/hello-world --replicas=2
If you’ll check the status of the pods, you will see two of them changing their status to “Terminating” before vanishing completetely by running again
kubectl get pods
So we have our two pods running with our app, but does the load balacing work or is all traffic sent to one Pod? Let’s find out!
We want to simulate traffic spikes by sending HTTP requests to our app. Before we can start, we need to install hey, a CLI to send HTTP requests.
Now that we have installed hey, let’s send requests by running
hey -n 1000 http://localhost:3000/
in the terminal.
You should receive an output similar to this:
Summary:
Total: 0.1128 secs
Slowest: 0.0354 secs
Fastest: 0.0008 secs
Average: 0.0055 secs
Requests/sec: 8861.5919
Total data: 393000 bytes
Size/request: 393 bytes
Response time histogram:
0.001 [1] |
0.004 [395] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.008 [523] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.011 [35] |■■■
0.015 [18] |■
0.018 [11] |■
0.022 [3] |
0.025 [3] |
0.028 [4] |
0.032 [4] |
0.035 [3] |
Latency distribution:
10% in 0.0033 secs
25% in 0.0038 secs
50% in 0.0046 secs
75% in 0.0060 secs
90% in 0.0074 secs
95% in 0.0101 secs
99% in 0.0256 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0002 secs, 0.0008 secs, 0.0354 secs
DNS-lookup: 0.0001 secs, 0.0000 secs, 0.0019 secs
req write: 0.0000 secs, 0.0000 secs, 0.0002 secs
resp wait: 0.0053 secs, 0.0008 secs, 0.0320 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0003 secs
Status code distribution:
[200] 1000 responses
So we sent 1000 requests and always received HTTP status code 200.
Run the hey command from above several times in your terminal and then visit the Docker Desktop app.
Now we can investigate what happened within the containers of our app. Click on each of the containers and select Stats
in the details view. You should see that both our containers had peaks in CPU usage and our load balancing works.