VMware PKS 1.3 Scale K8s Clusters

Since the release of VMware PKS 1.3, we can scale up and down Kubernetes clusters. Scaling up was possible since the first PKS release but scaling down is a new capability of PKS 1.3. This feature gives Platform Reliability Engineers the necessary elasticity and flexibility to manage infrastructure capacity of their Kubernetes clusters. In this post, I want to describe the process of scaling a PKS managed Kubernetes cluster.

Toolset

vSphere HTML5 Client

Before we start, let’s make sure we have the right tools available to execute and monitor the scaling process. First of all, we should open the vSphere HTML5 Client to see the VMs that are forming our PKS Kubernetes cluster.

In my case, I have a Kubernetes cluster with 2 worker nodes deployed. and I want to scale down to 1. We can check the role of the VM by looking at the Custom Attributes “instance_group” and “job”.

screenshot 2019-01-23 at 15.06.49screenshot 2019-01-23 at 15.07.29

BOSH CLI

The next tool we should have available and ready to use is bosh-cli. I am a Mac user, so I decided to use Homebrew to install the latest bosh-cli version.

➜  ~ brew install cloudfoundry/tap/bosh-cli
...
==> Tapping cloudfoundry/tap
Cloning into '/usr/local/Homebrew/Library/Taps/cloudfoundry/homebrew-tap'...
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 14 (delta 1), reused 6 (delta 0), pack-reused 0
Unpacking objects: 100% (14/14), done.
Tapped 7 formulae (49 files, 55KB).
==> Installing bosh-cli from cloudfoundry/tap
==> Downloading https://s3.amazonaws.com/bosh-cli-artifacts/bosh-cli-5.4.0-darwin-amd64
######################################################################## 100.0%
==> Caveats
Bash completion has been installed to:
  /usr/local/etc/bash_completion.d
==> Summary
🍺  /usr/local/Cellar/bosh-cli/5.4.0: 4 files, 27.0MB, built in 14 seconds
➜  ~ bosh -v
version 5.4.0-891ff634-2018-11-14T00:21:14Z

But you can also simply download the right version for your OS on GitHub here and follow the instructions here, or use the already installed bosh-cli version on the Cloud Foundry Ops Manager. To SSH into the Ops Manager VM use the “ubuntu” user and the password that you have specified during the OVA deployment.

To login to your BOSH Director, we need to first copy the root_ca_certificate to the workstation/client from where we want to use the bosh-cli. We can download the certificate from the Ops Manager UI under Settings/Advanced. If you want to execute bosh-cli from the Ops Manager VM, you can find the certificate under the following folder “/var/tempest/workspaces/default/root_ca_certificate”.

Screenshot 2019-01-25 at 10.53.28.png

As a next step, we need to get the necessary bosh command line credentials via the Ops Manager UI, see screenshot.

Screenshot 2019-01-25 at 14.22.33.png

You should see an output like this after clicking on “Link to Credential”:

{"credential":"BOSH_CLIENT=ops_manager BOSH_CLIENT_SECRET=ZYxWvutsRqPoNmlKjIhgFeDcBA BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate BOSH_ENVIRONMENT=192.168.96.1 bosh "}

I recommend setting some variables to avoid the need to specify everything during command execution. Simply create a file and copy/format the collected content in the following way. If you are executing bosh-cli from a client and not from the Ops Manager itself, make sure that you change the BOSH_CA_CERT path to the location of the downloaded certificate.

export BOSH_CLIENT_SECRET=ZYxWvutsRqPoNmlKjIhgFeDcBA
export BOSH_CLIENT=ops_manager
export BOSH_ENVIRONMENT=192.168.96.1
export BOSH_CA_CERT=/home/aullah/root_ca_certificate

Save the file and “source” it whenever you need to execute bosh-cli commands against the environment. Alternatively, you can add the content to your bash profile to have it available every time you start.

Execute the “bosh vms” command to see if it’s working. If everything is configured correctly, you should see an output like this. Find your Kubernetes deployment and make a note of your deployment ID, see screenshot. We will need the deployment ID later to monitor the scaling process.

screenshot-2019-01-23-at-16.13.46-1-e1548257877943.png

PKS CLI and Kubectl

To download and install PKC CLI, simply follow the instructions here. Rename the downloaded PKS CLI file to “pks”, make it executable and move it to the /usr/local/bin/ folder.

➜ ~ mv pks-darwin-amd64-1.3.0-build.126 pks
➜ ~ chmod +x pks
➜ ~ mv pks /usr/local/bin/pks
➜ ~ pks --version

PKS CLI version: 1.3.0-build.126

Login to PKS with your username and password.

pks login -a <API> -u <USERNAME> -p <PASSWORD> -k

In addition, we obviously need kubectl to monitor the scheduling and restarts of our Pods during the scale operation. Kubernetes is not automatically balancing the Pods during normal operations, only at the time of creation or when Pods get killed and need to be rescheduled. That’s why we need to have an eye on the scale down operation as Pods will be killed and recreated on the remaining worker nodes. Here you can find more information on how to install kubectl.

Now that we have the vSphere HTML5 Client open, the bosh-cli configured, PKS CLI logged in and kubectl ready, we can start the scaling operation.

Scaling

Run the “pks cluster <clustername>” command to get some information about the cluster you want to scale down.

➜  ~ pks cluster k8s-cluster-01

Name:                     k8s-cluster-01
Plan Name:                small
UUID:                     d0bb926c-86ab-492a-b4b4-ba0824a8a49f
Last Action:              UPDATE
Last Action State:        succeeded
Last Action Description:  Instance update completed
Kubernetes Master Host:   pks-cluster-01
Kubernetes Master Port:   8443
Worker Nodes:             2
Kubernetes Master IP(s):  172.16.10.1
Network Profile Name:

In parallel, use kubectl to monitor the running Pods of your Kubernetes cluster. Execute the following command to watch your Pods and the corresponding worker nodes.

➜  ~ kubectl get pods -o wide --watch
NAME                              READY   STATUS    RESTARTS   AGE     IP             NODE                                   NOMINATED NODE
nginx-85474d599b-clmtm            1/1     Running   0          8d      10.200.33.14   d9daf54a-06ac-4456-996e-970d42971141   <none>
nginx-85474d599b-w65tf            1/1     Running   0          8d      10.200.33.9    d9daf54a-06ac-4456-996e-970d42971141   <none>
redis-bb7894d65-xhrvm             1/1     Running   0          8d      10.200.33.6    d9daf54a-06ac-4456-996e-970d42971141   <none>
redis-bb7894d65-xtl7d             1/1     Running   0          8d      10.200.33.8    d9daf54a-06ac-4456-996e-970d42971141   <none>
redis-server-77b4d88467-j4glb     1/1     Running   0          6m31s   10.200.96.7    7f3298b9-a803-4b75-9e14-a67ad2cd1d28   <none>
yelb-appserver-58db84c875-gldl9   1/1     Running   0          6m31s   10.200.96.9    7f3298b9-a803-4b75-9e14-a67ad2cd1d28   <none>
yelb-db-69b5c4dc8b-78k5p          1/1     Running   0          6m31s   10.200.96.8    7f3298b9-a803-4b75-9e14-a67ad2cd1d28   <none>
yelb-ui-6b5d855894-4xjmp          1/1     Running   0          6m31s   10.200.96.6    7f3298b9-a803-4b75-9e14-a67ad2cd1d28   <none>

As a next step simply execute the “pks resize <clustername> -n x” command with a node count lower or higher than the existing. In my case, I want to scale down from 2 to 1 worker node.

➜  ~ pks resize k8s-cluster-01 -n 1

Are you sure you want to resize cluster k8s-cluster-01 to 1? (y/n): y
Use 'pks cluster k8s-cluster-01' to monitor the state of your cluster

To monitor the progress it is advisable to not only use the “pks cluster” command as shown in the output, instead use the BOSH CLI and execute the following commands.

Previously we have made a note of our deployment ID. We can now execute “bosh tasks -d <deployment_ID>” to see which task is currently being executed, followed by “bosh task <taks_number>” to get more details about the progress. Alternatively, you can execute “bosh tasks -r” to get a list of the recent tasks or “bosh task -a” to tail the latest task.

screenshot-2019-01-23-at-16.28.40.png

Here you can find some more examples of how to use bosh cli from Denny Zhang.

In vCenter, you will see that the worker node VM got deleted and removed from disk.

screenshot 2019-01-23 at 15.26.17

In the meantime, the kubectl command should show you a few “Terminating”, “Pending” and “ContainerCreating” outputs. That is expected as we deleted a worker node and Kubernetes needed to reschedule the Pods on the remaining worker node. What matters is that the Pods are in a “Running” state at the end.

yelb-appserver-58db84c875-9jklw   0/1   ContainerCreating   0     3s    <none>   d9daf54a-06ac-4456-996e-970d42971141   <none>
yelb-db-69b5c4dc8b-78k5p   1/1   Terminating   0     10m   10.200.96.8   7f3298b9-a803-4b75-9e14-a67ad2cd1d28   <none>
yelb-db-69b5c4dc8b-78k5p   1/1   Terminating   0     10m   10.200.96.8   7f3298b9-a803-4b75-9e14-a67ad2cd1d28   <none>
yelb-ui-6b5d855894-4xjmp   1/1   Terminating   0     10m   10.200.96.6   7f3298b9-a803-4b75-9e14-a67ad2cd1d28   <none>
yelb-ui-6b5d855894-4xjmp   1/1   Terminating   0     10m   10.200.96.6   7f3298b9-a803-4b75-9e14-a67ad2cd1d28   <none>
yelb-db-69b5c4dc8b-kczqx   1/1   Running   0     14s   10.200.33.30   d9daf54a-06ac-4456-996e-970d42971141   <none>
yelb-appserver-58db84c875-9jklw   1/1   Running   0     15s   10.200.33.31   d9daf54a-06ac-4456-996e-970d42971141   <none>
yelb-ui-6b5d855894-hjxwm   1/1   Running   0     15s   10.200.33.32   d9daf54a-06ac-4456-996e-970d42971141   <none>
redis-server-77b4d88467-xfp9t   1/1   Running   0     15s   10.200.33.34   d9daf54a-06ac-4456-996e-970d42971141   <none>
                                                                                                                                               
➜  ~ kubectl get pods -o wide
NAME                              READY   STATUS    RESTARTS   AGE   IP             NODE                                   NOMINATED NODE
nginx-85474d599b-clmtm            1/1     Running   0          8d    10.200.33.14   d9daf54a-06ac-4456-996e-970d42971141   <none>
nginx-85474d599b-w65tf            1/1     Running   0          8d    10.200.33.9    d9daf54a-06ac-4456-996e-970d42971141   <none>
redis-bb7894d65-xhrvm             1/1     Running   0          8d    10.200.33.6    d9daf54a-06ac-4456-996e-970d42971141   <none>
redis-bb7894d65-xtl7d             1/1     Running   0          8d    10.200.33.8    d9daf54a-06ac-4456-996e-970d42971141   <none>
redis-server-77b4d88467-xfp9t     1/1     Running   0          15m   10.200.33.34   d9daf54a-06ac-4456-996e-970d42971141   <none>
yelb-appserver-58db84c875-9jklw   1/1     Running   0          15m   10.200.33.31   d9daf54a-06ac-4456-996e-970d42971141   <none>
yelb-db-69b5c4dc8b-kczqx          1/1     Running   0          15m   10.200.33.30   d9daf54a-06ac-4456-996e-970d42971141   <none>
yelb-ui-6b5d855894-hjxwm          1/1     Running   0          15m   10.200.33.32   d9daf54a-06ac-4456-996e-970d42971141   <none>

Done, the scale down operation finished successfully and I have freed up some infrastructure resources. The same process can be used to scale up and add additional worker nodes to a PKS managed Kubernetes cluster.

Conclusion

Scaling PKS managed Kubernetes clusters is a very important capability as it allows for efficient resource utilization by giving back unused resources or to quickly expand if the resource demand is growing.

The scaling operation itself is very easy to execute and to monitor if you have the right toolset in place.

Since VMware PKS 1.3, Platform Reliability Engineers can also make use of the scale down capability and manage infrastructure capacity for their Kubernetes clusters in a more efficient way.

If you want to learn more about PKS 1.3, have a look at my VMware PKS 1.3 What’s New blog post.

Additional Sources

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s