Scaling Worker Nodes in a Tanzu Kubernetes Grid (TKG) Cluster with Cluster Autoscaler

TKG supports scaling nodes in different ways:-

In this blog post, I will be talking about scaling worker nodes with Cluster Autoscaler.

Scale Worker Nodes with Cluster Autoscaler

Cluster Autoscaler is a Kubernetes program that automatically scales nodes depending on the demands on the cluster. e.g. You are trying to schedule a pod that requires more memory than available, cluster autoscaler will trigger the worker node addition.

You can only enable autoscaler in a TKG workload clusters , not in a management cluster. By default, this is disabled and you need to specify the parameters in a workload cluster yaml file during the cluster creation. You can find the variables and their description here.


#! ---------------------------------------------------------------------
#! Autoscaler related configuration
#! ---------------------------------------------------------------------
ENABLE_AUTOSCALER: false
AUTOSCALER_MAX_NODES_TOTAL: "0"
AUTOSCALER_SCALE_DOWN_DELAY_AFTER_ADD: "10m"
AUTOSCALER_SCALE_DOWN_DELAY_AFTER_DELETE: "10s"
AUTOSCALER_SCALE_DOWN_DELAY_AFTER_FAILURE: "3m"
AUTOSCALER_SCALE_DOWN_UNNEEDED_TIME: "10m"
AUTOSCALER_MAX_NODE_PROVISION_TIME: "15m"
AUTOSCALER_MIN_SIZE_0:
AUTOSCALER_MAX_SIZE_0:
AUTOSCALER_MIN_SIZE_1:
AUTOSCALER_MAX_SIZE_1:
AUTOSCALER_MIN_SIZE_2:
AUTOSCALER_MAX_SIZE_2:

Above is an example of Cluster Autoscaler settings in a cluster configuration file. You cannot modify these values after you deploy the cluster. Here is what I have used during workload cluster deployment.

ENABLE_AUTOSCALER: true
AUTOSCALER_MAX_NODES_TOTAL: "3"
AUTOSCALER_SCALE_DOWN_DELAY_AFTER_ADD: "10m"
AUTOSCALER_SCALE_DOWN_DELAY_AFTER_DELETE: "10s"
AUTOSCALER_SCALE_DOWN_DELAY_AFTER_FAILURE: "3m"
AUTOSCALER_SCALE_DOWN_UNNEEDED_TIME: "10m"
AUTOSCALER_MAX_NODE_PROVISION_TIME: "15m"
AUTOSCALER_MIN_SIZE_0: "2"
AUTOSCALER_MAX_SIZE_0: "3"

Once cluster is deployed, then follow the steps in the next section.

tanzu cluster list
  NAME         NAMESPACE  STATUS   CONTROLPLANE  WORKERS  KUBERNETES        ROLES   PLAN  
  wkldcluster  default    running  1/1           1/1      v1.22.5+vmware.1  <none>  dev  

Test Autoscaling

In order to test the autoscaling feature, we will deploy a pod and it will request more memory than available in an existing nodes. e.g. You can specify based on the available memory in your cluster nodes.

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: nginx
  name: nginx
spec:
  containers:
  - image: quay.io/bitnami/nginx
    name: nginx
    resources:
      requests:
        memory: "3000Mi"
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

Apply the above yaml file by running kubectl apply -f <yaml file>

kubectl apply -f pod.yaml 
pod/nginx created

Notice the status, it will show pending

root@dt-vc-ubuntu:~/.config/tanzu/tkg/clusterconfigs# kubectl get pods
NAME          READY   STATUS    RESTARTS   AGE
memory-demo   1/1     Running   0          99s
nginx         0/1     Pending   0          2s

Once you describe the pod, notice the last line where autoscaler is triggered.

kubectl describe po nginx
Name:         nginx
Namespace:    default
Priority:     0
Node:         <none>
Labels:       run=nginx
Annotations:  <none>
Status:       Pending
IP:           
IPs:          <none>
Containers:
  nginx:
    Image:      quay.io/bitnami/nginx
    Port:       <none>
    Host Port:  <none>
    Requests:
      memory:     3000Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-htbvx (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  kube-api-access-htbvx:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age   From                Message
  ----     ------            ----  ----                -------
  Warning  FailedScheduling  29s   default-scheduler   0/2 nodes are available: 1 Insufficient memory, 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
  Normal   TriggeredScaleUp  18s   cluster-autoscaler  pod triggered scale-up: [{MachineDeployment/default/wkldcluster-md-0 1->2 (max: 3)}]

Notice the newly created task in vCenter for creating a worker node.

See the newly created node by running kubectl get nodes command.

$ kubectl get nodes
NAME                                STATUS   ROLES                  AGE   VERSION
wkldcluster-control-plane-hlbhr     Ready    control-plane,master   10h   v1.22.5+vmware.1
wkldcluster-md-0-865474b475-295xq   Ready    <none>                 86s   v1.22.5+vmware.1
wkldcluster-md-0-865474b475-w87l4   Ready    <none>                 8h    v1.22.5+vmware.1

Validate the status of pod. It must be running now. Run kubectl get pods

nginx         1/1     Running   0          3m59s

That’s it. It’s very simple and straight forward to enable cluster autoscaler. Try other other scaling options as mentioned in the start of this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s