VMware PKS 1.3 Monitor K8s with vROPs

Monitoring is a key discipline for everyone running Kubernetes clusters in production or similar environments. VMware PKS delivers out of the box integrations for logging, application monitoring, and infrastructure monitoring to satisfy requirements of different personas working with the platform.

The requirements of operations teams or Platform Reliability Engineers tend to be more infrastructure-oriented. Areas such as capacity management, performance management, and health monitoring of all components in a service chain are highly important to ensure OLAs/SLAs are met.

In this blog post, I want to demonstrate how to leverage the vRealize Operations Manager (vROPs) integration to monitor of PKS managed K8s clusters.

Integration

First of all, we need to make sure we have everything in place to establish the integration between the different components. In the following scenario, we have a PKS 1.3 managed K8s cluster and vRealize Operations Manager version 7.0 already up and running.

➜  ~ pks cluster k8scl01

Name:                     k8scl01
Plan Name:                small
UUID:                     557cffde-3647-4267-a50f-fa3e09a39608
Last Action:              CREATE
Last Action State:        succeeded
Last Action Description:  Instance provisioning completed
Kubernetes Master Host:   pkscl01.aulab.local
Kubernetes Master Port:   8443
Worker Nodes:             2
Kubernetes Master IP(s):  172.16.10.1
Network Profile Name:

Screenshot 2019-02-08 at 14.05.30.png

To monitor our K8s cluster, we need to download and install the “vRealize Operations Management Pack for Container Monitoring” from VMware’s Solution Exchange. Have a look at the Technical Specifications to ensure you have the right vROPs version (6.6.x and above) as well as the right VMware PKS version in place (1.1 and above).

Deploy cAdvisor DaemonSet

As a prerequisite of the Management Pack for Container Monitoring, we need to deploy cAdvisor as a DaemonSet on our Kubernetes cluster. The instructions and the necessary yaml definition can be found in the User Guide, here.

Simply copy the following code or the content from the User Guide into a yaml file (e.g. vrops-cadvisor.yaml).

apiVersion: apps/v1beta2 # apps/v1beta2 in Kube 1.8, extensions/v1beta1 in Kube < 1.8
kind: DaemonSet
metadata:
  name: vrops-cadvisor
  namespace: kube-system
  labels:
    app: vrops-cadvisor
spec:
  selector:
    matchLabels:
      name: vrops-cadvisor
  template:
    metadata:
      labels:
        name: vrops-cadvisor
        version: v0.31.0
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      hostNetwork: true
      containers:
      - name: vrops-cadvisor
        image: google/cadvisor:v0.31.0
        imagePullPolicy: Always
        volumeMounts:
        - name: rootfs
          mountPath: /rootfs
          readOnly: true
        - name: var-run
          mountPath: /var/run
          readOnly: false
        - name: sys
          mountPath: /sys
          readOnly: true
        - name: docker
          mountPath: /var/lib/docker #Mouting Docker volume
          readOnly: true
        - name: docker-sock
          mountPath: /var/run/docker.sock
          readOnly: true
        - name: containerd-sock
          mountPath: /var/run/containerd.sock
          readOnly: true
        - name: disk
          mountPath: /dev/disk
          readOnly: true
        ports:
        - name: http
          containerPort: 31194 #Port exposed
          hostPort: 31194  #Host's port - Port to expose your cAdvisor DaemonSet on each node
          protocol: TCP
        securityContext:
          capabilities:
            drop:
            - ALL
            add:
            - NET_BIND_SERVICE
        args:
          - --port=31194
          - --profiling
          - --housekeeping_interval=1s
      terminationGracePeriodSeconds: 30
      volumes:
      - name: rootfs
        hostPath:
          path: /
      - name: var-run            
        hostPath:
          path: /var/run
      - name: sys
        hostPath:
          path: /sys
      - name: docker
        hostPath:
          path: /var/vcap/store/docker/docker #Docker path in Host System
      - name: docker-sock
        hostPath:
          path: /var/vcap/sys/run/docker/docker.sock
      - name: containerd-sock
        hostPath:
          path: /var/run/docker/containerd/docker-containerd.sock
      - name: disk
        hostPath:
          path: /dev/disk

Create the DaemonSet with “kubectl create -f vrops-cadvisor.yaml“.

➜  ~ kubectl create -f vrops-cadvisor.yaml
daemonset.apps/vrops-cadvisor created
➜  ~ kubectl get pods --all-namespaces
NAMESPACE     NAME                                    READY   STATUS    RESTARTS   AGE
default       redis-server-77b4d88467-wc956           1/1     Running   0          24h
default       yelb-appserver-58db84c875-bgncm         1/1     Running   0          24h
default       yelb-db-69b5c4dc8b-zvhl2                1/1     Running   0          24h
default       yelb-ui-6b5d855894-v985g                1/1     Running   0          24h
kube-system   heapster-85647cf566-tnzkd               1/1     Running   0          3d16h
kube-system   kube-dns-7559c96fc4-lkw2n               3/3     Running   0          3d16h
kube-system   kubernetes-dashboard-5f4b59b97f-6dmpt   1/1     Running   0          3d16h
kube-system   metrics-server-555d98886f-rtfc9         1/1     Running   0          3d16h
kube-system   monitoring-influxdb-cdcf4674-27ndm      1/1     Running   0          3d16h
kube-system   vrops-cadvisor-d4dnm                    1/1     Running   0          7s
kube-system   vrops-cadvisor-p622f                    1/1     Running   0          7s
pks-system    event-controller-6c77ddd949-cszwv       2/2     Running   1          3d16h
pks-system    fluent-bit-88cxx                        2/2     Running   0          3d16h
pks-system    fluent-bit-p8qf9                        2/2     Running   0          3d16h
pks-system    sink-controller-65595c498b-gr8x4        1/1     Running   0          3d16h
pks-system    telemetry-agent-559f9c8855-6p2gr        1/1     Running   0          3d16h

To verify the functionality of cAdvisor as part of our Kubernetes cluster, connect to “http://node_ip:31194/containers/” where node_ip is the IP address of your Kubernetes node. Make sure that you can access the cAdvisor webpage and that metrics data is coming in.

This slideshow requires JavaScript.

Additionally, check if you can access information of the Docker containers via “http://node_ip:31194/docker/“. If the connection to the Docker daemon is working you should see a list of containers.

Screenshot 2019-02-08 at 15.51.13

Install Management Pack

Now that we have cAdvisor running, let’s install the vRealize Operations Management Pack for Container Monitoring. Log in to vRealize Operations Manager with Admin permissions and go to the “Administration” tab. Click the green + icon under “Solutions” to start the installation wizard.

Screenshot 2019-02-08 at 16.14.23.png

Select the PAK file that we have downloaded from VMware’s Solution Exchange and click “UPLOAD”.

Screenshot 2019-02-20 at 17.44.10.png

Follow the self-explanatory wizard until the installation is completed.

Screenshot 2019-02-08 at 09.02.05

Configure Management Pack

As a next step, we need to configure an “Adapter Instance” of the installed Management Pack. Click on the little gear icon to open the configuration page. We can configure multiple adapter instances, one per Kubernetes cluster if required.

Screenshot 2019-02-08 at 16.21.24.png

To add another adapter instance, click on the green + icon and specify a display name. Enter the Master URL of your cluster, select DaemonSet as cAdvisor Service and specify the cAdvisor port from the yaml definition that we have used earlier to create the cAdvisor DaemonSet (31194 in our case). Before we can test the connection, we need to add valid credentials for our Kubernetes cluster. Click on the green + icon next to the credential field.

Screenshot 2019-02-08 at 16.30.12

We are simply going to use the token from our local kubectl config file in this scenario. The config file can be found under “$home/.kube/” or you type in “kubectl config view” while on the correct kubectl config context. Alternatively, you can choose Basic or Client Certificate Authentication.

Specify the credential type, a display name, the Bearer Token value and click “OK”.

Screenshot 2019-02-08 at 16.51.46

Let’s see if the connection is working, click on “TEST CONNECTION”. If the connection was established successfully, you should see a message like this.

Screenshot 2019-02-08 at 16.30.20

Save the adapter instance settings by clicking on “SAVE SETTINGS”.

Screenshot 2019-02-09 at 21.40.32.png

Done, we have successfully integrated vRealize Operations Manager with our VMware PKS managed Kubernetes cluster. We can now have a look at the Kubernetes related information available in vROPs.

Monitoring

The “Kubernetes Overview” dashboard is now available under the “Dashboards” tab. The dashboard shows a lot of useful information about your K8s clusters, nodes, pods, and containers.

Cluster Widgets

First, select the K8s cluster you want to view under point 1. Immediately, you will see a lot of useful information, such as a summary of the K8s cluster objects (nodes, namespaces, pods, containers, …) and the corresponding health status. Widget 3 shows all active alerts and next to it you can see a health map of related objects.

Screenshot 2019-02-09 at 16.06.08.png

Node Widgets

In my lab, we can see a memory usage alert on both K8s nodes and therefore the health status is degraded. By clicking on one of the nodes within widget 5, we will get more useful details. Section 7 shows a health map with the pods running on the selected node. We can see that the trend for the “Memory Usage (%)” metric is quite high for some time now, see widget number 8. Right next to it, we can select node metrics we want to add to the metric chart.

Screenshot 2019-02-09 at 16.09.45.png

Solving such a resource problem is a very easy task with VMware PKS. We can simply scale the Kubernetes cluster and add additional nodes to it. If you want to learn more about scaling a K8s cluster with PKS, have a look at my blog post “VMware PKS 1.3 Scale K8s Clusters“.

Pod & Container Widgets

Further down we can find information about the pods and containers. Select the pod to inspect under widget 11. We will see the related containers and the health status next to it. Widget 13 shows trend lines for Pod metrics and within window 14 we can select metrics to be added to the metric chart.

Screenshot 2019-02-09 at 16.12.54.png

This is a lot of very useful information in just one dashboard! It helps Operators to quickly identify the root cause and to speed up the troubleshooting process. Additionally, it will monitor the environment and create alerts based on certain Symptoms, see screenshot.

Screenshot 2019-02-10 at 17.15.45

You should also check out the video from VMware’s Michael West on monitoring Kubernetes clusters with vRealize Operations Manager.

Conclusion

Implementing the PKS/vROPs integration via the vRealize Operations Management Pack for Container Monitoring is quite easy if you follow the instructions in this blog post or within the User Guide. I expect that to become more automated in future VMware PKS versions.

The vRealize Operations Management Pack for Container Monitoring allows operations teams and Platform Reliability Engineers to monitor and quickly analyze their K8s clusters. Health-related alerts in combination with the relationship views of native K8s objects, help to reduce the MTTR (mean time to repair) and to keep the environment in a healthy state. Additionally, vROPs can be used to plan and manage the capacity of the underlying infrastructure resources.

If you want to learn more about PKS 1.3, please have a look at my VMware PKS 1.3 What’s New blog post.

Additional Sources

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s