beyond elastic

What lies beyond the elastic horizon…

Configure vSphere with Tanzu behind a Proxy plus TKG Extensions

Since the release of vSphere 7 U1c, a global HTTP/HTTPS proxy configuration is available for all Tanzu Kubernetes Clusters (TKCs). Since the release of U2, it is even possible to configure the proxy settings per TKC. However, if you want to use the TKG Extensions (Contour, FluentBit, Prometheus, and Grafana), you have to configure the proxy settings for the kapp controller as well. This blog post is a comprehensive guide on how and where to configure HTTP/HTTPS proxy settings in vSphere with Tanzu. 

My Setup

To test and validate the available options, I have deployed a simple Squid proxy in my home-lab environment. If you want to reproduce it, you can find a simple installation and configuration guide for Squid here.

Additionally, I am running vSphere 7 U2 Build 17920168 (April 2021 release) with VDS-based networking and the NSX Advanced Loadbalancer. Nevertheless, you can apply this guide to NSX-T based environments as well. The only thing to note is, I am referring to vSphere with Tanzu and the Tanzu Kubernetes Grid Service (TKGS) and not Tanzu Kubernetes Grid Multi-Cloud (TKGM). For TKGM version 1.2.x, look at this blog post by William Lam or check out the official documentation for 1.3.x deployments.

Proxy settings for Tanzu Kubernetes Clusters

There are two options available to configure the HTTP/HTTPS proxy settings for your TKCs. Globally via the TKGServiceConfiguration or per cluster as described here. Both methods will configure the proxy settings on the TKC nodes to ensure they can pull container images via the proxy. We will have a look at both options.

Global configuration

For the global configuration to work, we need at least vCenter 7 Update 1c. Please note that it is not enough to update the vCenter Server; we also have to make sure the Supervisor Cluster is updated.

The global configuration is implemented via the TkgServiceConfiguration custom resource on the Supervisor Cluster. It can be used to configure the defaultCNI plugin (Antrea or Calico), proxy settings, and additional trusted certificates for TKCs. The TkgServiceConfiguration will be pulled for all the TKCs on your Supervisor Cluster. The first time you change the configuration will trigger a rolling update for all existing TKCs. Subsequently, you will have to perform a scale or patch operation to initiate the process.

To change the configuration, connect to the Supervisor Cluster as administrator, edit the existing tkg-service-configuration, or apply a predefined yaml file. For example, in my environment, the manifest looks as follows:

apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TkgServiceConfiguration
metadata:
  name: tkg-service-configuration
spec:
  defaultCNI: antrea
  proxy:
    httpProxy: http://192.168.96.229:3128
    httpsProxy: http://192.168.96.229:3128
    noProxy: [192.168.96.0/24,192.168.24.0/24,192.168.14.0/24,.local,.svc,.svc.cluster.local]

IMPORTANT: Ensure you have the correct noProxy settings specified. Otherwise, you will face issues later.

As a minimum, you have to add the Workload, Management, and Front-end network CIDR and .local, .svc, .svc.cluster.local (required for kapp controller / TKG Extensions) to the noProxy configuration. For NSX-T based deployments, also add the Ingress and Egress network.

As stated in the documentation, the following networks do not need to be excluded:

  • Service CIDR from the Supervisor Cluster (no interaction)
  • Service and POD CIDR from the TKCs (will be added automatically)
  • .local and 127.0.0.1 (will be added automatically)

After applying the configuration, you can verify it by querying the tkg-service-configuration object as follows:

(⎈ |services:services)➜ k get tkgserviceconfigurations tkg-service-configuration -oyaml
apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TkgServiceConfiguration
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"run.tanzu.vmware.com/v1alpha1","kind":"TkgServiceConfiguration","metadata":{"annotations":{},"name":"tkg-service-configuration"},"spec":{"defaultCNI":"antrea","proxy":{"httpProxy":"http://192.168.96.229:3128","httpsProxy":"http://192.168.96.229:3128","noProxy":["192.168.96.0/24","192.168.24.0/24","192.168.14.0/24",".local",".svc",".svc.cluster.local"]}}}
  creationTimestamp: "2021-05-07T12:41:05Z"
  generation: 2
  managedFields:
  - apiVersion: run.tanzu.vmware.com/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
      f:spec:
        .: {}
        f:defaultCNI: {}
        f:proxy:
          .: {}
          f:httpProxy: {}
          f:httpsProxy: {}
          f:noProxy: {}
    manager: kubectl-client-side-apply
    operation: Update
    time: "2021-06-17T14:35:31Z"
  name: tkg-service-configuration
  resourceVersion: "34800395"
  selfLink: /apis/run.tanzu.vmware.com/v1alpha1/tkgserviceconfigurations/tkg-service-configuration
  uid: c3a80495-38e2-426f-9280-93a6b53f48ad
spec:
  defaultCNI: antrea
  proxy:
    httpProxy: http://192.168.96.229:3128
    httpsProxy: http://192.168.96.229:3128
    noProxy:
    - 192.168.96.0/24
    - 192.168.24.0/24
    - 192.168.14.0/24
    - .local
    - .svc
    - .svc.cluster.local

We can verify the proxy settings got applied on the TKCs by connecting via ssh to a TKC node and checking the following config file (/etc/systemd/system/containerd.service.d/http-proxy.conf):

vmware-system-user@tkg-svc-cl2-workers-ks6jk-7ccb7575d7-4sb98 [ / ]$ cat /etc/systemd/system/containerd.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=http://192.168.96.229:3128"
Environment="HTTPS_PROXY=http://192.168.96.229:3128"
Environment="NO_PROXY=192.168.96.0/24,192.168.24.0/24,192.168.14.0/24,.local,.svc,.svc.cluster.local,10.96.0.0/12,192.168.0.0/16,localhost,127.0.0.1"

Per TKC configuration

Another option is to specify the proxy settings per TKC. The per cluster option will always overwrite the global configuration. You have to configure the proxy settings as part of the TKC manifest as follows:

apiVersion: run.tanzu.vmware.com/v1alpha1      #TKG API endpoint
kind: TanzuKubernetesCluster                   #required parameter
metadata:
  name: tkg-proxy-cl1                         #cluster name, user defined
  namespace: services                     #supervisor namespace
spec:
  distribution:
    fullVersion: null
    version: v1.20                             #resolved kubernetes version
  topology:
    controlPlane:
      count: 1                                 #number of control plane nodes
      class: best-effort-small                 #vmclass for control plane nodes
      storageClass: gold         #storageclass for control plane
    workers:
      count: 1                                 #number of worker nodes
      class: best-effort-small                 #vmclass for worker nodes
      storageClass: gold         #storageclass for worker nodes
  settings:
    storage:
      defaultClass: gold
    network:
      cni:
        name: antrea
      pods:
        cidrBlocks:
        - 193.0.2.0/16
      services:
        cidrBlocks:
        - 195.51.100.0/12
      proxy:
        httpProxy: http://192.168.96.229:3128  #Proxy URL for HTTP connections
        httpsProxy: http://192.168.96.229:3128 #Proxy URL for HTTPS connections
        noProxy: [192.168.96.0/24,192.168.24.0/24,192.168.14.0/24,.local,.svc,.svc.cluster.local]

IMPORTANT: Ensure that your Proxy server’s IP address is not overlapping with the Pod or Service CIDR of your TKC!

The Proxy server IP address should never overlap with the Pod or Service CIDR of your TKC. If you haven’t specified specific CIDRs in your TKC manifest, the following default values will be used:

  • Pod/Cluster CIDR = 192.168.0.0/16
  • Service CIDR = 10.96.0.0/12

To find out what CIDRs are currently configured for your cluster, you can execute the following commands:

(⎈ |tkg-proxy-cl1:default)➜  kubectl cluster-info dump | grep -m 1 cluster-cidr

                            "--cluster-cidr=193.0.2.0/16",
(⎈ |tkg-proxy-cl1:default)➜  kubectl cluster-info dump | grep -m 1 service-cluster-ip-range

                            "--service-cluster-ip-range=195.51.100.0/12",

As a reminder, make sure to have the correct noProxy settings configured and verify them as mentioned under the “Global configuration” section. Be aware that the per-cluster option allows you to specify a serviceDomain as part of the TKC manifest. If you change the serviceDomain, you have to modify the noProxy settings from cluster.local to the domain you have specified.

Proxy settings for TKG Extensions

After the upper steps, your TKCs should have the proxy server configured and should be able to pull images via the same. However, this is just half of the story. If you want to install any of the TKG Extensions such as Contour (Ingress controller), Grafana, Prometheus, or FluentBit, you need to configure the proxy for the kapp controller.

Follow the instruction in the official documentation to download and extract the TKG Extension bundle. Within the extracted folder, you can find a “kapp-controller-config.yaml” under the extension folder (e.g. /tkg-extensions-v1.3.1+vmware.1/extensions/kapp-controller-config.yaml). Edit the file that it looks similar to the following:

---
apiVersion: v1
kind: Namespace
metadata:
  name: tkg-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  # Name must be `kapp-controller-config` for kapp controller to pick it up
  name: kapp-controller-config
  # Namespace must match the namespace kapp-controller is deployed to
  namespace: tkg-system
data:
  # A cert chain of trusted ca certs. These will be added to the system-wide
  # cert pool of trusted ca's (optional)
  #  caCerts: |
  #    -----BEGIN CERTIFICATE-----
  #    Certificate 1
  #    -----END CERTIFICATE-----
  #    -----BEGIN CERTIFICATE-----
  #    Certificate 2
  #    -----END CERTIFICATE-----

  # The url/ip of a proxy for kapp controller to use when making network
  # requests (optional)
  httpProxy: "http://192.168.96.229:3128"

  # The url/ip of a tls capable proxy for kapp controller to use when
  # making network requests (optional)
  httpsProxy: "http://192.168.96.229:3128"

  # A comma delimited list of domain names which kapp controller should
  # bypass the proxy for when making requests (optional)
  noProxy: "localhost,127.0.0.1,kubernetes.default.svc,.svc,cluster.local,.local,195.51.100.0/12"

  # A comma delimited list of hostnames for which kapp controller should
  # skip TLS verification (optional)
  #dangerousSkipTLSVerify: "cert-manager-webhook.cert-manager.svc,cert-manager-webhook"

IMPORTANT: Ensure to comment on all settings you don’t need, such as the caCerts section, and specify the noProxy settings as listed below!

It is crucial to specify the proxy settings for the kapp controller and the following noProxy settings:

  • localhost
  • 127.0.0.1
  • kubernetes.default.svc
  • .svc
  • cluster.local
  • .local
  • Service CIDR of the TKC

Otherwise, you will end up in a situation where the reconciliation of the extension app will fail one way or the other. Also, apply the kapp-controller-config.yaml before you deploy the kapp-controller itself.

(⎈ |tkg-proxy-cl1:default)➜ k apply -f kapp-controller-config.yaml
namespace/tkg-system created
configmap/kapp-controller-config created
(⎈ |tkg-proxy-cl1:default)➜ k apply -f kapp-controller.yaml
namespace/tkg-system unchanged
serviceaccount/kapp-controller-sa created
customresourcedefinition.apiextensions.k8s.io/apps.kappctrl.k14s.io created
deployment.apps/kapp-controller created
clusterrole.rbac.authorization.k8s.io/kapp-controller-cluster-role created
clusterrolebinding.rbac.authorization.k8s.io/kapp-controller-cluster-role-binding created
(⎈ |tkg-proxy-cl1:default)➜  k get pods -n tkg-system
NAME                              READY   STATUS    RESTARTS   AGE
kapp-controller-cf9df8646-nq6jh   1/1     Running   0          10m

I am not covering the entire TKG Extension deployment process in this blog post. After the kapp controller is running, continue with the deployment and configuration of the TKG Extension as described in the official documentation.

If everything is configured correctly, you should see the app reconciliation of your TKG Extensions (e.g., Contour) succeeding and pods running.

(⎈ |tkg-proxy-cl1:default)➜  k get app -n tanzu-system-ingress
NAME      DESCRIPTION           SINCE-DEPLOY   AGE
contour   Reconcile succeeded   6s             64s
(⎈ |tkg-proxy-cl1:default)➜  k get pods -n tanzu-system-ingress --watch
NAME                      READY   STATUS    RESTARTS   AGE
contour-b7dfd9bf9-fcbbc   1/1     Running   0          50s
contour-b7dfd9bf9-tf2dp   1/1     Running   0          50s
envoy-nwzd9               2/2     Running   0          52s
envoy-w4l65               2/2     Running   0          52s

Conclusion

To summarize, if you have to run vSphere with Tanzu behind a proxy, you need to configure the proxy settings on the Kubernetes cluster level and for the kapp controller to use any TKG Extensions. For the Kubernetes cluster configuration, you can either use the global or per-cluster option. The global option will trigger a rolling update for all existing TKCs during the first configuration, and the per-cluster option gives you more granularity. Additionally, you will need to configure the proxy settings on the vCenter to sync the TKC Photon OS images via the Content Library subscription. I have not covered the vCenter proxy configuration in this blog post as it is pretty straightforward as described in the official documentation. Here are three simple must checks you should adhere to have a working proxy configuration.

  1. Make sure you have all required noProxy settings specified on the TKC level (Workload CIDR, Management CIDR, Front-end CIDR, .local, .svc, .svc.cluster.local, Ingress and Egress CIDR for NSX-T based deployments).
  2. Verify that the IP address of your proxy is not overlapping with the Pod or Service CIDR of your TKC.
  3. Configure the correct noProxy settings for the kapp controller (localhost,127.0.0.1,kubernetes.default.svc,.svc,cluster.local,.local,Service CIDR of your TKC) and comment out all unused settings from the kapp-controller-config.yaml.

Sources

5 responses to “Configure vSphere with Tanzu behind a Proxy plus TKG Extensions”

  1. What to do when we have setup the Global configuration proxy, and we want to change the proxy address ?

    I was able to update the global proxy using the same method, but updated proxy address works only for newly deployed TKG cluster and existing cluster still showing old proxy address even after changed in global proxy.

    Any method to refresh the new proxy settings from global proxy to each existing TKG’s clsuter after chaning the proxy address ?

    Like

    1. The first time you apply the TkgServiceConfiguration will trigger a rolling update for all existing TKG clusters. Subsequently, for existing TKG clusters, you would need to perform a scale or patch operation to trigger a recreation.

      Like

  2. […] they can be used. You might have read about Tanzu Kubernetes Grid (TKG) Extensions in my previous blog post or the official VMware documentation. However, Tanzu Packages are the evolution of TKG Extensions. […]

    Like

  3. What is the source IP address hitting the proxy? I have restricted access to proxy and I need to know what IP address I must add to proxy configuration file.
    PS very nice article

    Like

    1. Sorry for the late response. The source IP will be the IP from one of the worker nodes. So you have to configure the IP range specified for your worker nodes on your proxy. Depending on which Antrea version you are currently using, you could also look at the Egress feature explained here: https://antrea.io/docs/v1.3.0/docs/egress/

      Like

Leave a reply to Alexander Ullah Cancel reply