beyond elastic

What lies beyond the elastic horizon…

Combining Terraform and GitOps for EKS Automation using Tanzu Mission Control

Nowadays, DevOps or Platform teams are challenged with abstracting and hiding the infrastructure complexity and building consumer-friendly solutions. These platforms must be easily adjustable, operatable, scalable, and secure. It is not an easy task if you consider that the required skillset in the cloud-native and Kubernetes landscape is rare, and often, not enough resources are available internally or externally. This is also reflected in the State of Kubernetes 2023 report, where 57% of the participants answered that “inadequate internal experience and expertise” is the number one challenge in managing Kubernetes.

That is one of the many reasons why automation is one of the most essential topics for DevOps or Platform teams. In this blog post, I want to demonstrate how to use Infrastructure as Code (IaC) with Terraform in combination with GitOps to achieve end-to-end automation using VMware Tanzu Mission Control (TMC).

TMC is VMware’s multi-cloud Kubernetes management solution, and it can be used as a SaaS offering or as a self-managed deployment option on-premises. If you want to learn more about deploying TMC on-premises, look at one of my recent blog posts here.

For my scenario, I will use the TMC terraform provider to deploy and configure an AWS EKS cluster in my region of choice. Further, we will use the Continuous Delivery (CD) feature of TMC, which is based on FluxCD, to configure our cluster and to deploy our workload in GitOps fashion.

TMC offers a very convenient way to apply and continuously synchronize Kubernetes manifests in a GitOps approach to multiple clusters without the need to configure CD on each cluster. Kubernetes clusters can be grouped in so-called Cluster Groups. Platform Operators can then define security, access, CD sources (Git repository and path to manifests), and more once on the Cluster Group level, and every cluster will inherit the configuration. FluxCD will be automatically deployed on each cluster in the Cluster Group, and the content from the Git repository will be picked up and synchronized with all clusters.

We will make use of the TMCs CD functionality on the Cluster Group level in this article. If you want to learn more about FluxCD and how it works under the covers, look at one of my previous blog posts here.

What do we need?

Before we can start, we need the following prerequisites in place.

  1. A VMware TMC account with a trial or purchased subscription. We are going to use the SaaS version of TMC.
  2. An AWS account for our EKS deployment.
  3. Terraform cli installed on our client. As I am using Mac OS, I simply used homebrew as described here and installed version 1.5.7.
  4. The latest TMC terraform provider from here. Version 1.2.4 at the time of writing this article.
  5. Access to a Git-based code repository and the git cli installed on our client.
  6. The latest Tanzu cli installed on our client.
  7. Compatible kubectl version available on our client.
  8. Optional: Visual Studio Code with the HashiCorp Terraform extension.
  9. Optional: AWS cli installed on our client.

Let’s put it together!

1. Connect the AWS account with TMC

The first thing we will do is to configure our AWS account in TMC. Once done, we can deploy and life-cycle EKS cluster from TMC. Besides EKS, TMC supports cluster life-cycle functionality for Tanzu Kubernetes Grid (TKG) and Microsoft AKS clusters. Other Kubernetes conformant clusters (DIY, Rancher, OpenShift…) can be attached to TMC, but cluster life-cycle operations will not be possible.

The process is pretty straightforward, and we are not going to cover it in much detail. Simply follow the documentation here and create a Cloud Formation Stack template which will create the necessary AWS role via the AWS console. After the creation of the stack, we need to copy the role ARN from the outputs section and use it in TMC to establish the connection.

If we have done everything correctly, we should see the credentials showing up as “valid” under the Administration/Accounts section of TMC:

Additionally, we need to make a note of two more ARN roles that have been created during our stack run. We can collect these using the AWS console and go through the resources section of the stack run, or we use the AWS cli. Simply insert the GeneratedTemplateID of your Cloud Formation Stack template in the command, and you should get your roles and the associated ARN.

➜  aws iam list-roles | grep 'worker.<GeneratedTemplateID>\|control-plane.<GeneratedTemplateID>'
            "RoleName": "control-plane.<GeneratedTemplateID>.eks.tmc.cloud.vmware.com",
            "Arn": "arn:aws:iam::012345678912:role/control-plane.<GeneratedTemplateID>.eks.tmc.cloud.vmware.com",
            "RoleName": "worker.<GeneratedTemplateID>.eks.tmc.cloud.vmware.com",
            "Arn": "arn:aws:iam::012345678912:role/worker.<GeneratedTemplateID>.eks.tmc.cloud.vmware.com",

2. Create a VPC for EKS cluster life-cycle

Before we can start using our newly added AWS account with TMC for EKS cluster life-cycle, we need to create a VPC with public and private subnets as described here. I am not going to cover the VPC creation task in this blog post. Nevertheless, we have to execute a similar process than before using a Cloud Formation Stack template in our AWS region of choice. Simply follow the procedure documented here for “public and private subnets” and use the URL for the IPv4 configuration. After creating the stack, we should make a note of the created subnets, security groups, and VPC, which we can find under the outputs section.

Once that is done, we should be able to create and life-cycle EKS clusters from the TMC UI. Nevertheless, we want to do it in an automated way, so let’s move on to the TMC Terraform provider.

3. Prepare the TMC Terraform Provider

We should have the TMC Terraform provider already downloaded to a folder that we will use to work with Terraform-related files. Now we need to unpack the zip file and inspect the files.

➜  tar -xzf terraform-provider-tanzu-mission-control_1.2.4_darwin_amd64.zip
➜  ls
LICENSE                                                         terraform-provider-tanzu-mission-control_1.2.4_darwin_amd64.zip
README.md                                                       terraform-provider-tanzu-mission-control_v1.2.4

To configure the provider, we must create a main.tf and variables.tf file in the same folder. We specify the TMC Terraform provider version and two variables for the API endpoint and token in the main.tf as shown here:

terraform {
  required_providers {
    tanzu-mission-control = {
      source = "vmware/tanzu-mission-control"
      version = "1.2.4"
    }
  }
}

provider "tanzu-mission-control" {
  endpoint = var.endpoint
  vmw_cloud_api_token = var.vmw_cloud_api_token
}

In the variables.tf we will define the type and value of our variables:

variable "endpoint" {
  description = "The Tanzu Mission Control service endpoint hostname"
  type = string
  default = "your-tmc-org-api-endpoint"
}

variable "vmw_cloud_api_token" {
  description = "The VMware Cloud Services API Token"
  type = string
  default = "your-api-token"
}

In case you wonder where to get the API endpoint and token from. The endpoint can quickly be taken from the browser URL once logged in to your TMC cloud org.

The API token can be created from the “My Account” section by clicking on your name in the top right corner. Select the “API Tokens” tab and click “GENERATE TOKEN”. Make sure the token has enough privileges to create workload clusters.

After we have created the necessary files, we can initialize our Terraform backend with the following command:

➜  terraform terraform init

Initializing the backend...

Initializing provider plugins...
- Finding vmware/tanzu-mission-control versions matching "1.2.4"...
- Installing vmware/tanzu-mission-control v1.2.4...
- Installed vmware/tanzu-mission-control v1.2.4 (signed by a HashiCorp partner, key ID 6B6B0F38607A2264)

Partner and community providers are signed by their developers.
If you'd like to know more about provider signing, you can read about it here:
https://www.terraform.io/docs/cli/plugins/signing.html

Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Great, the initialization finished successfully. Now, let’s continue with writing some Terraform scripts.

4. Create a Terraform script for Cluster Group creation

We can create the Cluster Group and configure CD via the TMC UI. However, we want to use Terraform and create everything automated in one go. We can start by using the example shown here. The only thing we might wanna change are the name and labels.

# Create Tanzu Mission Control cluster group
resource "tanzu-mission-control_cluster_group" "create_cluster_group" {
  name = "aullah-demo-clgrp"
  meta {
    description = "Create cluster group through terraform"
    labels = {
      "owner" : "aullah",
    }
  }
}

5. Create a Terraform script for CD configuration

Next, we want to configure CD on our Cluster Group. To accomplish this, we will create two Terraform scripts. One in which we specify the Git Repository we want to connect and another for the Kustomization, which defines the path to our Kubernetes manifest within the repository.

I am just going to use a simple web app for our CD configuration, but it could also be a cluster configuration, a Tanzu Package, or any kind of Kubernetes manifest that you want to be synced with every cluster in the Cluster Group.

As we need specific resources, such as the Cluster Group, to be available before creating our CD resources, we will use Terraform’s “References to named values” to create an implicit dependency between the two resources.

Here are some of the parameters for the Git Repository Terraform script you might want to change according to your needs:

ParameterDescription
nameSpecify the name of the TMC Git repository object.
namespace_nameOnly change this if you want your FluxCD custom resources created in a different namespace. TMC will create and use the “tanzu-continuousdelivery-resources” namespace per default!
scope.cluster_group.nameSpecify the name of your TMC Cluster Group. In this example, we reference a named value from the Cluster Group creation script to create an implicit dependency.
spec.urlThe URL to your Git repository that you want to connect.
spec.intervalThe interval in which the FluxCD controller will check the repository.
spec.ref.branchThe Git branch of your repository you want to use.
# Create Tanzu Mission Control git repository with attached set as default value.
resource "tanzu-mission-control_git_repository" "create_cluster_group_git_repository" {
  name = "my-demo-repo" # Required

  namespace_name = "tanzu-continuousdelivery-resources" #Required

  scope {
    cluster_group {
      name = tanzu-mission-control_cluster_group.create_cluster_group.name # Required
    }
  }

  meta {
    description = "my demo app repo"
    labels      = { "owner" : "aullah" }
  }

  spec {
    url                = "https://github.com/beyondelastic/vmware-explore-app" # Required
    interval           = "5m"    # Default: 5m
    git_implementation = "GO_GIT" # Default: GO_GIT
    ref {
      branch = "main"
    }
  }
}

Let’s continue with the script to create the Kustomization resource. Please review the parameters and change them according to your needs (name, namespace-name, and scope.cluster_group.name parameters are not listed again as the approach is equal to what is explained above):

The target namespace to which your Kubernetes manifests will be applied.Description
spec.pathThe path within your Git repository where the Kubernetes manifests are stored.
spec.pruneClean up objects from the cluster if the referenced object/is missing (true/false).
spec.intervalInterval in which the controller detects and corrects drift.
spec.target_namespaceThe target namespace to which your Kubernetes manifests will be applied to.
spec.source.nameThe Git repository object name we are referencing. To create an implicit dependency, we reference a named value from the Git repository creation script.
# Create Tanzu Mission Control kustomization with attached set as default value.
resource "tanzu-mission-control_kustomization" "create_cluster_group_kustomization" {
  name = "my-demo-app" # Required

  namespace_name = "tanzu-continuousdelivery-resources" #Required

  scope {
    cluster_group {
      name = tanzu-mission-control_cluster_group.create_cluster_group.name # Required
    }
  }

  meta {
    description = "my demo app"
    labels      = { "owner" : "aullah" }
  }

  spec {
    path             = "/deployment" # Required
    prune            = "false"
    interval         = "5m" # Default: 5m
    target_namespace = "default"
    source {
      name      = tanzu-mission-control_git_repository.create_cluster_group_git_repository.name      # Required
      namespace = "tanzu-continuousdelivery-resources" # Required
    }
  }
}

To summarize what these two scripts will do. It will create our CD resources on the Cluster Group level within TMC. This will trigger a FluxCD installation on each cluster in the Cluster Group. It will also apply the FluxCD GitRepository and Kustomization custom resource. FluxCD will then do its magic and continuously synchronize the content from our repository with every cluster. This is pretty cool, but we still need a Kubernetes cluster in our Cluster Group, so let’s move on to the next step.

6. Create a Terraform script for EKS deployments

We can follow the EKS guide from the TMC Terraform provider to create a script with the correct parameters. We must reuse the IDs (subnet, security group) and ARN roles (control-plane, worker) created during our Cloud Formation Stack runs. Here are some of the parameters you need to change according to your environment:

Specify the region where you want your EKS cluster deployed (where you created your VPC).Description
resource.credential_nameThe TMC credential name we created while connecting our AWS account with TMC.
resource.regionSpecify the region where you want your EKS cluster to be deployed (the region in which you created your VPC).
resource.nameSpecify the name of your EKS cluster.
resource.ready_wait_timoutWait time for cluster creation to finish.
resource.spec.cluster_groupSpecify the name of the TMC Cluster Group. In this example, we use a reference to a named value from the Cluster Group creation script to create an implicit dependency.
resource.spec.config.role_arnThe control-plane role ARN we have created and identified at the beginning.
resource.spec.config.kubernetes_versionDefine one of the support Kubernetes versions for your EKS cluster.
resource.spec.config.vpc.security_groupsAWS security group(s) that have been created as part of our VPC creation.
resource.spec.config.vpc.subnet_idsAWS subnet IDs that have been created during the VPC creation.
resource.spec.nodepool.spec.role_arnThe worker role ARN that we created and identified at the beginning.
resource.spec.nodepool.spec.ami_typeSpecify the AMI type of your choice.
resource.spec.nodepool.spec.root_disk_sizeThe root disk size for your nodes in GiB.
resource.spec.nodepool.spec.subnet_idsAWS subnet IDs that have been created during the VPC creation.
resource.spec.nodepool.spec.scaling_configDesired node count config and min-max settings.
resource.spec.nodepool.spec.instance_typesAWS instance type for your worker nodes.

I recommend double-checking all available parameters and adding or changing the script according to your needs. A complete list can be found here.

# Create a Tanzu Mission Control AWS EKS cluster entry
resource "tanzu-mission-control_ekscluster" "tf_eks_cluster" {
  credential_name = "aullah-aws-eks-tmc"          // Required
  region          = "us-west-2"         // Required
  name            = "aullah-eks-cl1" // Required

  ready_wait_timeout = "45m" // Wait time for cluster operations to finish (default: 30m).

  meta {
    description = "aullah eks cluster"
    labels      = { "owner" : "aullah" }
  }

  spec {
    cluster_group = tanzu-mission-control_cluster_group.create_cluster_group.name // Default: default

    config {
      role_arn = "arn:aws:iam::xxxxxxxxxxxx:role/control-plane.<GeneratedTemplateID>.eks.tmc.cloud.vmware.com" // Required, forces new

      kubernetes_version = "1.27" // Required
      tags               = { "tagkey" : "tagvalue" }

      kubernetes_network_config {
        service_cidr = "10.100.0.0/16" // Forces new
      }

      logging {
        api_server         = false
        audit              = false
        authenticator      = false
        controller_manager = false
        scheduler          = false
      }

      vpc { // Required
        enable_private_access = true
        enable_public_access  = true
        public_access_cidrs = [
          "0.0.0.0/0",
        ]
        security_groups = [ // Forces new
          "sg-xxxxxxxxxxxx",
        ]
        subnet_ids = [ // Forces new
          "subnet-xxxxxxxxxxxx1",
          "subnet-xxxxxxxxxxxx2",
          "subnet-xxxxxxxxxxxx3",
          "subnet-xxxxxxxxxxxx4",
        ]
      }
    }

    nodepool {
      info {
        name        = "fist-np"
        description = "first eks nodepool"
      }

      spec {
        role_arn = "arn:aws:iam::xxxxxxxxxxxx:role/worker.<GeneratedTemplateID>.eks.tmc.cloud.vmware.com" // Required

        ami_type       = "AL2_x86_64"
        capacity_type  = "ON_DEMAND"
        root_disk_size = 30 // Default: 20GiB
        tags           = { "nptag" : "nptagvalue9" }
        node_labels    = { "nplabelkey" : "nplabelvalue" }

        subnet_ids = [ // Required
          "subnet-xxxxxxxxxxxx1",
          "subnet-xxxxxxxxxxxx2",
          "subnet-xxxxxxxxxxxx3",
          "subnet-xxxxxxxxxxxx4",
        ]

        scaling_config {
          desired_size = 3
          max_size     = 3
          min_size     = 2
        }

        update_config {
          max_unavailable_nodes = "1"
        }

        instance_types = [
          "m3.large",
        ]

      }
    }
  }
}

As mentioned above, many of these input parameters should instead be declared as variables to make the script reusable. Nevertheless, for the sake of simplicity, I just entered them directly, so let’s run it!

Let’s run it!

We should have all the necessary Terraform scripts in our working directory and can finally test if everything works as expected. 🥁🥁🥁 We simply have to execute the “terraform plan” command to create and review the execution plan, followed by a “terraform apply” if things look good.

➜ terraform plan -out tfplan.out

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # tanzu-mission-control_cluster_group.create_cluster_group will be created
  + resource "tanzu-mission-control_cluster_group" "create_cluster_group" {
      + id   = (known after apply)
      + name = "aullah-demo-clgrp"

      + meta {
          + annotations      = (known after apply)
          + description      = "Create cluster group through terraform"
          + labels           = {
              + "owner" = "aullah"
            }
          + resource_version = (known after apply)
          + uid              = (known after apply)
        }
    }

  # tanzu-mission-control_ekscluster.tf_eks_cluster will be created
  + resource "tanzu-mission-control_ekscluster" "tf_eks_cluster" {

...
...
...

Plan: 4 to add, 0 to change, 0 to destroy.

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Saved the plan to: tfplan.out

To perform exactly these actions, run the following command to apply:
    terraform apply "tfplan.out"

➜ terraform apply tfplan.out
tanzu-mission-control_cluster_group.create_cluster_group: Creating...
tanzu-mission-control_cluster_group.create_cluster_group: Creation complete after 6s [id=cg:01HBQRCE7XKH4HXSVXSD1WNACK]
tanzu-mission-control_git_repository.create_cluster_group_git_repository: Creating...
tanzu-mission-control_ekscluster.tf_eks_cluster: Creating...
tanzu-mission-control_git_repository.create_cluster_group_git_repository: Creation complete after 1s [id=cggitrepo:01HBQRCFAT2YHQE4Y649YJJW3F]
tanzu-mission-control_kustomization.create_cluster_group_kustomization: Creating...
tanzu-mission-control_kustomization.create_cluster_group_kustomization: Creation complete after 2s [id=cgks:01HBQRCGP95PKG8D1KQDVAS488]
tanzu-mission-control_ekscluster.tf_eks_cluster: Still creating... [10s elapsed]
tanzu-mission-control_ekscluster.tf_eks_cluster: Still creating... [20s elapsed]
tanzu-mission-control_ekscluster.tf_eks_cluster: Still creating... [30s elapsed]
...
...
...
tanzu-mission-control_ekscluster.tf_eks_cluster: Still creating... [5m40s elapsed]
tanzu-mission-control_ekscluster.tf_eks_cluster: Still creating... [5m50s elapsed]
tanzu-mission-control_ekscluster.tf_eks_cluster: Still creating... [6m0s elapsed]
tanzu-mission-control_ekscluster.tf_eks_cluster: Creation complete after 6m6s [id=c:01HBQRCF32D437SAFEFEM8AS9D]

Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

That looks good. Let’s see if the EKS cluster got deployed and the TMC configuration is in place.

Let’s check the results!

The TMC Cluster Group should be available instantly, and we should see the EKS cluster being created and listed as part of the group.

The Git repository and the Kustomization should be visible under the “Add-ons” tab of the Cluster Group.

Nevertheless, as the cluster is not ready yet, the “Cluster sync status” will still show “Enabling”.

Once the EKS cluster is ready, it will install the FluxCD packages and continuously sync our content according to our CD configuration. Finally, the “Cluster sync status” will change to “Enabled”.

In the Namespaces section of TMC, we can check if our example app has been deployed.

We can now get kubectl access to the EKS cluster by using the TMC UI or the tanzu cli. We will work with the tanzu cli to check the created resources via kubectl. For this to work, you need to have a TMC context configured and working as mentioned here.

First, we will gather the kubeconfig file and switch our kubectl context via the following commands:

➜  ~ tanzu mission-control cluster kubeconfig get <your-cluster-name-as-shown-in-tmc> -m eks -p eks >> .kube/config
➜  ~ k config use-context <your-cluster-name-as-shown-in-tmc>
Switched to context "<your-cluster-name-as-shown-in-tmc>".

Let’s have a final look at our test app that should be deployed on our EKS cluster.

➜  ~ k get pods,svc
NAME                                     READY   STATUS    RESTARTS   AGE
pod/vmware-explore-app-f95dfbfb9-5b82m   1/1     Running   0          65m
pod/vmware-explore-app-f95dfbfb9-hq857   1/1     Running   0          65m

NAME                         TYPE           CLUSTER-IP     EXTERNAL-IP                                                              PORT(S)        AGE
service/kubernetes           ClusterIP      10.100.0.1     <none>                                                                   443/TCP        100m
service/vmware-explore-app   LoadBalancer   10.100.7.124   a0200c87768344e129eb75ea85907934-xxxxxxxx.us-west-2.elb.amazonaws.com   80:31265/TCP   65m

We can also reach the test app via the assigned LoadBalancer URL.

Great, our test app is deployed via CD and available on our EKS cluster. Any changes I commit against the app’s Git repository will be automatically synced with my cluster.

Summary

Combining Terraform or IaC with the TMC GitOps capabilities makes a lot of sense and helps DevOps teams automate their daily tasks, such as cluster provisioning, cluster configuration, or even workload deployments. Grouping Kubernetes clusters in organizational entities and using CD on the TMC Cluster Group further reduces the effort to ensure equal configuration for multiple clusters.

Additionally, the TMC Terrafrom provider offers much more than we have covered in this post. Security policies, access policies, network policies, and more can be created on various levels of TMC’s hierarchical policy structure.

At this point, you might think, is it possible to use GitOps for my Terraform scripts too? The answer is yes. There is a Terraform controller available from the FluxCD project that does exactly that.

Sources

Leave a comment