Saturday, 17 August 2024

Deploying Kubernetes Event Driven Autoscaling (KEDA) AKS add-on with Terraform and Azure Pipelines

 We have discussed deploying Kubernetes Event Drivern Autoscaling (KEDA) with workload identity in AKS in the post "Setting Up Kubernetes Event Drivern Autoscaling (KEDA) in AKS with Workload Identity". Then we discussed how to use an Azure DevOps pipeline to automate deployment of KEDA in the post "Deploying Kubernetes Event Drivern Autoscaling (KEDA) with Azure Pipelines Using Helm". If you are setting up KEDA with helm as discussed instead of using AKS KEDA add-on you will have to monitor the  documentation here and ensure supported version is used. However, with AKS it is better to use Microsoft supported KEDA add-on as it will be having better support in case of an issue. Additionally it will be the correct supported version of KEDA getting setup with add-on, based on AKS kubernetes version used according to the documentation here. Let's see what we need to do in terraform and in Azure piplines to get AKS setup with KEDA add-on.

As the first step we need to enable KEDA add-on in AKS deployment. We can do that as show below in azurerm_kubernetes_cluster terraform resource.

workload_autoscaler_profile {
    keda_enabled = true
  }

Full example azurerm_kubernetes_cluster with KEDA add-on is below.

resource "azurerm_kubernetes_cluster" "aks_cluster" {

  lifecycle {
    ignore_changes = [default_node_pool[0].node_count]
  }

  name                         = "${var.prefix}-${var.project}-${var.environment_name}-aks-${var.deployment_name}"
  kubernetes_version           = local.kubernetes_version
  sku_tier                     = "Standard"
  location                     = var.location
  resource_group_name          = var.rg_name
  dns_prefix                   = "${var.prefix}-${var.project}-${var.environment_name}-aks-${var.deployment_name}-dns"
  node_resource_group          = "${var.prefix}-${var.project}-${var.environment_name}-aks-${var.deployment_name}-rg"
  image_cleaner_enabled        = false # As this is a preview feature keep it disabled for now. Once feture is GA, it should be enabled.
  image_cleaner_interval_hours = 48

  network_profile {
    network_plugin    = "azure"
    load_balancer_sku = "standard"
  }

  storage_profile {
    file_driver_enabled = true
  }

  default_node_pool {
    name                 = "chlinux"
    orchestrator_version = local.kubernetes_version
    node_count           = 1
    enable_auto_scaling  = true
    min_count            = 1
    max_count            = 4
    vm_size              = "Standard_B4ms"
    os_sku               = "Ubuntu"
    vnet_subnet_id       = var.subnet_id
    max_pods             = 30
    type                 = "VirtualMachineScaleSets"
    scale_down_mode      = "Delete"
    zones                = ["1", "2", "3"]

    upgrade_settings {
      drain_timeout_in_minutes      = 0
      max_surge                     = "10%"
      node_soak_duration_in_minutes = 0
    }
  }

  timeouts {
    update = "180m"
    delete = "180m"
  }

  # Enable workload identity requires both below to be set to true
  oidc_issuer_enabled       = true
  workload_identity_enabled = true

  identity {
    type         = "UserAssigned"
    identity_ids = [var.user_assigned_identity]
  }
  
  windows_profile {
    admin_username = "nodeadmin"
    admin_password = "AdminPasswd@001"
  }

  ingress_application_gateway {
    gateway_id = azurerm_application_gateway.aks.id
  }

  key_vault_secrets_provider {
    secret_rotation_enabled = false
  }

  workload_autoscaler_profile {
    keda_enabled = true
  }

  azure_active_directory_role_based_access_control {
    azure_rbac_enabled = false
    managed            = true
    tenant_id          = var.tenant_id

    # add sub owners as cluster admin 
    admin_group_object_ids = [
    var.sub_owners_objectid] # azure AD group object ID
  }

  oms_agent {
    log_analytics_workspace_id = var.log_analytics_workspace_id
  }

  depends_on = [
    azurerm_application_gateway.aks
  ]

  tags = merge(tomap({
    Service = "aks_cluster"
  }), var.tags)
}

In terraform we have to make sure we add federated identity for keda operator as shown below to make it work with workload identity. You can find more information on workload identity setup in the post "Setting Up Azure Workload Identity for Containers in Azure Kubernetes Services (AKS) Using Terraform - Improved Security for Containers in AKS" (user assigned identity is setup for AKS to support workload identity in terraform as described in this post).  Note that subject should be set as "system:serviceaccount:kube-system:keda-operator" as KEDA will be deployed to kube-system namespace with KEDA addd-on for AKS.

# Federated identity credential for AKS user assigned id - used with workload identity service account for KEDA
resource "azurerm_federated_identity_credential" "keda" {
  name                = "${var.prefix}-${var.project}-${var.environment_name}-aks-keda-fic-${var.deployment_name}"
  resource_group_name = var.rg_name
  audience            = ["api://AzureADTokenExchange"]
  issuer              = azurerm_kubernetes_cluster.aks_cluster.oidc_issuer_url
  parent_id           = var.user_assigned_identity
  subject             = "system:serviceaccount:kube-system:keda-operator"

  depends_on = [
    azurerm_kubernetes_cluster.aks_cluster
  ]

  lifecycle {
    ignore_changes = []
  }
}

In Azure DevOps pipeline get the terraform resources deployed with pipline task such as below. 

- task: TerraformCLI@0
      displayName: 'Run terraform apply attempt ${{ parameters.attempt }}'
      name: terraformApply
      inputs:
        command: apply
        environmentServiceName: '${{ parameters.serviceconnection }}'
        workingDirectory: "$(System.DefaultWorkingDirectory)/infra/Deployment/Terraform"
        commandOptions: -var-file=env.tfvars

After the AKS cluster is deployed we have to get the KEDA authentication trigger created and update the keda-operator service account to use workload identity. We can setup below yaml and apploy it to the AKS cluster. demo is the namesapce my applications and scaled jobs are getting deployed (Note that the user assigned identity is setup for AKS to support workload identity in terraform as described in this post).

# Namespace in AKS for Apps
apiVersion: v1
kind: Namespace
metadata:
  name: demo

# Service account for AKS workload identity for KEDA
---
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    azure.workload.identity/client-id: "${sys_aks_uai_client_id}$"
    azure.workload.identity/tenant-id: "${tenantid}$"
  name: keda-operator # Referred by AKS user assigned identity federated credential
  namespace: kube-system

---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: av-keda-trigger-auth
  namespace: demo
spec:
  podIdentity:
    provider: azure-workload
    identityId: ${sys_aks_uai_client_id}$
# Service account for AKS workload identity
---
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    azure.workload.identity/client-id: "${sys_aks_uai_client_id}$"
    azure.workload.identity/tenant-id: "${tenantid}$"
  name: demo-wi-sa # Referred by AKS user assigned identity federated credential
  namespace: demo

To get the above yaml (k8s_prerequisites.yaml) deployed we can use Azure pipeline steps below.

      - task: qetza.replacetokens.replacetokens-task.replacetokens@5
        displayName: 'Replace tokens in k8s_prerequisites.yaml'
        inputs:
          rootDirectory: '$(System.ArtifactsDirectory)'
          targetFiles: 'k8s_prerequisites.yaml'
          actionOnMissing: fail
          tokenPattern: custom
          tokenPrefix: '${'
          tokenSuffix: '}$'

      - task: Kubernetes@1
        displayName: 'Deploy k8s prerequisites'
        inputs:
          connectionType: 'Azure Resource Manager'
          azureSubscriptionEndpoint: '${{ parameters.serviceconnection }}'
          azureResourceGroup: 'ch-demo-$(envname)-rg'
          kubernetesCluster: 'ch-demo-$(envname)-aks-$(sys_app_deploy_instance_suffix)'
          useClusterAdmin: true
          command: apply
          arguments: '-f k8s_prerequisites.yaml'
          workingDirectory: '$(System.ArtifactsDirectory)'

Then, we have to ensure that the keda-operator is restarted to get the workload identity applied to the keda-operator containers (pods). We can use a pipeline task such as below to get that done.

The task below executes the command below for the AKS cluster.

 kubectl rollout restart deploy keda-operator -n kube-system

      - task: AzureCLI@2
        displayName: 'Update KEDA operator identity'
        inputs:
          azureSubscription: '${{ parameters.serviceconnection }}'
          scriptType: pscore
          scriptLocation: inlineScript
          inlineScript: |
            $rgName = 'fw-av-$(envname)-rg';
            $aksName = 'fw-av-$(envname)-aks-$(sys_app_deploy_instance_suffix)';
           
            Write-Host $aksName
           
            az aks get-credentials -n $aksName -g $rgName --admin --overwrite-existing
           
            kubectl rollout restart deploy keda-operator -n kube-system

            kubectl config delete-context (-join($aksName,'-admin'))

Once these steps are completed the AKS cluster is successfully deployed with KEDA and scaled jobs or deployments can be setup to use KEDA to scaled based on events such as message queues received messages. Refer below posts for more information.


When setting up authentication triggers the user assigned identity used for workload identity should be granted with necessary access to get the queue information. Refer the KEDA documentation to find out permissions for workload identity user assigned id for each scaler in https://keda.sh/docs/2.15/scalers/.  


KEDA deployed with add-on for AKS.


keda-operator using workload identity. Note that al of below environment variables get correctly set for workload identity to be used with the keda-oprator with above described steps.

  • AZURE_CLIENT_ID
  • AZURE_TENANT_ID
  • AZURE_FEDERATED_TOKEN_FILE
  • AZURE_AUTHORITY_HOST



No comments:

Popular Posts