Saturday, 16 December 2023

Setting Up Azure Workload Identity for Containers in Azure Kubernetes Services (AKS) Using Terraform - Improved Security for Containers in AKS

 Azure Workload Identity allows your containers in AKS touse amanaged identity to access Azure resources securely without having to depend on connection strings, passwords, access keys or secrets. In other works you can just use DefaultAzureCredential in your containers running in AKS, which will be using workload identity assigned to the container, to get access to the required Azure resource. The roale based access permissions will be in effect and the user assigned managed identity (we can use AD app registration as well bu user assigned managed identity is recommended) used to setup the workload identity in AKS should be given the necessary roles in the target Azure resource. This is far better than having to store secrets or connection stigs to utilized by the dotnet applications. In this post let's understand how to setup workload identity in AKS deployed containers and explore how it simplifies the dotnet application code allowing the application to access Azure resources securely with a managed identity.

Full example source code with terraform and a .NET application using default credentials to access app config service and keyvault is available here in my GitHub repo,

The first step of setting up the workload identity in AKS is to enable the OIDC (open id conectivity) issuer and workload identity. In terraform resource azurerm_kubernetes_cluster.

  oidc_issuer_enabled       = true
  workload_identity_enabled = true

Example AKS cluster tf code is below

resource "azurerm_kubernetes_cluster" "aks_cluster" {

  lifecycle {
    ignore_changes = [default_node_pool[0].node_count]

  name                         = "${var.prefix}-${var.project}-${var.environment_name}-aks-${var.deployment_name}"
  kubernetes_version           = local.kubernetes_version
  sku_tier                     = "Standard"
  location                     = var.location
  resource_group_name          = var.rg_name
  dns_prefix                   = "${var.prefix}-${var.project}-${var.environment_name}-aks-${var.deployment_name}-dns"
  node_resource_group          = "${var.prefix}-${var.project}-${var.environment_name}-aks-${var.deployment_name}-rg"
  image_cleaner_enabled        = false # As this is a preview feature keep it disabled for now. Once feture is GA, it should be enabled.
  image_cleaner_interval_hours = 48

  network_profile {
    network_plugin    = "azure"
    load_balancer_sku = "standard"

  storage_profile {
    file_driver_enabled = true

  default_node_pool {
    name                 = "chlinux"
    orchestrator_version = local.kubernetes_version
    node_count           = 1
    enable_auto_scaling  = true
    min_count            = 1
    max_count            = 7
    vm_size              = "Standard_DS4_v2"
    os_sku               = "Ubuntu"
    vnet_subnet_id       = var.subnet_id
    max_pods             = 30
    type                 = "VirtualMachineScaleSets"
    scale_down_mode      = "Delete"
    zones                = ["1", "2", "3"]

  oidc_issuer_enabled       = true # Allow creating open id connect issue url to be used in federated identity credential
  workload_identity_enabled = true # Enable workload identity in AKS

  identity {
    type = "SystemAssigned"

  ingress_application_gateway {
    gateway_id =

  key_vault_secrets_provider {
    secret_rotation_enabled = false

  azure_active_directory_role_based_access_control {
    azure_rbac_enabled = false
    managed            = true
    tenant_id          = var.tenant_id

    # add sub owners as cluster admin 
    admin_group_object_ids = [
    var.sub_owners_objectid] # azure AD group object ID

  oms_agent {
    log_analytics_workspace_id = var.log_analytics_workspace_id

  depends_on = [

  tags = merge(tomap({
    Service = "aks_cluster"
  }), var.tags)

As the next step we hve to setup a user assigned managed identity, which will be used as workload identity in AKS.

# User assigned identity to use as workload identity in AKS
resource "azurerm_user_assigned_identity" "aks" {
  location            = azurerm_resource_group.instancerg.location
  name                = "${var.PREFIX}-${var.PROJECT}-${var.ENVNAME}-aks-uai"
  resource_group_name =

Then we need to setup a federated identity credential for user assigned identity for the AKS service account which will be used to assigned the identity for each pod. We need to use OIDC issuer url of the AKS cluster in the federated identity credential setup. To understand the concepts in detail read the docs here.

# Federated identity credential for AKS user assigned id - to be used with workload identity service account
resource "azurerm_federated_identity_credential" "aks" {
  name                = "${var.prefix}-${var.project}-${var.environment_name}-aks-fic-${var.deployment_name}"
  resource_group_name = var.rg_name
  audience            = ["api://AzureADTokenExchange"]
  issuer              = azurerm_kubernetes_cluster.aks_cluster.oidc_issuer_url
  parent_id           = var.user_assigned_identity
  subject             = "system:serviceaccount:widemo:wi-demo-sa" # system:serviceaccount:aksapplicationnamespace:workloadidentityserviceaccountname

  depends_on = [

  lifecycle {
    ignore_changes = []

We can use terraform output to obtain the client id of the user assigned managed identity we created. This is required for later use with the service account creation.

output "aks_uai_client_id" {
  value = azurerm_user_assigned_identity.aks.client_id

Once the terrafomr code is deployed and the AKS cluster is created we can use kubectl to create the service account as shown below.

apiVersion: v1
kind: ServiceAccount
    azure.workload.identity/client-id: userassignedidentitycientid #${USER_ASSIGNED_CLIENT_ID}$ # user Assigned identity client ID (aks_uai_client_id output from Terraform)
    azure.workload.identity/tenant-id: tenantid #${AZURE_TENANT_ID}$ # Azure tenant id
    # azure.workload.identity/service-account-token-expiration: "3600" # Default is 3600. Supported range is 3600-86400. Configure to avoid down time in token refresh. Setting in Pod spec takes precedence.
  name: wi-demo-sa
  namespace: widemo

We can deploy our application pods enabling use of workload identity as shown below in pod template.

        app: wi-api
        service: wi-api
        azure.workload.identity/use: "true" # Required to make the contianers in the pod to use the workload identity
      # annotations:
      #   azure.workload.identity/service-account-token-expiration: "3600" # Configure to avoid down time in token refresh. Takes precedence over servie acount setting. Default 3600, acceptable range: seconds 3600 - 86400.
      #   azure.workload.identity/skip-containers: "container1:container2" # Containers o skip using workload identity. By default all containers in pod will use workload identity when pod is labeled with azure.workload.identity/use: true 
      #   azure.workload.identity/inject-proxy-sidecar: "true" # Default true. The proxy sidecar is used to intercept token requests to IMDS (Azure Instance Metadata Service) and acquire an AAD token on behalf of the user with federated identity credential.
      #   azure.workload.identity/proxy-sidecar-port: "8000" # Port of the proxy sidecar. Default 8000
      serviceAccountName: wi-demo-sa # Service account (see aks_manifests\prerequisites\k8s.yaml) will provide identity to the pod

A full example deployment is below.

apiVersion: apps/v1
kind: Deployment
  name: wi-api
  namespace: widemo
    app: wi-api
    type: RollingUpdate
      maxSurge: 50%
      maxUnavailable: 25%
  minReadySeconds: 0
      service: wi-api
        app: wi-api
        service: wi-api
        azure.workload.identity/use: "true" # Required to make the contianers in the pod to use the workload identity
      # annotations:
      #   azure.workload.identity/service-account-token-expiration: "3600" # Configure to avoid down time in token refresh. Takes precedence over servie acount setting. Default 3600, acceptable range: seconds 3600 - 86400.
      #   azure.workload.identity/skip-containers: "container1:container2" # Containers o skip using workload identity. By default all containers in pod will use workload identity when pod is labeled with azure.workload.identity/use: true 
      #   azure.workload.identity/inject-proxy-sidecar: "true" # Default true. The proxy sidecar is used to intercept token requests to IMDS (Azure Instance Metadata Service) and acquire an AAD token on behalf of the user with federated identity credential.
      #   azure.workload.identity/proxy-sidecar-port: "8000" # Port of the proxy sidecar. Default 8000
      serviceAccountName: wi-demo-sa # Service account (see aks_manifests\prerequisites\k8s.yaml) will provide identity to the pod
        "": linux
      priorityClassName: widemo-highest-priority-linux
      # setting pod DNS policies to enable faster DNS resolution
          # use FQDN everywhere 
          # any cluster local access from pods need full CNAME to resolve 
          # short names will not resolve to internal cluster domains
          - name: ndots
            value: "2"
          # dns resolver timeout and attempts
          - name: timeout
            value: "15"
          - name: attempts
            value: "3"
          # use TCP to resolve DNS instad of using UDP (UDP is lossy and pods need to wait for timeout for lost packets)
          - name: use-vc
          # open new socket for retrying
          - name: single-request-reopen
        # `name` here must match the name
        # specified in the volume mount
        - name: widemo-configmap-wi-api-volume
            # `name` here must match the name
            # specified in the ConfigMap's YAML. See aks_manifests\prerequisites\k8s.yaml
            name: widemo-configmap
      terminationGracePeriodSeconds: 90 # This must be set to a value that is greater than the preStop hook wait time.
        - name: wi-api
                command: ["sleep","60"]
          imagePullPolicy: Always
          # probe to determine the stratup success
              path: /api/health
              port: container-port
            initialDelaySeconds: 30 # give 30 seconds to get container started before checking health
            failureThreshold: 30 # max 300 (30*10) seconds wait for start up to succeed
            periodSeconds: 10 # interval of probe (300 (30*10) start up to succeed)
            successThreshold: 1 # how many consecutive success probes to consider as success
            timeoutSeconds: 10 # probe timeout 
            terminationGracePeriodSeconds: 30 # restarts container (default restart policy is always)
          # readiness probe fail will not restart container but cut off traffic to container with one failure 
          # as specified below and keep readiness probes running to see if container works again
          readinessProbe: # probe to determine if the container is ready for traffic (used by AGIC)
              path: /api/health
              port: container-port
            failureThreshold: 1 # one readiness fail should stop traffic to container
            periodSeconds: 20 # interval of probe
            # successThreshold not supported by AGIC
            timeoutSeconds: 10 # probe timeout
          # probe to determine the container is healthy and if not healthy container will restart
              path: /api/health
              port: container-port
            failureThreshold: 3 # tolerates three consecutive faiures before restart trigger
            periodSeconds: 40 # interval of probe
            successThreshold: 1 # how many consecutive success probes to consider as success after a failure probe
            timeoutSeconds: 10 # probe timeout 
            terminationGracePeriodSeconds: 60 # restarts container (default restart policy is always)
          - mountPath: /etc/config
            name: widemo-configmap-wi-api-volume
            - name: container-port
              containerPort: 80
              protocol: TCP
            - name: ASPNETCORE_URLS
              value: http://+:80
            - name: ASPNETCORE_ENVIRONMENT
              value: Production
            - name: CH_WIDEMO_CONFIG
              value: /etc/config/config_dev-euw-001.json
                  memory: 1Gi # the memory limit equals to the request!
                  # no cpu limit! this is excluded on purpose
                  memory: 1Gi
                  cpu: "500m"

apiVersion: v1
kind: Service
  name: wi-api-clusterip
  namespace: widemo
    app: wi-api
    service: wi-api
  type: ClusterIP
    - port: 8091
      targetPort: 80
      protocol: TCP
    service: wi-api

kind: Ingress
  name: wi-api
  namespace: widemo
    # --------------
    # AGIC "true" "120" "true" "30"
    # --------------
  ingressClassName: azure-application-gateway
  - host:
      - path: /*
        pathType: Prefix
            name: wi-api-clusterip
              number: 8091

When our application pods are running the containers are injected with below shown environemnt variables. This allows our application to authnticate via the user assigned managed identity as explained in here..

So with this setup we can use code such as below to load app configuration with keyvault acces to our apps running in AKS using workload identity. app configuration endpoint would be just only the endpoint without any secret or connection information, for example is enough to enable access to app config as we are using default credntials now.

using Azure.Identity;
using Azure.Security.KeyVault.Secrets;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.Configuration.AzureAppConfiguration;

namespace common.lib.Configs
    public class ConfigLoader
        public static void LoadConfiguration(IConfigurationBuilder configBuilder)

            var config = configBuilder.Build();

            string? appConfigEndpont = config.GetSection("AppConfigEndpoint").Value;
            string? appConfigLabel = config.GetSection("AppConfigLabel").Value;
            string? sharedAppConfiglabel = config.GetSection("SharedAppConfiglabel").Value;
            string? keyVaultName = config.GetSection("KeyVaultName").Value;
            string? aadTenantId = config.GetSection("AadTenantId").Value;

            //Load configuration from Azure App Configuration
            configBuilder.AddAzureAppConfiguration(options =>
                DefaultAzureCredential azureCredentials = new();
                    new Uri(appConfigEndpont),

                        .Select(KeyFilter.Any, sharedAppConfiglabel)
                        .Select(KeyFilter.Any, appConfigLabel);

                SecretClient secretClient = new(
                    new Uri($"https://{keyVaultName}"),

                options.ConfigureKeyVault(kv =>


This is possible because in terraform we can grant the necessary permision to the user assigned managed identity, as shown below.

resource "azurerm_key_vault" "instancekeyvault" {
  name                        = "${var.PREFIX}-${var.PROJECT}-${var.ENVNAME}-kv"
  location                    = azurerm_resource_group.instancerg.location
  resource_group_name         =
  tenant_id                   = data.azurerm_client_config.current.tenant_id
  sku_name                    = "standard"
  enabled_for_deployment      = false
  enabled_for_disk_encryption = false
  purge_protection_enabled    = false # allow purge for drop and create in demos. else this should be set to true

  network_acls {
    bypass         = "AzureServices"
    default_action = "Deny"
    ip_rules       = ["", "${chomp(data.http.mytfip.response_body)}/32"]
    virtual_network_subnet_ids = [

  # Sub Owners
  access_policy {
    tenant_id               = var.TENANTID
    object_id               = data.azuread_group.sub_owners.object_id
    key_permissions         = ["Get", "Purge", "Recover"]
    secret_permissions      = ["Get", "List", "Set", "Delete", "Purge", "Recover"]
    certificate_permissions = ["Create", "Get", "Import", "List", "Update", "Delete", "Purge", "Recover"]

  # Infra Deployment Service Principal
  access_policy {
    tenant_id               = data.azurerm_client_config.current.tenant_id
    object_id               = data.azurerm_client_config.current.object_id
    key_permissions         = ["Get", "Purge", "Recover"]
    secret_permissions      = ["Get", "List", "Set", "Delete", "Purge", "Recover"]
    certificate_permissions = ["Create", "Get", "Import", "List", "Update", "Delete", "Purge", "Recover"]

  # Containers in AKS via user assigned identity
  access_policy {
    tenant_id          = var.TENANTID
    object_id          = azurerm_user_assigned_identity.aks.principal_id # principal_id is the object id of the user assigned identity
    secret_permissions = ["Get", "List", ]

  tags = merge(tomap({
    Service = "key_vault",
  }), local.tags)

# AKS user assigned identity as a reader
resource "azurerm_role_assignment" "appconf_datareader_aks" {
  scope                =
  role_definition_name = "App Configuration Data Reader"
  principal_id         = azurerm_user_assigned_identity.aks.principal_id

We can access storage blobs via default credntials as well as shown below.

    private static BlobServiceClient GetBlobServiceClient(string accountName)
        return new(new Uri($"https://{accountName}"),
            new DefaultAzureCredential());

Or storage queue as shown below.

QueueClient queueClient = new(
    new Uri($"https://{QueueStorageName}{QueueName}"),
    new DefaultAzureCredential());

Above are only few examples. With workload identity enabled, your containers deployed to AKS can access any Azure resource  with  DefaultAzureCredential securely using a managed identity. This far better secure approach than having to store connection strings, secrets etc. for your application usage purpose and having to pass those secret information around in your application components.

