Saturday 16 December 2023

Setting Up Azure Workload Identity for Containers in Azure Kubernetes Services (AKS) Using Terraform - Improved Security for Containers in AKS

 Azure Workload Identity allows your containers in AKS touse amanaged identity to access Azure resources securely without having to depend on connection strings, passwords, access keys or secrets. In other works you can just use DefaultAzureCredential in your containers running in AKS, which will be using workload identity assigned to the container, to get access to the required Azure resource. The roale based access permissions will be in effect and the user assigned managed identity (we can use AD app registration as well bu user assigned managed identity is recommended) used to setup the workload identity in AKS should be given the necessary roles in the target Azure resource. This is far better than having to store secrets or connection stigs to utilized by the dotnet applications. In this post let's understand how to setup workload identity in AKS deployed containers and explore how it simplifies the dotnet application code allowing the application to access Azure resources securely with a managed identity.

Full example source code with terraform and a .NET application using default credentials to access app config service and keyvault is available here in my GitHub repo,

The first step of setting up the workload identity in AKS is to enable the OIDC (open id conectivity) issuer and workload identity. In terraform resource azurerm_kubernetes_cluster.

  oidc_issuer_enabled       = true
  workload_identity_enabled = true

Example AKS cluster tf code is below

resource "azurerm_kubernetes_cluster" "aks_cluster" {

  lifecycle {
    ignore_changes = [default_node_pool[0].node_count]
  }

  name                         = "${var.prefix}-${var.project}-${var.environment_name}-aks-${var.deployment_name}"
  kubernetes_version           = local.kubernetes_version
  sku_tier                     = "Standard"
  location                     = var.location
  resource_group_name          = var.rg_name
  dns_prefix                   = "${var.prefix}-${var.project}-${var.environment_name}-aks-${var.deployment_name}-dns"
  node_resource_group          = "${var.prefix}-${var.project}-${var.environment_name}-aks-${var.deployment_name}-rg"
  image_cleaner_enabled        = false # As this is a preview feature keep it disabled for now. Once feture is GA, it should be enabled.
  image_cleaner_interval_hours = 48

  network_profile {
    network_plugin    = "azure"
    load_balancer_sku = "standard"
  }

  storage_profile {
    file_driver_enabled = true
  }

  default_node_pool {
    name                 = "chlinux"
    orchestrator_version = local.kubernetes_version
    node_count           = 1
    enable_auto_scaling  = true
    min_count            = 1
    max_count            = 7
    vm_size              = "Standard_DS4_v2"
    os_sku               = "Ubuntu"
    vnet_subnet_id       = var.subnet_id
    max_pods             = 30
    type                 = "VirtualMachineScaleSets"
    scale_down_mode      = "Delete"
    zones                = ["1", "2", "3"]
  }

  oidc_issuer_enabled       = true # Allow creating open id connect issue url to be used in federated identity credential
  workload_identity_enabled = true # Enable workload identity in AKS

  identity {
    type = "SystemAssigned"
  }

  ingress_application_gateway {
    gateway_id = azurerm_application_gateway.aks.id
  }

  key_vault_secrets_provider {
    secret_rotation_enabled = false
  }

  azure_active_directory_role_based_access_control {
    azure_rbac_enabled = false
    managed            = true
    tenant_id          = var.tenant_id

    # add sub owners as cluster admin 
    admin_group_object_ids = [
    var.sub_owners_objectid] # azure AD group object ID
  }

  oms_agent {
    log_analytics_workspace_id = var.log_analytics_workspace_id
  }

  depends_on = [
    azurerm_application_gateway.aks
  ]

  tags = merge(tomap({
    Service = "aks_cluster"
  }), var.tags)
}

As the next step we hve to setup a user assigned managed identity, which will be used as workload identity in AKS.

# User assigned identity to use as workload identity in AKS
resource "azurerm_user_assigned_identity" "aks" {
  location            = azurerm_resource_group.instancerg.location
  name                = "${var.PREFIX}-${var.PROJECT}-${var.ENVNAME}-aks-uai"
  resource_group_name = azurerm_resource_group.instancerg.name
}

Then we need to setup a federated identity credential for user assigned identity for the AKS service account which will be used to assigned the identity for each pod. We need to use OIDC issuer url of the AKS cluster in the federated identity credential setup. To understand the concepts in detail read the docs here.

# Federated identity credential for AKS user assigned id - to be used with workload identity service account
resource "azurerm_federated_identity_credential" "aks" {
  name                = "${var.prefix}-${var.project}-${var.environment_name}-aks-fic-${var.deployment_name}"
  resource_group_name = var.rg_name
  audience            = ["api://AzureADTokenExchange"]
  issuer              = azurerm_kubernetes_cluster.aks_cluster.oidc_issuer_url
  parent_id           = var.user_assigned_identity
  subject             = "system:serviceaccount:widemo:wi-demo-sa" # system:serviceaccount:aksapplicationnamespace:workloadidentityserviceaccountname

  depends_on = [
    azurerm_kubernetes_cluster.aks_cluster
  ]

  lifecycle {
    ignore_changes = []
  }
}


We can use terraform output to obtain the client id of the user assigned managed identity we created. This is required for later use with the service account creation.

output "aks_uai_client_id" {
  value = azurerm_user_assigned_identity.aks.client_id
}

Once the terrafomr code is deployed and the AKS cluster is created we can use kubectl to create the service account as shown below.

apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    azure.workload.identity/client-id: userassignedidentitycientid #${USER_ASSIGNED_CLIENT_ID}$ # user Assigned identity client ID (aks_uai_client_id output from Terraform)
    azure.workload.identity/tenant-id: tenantid #${AZURE_TENANT_ID}$ # Azure tenant id
    # azure.workload.identity/service-account-token-expiration: "3600" # Default is 3600. Supported range is 3600-86400. Configure to avoid down time in token refresh. Setting in Pod spec takes precedence.
  name: wi-demo-sa
  namespace: widemo

We can deploy our application pods enabling use of workload identity as shown below in pod template.

template:
    metadata:
      labels:
        app: wi-api
        service: wi-api
        azure.workload.identity/use: "true" # Required to make the contianers in the pod to use the workload identity
      # annotations:
      #   azure.workload.identity/service-account-token-expiration: "3600" # Configure to avoid down time in token refresh. Takes precedence over servie acount setting. Default 3600, acceptable range: seconds 3600 - 86400.
      #   azure.workload.identity/skip-containers: "container1:container2" # Containers o skip using workload identity. By default all containers in pod will use workload identity when pod is labeled with azure.workload.identity/use: true 
      #   azure.workload.identity/inject-proxy-sidecar: "true" # Default true. The proxy sidecar is used to intercept token requests to IMDS (Azure Instance Metadata Service) and acquire an AAD token on behalf of the user with federated identity credential.
      #   azure.workload.identity/proxy-sidecar-port: "8000" # Port of the proxy sidecar. Default 8000
    spec:
      serviceAccountName: wi-demo-sa # Service account (see aks_manifests\prerequisites\k8s.yaml) will provide identity to the pod https://azure.github.io/azure-workload-identity/docs/concepts.html

A full example deployment is below.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: wi-api
  namespace: widemo
  labels:
    app: wi-api
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 50%
      maxUnavailable: 25%
  minReadySeconds: 0
  selector:
    matchLabels:
      service: wi-api
  template:
    metadata:
      labels:
        app: wi-api
        service: wi-api
        azure.workload.identity/use: "true" # Required to make the contianers in the pod to use the workload identity
      # annotations:
      #   azure.workload.identity/service-account-token-expiration: "3600" # Configure to avoid down time in token refresh. Takes precedence over servie acount setting. Default 3600, acceptable range: seconds 3600 - 86400.
      #   azure.workload.identity/skip-containers: "container1:container2" # Containers o skip using workload identity. By default all containers in pod will use workload identity when pod is labeled with azure.workload.identity/use: true 
      #   azure.workload.identity/inject-proxy-sidecar: "true" # Default true. The proxy sidecar is used to intercept token requests to IMDS (Azure Instance Metadata Service) and acquire an AAD token on behalf of the user with federated identity credential.
      #   azure.workload.identity/proxy-sidecar-port: "8000" # Port of the proxy sidecar. Default 8000
    spec:
      serviceAccountName: wi-demo-sa # Service account (see aks_manifests\prerequisites\k8s.yaml) will provide identity to the pod https://azure.github.io/azure-workload-identity/docs/concepts.html
      nodeSelector:
        "kubernetes.io/os": linux
      priorityClassName: widemo-highest-priority-linux
      #------------------------------------------------------
      # setting pod DNS policies to enable faster DNS resolution
      # https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy
      dnsConfig:
        options:
          # use FQDN everywhere 
          # any cluster local access from pods need full CNAME to resolve 
          # short names will not resolve to internal cluster domains
          - name: ndots
            value: "2"
          # dns resolver timeout and attempts
          - name: timeout
            value: "15"
          - name: attempts
            value: "3"
          # use TCP to resolve DNS instad of using UDP (UDP is lossy and pods need to wait for timeout for lost packets)
          - name: use-vc
          # open new socket for retrying
          - name: single-request-reopen
      #------------------------------------------------------
      volumes:
        # `name` here must match the name
        # specified in the volume mount
        - name: widemo-configmap-wi-api-volume
          configMap:
            # `name` here must match the name
            # specified in the ConfigMap's YAML. See aks_manifests\prerequisites\k8s.yaml
            name: widemo-configmap
      terminationGracePeriodSeconds: 90 # This must be set to a value that is greater than the preStop hook wait time.
      containers:
        - name: wi-api
          lifecycle:
            preStop:
              exec:
                command: ["sleep","60"]
          image: chdemosharedacr.azurecr.io/widemo/wi-api:1.1
          imagePullPolicy: Always
          # probe to determine the stratup success
          startupProbe:
            httpGet:
              path: /api/health
              port: container-port
            initialDelaySeconds: 30 # give 30 seconds to get container started before checking health
            failureThreshold: 30 # max 300 (30*10) seconds wait for start up to succeed
            periodSeconds: 10 # interval of probe (300 (30*10) start up to succeed)
            successThreshold: 1 # how many consecutive success probes to consider as success
            timeoutSeconds: 10 # probe timeout 
            terminationGracePeriodSeconds: 30 # restarts container (default restart policy is always)
          # readiness probe fail will not restart container but cut off traffic to container with one failure 
          # as specified below and keep readiness probes running to see if container works again
          readinessProbe: # probe to determine if the container is ready for traffic (used by AGIC)
            httpGet:
              path: /api/health
              port: container-port
            failureThreshold: 1 # one readiness fail should stop traffic to container
            periodSeconds: 20 # interval of probe
            # successThreshold not supported by AGIC
            timeoutSeconds: 10 # probe timeout
          # probe to determine the container is healthy and if not healthy container will restart
          livenessProbe: 
            httpGet:
              path: /api/health
              port: container-port
            failureThreshold: 3 # tolerates three consecutive faiures before restart trigger
            periodSeconds: 40 # interval of probe
            successThreshold: 1 # how many consecutive success probes to consider as success after a failure probe
            timeoutSeconds: 10 # probe timeout 
            terminationGracePeriodSeconds: 60 # restarts container (default restart policy is always)
          volumeMounts:
          - mountPath: /etc/config
            name: widemo-configmap-wi-api-volume
          ports:
            - name: container-port
              containerPort: 80
              protocol: TCP
          env:
            - name: ASPNETCORE_URLS
              value: http://+:80
            - name: ASPNETCORE_ENVIRONMENT
              value: Production
            - name: CH_WIDEMO_CONFIG
              value: /etc/config/config_dev-euw-001.json
          resources:
                limits:
                  memory: 1Gi # the memory limit equals to the request!
                  # no cpu limit! this is excluded on purpose
                requests:
                  memory: 1Gi
                  cpu: "500m"

---
apiVersion: v1
kind: Service
metadata:
  name: wi-api-clusterip
  namespace: widemo
  labels:
    app: wi-api
    service: wi-api
spec:
  type: ClusterIP
  ports:
    - port: 8091
      targetPort: 80
      protocol: TCP
  selector:
    service: wi-api

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: wi-api
  namespace: widemo
  annotations:
    # --------------
    # AGIC
    appgw.ingress.kubernetes.io/connection-draining: "true"
    appgw.ingress.kubernetes.io/connection-draining-timeout: "120"
    appgw.ingress.kubernetes.io/use-private-ip: "true"
    appgw.ingress.kubernetes.io/request-timeout: "30"
    # --------------
spec:
  ingressClassName: azure-application-gateway
  rules:
  - host: wi-api.aksblue.ch-wi-dev-euw-001.net
    http:
      paths:
      - path: /*
        pathType: Prefix
        backend:
          service:
            name: wi-api-clusterip
            port:
              number: 8091

When our application pods are running the containers are injected with below shown environemnt variables. This allows our application to authnticate via the user assigned managed identity as explained in here..


So with this setup we can use code such as below to load app configuration with keyvault acces to our apps running in AKS using workload identity. app configuration endpoint would be just only the endpoint without any secret or connection information, for example https://ch-wi-dev-euw-001-appconfig-ac.azconfig.io is enough to enable access to app config as we are using default credntials now.

using Azure.Identity;
using Azure.Security.KeyVault.Secrets;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.Configuration.AzureAppConfiguration;

namespace common.lib.Configs
{
    public class ConfigLoader
    {
        public static void LoadConfiguration(IConfigurationBuilder configBuilder)
        {
            configBuilder.AddJsonFile(Environment.GetEnvironmentVariable("CH_WIDEMO_CONFIG"));

            var config = configBuilder.Build();


            string? appConfigEndpont = config.GetSection("AppConfigEndpoint").Value;
            string? appConfigLabel = config.GetSection("AppConfigLabel").Value;
            string? sharedAppConfiglabel = config.GetSection("SharedAppConfiglabel").Value;
            string? keyVaultName = config.GetSection("KeyVaultName").Value;
            string? aadTenantId = config.GetSection("AadTenantId").Value;


            //Load configuration from Azure App Configuration
            configBuilder.AddAzureAppConfiguration(options =>
            {
                DefaultAzureCredential azureCredentials = new();
                options.Connect(
                    new Uri(appConfigEndpont),
                    azureCredentials);

                options
                        .Select(KeyFilter.Any, sharedAppConfiglabel)
                        .Select(KeyFilter.Any, appConfigLabel);

                SecretClient secretClient = new(
                    new Uri($"https://{keyVaultName}.vault.azure.net/"),
                    azureCredentials);

                options.ConfigureKeyVault(kv =>
                    kv.Register(secretClient));
            });

            configBuilder.Build();
        }
    }
}


This is possible because in terraform we can grant the necessary permision to the user assigned managed identity, as shown below.

resource "azurerm_key_vault" "instancekeyvault" {
  name                        = "${var.PREFIX}-${var.PROJECT}-${var.ENVNAME}-kv"
  location                    = azurerm_resource_group.instancerg.location
  resource_group_name         = azurerm_resource_group.instancerg.name
  tenant_id                   = data.azurerm_client_config.current.tenant_id
  sku_name                    = "standard"
  enabled_for_deployment      = false
  enabled_for_disk_encryption = false
  purge_protection_enabled    = false # allow purge for drop and create in demos. else this should be set to true

  network_acls {
    bypass         = "AzureServices"
    default_action = "Deny"
    ip_rules       = ["xxx.xxx.xxx.xxx/32", "${chomp(data.http.mytfip.response_body)}/32"]
    virtual_network_subnet_ids = [
      "${azurerm_subnet.aks.id}"
    ]
  }

  # Sub Owners
  access_policy {
    tenant_id               = var.TENANTID
    object_id               = data.azuread_group.sub_owners.object_id
    key_permissions         = ["Get", "Purge", "Recover"]
    secret_permissions      = ["Get", "List", "Set", "Delete", "Purge", "Recover"]
    certificate_permissions = ["Create", "Get", "Import", "List", "Update", "Delete", "Purge", "Recover"]
  }

  # Infra Deployment Service Principal
  access_policy {
    tenant_id               = data.azurerm_client_config.current.tenant_id
    object_id               = data.azurerm_client_config.current.object_id
    key_permissions         = ["Get", "Purge", "Recover"]
    secret_permissions      = ["Get", "List", "Set", "Delete", "Purge", "Recover"]
    certificate_permissions = ["Create", "Get", "Import", "List", "Update", "Delete", "Purge", "Recover"]
  }

  # Containers in AKS via user assigned identity
  access_policy {
    tenant_id          = var.TENANTID
    object_id          = azurerm_user_assigned_identity.aks.principal_id # principal_id is the object id of the user assigned identity
    secret_permissions = ["Get", "List", ]
  }

  tags = merge(tomap({
    Service = "key_vault",
  }), local.tags)
}


# AKS user assigned identity as a reader
resource "azurerm_role_assignment" "appconf_datareader_aks" {
  scope                = azurerm_app_configuration.appconf.id
  role_definition_name = "App Configuration Data Reader"
  principal_id         = azurerm_user_assigned_identity.aks.principal_id
}

We can access storage blobs via default credntials as well as shown below.

    private static BlobServiceClient GetBlobServiceClient(string accountName)
    {
        return new(new Uri($"https://{accountName}.blob.core.windows.net"),
            new DefaultAzureCredential());
    }

Or storage queue as shown below.

QueueClient queueClient = new(
    new Uri($"https://{QueueStorageName}.queue.core.windows.net/{QueueName}"),
    new DefaultAzureCredential());


Above are only few examples. With workload identity enabled, your containers deployed to AKS can access any Azure resource  with  DefaultAzureCredential securely using a managed identity. This far better secure approach than having to store connection strings, secrets etc. for your application usage purpose and having to pass those secret information around in your application components.

No comments:

Popular Posts