Friday 13 September 2024

Automate Health Check Validation for AKS Apps with Nginx Ingress Using Azure DevOps Pipelines

 We have discussed "Setup Application Ingress for AKS Using Nginx Ingress controller with Private IP" in a previous post. Once application and nginx ingress setup is deployed it takes some time for containers and pods to be ready for acepting traffic. We should validate whether the application contianers are deployed with correct docker image and running required replicas as specified in the horizontal pod autoscalers. Then we should verify if ingress for apps are setup corectly. If only all these health checks succeeed we can enable live traffic into a newly deployed set of applications in an AKS cluster (This is useful in blue-green deployments with new cluster or node pool to represent the blue or green instance). Let's use PowerShell and write health validation script and get it executed in Azure piplines to automate health check validation for apps deployed to AKS with nginx ingress.

Expected outecome is to validate ingress setup for application in an Azure DevOps pipelines task as below.



The health check script is below. The script checks for the deployed application pods to be ready based on the hortzontal pod autoscaler. Then ensures correct image is running in each container. If any restarts in deployed pods verifies further to see if restarts keep on happening or stabilized. After all application pods are ready, if any of apps is an API it verifies the ingress correctly setup for the API. Additionally, below script is checking the successfule deployment of cronjobs or kubernetes scaled jobs. The script can be furthere enahanced to verify other types of apps deployed to AKS.

param
(
    [string]$aksName,    
    [string]$aksNamespace,
    [string]$privateIpNginx,
    [string]$buildId,
    [string]$deployApps,
    [string[]]$apps,
    [switch]$preCheck
)

function Invoke-AKS-App-Health-Check 
{
    param
    (
        [switch]$preCheckAttempt
    )

    $appHealthCheckMaxAttempts = 30; # Wait for maximum 30 minutes till apps are fully deployed and running in AKS
    $appHealthCheckIntervalSeconds = 60; # Check cycle in each minute
    $appHealthCheckAttempt = 0;
    $appRestartCheckMaxAttempts = 3; # Max 15 minutes wait for restarts to stabilize
    $appRestartCheckIntervalSeconds = 300; # Wait time for restarts to stabilize (max cap 5 minutes in k8s restart policy https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy)
    $appRestartCheckAttempt = 0;
    $appRestartCheckMaxCycles = 2; # Only run maximum of two restart verification cycles setting max threshold to 30 minutes
    $appRestartCheckCycle = 0;

    if ($preCheckAttempt)
    {
        $appHealthCheckMaxAttempts = 15; # Wait for maximum 15 minutes till apps are fully deployed and running in AKS
        $appRestartCheckMaxAttempts = 2; # Max 10 minutes wait for restarts to stabilize
        $appRestartCheckMaxCycles = 1; # Only run maximum of one restart verification cycles setting max threshold to 10 minutes
    }

    Write-Host (-join('AKS app health check max attempts ',$appHealthCheckMaxAttempts));
    Write-Host (-join('AKS app restart check max attempts ',$appRestartCheckMaxAttempts));
    Write-Host (-join('AKS app restart check max cycles ',$appRestartCheckMaxCycles));
    Write-Host ('--------------------------------------------------------');
    Write-Host (-join('Waiting for AKS ',$aksName,' apps to be ready...'));

    do
    {
        $allAppPodsReady = $true;
        $restaredPods = @{}; # Empty hash tabel
        $appHealthCheckAttempt++;

        foreach($app in $apps)
        {
            Write-Host ('=========================================================');
            Write-Host (-join('Inspecting app ',$app,' in AKS...'));

            if ($app.EndsWith('-scaledjob')) 
            {
                $scaledjob = kubectl get scaledjobs $app -n avalanche -o json | ConvertFrom-Json;
                
                $scaledjobStatusType = $scaledjob.status.conditions[0].type;
                $scaledjobReadyStatus = $scaledjob.status.conditions[0].status;
                $scaledjobImage = $scaledjob.spec.jobTargetRef.template.spec.containers[0].image;

                Write-Host (-join('Apps deployed flag is ',$deployApps,'. ',$app,' status type is ',$scaledjobStatusType,'. Ready status is ',$scaledjobReadyStatus,'. Uses image ',$scaledjobImage));
                
                if ($scaledjobStatusType -ne 'Ready' -or $scaledjobReadyStatus -ne 'True' -or (($deployApps -eq 'True') -and (-not $scaledjobImage.EndsWith(-join(':',$buildId))))) 
                {
                    Write-Host (-join('Scaled job ',$app,'  deployment failed in AKS. Marking app as not healthy...')) -ForegroundColor Yellow;
                    $allAppPodsReady = $false;
                }
                else 
                {
                    Write-Host (-join('Scaled job ',$app,'  is succefully deployed in AKS.'));                    
                }
                Write-Host ('--------------------------------------------------------');
            }
            elseif ($app.EndsWith('-cronjob')) 
            {
                $cronjob = kubectl get cronjobs $app -n avalanche -o json | ConvertFrom-Json;
                
                $cronjobSuspended = $cronjob.spec.suspend;
                $cronjobImage = $cronjob.spec.jobTemplate.spec.template.spec.containers[0].image;

                Write-Host (-join('Apps deployed flag is ',$deployApps,'. ',$app,' suspend status is ',$cronjobSuspended,' . Uses image ',$cronjobImage));
                
                if ($cronjobSuspended -eq $true -or (($deployApps -eq 'True') -and (-not $cronjobImage.EndsWith(-join(':',$buildId))))) 
                {
                    Write-Host (-join('Cron job ',$app,'  deployment failed in AKS. Marking app as not healthy...')) -ForegroundColor Yellow;
                    $allAppPodsReady = $false;
                }
                else 
                {
                    Write-Host (-join('Cron job ',$app,'  is succefully deployed in AKS.'));                    
                }
                Write-Host ('--------------------------------------------------------');
            }
            else 
            {
                $hpaName = -join($app,'-hpa');
                $hpa = kubectl get hpa $hpaName -n $aksNamespace -o json | ConvertFrom-Json;
                
                kubectl get pods --selector=app=$app -n $aksNamespace # printing pods for pipeline logs
                Write-Host ('--------------------------------------------------------');
                
                $pods = kubectl get pods --selector=app=$app -n $aksNamespace -o json | ConvertFrom-Json;
    
                Write-Host (-join($hpa.status.desiredReplicas,' desired pod(s) for the app ',$app,' and found ', $pods.items.Count,' pod(s).'));
    
                if ($pods.items.Count -eq $hpa.status.desiredReplicas)
                {
                    Write-Host (-join('Inspecting pod(s) for the app ',$app,' ...'));
    
                    foreach($pod in $pods.items)
                    {
                        $podName = $pod.metadata.name;
                                
                        # Filter pod status - This is required as pod status phase in json, only has Pending and Running status which is not sufficient if pods terminate or crash
                        $podInfoString = (kubectl get pods $podName -n $aksNamespace)[1] -replace '\s+',';';
                        $podInfo = $podInfoString.Split(';');
                        $podReady = $podInfo[1];
                        $podStatus = $podInfo[2];
                        
                        if(($null -eq $pod.status) -or ($null -eq $pod.spec) -or ($null -eq $pod.status.containerStatuses) -or ($null -eq $pod.spec.containers) -or ($pod.status.containerStatuses.Count -le 0) -or ($pod.spec.containers.Count -le 0))
                        {
                            Write-Host (-join($podName,' is not yet healthy. Marking app as not healthy...')) -ForegroundColor Yellow;
                            $allAppPodsReady = $false;
                        }
                        else
                        {       
                            $podRestarts = $pod.status.containerStatuses[0].restartCount
                            $podImage = $pod.spec.containers[0].image;
                            $podContainerReady = $pod.status.containerStatuses[0].ready;
                            $podContainerStarted = $pod.status.containerStatuses[0].started;
    
                            Write-Host (-join('Apps deployed flag is ',$deployApps,'. ',$podName,' status is ',$podReady,' ',$podStatus,'. Restarts are ',$podRestarts,'. Uses image ',$podImage,'. Container ready state is ',$podContainerReady,' started state is ',$podContainerStarted));
                                
                            if (($podStatus -ne 'Running') -or ($podReady -ne '1/1') -or (($deployApps -eq 'True') -and (-not $podImage.EndsWith(-join(':',$buildId)))) -or (-not $podContainerReady) -or (-not $podContainerStarted))
                            {
                                Write-Host (-join($podName,' is not yet healthy. Marking app as not healthy...')) -ForegroundColor Yellow;
                                $allAppPodsReady = $false;
                            }
                            elseif ($podRestarts -gt 0)
                            {
                                Write-Host (-join($podName,' has ',$podRestarts,' restart(s). collecting information for further verification...'));
                                $restaredPods.Add($podName,$podRestarts);
                            }
                        }
                    }
                    Write-Host ('--------------------------------------------------------');
                }
                else
                {
                    Write-Host (-join($hpa.status.desiredReplicas,' desired pod(s) for the app ',$app,' and pod count ', $pods.items.Count,' mismatch. Marking app as not healthy...')) -ForegroundColor Yellow;
                    $allAppPodsReady = $false;
                }
            }
        }

        if (($allAppPodsReady) -and ($restaredPods.Count -le 0))
        {
            Write-Host (-join('All apps are ready in health check attempt ',$appHealthCheckAttempt,'.')) -ForegroundColor Green;
            break;
        }
        elseif (($allAppPodsReady) -and ($restaredPods.Count -gt 0))
        {
            $appRestartCheckAttempt++;

            if($appRestartCheckAttempt -eq 1)
            {
                $appRestartCheckCycle++;

                if (($appHealthCheckAttempt + $appRestartCheckMaxAttempts) -gt $appHealthCheckMaxAttempts)
                {
                    $appHealthCheckMaxAttempts = $appHealthCheckAttempt + $appRestartCheckMaxAttempts;
                }
            }

            if (($appRestartCheckAttempt -le $appRestartCheckMaxAttempts) -and ($appRestartCheckCycle -le $appRestartCheckMaxCycles))
            {
                Write-Host (-join('All apps are ready in health check attempt ',$appHealthCheckAttempt,', but have restarts in ',$restaredPods.Count,' pod(s). Waiting for 5 minutes before reinspecting pod restarts...')) -ForegroundColor Yellow;
                Start-Sleep -Seconds $appRestartCheckIntervalSeconds

                foreach ($restartedPodName in $restaredPods.Keys)
                {
                    $previousRestarts = $restaredPods[$restartedPodName];
                    $currentPod = kubectl get pods $restartedPodName -n $aksNamespace -o json | ConvertFrom-Json

                    if (($null -eq $currentPod) -or ($null -eq $currentPod.status) -or ($null -eq $currentPod.status.containerStatuses) -or ($currentPod.status.containerStatuses.Count -le 0))
                    {
                        Write-Host (-join($restartedPodName,' is not in a stable state.')) -ForegroundColor Yellow;
                        $allAppPodsReady = $false;
                    }
                    else
                    {
                        $currentPodRestarts = $currentPod.status.containerStatuses[0].restartCount;
                    
                        if ($currentPodRestarts -gt $previousRestarts)
                        {
                            Write-Host (-join($restartedPodName,' restarts are not stabilized. Previous: ',$previousRestarts,' Current:',$currentPodRestarts)) -ForegroundColor Yellow;
                            $allAppPodsReady = $false;
                        }
                        else
                        {
                            Write-Host (-join($restartedPodName,' restarts are stabilized. Previous: ',$previousRestarts,' Current:',$currentPodRestarts));
                        }
                    }
                }

                if ($allAppPodsReady)
                {
                    Write-Host (-join('Pod restarts are stabilized. All apps are ready in health check attempt ',$appHealthCheckAttempt,', in restart check cycle ',$appRestartCheckCycle,' and in restart check attempt ',$appRestartCheckAttempt,'.')) -ForegroundColor Green;
                    break;
                }
                else
                {
                    Write-Host (-join('Pod restarts are not stabilized. All apps are not ready in health check attempt ',$appHealthCheckAttempt,', in restart check cycle ',$appRestartCheckCycle,' and in restart check attempt ',$appRestartCheckAttempt,'. Waiting for ',$appHealthCheckIntervalSeconds,' seconds before next check...')) -ForegroundColor Yellow;
                    Write-Host ('--------------------------------------------------------');
                    Write-Host ('--------------------------------------------------------');
                    Start-Sleep -Seconds $appHealthCheckIntervalSeconds
                }
            }
            else
            {
                Write-Error (-join('All apps are not ready in health check attempt ',$appHealthCheckAttempt,', in restart check cycle ',$appRestartCheckCycle,' and in restart check attempts ', $appRestartCheckAttempt,'. Rerun health check phase after manual verification.'));
                exit 1;
            }
        }
        else
        {
            Write-Host (-join('All apps are not ready in health check attempt ',$appHealthCheckAttempt,' Waiting for ',$appHealthCheckIntervalSeconds,' seconds before next check...')) -ForegroundColor Yellow;
            Write-Host ('--------------------------------------------------------');
            Write-Host ('--------------------------------------------------------');
            $appRestartCheckAttempt = 0;
            Start-Sleep -Seconds $appHealthCheckIntervalSeconds
        }

    } until($allAppPodsReady -or ($appHealthCheckAttempt -ge $appHealthCheckMaxAttempts))

    if ($allAppPodsReady)
    {
        Write-Host ('All apps ready in health checks. Proceeding to ingress nginx health check...') -ForegroundColor Green;
        Write-Host ('--------------------------------------------------------');
        Write-Host ('--------------------------------------------------------');
    }
    else
    {
        Write-Error ('All apps are not ready in all health check attempts. Rerun health check phase after manual verification.');
        exit 1;
    }
}

function Invoke-AKS-Ingress-Health-Check 
{
    param
    (
        [switch]$preCheckAttempt
    )

    $appIngressReadyCheckMaxAttempts = 5; # Attempt 5 times (minutes) to check if ingress AGW backend health is ok
    $appIngressReadyCheckIntervalSeconds = 60; # Check in each minute
    $appIngressReadyCheckAttempt = 0;

    if ($preCheckAttempt)
    {
        $appIngressReadyCheckMaxAttempts = 3; # Attempt 3 times to check if ingress AGW backend health is ok
    }

    do
    {
        $allAppIngressReady = $true;
        $appIngressReadyCheckAttempt++;
        
        foreach($app in $apps)
        {
            if ($app.EndsWith('-api'))
            {
                Write-Host (-join('Checking ingress for ',$app,'...'));
                
                $appIngressNginx = $null;                
                $appIngressNginx = kubectl get ingress $app -n $aksNamespace -o json | ConvertFrom-Json;

                if ($null -eq $appIngressNginx)
                {
                    Write-Host (-join($app, ' ingress nginx is not available. Marking as app ingress unhealthy...')) -ForegroundColor Yellow;
                    $allAppIngressReady = $false;
                }
                else
                {
                    if (($null -eq $appIngressNginx.status.loadBalancer) `
                        -or ($null -eq $appIngressNginx.status.loadBalancer.ingress) `
                        -or ($appIngressNginx.status.loadBalancer.ingress.Count -ne 1) `
                        -or ($appIngressNginx.status.loadBalancer.ingress.ip -ne $privateIpNginx))
                    {
                        Write-Host (-join($app, ' ingress nginx is not ready. Marking as app ingress unhealthy...')) -ForegroundColor Yellow;
                        $allAppIngressReady = $false;
                    }
                    else
                    {
                        Write-Host (-join($app, ' ingress nginx is ready.')) -ForegroundColor Green;
                    }
                }
            }
            else
            {
                Write-Host (-join($app,' is not an API. Ingress health check is not required.')) -ForegroundColor Cyan;
            }
        
            Write-Host ('=========================================================');
        }

        if ($allAppIngressReady)
        {
            Write-Host (-join('All apps ingress ready in health check attempt ',$appIngressReadyCheckAttempt,'.')) -ForegroundColor Green;
            break;
        }
        else
        {
            Write-Host (-join('All apps ingress not ready in health check attempt ',$appIngressReadyCheckAttempt,' Waiting for ',$appIngressReadyCheckIntervalSeconds,' seconds before next check...')) -ForegroundColor Yellow;
            Write-Host ('--------------------------------------------------------');
            Write-Host ('--------------------------------------------------------');
            Start-Sleep -Seconds $appIngressReadyCheckIntervalSeconds
        }

    } until($allAppIngressReady -or ($appIngressReadyCheckAttempt -ge $appIngressReadyCheckMaxAttempts))

    if ($allAppIngressReady)
    {
        Write-Host ('All apps ingress ready. Successfully completed health checks.') -ForegroundColor Green;
    }
    else
    {
        Write-Error ('All apps ingress not ready in all health check attempts. Rerun health check phase after manual verification.');
        exit 1;
    }
}

function Invoke-AKS-Health-Check 
{
    param
    (
        [switch]$preCheckAttempt
    )
    
    if ($preCheckAttempt)
    {
        Invoke-AKS-App-Health-Check -preCheckAttempt
        Invoke-AKS-Ingress-Health-Check -preCheckAttempt
    }
    else
    {
        Invoke-AKS-App-Health-Check
        Invoke-AKS-Ingress-Health-Check
    }
}

if ($preCheck)
{
    Invoke-AKS-Health-Check -preCheckAttempt
}
else
{
    Invoke-AKS-Health-Check
}

The below job template in Azure pipelines can be used to run the above script.


parameters:
  - name: serviceconnection
    type: string
  - name: dependsonjobs
    type: object
    default: []
  - name: deployinfra
    type: boolean
  - name: deployapps
    type: boolean
  - name: apps
    type: object
  - name: precheck
    type: boolean
    default: false

jobs:
  - job: run_health_check_for_aks
    workspace:
      clean: all
    displayName: 'Run health check for AKS'
    dependsOn: ${{ parameters.dependsonjobs }}
    pool:
      vmImage: ubuntu-latest 
    timeoutInMinutes: 0 # No timeout for health checks
    steps:
      - checkout: self
        fetchDepth: 1
        lfs: true
        clean: true
        submodules: true
        persistCredentials: true

      - template: ../steps/switch_live_cluster_deploy_mode.yml
        parameters:
          deployapps: ${{ parameters.deployapps }}
          deployinfra: ${{ parameters.deployinfra }}

      - template: ../steps/log_blue_green_parameters.yml

      - task: KubectlInstaller@0
        displayName: 'Install Kubectl latest'

      - task: AzureCLI@2
        displayName: 'Run health check for AKS'
        inputs:
          azureSubscription: '${{ parameters.serviceconnection }}'
          scriptType: pscore # ps if windows
          scriptLocation: inlineScript
          inlineScript: |
            #region Nginx-change01
            $rgName = 'ch-demo-$(envname)-rg';
            $aksName = 'ch-demo-$(envname)-aks-$(sys_app_deploy_instance_suffix)';
            $aksNamespace = 'demo';
            $buildId = '$(Build.BuildId)';
            $skipHealthCheck = '$(sys_is_skip_healthcheck)';
            $apps = ('${{ convertToJson(parameters.apps) }}').Replace('_','-') | ConvertFrom-Json;
            $deployApps = '${{ parameters.deployapps }}';
            $preCheck = '${{ parameters.precheck }}';
            $sys_app_deploy_instance_suffix = '$(sys_app_deploy_instance_suffix)';
            $private_ip_nginx = '$(private_ip_nginx_blue)';

            if ($sys_app_deploy_instance_suffix -eq 'green')
            {
              $private_ip_nginx = '$(private_ip_nginx_green)';
            }
            
            Write-Host (-join('AKS name is ',$aksName));

            if ($skipHealthCheck -eq 'true')
            {
                Write-Host (-join('Skipping health check for ',$aksName));
            }
            else
            {
                az aks get-credentials -n $aksName -g $rgName --admin --overwrite-existing

                if ($preCheck -eq 'True')
                {
                  $(System.DefaultWorkingDirectory)/pipelines/scripts/aks_health_check.ps1 -aksName $aksName -aksNamespace $aksNamespace -privateIpNginx $private_ip_nginx -buildId $buildId -deployApps $deployApps -apps $apps -preCheck
                }
                else
                {
                  $(System.DefaultWorkingDirectory)/pipelines/scripts/aks_health_check.ps1 -aksName $aksName -aksNamespace $aksNamespace -privateIpNginx $private_ip_nginx -buildId $buildId -deployApps $deployApps -apps $apps
                }

                kubectl config delete-context (-join($aksName,'-admin'))
            }
            #endregion

No comments:

Popular Posts