In the post "High Availability Deployment of Nginx Gateway Fabric Replacing Retired Ingress Nginx in AKS - Part 2 - Deploy Nginx-Gateway-Fabric" we hav discussed how to get nginx gateway setup in AKS. This approach works fine for the first install and if you are using true blue green with a fresh AKS cluster. However, when we use componenets such as elastic search on AKS (which we will discuss in future posts how to setup elastc search on AKS) ,we have to use inplace AKS upgrades, with new node pools in same cluster, as we want to persist the data on elastic. In such inplace AKS upgrade requirements we will have to upgrade cert manager and nginx gateway as well inplace. When we try to do such upgrades to cert-manager and nginx gateway we are running into a issue as decribed below.
The Issue
Immediately after the upgrade or after a time interval, the dataplane pods of nginx gteway will run into a high CPU situation and will try to create pods. These pods will not be able to start properly as it they will not be able to validate the certificates generated. Ideally this situation should have been handled by the control plane (operator) of niginx gateway. however, it does not do that properly.
The solution
We have to ensure once upgrade is done the current certificates (secrets in AKS) used by nginx gateway is deleted. When we do this operator automatically creates new certificates. Then we should restart the control plane and the data plane, which will make both control plane and data plane to use the new secrets. We can modify the script we used in the post "High Availability Deployment of Nginx Gateway Fabric Replacing Retired Ingress Nginx in AKS - Part 2 - Deploy Nginx-Gateway-Fabric" and add below part to ensure the certificates are deleted and recreated and the pods of contorl and data plane are restarted after certificates are recreated.
#region Nginx-Gateway with Nginx-Gateway-Fabric $nginxGatewayFabricPrerequisites = -join($ManifestPath,'nginx_gateway/','nginx_gateway_fabric_prerequisites.yaml'); $nginxGatewayFabricHelmValuesManifest = -join($ManifestPath,'nginx_gateway/','nginx_gateway_fabric_helm_values.yaml'); $nginxGatewaySetupManifest = -join($ManifestPath,'nginx_gateway/','nginx_gateway_setup.yaml'); Write-Host (-join('Deploying Nginx-Gateway-Fabric prerequisites with: ',$nginxGatewayFabricPrerequisites, ' ...')); kubectl apply -f $nginxGatewayFabricPrerequisites; Write-Host ('Successfully deployed Nginx-Gateway-Fabric prerequisites.'); Write-Host ('========================================================='); Write-Host ('Deploying Nginx-Gateway-Fabric with helm...'); helm upgrade ngf oci://ghcr.io/nginx/charts/nginx-gateway-fabric --install ` --namespace nginx-gateway ` --version 2.5.1 ` -f $nginxGatewayFabricHelmValuesManifest ` --set nginx.service.type="LoadBalancer" ` --set nginx.service.loadBalancerIP=$nginxGatewayLoadBalancerIp ` --set nginxGateway.replicas=3 ` --set nginxGateway.snippets.enable=true; Invoke-AKS-App-Health-Check -aksNamespace 'nginx-gateway' -apps @('nginx-gateway-fabric') -appReadyInitialWaitSeconds 20 -appHealthCheckMaxAttempts 60; Write-Host ('Successfully deployed Nginx-Gateway-Fabric via helm.'); Write-Host ('========================================================='); Write-Host ('Deploying Nginx-Gateway with: ',$nginxGatewaySetupManifest, ' ...'); kubectl apply -f $nginxGatewaySetupManifest;# New script section begins here# Here we add the new code segement to ensure certs are deleted # Note the name selfhost-apps-gateway-nginx-agent-tls is changedWrite-Host ('--------------------------------------------------------'); Write-Host ('Cleaning up existing secrets for nginx-gateway if any...'); $nginxSecretsToDelete = @( 'selfhost-apps-gateway-nginx-agent-tls', 'nginx-gateway-ca', 'agent-tls', 'server-tls' ); Write-Host ('Existing secrets in nginx-gateway namespace before cleanup are:'); kubectl get secret -n nginx-gateway; Write-Host ('--------------------------------------------------------'); $existingNginxSecrets = kubectl get secret -n nginx-gateway -o json | ConvertFrom-Json; if (($null -eq $existingNginxSecrets) -or ($null -eq $existingNginxSecrets.items) -or ($existingNginxSecrets.items.Count -le 0)) { Write-Host ('No existing secrets found in nginx-gateway namespace. Skipping deletion.'); } else { $existingNginxSecretNames = $existingNginxSecrets.items | ForEach-Object { $_.metadata.name }; foreach ($nginxSecret in $nginxSecretsToDelete) { if ($existingNginxSecretNames -contains $nginxSecret) { Write-Host ("Deleting secret: $nginxSecret"); kubectl delete secret $nginxSecret -n nginx-gateway; } else { Write-Host ("Secret not found, skipping: $nginxSecret"); } } } Write-Host ('Waiting for secrets to be automatically recreated by nginx gateway fabric operator if they are deleted...'); start-sleep -Seconds 10; Write-Host ('Secrets deleted and recreated automatically. Current secrets in nginx-gateway namespace are:'); Write-Host ('--------------------------------------------------------'); kubectl get secret -n nginx-gateway; Write-Host ('--------------------------------------------------------'); Write-Host ('Successfully refreshed secrets for nginx-gateway.'); Write-Host ('========================================================='); Write-Host ('Existing secrets are refreshed. Restarting nginx gateway fabric...'); kubectl rollout restart deployment/ngf-nginx-gateway-fabric -n nginx-gateway;
# depending on the name you give to the gateway# Once we complete the cert refresh and restart of control plane and then data plane
# we can check if the data plane pods are started and load balancer is working as expected# New script section ends hereInvoke-AKS-App-Health-Check -aksNamespace 'nginx-gateway' -apps @('nginx-gateway-fabric') -appReadyInitialWaitSeconds 20 -appHealthCheckMaxAttempts 60; Write-Host ('Successfully restarted Nginx-Gateway-Fabric.') Write-Host ('========================================================='); Write-Host ('Existing secrets are refreshed. Restarting nginx gateway...'); kubectl rollout restart deployment/selfhost-apps-gateway-nginx -n nginx-gateway; Invoke-AKS-App-Health-Check -aksNamespace 'nginx-gateway' -apps @('selfhost-apps-gateway') -appLabelName 'gateway.networking.k8s.io/gateway-name' -appReadyInitialWaitSeconds 30 -appHealthCheckMaxAttempts 60; Invoke-AKS-Load-Balancer-Health-Check -loadBalancerServiceName 'selfhost-apps-gateway-nginx' -loadBalancerIP $nginxGatewayLoadBalancerIp -aksNamespace 'nginx-gateway'; Write-Host ('Successfully deployed Nginx-Gateway.'); Write-Host ('========================================================='); #endregion Nginx-Gateway with Nginx-Gateway-Fabric
Check the post "High Availability Deployment of Nginx Gateway Fabric Replacing Retired Ingress Nginx in AKS - Part 2 - Deploy Nginx-Gateway-Fabric" to understand the full setup of the script. Addtional part is marked in above script section.
With this change upgrade of cert manager and nginx gateway runs smoothly without any issues.
No comments:
Post a Comment