Saturday 15 April 2023

Workaround Solution for Intermittent CrashLoopBackOff in Windows Containers Running on AKS (.NET 6 Apps with System.Net.Sockets.SocketException 11001 and 10060)

Let's look at a temporary solution to the issue Intermittent CrashLoopBackOff in Windows Containers Running on AKS (.NET 6 Apps with System.Net.Sockets.SocketException 11001 and 10060). Same issue is asked in stackoverflow here. Instead of manually deleting pods that run into the issue maually, the cleaner app implemented in this repo is doing autmatic deletion of pods CrashLoopBackOff state with known exception reported in the container log. If the exeception is unknown the pod in CrashLoopBackOff state will not be deleted, and the container log output is printed, in cleaner app logs to show the exception of the pod having CrashLoopBackOff state.

The solution is implemented with a container app based on Azure CLI docker image, that can be deployed to Linux node in AKS. The docker file will build the image copying both cleaner,sh and to container. is set as startup script in the container. The solution is available in GitHub here.

FROM AS base
COPY ["",""]
COPY ["",""]

RUN ./

ENTRYPOINT ["sh", "" ]

The installs the kubectl and jq while building the docker image. You can change the version if required for kubectl.

Next, the setup will use a service principal to do az login (the service principal should have contributor permision to the AKS cluster). Then use az aks get-credential command to configure kubectl context with the required AKS cluster.

The is running when the container is running. It will run in each one minute interval to check if any pod is in CrashLoopBackOff state. If a pod is found with  CrashLoopBackOff state following steps will be performed.

  • Read the last 5000 lines of log of the pod.
  • Check if any known socket exception occured due to Azure app configu connection failure.
  • Delete the pod if  exception is known or print the log of the pod with exception if the exception is unknown.

To get the CrashLoopBackOff cleaner app deployed to AKS follow the below steps.

  • Copy content of cleanerapp/ and cleanerapp/ to a text editor. Then delete the two files and create them as new files with same name. Then copy back the content of these two .sh files. This is to avoid issues such as `curl: Failed to extract a sensible file name from the URL to use for storage!` when executing the two files with docker build and run.
  • Replace AzureSPNAppId, AzureSPNAppPwd, AzureTenantId, AzureSubscriptionId, aksCusterName and aksClusterResourceGroupName in cleanerapp/
  • Replace appconfigsvcname and aksnamespace in cleanerapp/
  • docker build -t cleanerapp:dev .

  • Use makefiles and k8s.yaml files in deploy folder to get the app deployed to AKS Linux node. 

This workaround solution is implemented for a .NET 6 application having a single container running in each pod, running into socket exceptions at startup, while trying to connect to Azure App Configuration service.

No comments:

Popular Posts