Saturday, 22 March 2025

Gracefully Shut Down dotnet 8 IHostedService App - Deployed as a Windows Container in AKS - While Scale In or Pod Deallocations

 Applications implemented with IHostedService in dotnet, deployed to Azure Kubernetes Services (AKS) as containers in pods get terminated when pod recheduling happens or scaling-in opertaions happen. However, unlike Linux containers, the Windows containers does not receive the signal (similar to SIGTERM or SIGINT) to graceful shutdown. Once the pre stop hook is done the container is immediatly killed disregarding the value set in the terminationgraceperiod. Since, the Windows container did not receive a message to start a graceful shut down, and it is killed abruptly, the in flight operations in the Windows app container are abandoned. Such abadoning of operations cause inconsitency in system data and cause system failures. Therefore, it is mandatory to implement a proper graceful shutdown for Windows containers as well. Let's explore the issue in detail and how to implement a proper solution to enable graceful Windows container shut down, for dotent apps implemented with IHostedService. The issue is happening in mcr.microsoft.com/dotnet/runtime:8.0-windowsservercore-ltsc2022 images and the solution is tested with the same.

Windows app pod scaled-in or pod rescedule 


Unlike Windows app pod, in Linux dotnet app containers recive the SIGTERM signal, just after pre stop hook. Then the application starts graceful shutdown. The container is not forecully killed until the terminationgraceperiod (pre stop hook time included in terminationgraceperiod) is completed. If the container(s) in the pod shuts down gracefully, after completing in flight processing, before the terminationgraceperiod is over, the pod terminates immediately. If the container does not stop before terminationgraceperiod is over, then the container will be killed just after terminationgraceperiod is finished. This provides oppotrunity to setup required terminationgraceperiod based on the app shut down period requirement to allow app to complete in flight processing and shut down gracefully.

Linux app pod scaled-in or pod rescedule 



How is the IHostedService setup?

The implemetation of StartAsync in IHostedService allows the hosted service to start its operations, and called when the application is started. The StopAsync in IHostedService need to be impllemented with necessary code to complete in flight processing gracefully. StopAsync gets callled when the application receives shut down signal such as ctrl+c. Read more about IHostedService here.

public sealed class DemoHostedService : IHostedService, IDisposable
{
    internal const string ProcessorStartingMessage = "SDTEST: Starting processors.";
    internal const string ProcessorStoppingMessage = "SDTEST: Stopping processors.";

    private readonly ILogger<DemoHostedService> _logger;

    internal CancellationTokenSource CancellationTokenSource;

    private bool _disposed;

    public DemoHostedService(
        ILogger<DemoHostedService> logger)
    {
        _logger = logger;
    }

    public Task StartAsync(CancellationToken cancellationToken)
    {
        _logger.LogInformation(ProcessorStartingMessage);

        CancellationTokenSource = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);

        // Start hosted service steps here

        return Task.CompletedTask;
    }

    public async Task StopAsync(CancellationToken cancellationToken)
    {
        _logger.LogInformation(ProcessorStoppingMessage);

        await CancellationTokenSource.CancelAsync();

        // Greacefully stop processing events/requests
    }

    public void Dispose()
    {
        if (_disposed)
        {
            return;
        }

        _disposed = true;        

        CancellationTokenSource.Cancel();
        CancellationTokenSource.Dispose();
    }
}


To test the hosted service below is dummy process implentation, which processes event recived from event hubs in IHostedService. Similar implementation done in both Windows and Linux apps.

protected override async Task<ProcessResult> ProcessEventAsync(ProcessEventArgs eventArgs)
{
    DummyVideoMessage DummyVideoMessage = _messageParser.Parse<DummyVideoMessage>(eventArgs.Data);
    Video video = _mapper.Map<Video>(DummyVideoMessage.Video);

    try
    {
        int processSeconds = 0;
        Logger.LogInformation("SDTEST: Staring video {VideoName} processing...", video.Path.VideoName);

        while (processSeconds < video.PageCount)
        {
            await Task.Delay(10000);
            processSeconds += 10;
            Logger.LogInformation("SDTEST: Processing video {VideoName} for {duration}...", video.Path.VideoName, processSeconds);
        }

        Logger.LogInformation("SDTEST: Video {VideoName} processing completed.", video.Path.VideoName);

        return new ProcessResult
        {
            MessageId = DummyVideoMessage.Id,
            Succeeded = true            
        };
    }
    catch (Exception ex)
    {
        Logger.LogError(ex, "SDTEST: Error while generating dummy video previews for message {MessageId}", DummyVideoMessage.Id);

        return new ProcessResult
        {
            ExceptionMessage = ex.Message,
            MessageId = DummyVideoMessage.Id,
            Succeeded = false
        };
    }
}


How the application termination works in Visual Studio?

In Visual Studio when the windows applicaton is run and ctrl+c triggers graceful shutdown. In flight processes fully complete and app stops gracefully. Proving IHostedService is correctly implemented. Linx app also behaves the same way in Visual Studio.



How the application termination works in AKS pod?

Windows pod terminationgraceperiod and pre stop hook setup.


Linux pod terminationgraceperiod and pre stop hook setup.



When terminating the pods behaviour is as described below.  The below kusto query can be used to see how the shut down happened in pods in log analytics by giving pod name.

ContainerLog
| where LogEntry has "SDTEST:" or LogEntry has "Application started" or LogEntry has "Application is shutting down"
| project TimeGenerated, LogEntry, ContainerID
| join kind=inner
    (KubePodInventory
        | where Name == 'podnamehere'
        | distinct ContainerID, Name
        | project ContainerID, Name)
on ContainerID
| project TimeGenerated, Name, LogEntry
| order by TimeGenerated desc 


Windows pod dummy-preview-854c5594c5-qmqdx terminating


Windows pod started termination at 08:41:25 as shown in below in events.

Windows pod seems not recived termination signal after 10 seconds of pre stop. But at 13 seconds just after pre stop hook at 08:41:38 it was killed abruptly. The in flight processes not completed.

Therefore, it is clear the windows pods terminate without graceful shut down as shown in below picture.


Linux pod dummy-video-68f5f597f4-c6fxb terminating 


Linux pod started termination at 08:08:36 as shown in below in events.


Linux pod received termination signal after 10 seconds of pre stop hook at 08:08:46. It started application shut down process gracefully by invoking StopAsync. Processed the in flight events until the completion till 08:09:11 for about additional 25 seconds after termination signal recived. So, it has completed within grace persion of 300 seconds successfully. 


Therefor linux pod works correctly reciving termination signal and shut down gracefully.

How the application works locally with docker?

docker stop --timeout 300 containerid test in locally also show Linux container gracefully shut down. However, windows container shuts down abruptly and in flight processing abandoned.




THE SOLUTION for Windows Pods

The dotnet IHostedService implementation is correctly done in the application the actual. Root cause for Windows container not shutting down gracefully is the docker stop, or kubernetes pod termination not signaling the termination signal to the Windows container.

The need is to get the Windows pod to shut down container gracefully in AKS. For this we can use powershell script in pre stop hook and send a ctrl+c signal to dotnet process. This ctrl+c signal sending script is taken from here and modified to send ctrl+c signal to dotnet process in the Windows container.

$dotnetProcess = Get-Process -Name 'dotnet';
$ProcessID = $dotnetProcess.Id;
$encodedCommand = [Convert]::ToBase64String([System.Text.Encoding]::Unicode.GetBytes("Add-Type -Names 'w' -Name 'k' -M '[DllImport(""kernel32.dll"")]public static extern bool FreeConsole();[DllImport(""kernel32.dll"")]public static extern bool AttachConsole(uint p);[DllImport(""kernel32.dll"")]public static extern bool SetConsoleCtrlHandler(uint h, bool a);[DllImport(""kernel32.dll"")]public static extern bool GenerateConsoleCtrlEvent(uint e, uint p);public static void SendCtrlC(uint p){FreeConsole();AttachConsole(p);GenerateConsoleCtrlEvent(0, 0);}';[w.k]::SendCtrlC($ProcessID)"));
$stopProcess = start-process powershell.exe -PassThru -argument "-nologo -noprofile -executionpolicy bypass -EncodedCommand $encodedCommand";
$stopProcess.WaitForExit();
$dotnetProcess.WaitForExit();

In the Windows app docker base image the script is made available in the app directory as below.


Then the prestop hook is setup as shown below.

The windows pod dummy-preview-86656f579c-6x969 terminating now after the fix.


The windows pod started termination at 09:32:32


Now the windows container received shut down signal via pre stop hook script at 09.32.39. It continued processing in flight events for about 42 seconds after shut down signal and stopped after proceessing completed. The fix allowed Windows pod also to shut down the container gracefully.


No comments:

Popular Posts