Thursday 28 September 2023

Reducing Log Analytics Cost by Preventing Container Logs Ingession from Azure Kubernetes Services (AKS)

 Monitoring is an essential part of a deployment of software on aplatform such as AKS. However, once monitoring enabled there could be significant cost involved for monitoring data. When we enable log analytics workspace to ingest monitoring data from AKS, by default AKS will ingest all container logs, except for kube-system and gatekeeper-system namespaces. If our application are having large amout of container logs generated, then the cost will be lot higher for log analytics workspace. In case we are using app insights/ or an alternative system to monitor the application logs, for the applications deployed to AKS, we have enuough information to diagnose issues etc. Therefore, to reduce unnecessary cost log analytics can be done, by preventing AKS ingestion of the container logs to log analytics workspace. Let's explore the steps.

Expected outcome

As shown in the below figure, the log analytics workspace ingestion over time chart, is indicating the container logs data is no longer getting ingested after the change is applied.


 If we ispect in more detail in log analytics usage table with below query, we can see over a 90 day perioed the highest potion (nearly 52%) of log analytics cost was due to the container logs from AKS.

Usage
| where TimeGenerated > startofday(ago(90d))
| where IsBillable == true
| summarize IngestedGB = sum(Quantity) / 1000 by DataType
| sort by IngestedGB desc
| render piechart



For the last 24 hours, the container logs is about 6%  (0.5GB of data).

Usage
| where TimeGenerated > ago(24h)
| where IsBillable == true
| summarize IngestedGB = sum(Quantity) / 1000 by DataType
| sort by IngestedGB desc
| render piechart




However, after applying change, if we inspect the last 22 hour ingest data usage table for billable data in log analytics workspace, we can hardly see any ingestion of container logs (only few MBs). This is just beasue the change was applied just at the begining of last 22 hour.

Usage
| where TimeGenerated > ago(22h)
| where IsBillable == true
| summarize IngestedGB = sum(Quantity) / 1000 by DataType
| sort by IngestedGB desc
| render piechart



For last 21 hours considered there is no billable usage of container logs ingestion whoch shows, the change is effective and AKS no loger ingest container logs to log analytics workspace.



How to apply the change

As documented here we can create a kubernetes config map in AKS, for disabling container logs ingest to log analytics workspace. The exclude namespace is effective only when you want to enable container log ingestion to namespaces other than excluded ones.


The config map yaml below can be used for the purpose. The full details of the file can be found here.

# Refer below for settings in this config map
# https://raw.githubusercontent.com/microsoft/Docker-Provider/ci_prod/kubernetes/container-azm-ms-agentconfig.yaml
# https://learn.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-agent-config#data-collection-settings
kind: ConfigMap
apiVersion: v1
data:
  schema-version:
    v1
  config-version:
    ver1
  log-data-collection-settings: |-
    [log_collection_settings]
       [log_collection_settings.stdout]
          enabled = false
          exclude_namespaces = ["kube-system","gatekeeper-system"]
       [log_collection_settings.stderr]
          enabled = false
          exclude_namespaces = ["kube-system","gatekeeper-system"]
       [log_collection_settings.env_var]
          enabled = true
       [log_collection_settings.enrich_container_logs]
          enabled = false
       [log_collection_settings.collect_all_kube_events]
          enabled = false
  prometheus-data-collection-settings: |-
    [prometheus_data_collection_settings.cluster]
        interval = "1m"
        monitor_kubernetes_pods = false
    [prometheus_data_collection_settings.node]
        interval = "1m"
  metric_collection_settings: |-
    [metric_collection_settings.collect_kube_system_pv_metrics]
      enabled = false
  alertable-metrics-configuration-settings: |-
    [alertable_metrics_configuration_settings.container_resource_utilization_thresholds]
        container_cpu_threshold_percentage = 95.0
        container_memory_rss_threshold_percentage = 95.0
        container_memory_working_set_threshold_percentage = 95.0
    [alertable_metrics_configuration_settings.pv_utilization_thresholds]
        pv_usage_threshold_percentage = 60.0
    [alertable_metrics_configuration_settings.job_completion_threshold]
        job_completion_threshold_time_minutes = 360
  integrations: |-
    [integrations.azure_network_policy_manager]
        collect_basic_metrics = false
        collect_advanced_metrics = false
    [integrations.azure_subnet_ip_usage]
        enabled = false
  agent-settings: |-
    [agent_settings.prometheus_fbit_settings]
      tcp_listener_chunk_size = 10
      tcp_listener_buffer_size = 10
      tcp_listener_mem_buf_limit = 200
metadata:
  name: container-azm-ms-agentconfig
  namespace: kube-system

The above config map fill be automatically loaded by the ama-logs and ama-logs-windows (If you have windows containers as well) pods. 

kubectl get pods -n kube-system


kubectl logs ama-logs-9t42x -n kube-system --timestamps=true



You could enable container log ingestion and still disable it for given namespace say "demo" by using settings as below. Container logs will be ingested for namespaces other than excluded onces, which are "kube-system","gatekeeper-system" and "demo". 

log-data-collection-settings: |-
    [log_collection_settings]
       [log_collection_settings.stdout]
          enabled = true
          exclude_namespaces = ["kube-system","gatekeeper-system","demo"]
       [log_collection_settings.stderr]
          enabled = true
          exclude_namespaces = ["kube-system","gatekeeper-system","demo"]


No comments:

Popular Posts