GitHub Self-Hosted Runners on AKS

Introduction

GitHub self-hosted runners allow you to run GitHub Actions workflows on your infrastructure. This can be beneficial for various reasons, including potential cost savings, better performance, longer jobs, and more environmental control. This article will cover setting up GitHub self-hosted runners on AKS with external components such as Azure Key Vault.

Authentication Using GitHub Application

Using a GitHub application for authentication is more secure than a Personal Access Token (PAT). You must create a GitHub App, configure it with the necessary permissions, and use its credentials in your runner configuration. You must select the appropriate permissions based on your requirements and generate a private key for the GitHub App. If you need to update the permissions, you can do so in the GitHub App settings and then validate it in your account without regenerating the private key.

We will not cover the GitHub App creation process here, but you can find detailed instructions in the GitHub documentation.

Components

Update the Base Image

You may need custom tools in your CI and don’t want to install them each time in your GitHub Workflow. In that case, you can extend the default image and reference it in the Helm.

The Controller

First, you must deploy the GitHub Actions Runner Controller on your Kubernetes cluster. Here is an example of how you can deploy the operator using Helm and Terraform:

provider "helm" {
  kubernetes {
    host                   = "<host>"
    client_certificate     = "<client_certificate_in_clear>"
    client_key             = "<client_key_in_clear>"
    cluster_ca_certificate = "<cluster_ca_certificate_in_clear>"
  }
}

resource "helm_release" "github-runner" {
    provider = helm

    chart               = "gha-runner-scale-set-controller"
    name                = "arc"
    namespace           = "arc-systems"
    repository          = "oci://ghcr.io/actions/actions-runner-controller-charts/"
    version             = "0.10.1"
    create_namespace    = true

    # You can defined additional configuration here.
    values = [
<<EOF
nodeSelector:
  github-listener: "yes"
EOF
    ]
}

The Scale Set

Next, you need to deploy the GitHub Actions Runner Scale Set. This component will run the workflow job. The following example defined more than the necessary configuration to give you an idea of what you can do. The complete list of configurations is available here.

resource "helm_release" "runner" {
    provider = helm

    chart               = "gha-runner-scale-set"
    name                = "myworkflow"
    namespace           = "myworkflow"
    repository          = "oci://ghcr.io/actions/actions-runner-controller-charts/"
    version             = "0.10.1"
    create_namespace    = true

    values = [
<<EOF
githubConfigUrl: "https://github.com/<account>/<repository>"
githubConfigSecret:
  github_app_id: "<github_app_id>"
  github_app_installation_id: "<github_app_installation_id>"
  github_app_private_key: |
    ${indent(4, <private_key_file_path>)}

listenerTemplate:
  spec:
    containers:
      - name: listener
        securityContext:
          runAsUser: 1000
template:
  spec:
    serviceAccountName: "<my-service-account>"
    initContainers:
    - name: init-dind-externals
      # Here we are using a custom image based on: ghcr.io/actions/actions-runner
      image: "my-custom-image"
      command: ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
      volumeMounts:
        - name: dind-externals
          mountPath: /home/runner/tmpDir
    containers:
    - name: runner
      # Here we are using a custom image based on: ghcr.io/actions/actions-runner
      image: "my-custom-image"
      command: ["/home/runner/run.sh"]
      env:
        - name: DOCKER_HOST
          value: unix:///var/run/docker.sock
      volumeMounts:
        - name: work
          mountPath: /home/runner/_work
        - name: dind-sock
          mountPath: /var/run
        # Here an example of mounting secrets from a volume.
        - name: secrets-store01-inline
          mountPath: "/home/runner/.ssh"
          readOnly: false
        - name: secrets-store01-inline
          mountPath: "/etc/ssh"
          readOnly: true
      resources:
        limits:
          cpu: "4"
          memory: 8Gi
        requests:
          cpu: "1"
          memory: 2Gi
    # Sometimes we need more than just the runner container.
    # Here we are adding a MongoDB container that the runner will use during the workflow execution.
    - name: mongo
      image: mongo
      env:
        - name: MONGO_INITDB_ROOT_USERNAME
          value: "foo"
        - name: MONGO_INITDB_ROOT_PASSWORD
          value: "bar"
        - name: MONGO_INITDB_DATABASE
          value: "foobar"
      # ... Resources configuration and volumes ...
    - name: dind
      image: docker:dind
      args:
        - dockerd
        - --host=unix:///var/run/docker.sock
        - --group=$(DOCKER_GROUP_GID)
      env:
        - name: DOCKER_GROUP_GID
          value: "123"
      securityContext:
        privileged: true
      volumeMounts:
        - name: work
          mountPath: /home/runner/_work
        - name: dind-sock
          mountPath: /var/run
        - name: dind-externals
          mountPath: /home/runner/externals
      # ... Resources configuration and volumes ...
    volumes:
    - name: work
      emptyDir: {}
    - name: dind-sock
      emptyDir: {}
    - name: dind-externals
      emptyDir: {}
    # Mounting a secret from Azure Key Vault
    - name: secrets-store01-inline
      csi:
        driver: secrets-store.csi.k8s.io
        readOnly: true
        volumeAttributes:
          secretProviderClass: "my-secret-provider-class"
    # Retricting the runner to spot instances
    tolerations:
      - key: "kubernetes.azure.com/scalesetpriority"
        operator: "Equal"
        value: "spot"
        effect: "NoSchedule"
EOF
    ]
}

Deep Dive in the Scale Set Configuration

Service Account with Entra Application

In the example above, we defined a service account for the runner pod. This service account is used to authenticate with Azure Key Vault to access secrets. The first step is to create an Entra application and then federated identity credentials for this application using the AKS as the OIDC issuer.

data "azuread_client_config" "current" {}

resource "azuread_application_registration" "self" {
  description      = "<description>"
  display_name     = "<display_name>"
  sign_in_audience = "AzureADMyOrg"
}

resource "azuread_application_federated_identity_credential" "self" {
  application_id = azuread_application_registration.self.id
  audiences      = ["api://AzureADTokenExchange"]
  description    = "<description>"
  display_name   = "<name>"
  issuer         = "<kubernetes-oidc-issuer>
  subject        = "system:serviceaccount:<namespace>:<name>"
}

resource "azuread_service_principal" "self" {
  app_role_assignment_required = false
  client_id                    = azuread_application_registration.self.client_id
  description                  = "<description>"
  owners                       = [data.azuread_client_config.current.object_id]

  feature_tags {
    enterprise = true
    gallery    = false
  }
}

Mounting Volume and Secrets

Azure Key Vault as Volume

You can mount secrets from Azure Key Vault as volumes in your runner pods, ensuring that sensitive information is securely managed. Remember to provide the Entra application with the necessary permissions to access the Key Vault.

data "azurerm_client_config" "current" {}

resource "kubernetes_manifest" "secrets" {
    provider = kubernetes

    manifest = {
        apiVersion  = "secrets-store.csi.x-k8s.io/v1"
        kind        = "SecretProviderClass"
        metadata = {
            name        = "name"
            namespace   = "namespace"
        }
        spec = {
            provider = "azure"
            parameters = {
                usePodIdentity = "false"
                useVMManagedIdentity = "false"
                clientID = "<entra-application-id>"
                keyvaultName = "<keyvault-name>"
                objects = <<EOF
array:
# Mounts a list of secrets from Azure Key Vault
%{ for secret in var.secrets ~}
  - |
    objectName: ${secret}
    objectAlias: "${secret}"
    objectType: secret
    objectVersion: ""
%{ endfor ~}
                EOF
                tenantId = data.azurerm_client_config.current.tenant_id
            }
        }
    }
}

Multiple Containers

You can run multiple containers alongside the main runner container. This is useful for running additional services like databases or caching layers.

# ...existing code...
    containers:
    - name: mongo
      image: mongo
      env:
        - name: MONGO_INITDB_ROOT_USERNAME
          value: "foo"
        - name: MONGO_INITDB_ROOT_PASSWORD
          value: "bar"
        - name: MONGO_INITDB_DATABASE
          value: "foobar"
      resources:
        requests:
          cpu: "0.5"
          memory: 1Gi
        limits:
          cpu: "1"
          memory: 2Gi
# ...existing code...

Spot and Machine Specific Usage

This can optimize costs and performance based on your requirements, mainly because only the listener stays permanently up. It’s also helpful to train a model or run a specific task requiring a particular type of machine using GitHub Actions as an orchestrator.

# ...existing code...
    tolerations:
      - key: "kubernetes.azure.com/scalesetpriority"
        operator: "Equal"
        value: "spot"
        effect: "NoSchedule"
# ...existing code...

Note: I wrote an article about using GitHub Actions as an orchestrator for machine learning tasks; you can find it here.

Unexpected Behaviors

Sometimes, you may encounter unexpected behaviors when using self-hosted runners on Kubernetes. The worst encountered was the scale set not scaling when the listener asked for a new runner. This blocked all workflows and impacted the development team. Even after lengthy troubleshooting, the root cause was not found and we had to delete and redeploy all components.

To fix the problem, we tried removing all namespaces containing a Scale Set and the Controller using Terraform; it was a nightmare. Namespaces stayed in the deletion status due to some remaining underlying components that can’t be deleted. We finally had to patch some resources to empty the finalizers metadata and unblock the situation.

Conclusion

Using self-hosted runners on Kubernetes provides greater control and flexibility for running GitHub Actions workflows, even if it includes maintenance. Leveraging the AKS ecosystem allows you to run jobs or cron jobs based on GitHub and an existing infrastructure without having to create another resource.