Using iam roles inside AWS Kubernetes

AWS managed Kubernetes (EKS) allows your pods to assume AWS iam roles automatically so you don’t have to handle pesky credentials.

AWS managed Kubernetes (EKS) has tight integration with the wider AWS ecosystem, allowing pods to utilise Security Groups to gain access to network services like RDS and IAM roles to interact with AWS itself. In this article, we will look at how we can configure our pods to use IAM roles without having to worry about credentials or access tokens.

By removing access tokens and secrets from our applications, we eliminate the risk of those credentials being leaked or compromised, remove the need to manually rotate secret values and the disruptive downtime we experience when we forget.

This same approach also applies to serverless functions, such as AWS Lambda and ECS, and conventional EC2 instances. AWS makes it trivial to provide AWS API permissions to your services without having to generate, store and distribute credentials.

Amazon has some excellent documentation on this topic, but it can be quite scattered.

To begin, you need to enable the OIDC provider for your cluster, following this guide here

We now need to create our roles.

We advise having two roles. One for the pod to assume, and another that the application will use to interact with aws services. With this approach, we disconnect the pod from the role it uses to interact with the aws services, which allows assuming different roles if required and for developers to assume the roles for debugging.

This article is about to take a deep dive into the mechanics and implementation, so click here to skip the techy bit

Creating and associating our role

EKS role assumption works by allowing a role to be assumed by a Kubernetes pod, so we need to create a trust policy for the role rather than granting permissions to the pod. This looks like the policy below.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "OIDC_ARN"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "OIDC_URI:sub": "system:serviceaccount:NAMESPACE:NAME",
                }
            }
        }
    ]
}

This policy can be attached to a role, and that role can be assumed by a pod which has mounted a service account with the right name and in the right namespace.

Behind the scenes operators will fetch and mount the necessary tokens to allow pods to automatically talk to AWS, but first we need to do the Kubernetes configuration.

The first component is the service account for our pod to use. Service accounts are a core kubernetes component and are used to grant pods kubernetes privileges, such as the ability to create other pods or monitor for ingresses. The service account we create needs to match the name and namespace in our earlier trust policy. It also needs an annotation which provides the arn for the role you want it to assume.

Here is a template for this service account:

apiVersion: v1
kind: ServiceAccount
name: NAME
namespace: NAMESPACE
metadata:
    annotations:
        eks.amazonaws.com/role-arn: ROLE_ARN

Finally, we need to mount this service account to our pod.

Amazon cover this well in their documents here. Essentially, we are adding the key:

serviceAccountName: my-service-account

Using our new role!

With all this configured and our pod deployed, we can start to use our role from our pod.

An AWS operator has kindly mounted our credentials inside the pod at the following path: /var/run/secrets/eks.amazonaws.com/serviceaccount/token

This path is configured into the AWS SDK authentication discovery process, so in most cases you will not need to know it exists.

From inside your pod, you should be able to (if the aws cli is installed) confirm your role is granted to the pod with the command: aws sts get-caller-identity

If you granted that role permissions directly, this may be as far as you need to go. In our recommended configuration, you should instead allow that role to assume another.

To do this, create another role, but this time give it only the permissions it needs to do its job. This may be the ability to create DNS records in Route53 or to deploy certain AWS resources. This role will also need a trust policy, but this time we allow one role to assume another.

That trust policy will look something like this:

{
  "Version": "2012-10-17",
  "Statement": {
    "Effect": "Allow",
    "Action": "sts:AssumeRole",
    "Resource": "OUR_FIRST_ROLES_ARN"
  }
}

With this in place, our pod can assume its default role but also now assume this higher privileged role when it needs it.

Switching roles

Another time when this is super useful is when we are running pipelines inside AWS, using a service like Atlantis or a Gitlab runner. Tasks in these pipelines can be configured to assume an AWS profile, and this profile could be accessible to both developers and the pipeline, which helps them to share a common codebase.

We can provision a configuration file into our pod using a configmap, and then tell aws to use it via an environment variable called AWS_CONFIG_FILE. This config file should be something like the one below:

[profile default]
web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token
role_arn = FIRST_ROLE_ARN
[profile DEPLOY_PROFILE_NAME]
role_arn = SECOND_ROLE_ARN
source_profile = default

With this in place, we can quickly switch between our roles using the –profile switch to the aws cli or steering aws authentication using the AWS_PROFILE environment variable.

We can:

use our first role like this: aws sts get-caller-identity
or our second role with the environment variable: AWS_PROFILE=DEPLOY_PROFILE aws sts get-caller-identity
or switch: aws sts get-caller-identity --profile DEPLOY_PROFILE

That’s it, we have securely assumed a role without any pesky secrets!

Why?

As you can see, at no point did we need to handle pesky sensitive secret strings to allow our applications to perform actions on AWS. We can get the benefits of authentication for our services without the messy problems of handling these authentication secrets. This is one of the security benefits you can get from moving to a cloud native workload.

Do these patterns lead to “vendor lock-in”?

Some of our customers are quite rightly worried about making AWS specific changes to their applications. It can feel wrong to use IAM role assumption, rather than credentials, because it is an AWS specific feature and results in “vendor lock-in”. If they move their pipeline runners to another home, they will then need credentials to access AWS again and this requires rework. Whilst this is true, it’s a poor argument. It’s possible that those runners will never move, but if they did then they would likely need a lot more work than simply slotting in some credentials!

Next steps

This approach, when outlined in a blog, can look a little more involved and daunting than the less secure (but common) approach of creating a service user and generating an access token. Generating an access token creates its own problems, like how do we store them and give them to the application. The unfortunate truth is that corners are often cut here, and what looks like less effort is actually a significant risk.

We recommend building a templated approach to creating these roles using Terraform. Terraform (and other IaC technologies) can make adding a role securely to one of your applications incredibly simple, reliable and repeatable. In this way, you can automate away the toil of creating multiple roles and trust policies. If we look back at this blog, a lot of those roles and policies are obvious candidates for templating and can be reduced to a few key variables. It is actually fairly trivial to get to a point where you can request a pod has access to a powerful role, via an intermediate role, with nothing more than 5 or 6 lines of configuration. As an added benefit, that configuration will be easy to read, audit and assure.

We help companies do this all the time, so their developers get the benefits of a rapid and secure mechanism for deploying applications without having the expensive and time-consuming journey of finding out what works.