How to search organisation-wide cloudtrails with Athena

Blog banner image


AWS Athena is the easiest way to search organisation-wide cloudtrails implemented by AWS Control Tower. In this blog, we walk you through it.



First, the basics?

AWS Control tower automatically adds multi-region cloudtrails into every single account in your organisation, and forwards all of these logs into the dedicated “Log Archive” account. This is fantastic for security, as it is simple to control access to those centralised AWS CloudTrail logs. You can take a feed of these logs and ship them into a product like Datadog or Splunk, but in this case we are sticking with AWS and are going to query them with AWS Athena.

We found that AWS Athena queries in the real-world can be very slow, but this can be vastly improved by partitioning the Athena tables. This way Athena can fetch much less data from S3, and avoid the other content of the buckets like AWS Config files.

Using our SQL queries here will make your searches much faster.

What are CloudTrail logs?

AWS CloudTrail is a log of all management activity in an aws account, such as creating / deleting resources or modifying security configuration. They are a key place to look for abuse, or to identify what changes have been made in AWS.

All those ControlTower-managed CloudTrail logs end up in a bucket inside the “Log Archive” account, and there is a structure to their placement. You can browse to this bucket in the “Log Archive” account and explore the bucket in the UI. Your logs should be setup like this automatically:

s3://aws-controltower-logs-{{ACCOUNT_ID}}-{{REGION}}/{{ORG_ID}}/AWSLogs/{{ORG_ID}}/

Now you will have some specifics to fill in, such as the {{ACCOUNT_ID}} and {{REGION}}. These are specific to your Log Archive account and Control Tower configuration.

Throughout this guide, we will use {{VAR}} formatting to show you where to add replacements. This is important as there is templating inside some of our code blocks and those have different variable formats. Don;t change those!

First we need to setup the Athena table

This big statement is the bit that’s always missing from the aws docs!

We want to tell Athena where to find our data and how to ```partition`` it.

Partitioning the table is the magic that allows Athena to find all our logs, no matter the region or account.

CREATE EXTERNAL TABLE cloudtrail_logs(
    eventVersion STRING,
    userIdentity STRUCT<
        type: STRING,
        principalId: STRING,
        arn: STRING,
        accountId: STRING,
        invokedBy: STRING,
        accessKeyId: STRING,
        userName: STRING,
        sessionContext: STRUCT<
            attributes: STRUCT<
                mfaAuthenticated: STRING,
                creationDate: STRING>,
            sessionIssuer: STRUCT<
                type: STRING,
                principalId: STRING,
                arn: STRING,
                accountId: STRING,
                userName: STRING>,
            ec2RoleDelivery:string,
            webIdFederationData:map<string,string>
        >
    >,
    eventTime STRING,
    eventSource STRING,
    eventName STRING,
    awsRegion STRING,
    sourceIpAddress STRING,
    userAgent STRING,
    errorCode STRING,
    errorMessage STRING,
    requestparameters STRING,
    responseelements STRING,
    additionaleventdata STRING,
    requestId STRING,
    eventId STRING,
    readOnly STRING,
    resources ARRAY<STRUCT<
        arn: STRING,
        accountId: STRING,
        type: STRING>>,
    eventType STRING,
    apiVersion STRING,
    recipientAccountId STRING,
    serviceEventDetails STRING,
    sharedEventID STRING,
    vpcendpointid STRING,
    eventCategory STRING,
    tlsDetails struct<
        tlsVersion:string,
        cipherSuite:string,
        clientProvidedHostHeader:string>
  )
PARTITIONED BY (
   `account_id` string,
   `timestamp` string,
   `aws_region` string)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://aws-controltower-logs-{{ACCOUNT_ID}}-{{REGION}}/{{ORG_ID}}/AWSLogs/{{ORG_ID}}/'
TBLPROPERTIES (
  'projection.enabled'='true', 
  'projection.account_id.type'='enum',
  'projection.account_id.values'='{{ACCOUNT1}},{{ACCOUNT2}},{{ACCOUNT3}},{{ACCOUNT4}},{{ACCOUNT5}}',
  'projection.aws_region.type'='enum',
  'projection.aws_region.values'='us-east-1,us-east-2,us-west-1,us-west-2,eu-central-1', 
  'projection.timestamp.format'='yyyy/MM/dd', 
  'projection.timestamp.interval'='1', 
  'projection.timestamp.interval.unit'='DAYS', 
  'projection.timestamp.range'='2024/01/01,NOW', 
  'projection.timestamp.type'='date', 
  'storage.location.template'='s3://aws-controltower-logs-{{ACCOUNT_ID}}-{{REGION}}/{{ORG_ID}}/AWSLogs/{{ORG_ID}}/${account_id}/CloudTrail/${aws_region}/${timestamp}')

If you templated that correctly, it should create the table!

With the Athena table created we can start to query information from it!

This uses SQL syntax and can be complicated. We’ve included a couple examples here to get you started.

Finding user activity when using IAM Identity center

If you’re using IAM Identity Center (previously AWS SSO) then you will already know that it works on role assumptions. This means you cannot correlate user activity back to an IAM user. Instead, we correlate back to a role assumption with a query like this:

SELECT
  eventTime,
  eventName,
  awsRegion,
  eventSource,
  sourceIPAddress,
  userAgent,
  errorCode,
  errorMessage,
  requestParameters,
  responseElements,
  additionalEventData,
  userIdentity.sessionContext.sessionIssuer.userName AS issuerUserName,
  userIdentity.arn AS assumedRoleArn,
  userIdentity.sessionContext.sessionIssuer.arn AS issuerArn
FROM
  cloudtrail_logs
WHERE
  userIdentity.type = 'AssumedRole'
  AND (userIdentity.sessionContext.sessionIssuer.userName = '{{USERNAME}}'
       OR userIdentity.arn LIKE '%{{USERNAME}}%')
ORDER BY
  eventTime DESC
LIMIT 100;

What about API keys

Of course, API keys are still used and those can be queried no matter if you are using IAM Identity Center or not:

SELECT
  eventTime,
  eventName,
  awsRegion,
  eventSource,
  sourceIPAddress,
  userAgent,
  errorCode,
  errorMessage,
  requestParameters,
  responseElements,
  additionalEventData,
  userIdentity.arn,
  userIdentity.accountId,
  userIdentity.accessKeyId
FROM
  cloudtrail_logs
WHERE
  userIdentity.accessKeyId = '{{KEY}}'
ORDER BY
  eventTime DESC;

We can also identify all changes to security groups

Another great query is looking for Security Group modifications inside your environment.

SELECT
  eventTime,
  eventName,
  awsRegion,
  sourceIPAddress,
  userIdentity.type as userType,
  userIdentity.userName as userName,
  userIdentity.arn as userArn,
  userIdentity.accountId as userAccountId,
  userIdentity.principalId as userPrincipalId,
  json_extract(requestParameters, '$.groupId') as groupId,
  errorCode,
  errorMessage,
  requestParameters,
  responseElements
FROM
  cloudtrail_logs
WHERE
  eventName LIKE 'AuthorizeSecurityGroup%'
  OR eventName LIKE 'RevokeSecurityGroup%'
ORDER BY
  eventTime DESC;

Conclusion

AWS provides a fantastic amount of native functionality, providing a “one-stop-shop” for your cloud environment. If you need help configuring AWS securely and really leveraging its features, reach out. We’d love to help.

For more information, email us at [email protected] or call us on 0161 660 3545

Author

Simon Gurney

- CTO -

Simon is one of the Punk Security Directors and has over 17 years experience working within IT, primarily focused on automation and InfoSec.

read more