BLOG

Unraveling the Unknown Costs of CloudWatch Metrics

Table of contents

Introduction

Cloud monitoring is essential for maintaining and optimizing AWS infrastructure, with Amazon CloudWatch being a primary tool for tracking metrics, logs, and alarms. However, as your AWS usage scales, so do your CloudWatch costs โ€” and sometimes, those costs can be difficult to trace.

A common challenge AWS users face is pinpointing the exact cost drivers for CloudWatch in theย Cost & Usage Report (CUR). While the report provides a high-level breakdown, it lacks the detailed attribution needed to identify specific sources of high costs.

This article presents a real-world example of diagnosing unexpected CloudWatch costs usingย DoiT Cloud Intelligence. After identifying high-cost operations, we will demonstrate how to leverageย CloudTrail data eventsย andย Amazon Athenaย to uncover the origins of these charges and gain actionable insights. By the end, youโ€™ll have a clear strategy for understanding and managing hidden CloudWatch costs.

Problem Statement

Amazon CloudWatch is an essential monitoring tool for AWS environments, offering detailed insights into metrics and logs. However, its cost structure can sometimes feel opaque, making it challenging to identify the source of unexpected billing spikes.

While theย Cost & Usage Report (CUR)ย provides a granular breakdown of AWS costs, it falls short when it comes toย resource-level visibilityย for specific charges. A notable example isย GetMetricData, an API operation used to retrieve CloudWatch metric data. Despite its significant impact on costs, CUR does not provide sufficient detail to determine which services, applications, or users are responsible for these charges.

This lack of transparency makes it challenging for AWS users to optimize costs, prevent budget overruns, and make informed decisions about their monitoring configurations.

Identifying High Costs in CloudWatch

To illustrate this challenge, we usedย DoiT Cloud Analytics reports, which help visualize and interpret cloud cost data. The cost data can be represented in various diagrams, filtered and grouped for better insights.

For instance, the following analysis provides a detailed cost breakdown of CloudWatch usage over 28 days, highlighting the consistently high costs associated withย GMD-Metrics (GetMetricData)ย operations.

The cost table below further categorizes CloudWatch expenses by SKU (Stock Keeping Unit), operation type, and resource information. Notably:

  • GMD-Metrics (GetMetricData) is a top cost driver.
  • Resource information is missing, making it difficult to determine the source of these requests.
  • MetricMonitorUsage also contributes to costs, but to a lesser extent.

Sinceย GetMetricDataย is driving significant and unexplained costs, we need a more detailed investigation usingย CloudTrail data events and Amazon Athenaย to trace its origin.

 

Enabling CloudTrail Data Events

AWS CloudTrail logs management events by default, such as IAM changes, security configurations, and resource provisioning. However,ย data events, which capture service-specific API calls like S3 object-level operations or Lambda executions, areย not enabled by default.

Since we need to trackย CloudWatch Metrics events, we must explicitly enable CloudTrail data events. This can be configured in an existing trail or by creating a new one.

Setting Up CloudTrail

1. Choose a CloudTrail trail

  • Modify an existing trail or create a new one.
  • Define anย S3 bucketย for storing CloudTrail logs.

2. Configure Optional Features

  • KMS Encryption (optional) for added security.
  • Log validation & SNS notifications (optional, for integrity and alerts).
  • CloudWatch Logs storage (not applicable here since we use Athena for analysis).

Defining the Data Event for CloudWatch Metrics

1. Select CloudWatch metric as the Data Event Type.

2. Specify Log Selector:

  • All eventsย (our choice for simplicity).
  • Read-only events or Write-only events
  • Use Custom selectors for more control.

Analyzing Log Data via Athena

Creating an Athena Table for CloudTrail Logs

With CloudTrail now loggingย CloudWatchย GetMetricDataย requestsย to anย S3 bucket, we can useย Amazon Athenaย to analyze them.

To analyze CloudTrail logs using Amazon Athena, you must create a table referencing the log data stored in your S3 bucket:

  • Access the CloudTrail Console and navigate toย Event historyย in the left-hand menu.
  • Click onย Create Athena table, and in the Storage location dropdown, select the S3 bucket where your CloudTrail logs are stored.

Querying GetMetricData Events

Now, we can query who or what is makingย GetMetricDataย requests in Amazon Athena. This SQL query is just an example using a small sample dataset. For a real dataset, a different query may yield more accurate results.

SELECT
    COUNT(*) as count,
    eventname,
    useridentity.principalId,
    useridentity.arn
FROM cloudtrail_logs_aws_cloudtrail_logs_cw_metrics
WHERE eventname = 'GetMetricData'
GROUP BY eventname, useridentity.principalId, useridentity.arn
ORDER BY count DESC
LIMIT 100;

Interpreting the Results

The query results (example shown below) reveal the sources generatingย GetMetricDataย requests.

  • The top row showsย 18 requests, making it theย primary cost driver.
  • Theย principalIdย andย arnย columns help identify whether the requests originate from aย specific AWS service, IAM user/role, or application.
  • If excessive requests are unnecessary, considerย reducing polling frequency, optimizing monitoring settings, or restricting accessย to lower costs.

 

Conclusion

Hidden CloudWatch costs, particularly those driven byย GetMetricData, can be challenging to track usingย AWS's Cost & Usage Report (CUR). By usingย CloudTrail data eventsย andย Amazon Athena, we gained detailed insights into the exact sources responsible for these requests.

To avoid future unexpected high costs, consider:

  • Optimizing metric queries to reduce frequency.
  • Restricting IAM permissions forย GetMetricData.
  • Using AWS Cost Explorer orย DoiT Cloud Intelligenceย for real-time cost monitoring.

With these strategies, you canย gain full visibility into CloudWatch costs, ensuring efficient monitoring without unnecessary expenses. If youโ€™ve encountered similar challenges, try this approach and share your findings!

Resources and References:

Schedule a call with our team

You will receive a calendar invite to the email address provided below for a 15-minute call with one of our team members to discuss your needs.

You will be presented with date and time options on the next step