Skip to main content

Cost anomalies

Overview

DoiT Cloud cost anomaly detection offers end-to-end monitoring of spikes in your Google Cloud, Amazon Web Services, and Microsoft Azure costs across all your projects and services.

The detection service leverages machine learning algorithms to monitor billing data and analyzes the trend of spending in your cloud environment. It identifies billing patterns across DoiT customers, forecasts your cloud spending, and constantly improves itself to provide even more accurate results.

Billing records that don't align with your anticipated spending behavior are identified as potential anomalies. You can also get insights into which resources are causing the anomalies and take corrective actions if necessary.

Before you begin

The data analysis begins as soon as you sign up. However, for anomaly detection to work properly, we need at least seven full days of reference data in a specific project.

In case anomaly detection is critical to your operation, we recommend that you wait this seven-day period out before making significant changes to your cloud spending.

Required Permissions
  • Attributions Manager, Anomalies Viewer, Cloud Analytics

Access anomalies

To access detected cost anomalies, select Governance from the top navigation menu, and then select Cost anomalies. The DoiT Platform stores all the historical cost anomalies.

The cost anomalies page.

For each anomaly, you'll find the following information:

  • Start Time: The start time of the hourly usage window on which the aggregated cost exceeds the predefined threshold and is considered a potential anomaly. The time value comes from the billing data by the cloud providers: for AWS it is the lineItem/UsageStartDate (UTC); for Google Cloud it is the usage_start_time (PT); for Azure it is the property usageStart (UTC).

  • Service: See Resource metadata: Service.

  • Project/Account: See Hierarchy groups: Project/Account ID.

  • Billing Account ID: For Google Cloud it is the Cloud Billing account ID that the usage is associated with. For AWS it is your DoiT customer ID (if you're on a Dedicated payer account) or your CloudHealth account ID (if you're on a Consolidated billing account). For Azure it is the unique identifier of your Azure subscription; you can also find the Subscription ID on the DoiT Console's Assets page.

  • Platform: The cloud provider: Amazon Web Services, Google Cloud, or Microsoft Azure.

  • Time Frame: Whether the anomaly is triggered based on the Hourly or Daily time series of the usage and cost data.

  • Attribution: The group of resources being monitored. By default, the anomaly detection service monitors three preset attributions: All AWS Resources, All GCP Resources, and All Azure Resources. You can also Monitor cost anomalies on other specific subsets of your overall cloud spend.

  • Anomaly: A thumbnail image of the anomaly chart.

  • Severity: The severity level of the anomaly. There are three severity levels: Information, Warning, and Critical. They're defined by DoiT in accordance with the extent to which the actual cost deviates from the established pattern.

  • Cost of anomaly: The difference between the actual cost and the maximum cost in the normal range.

  • Details: You can select the View button in this column to view the details of the specific anomaly.

How it works

The anomaly detection system uses a rule-based anomaly classification model to evaluate the cost and usage data per service. To be classified as an anomaly, the spend of a service must meet all the following criteria:

  1. The hourly spend of the service at the usage time considered is at least US$5.

  2. The daily spend of the service is at least US$90.

  3. The daily spend exceeds monthly seasonality.

  4. The daily spend exceeds the upper bound of the system's acceptable range.

    The anomaly detection system utilizes a model fitted on data from the preceding period to forecast expected spend. The acceptable range is determined by a DoiT specific confidence interval, which represents a range where a certain percentage of possible values should fall. For example, a 90% confidence interval indicates the range for 90% of possible values.

    On anomaly charts, the acceptable range is depicted as a shaded area. In the example below, two points exceed the upper bound of the shaded area. As they also met the other three criteria, they are both classified as anomalies.

    The cost anomalies page

FAQ

What is the scope evaluated by the anomaly detection system?

Data samples evaluated by the system are partitioned as follows:

  • per billing account
  • per project/account
  • per service
  • per attribution (if applicable)

The anomaly detection system evaluates anomalies per service per project/account across regions. It doesn't evaluate multiple services in a project/account as a whole or multiple projects/accounts per service.

What is the latency of cost anomaly detection?

In most cases, an anomaly is reported within 12 hours once the aggregated cost exceeds the predefined threshold.

The anomaly detection engine checks usage and cost data hourly. The latency mainly relates to the varying intervals at which cloud providers report usage and cost data.

See also AWS cost data latency in DoiT Console and Google Cloud's frequency of data loads.

Why was a spike in my costs not reported as an anomaly?

The anomaly detection system evaluates costs per service, it doesn't evaluate the combined costs of multiple services.

If a spike in your cloud costs was not detected as an anomaly, it's important to first assess whether the spike was caused by more than one service.

In addition, the spend of a service must meet a specific set of criteria to qualify as an anomaly.

How does anomaly detection differ from alerts?

Anomaly detection differs from cost alerts in the following aspects:

  • Scope: Anomaly detection always monitors individual services, while the scope of cost alerts is defined by attribution (though it's possible to break down the evaluation by service)

  • Condition: The condition to trigger an alert is a single threshold, for example, a 5% increase of weekly cost. In contrast, to be classified as an anomaly, the cost must meet multiple criteria.

  • Objectivity: An alert reacts to an objective threshold, while anomaly detection also considers the anticipated spending behavior established by a fitted time series model.

In general, alerts are more "sensitive" than anomaly detection.

How does a cost anomaly alert differ from its report?

Data values

The chart included in a cost anomaly alert provides a snapshot of the billing data as at the time of the detection.

The corresponding report (accessed via the Open in Reports button) contains the most up-to-date data and may therefore differ slightly at the latest time steps.

Data availability

The anomaly detection system uses the freshest billing data available in order to expedite alerts.

The report uses more detailed tables that require additional processing, with an accompanied delay in data availability. This may result in billing data which triggered an alert being temporarily unavailable in the report at the time that the alert is sent.

Interactive demo

Try out our interactive demo for a hands-on walk-through experience.