Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Split Impact Allocation Data

SCAD

Split Cost Allocation Data or SCAD is an AWS billing and cost management feature that helps organizations gain fine-grained visibility into how cloud resources are shared and consumed across multiple services, accounts, or workloads.

When enabled, it ensures that shared costs — such as data transfer, EC2 instances, or load balancers — are allocated proportionally among all linked resources or cost allocation tags. This becomes particularly important in containerized and multi-tenant environments such as Kubernetes on AWS.

In Kubernetes, a single EC2 instance, EBS volume, or Elastic Load Balancer can support multiple pods, namespaces, or even different applications. This makes it difficult to understand:

  • How much a specific team, application, or namespace costs.
  • Which workloads are driving the majority of the cluster’s cloud expenses.
  • How to accurately charge back or show back costs to different business units.

Without split allocation, costs appear aggregated at the resource level rather than the usage level, making accurate chargeback or budgeting almost impossible.

Split cost allocation data introduces new usage records and new cost metric columns for each containerized resource ID (that is, ECS task and Kubernetes pod) in AWS CUR. For more information, see Split line item details.

How to Enable Split Cost Allocation Data

When defining the CUR data export, activate the options Split cost allocation data as well as Include resource IDs. See Generate Cost and Usage Reports for more details.

Activating SCAD has a substantial impact on the size of the CUR reports as they contain line items for every split as well as the AWS resources.

From Cost to Impact allocation

A similar problem happens with the environmental impacts estimated by SPRUCE. The split line items (e.g. K8s pods) have a cost associated with them but the emissions and other environmental impacts are still only associated with the AWS resources.

SPRUCE has a separate Apache Spark job which:

  • groups all the line_items by hourly time slot and resourceID
  • get the sum of all the impacts for the resources (EC2, volume, network)
  • get the sum of the split usage ratios
  • allocates the impacts on the splits based on their usage ratios
The impact columns for the splits have a prefix `split_`, e.g `split_operational_energy_kwh` so that they are not counted twice: once with the resources and then again with the splits.

HOWTO

The call is similar to how you run Spruce

spark-submit --class com.digitalpebble.spruce.SplitJob --driver-memory 8g ./target/spruce-*.jar -i ./enriched_curs -o ./enriched_curs_with_splits

An option -c allows to specify the impact columns to attribute to the splits. By default its value is "operational_energy_kwh, operational_emissions_co2eq_g, embodied_emissions_co2eq_g.

This can be changed to cater for columns generated by services other than SPRUCE, including commercial ones such as GreenPixie or TailPipe.