Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Enrichment modules

SPRUCE generates its estimates by chaining EnrichmentModules. An EnrichmentModule is the unit of extension in SPRUCE. Each module reads columns from the CUR input row and/or from values set by earlier modules, then writes its results into a shared map. The pipeline materialises one output row per CUR row at the end, avoiding per-module row copies.

For instance, the AverageCarbonIntensity.java module applies average carbon intensity factors to energy estimates based on the region in order to generate operational emissions.

The list of columns generated by the modules can be found in the SpruceColumn class.

The enrichment modules are listed and configured in a configuration file, one per cloud provider. If no configuration is specified, SPRUCE uses the bundled default for the active provider (e.g. default-config-aws.json for AWS). See Configure the modules for instructions on how to modify the enrichment modules.

Cloud Carbon Footprint

The following modules implement the heuristics from the Cloud Carbon Footprint project.

ccf.aws.Storage

Provides an estimate of energy used for storage by applying a flat coefficient per Gb, following the approach used by the Cloud Carbon Footprint project. Service-specific replication factors are applied. See methodology for more details.

The HDD and SSD coefficients (in Wh per TB-hour) can be overridden via configuration:

KeyDefaultDescription
hdd_coefficient_tb_h0.65Energy per TB-hour for HDD storage
ssd_coefficient_tb_h1.2Energy per TB-hour for SSD storage

Output column: operational_energy_kwh.

ccf.aws.Accelerators

Provides an estimate of energy used by accelerators, following the approach used by the Cloud Carbon Footprint project. See methodology for more details.

Output column: operational_energy_kwh.

Boavizta

The following modules make use of the BoaviztAPI.

boavizta.aws.BoaviztAPI

Provides an estimate of final energy used for computation (EC2, OpenSearch, RDS) as well as the related embodied emissions using the BoaviztAPI.

Output columns: operational_energy_kwh, embodied_emissions_co2eq_g and embodied_adp_sbeq_g.

From https://doc.api.boavizta.org/Explanations/impacts/

Abiotic Depletion Potential (ADP) is an environmental impact indicator. This category corresponds to mineral and resources used and is, in this sense, mainly influenced by the rate of resources extracted. The effect of this consumption on their depletion is estimated according to their availability stock at a global scale. This impact category is divided into two components: a material component and a fossil fuels component (we use a version of ADP which include both). This impact is expressed in grams of antimony equivalent (gSbeq).

Source: sciencedirect

boavizta.aws.BoaviztAPIstatic

Similar to the previous module but does not get the information from an instance of the BoaviztAPI but from a static file generated from it. This makes it simpler to use SPRUCE.

Output columns: operational_energy_kwh, embodied_emissions_co2eq_g and embodied_adp_sbeq_g.

EcoLogits

The following modules estimate the energy consumption and embodied emissions of LLM inference using static coefficients derived from the EcoLogits project.

ecologits.BedrockEcoLogits

Provides an estimate of energy consumption and embodied emissions for LLM inference on AWS Bedrock, based on static per-model coefficients from the EcoLogits project. This follows the same pattern as boavizta.aws.BoaviztAPIstatic: a static data file bundled in the JAR is loaded at initialisation time, and the module matches Bedrock CUR rows to per-model coefficients to compute energy usage and embodied emissions.

The module parses the line_item_usage_type field (format: {REGION}-{ModelKey}-{input|output}-tokens[-batch]) to extract both the model key and the token type, then normalises the token count from pricing_unit (handling real-world values such as 1K tokens or 1M tokens). Only output-token rows are scored — the EcoLogits methodology attributes ~all generation cost to the autoregressive output phase, so input-token rows are skipped.

Output columns: operational_energy_kwh and embodied_emissions_co2eq_g.

Batch size assumption: EcoLogits hardcodes a batch size of B=64 concurrent requests. The resulting coefficients are a mid-batch estimate — they underestimate energy for low-traffic scenarios and overestimate it for high-throughput batch inference (e.g. Bedrock Batch mode). Making B dynamic requires provider telemetry not available in billing data.

ember.AverageCarbonIntensity

Adds average carbon intensity factors derived from Ember’s electricity data, distributed under the Creative Commons Attribution Licence (CC-BY-4.0). Values are keyed directly by cloud provider and region (e.g. aws:us-east-1). For regions in countries with sub-national data (currently the US and India), the carbon intensity is taken from the Ember value for the state hosting the data centre; otherwise the country-level Ember value is used.

The data is loaded from src/main/resources/ember/ember_co2_intensity.csv, which is generated from cloud_regions.json — see the scripts under scripts/ and the dedicated README for how to refresh it.

Output column: carbon_intensity.

Cloud region metadata

SPRUCE ships with src/main/resources/cloud_regions.json, a single JSON file listing the AWS, GCP, and Azure cloud regions together with their location (country, metro area, latitude/longitude), service status, and number of availability zones. It is the canonical source for the region-to-location mapping used by other modules and resource files (e.g. ember.AverageCarbonIntensity).

The file is produced in two steps by scripts under scripts/, see scripts/README.md for the full usage details.

RegionExtraction

Extracts the region information from the input and stores it in a standard location.

Output column: region.

PUE

Uses the 2024 data published by AWS for Power Usage Effectiveness to rows for which energy usage has been estimated. This provides a more accurate and up to date approach than the flat rate approach in the CCF methodology.

Output column: power_usage_effectiveness.

Water

Estimates water consumption associated with cloud usage, producing three columns:

  • water_cooling_l – the volume of water (in litres) used for data centre cooling. Computed as operational_energy_kwh × power_usage_effectiveness × WUE, where WUE (Water Usage Effectiveness) is the ratio of litres of water consumed for cooling per kWh of IT energy. The per-region WUE values come from the 2024 data published by AWS.

  • water_electricity_production_l – the volume of water (in litres) consumed during electricity generation to power the data centre. Computed as operational_energy_kwh × power_usage_effectiveness × WCF, where WCF (Water Consumption Factor) represents the litres of water consumed per kWh of electricity generated. The WCF values per electricity grid zone are sourced from the WRI methodology for calculating water use embedded in purchased electricity.

  • water_consumption_stress_area_l – the total water consumption (water_cooling_l + water_electricity_production_l) attributed to regions under high or extremely high water stress (Aqueduct 4.0 baseline water stress category ≥ 3). This field is only populated when the electricity grid zone for the region has a water stress category of 3 (High) or 4 (Extremely High); it is absent otherwise. Water stress categories are derived from the WRI Aqueduct 4.0 dataset. The World Resource Institute’s Aqueduct tool is licensed through Creative Commons. The data has been extracted and mapped to cloud provider region codes.

Output columns: water_cooling_l, water_electricity_production_l, and water_consumption_stress_area_l.

aws.Networking

Provides an estimate of energy used for networking in and out of data centres. The module distinguishes between three transfer types with separate coefficients (in kWh/Gb):

Transfer typeKeyDefaultDescription
Intra-regionintra0.001Traffic within the same region
Inter-regioninter0.0015Traffic between AWS regions
Externalextra0.059Traffic to/from the internet (AWS Inbound / Outbound)

The coefficients are taken from the Boavizta Cloud Emissions Working Group and can be overridden via the network_coefficients_kwh_gb configuration map.

The relevance and usefulness of attributing emissions for networking based on usage is subject for debate as the energy use of networking is pretty constant independently of traffic. The consequences of reducing networking are probably negligible but since the approach in SPRUCE is attributional, we do the same for networking in order to be consistent.

Output column: operational_energy_kwh.

aws.Serverless

Provides an estimate of energy for the memory and vCPU usage of serverless services like Fargate or EMR. The default coefficients are taken from the Tailpipe methodology and can be overridden via configuration:

KeyDefaultDescription
memory_coefficient_kwh0.0000598kWh per GB of memory
arm_cpu_coefficient_kwh0.00191015625kWh per vCPU (ARM)
x86_cpu_coefficient_kwh0.0088121875kWh per vCPU (x86)

Output column: operational_energy_kwh.

OperationalEmissions

Computes operational emissions based on the energy usage, average carbon intensity factors and power_usage_effectiveness estimated by the preceding modules, based on the region. It also accounts for two additional overheads:

  • Power Supply Efficiency: The power lost between the data centre mains electricity and the server (default 1.04).
  • Power Transmission Losses: The power lost between the power station and the data centre mains electricity (default 1.08).

These two values can be overridden via configuration (powerSupplyEfficiency and powerTransmissionLosses).

operational_emissions_co2eq_g is equal to operational_energy_kwh * carbon_intensity * power_usage_effectiveness * powerSupplyEfficiency * powerTransmissionLosses.

Output columns: operational_emissions_co2eq_g.