Skip to content

Enrichment modules

SPRUCE generates its estimates by chaining EnrichmentModules. An EnrichmentModule is the unit of extension in SPRUCE. Each module reads columns from the CUR input row and/or from values set by earlier modules, then writes its results into a shared map. The pipeline materialises one output row per CUR row at the end, avoiding per-module row copies.

For instance, the AverageCarbonIntensity.java module applies average carbon intensity factors to energy estimates based on the region in order to generate operational emissions.

The list of columns generated by the modules can be found in the SpruceColumn class.

The enrichment modules are listed and configured in a configuration file, one per cloud provider. If no configuration is specified, SPRUCE uses the bundled default for the active provider (e.g. default-config-aws.json for AWS). See Configure the modules for instructions on how to modify the enrichment modules.

Cloud Carbon Footprint

The following modules implement the heuristics from the Cloud Carbon Footprint project.

ccf.aws.Storage

Provides an estimate of energy used for storage by applying a flat coefficient per Gb, following the approach used by the Cloud Carbon Footprint project. Service-specific replication factors are applied. See methodology for more details.

The HDD and SSD coefficients (in Wh per TB-hour) can be overridden via configuration:

Key Default Description
hdd_coefficient_tb_h 0.65 Energy per TB-hour for HDD storage
ssd_coefficient_tb_h 1.2 Energy per TB-hour for SSD storage

Output column: operational_energy_kwh.

ccf.azure.Storage

Provides an estimate of energy used for Azure storage by applying the same Cloud Carbon Footprint storage coefficients used by ccf.aws.Storage. Service-specific replication factors are applied. Managed disks are estimated from their provisioned capacity.

The HDD and SSD coefficients (in Wh per TB-hour) can be overridden via configuration using the same keys as ccf.aws.Storage:

Key Default Description
hdd_coefficient_tb_h 0.65 Energy per TB-hour for HDD storage
ssd_coefficient_tb_h 1.2 Energy per TB-hour for SSD storage

Output column: operational_energy_kwh.

ccf.aws.Accelerators

Provides an estimate of energy used by accelerators, following the approach used by the Cloud Carbon Footprint project. See methodology for more details.

Output column: operational_energy_kwh.

Boavizta

The following modules make use of the BoaviztAPI.

boavizta.aws.BoaviztAPI

Provides an estimate of final energy used for computation (EC2, OpenSearch, RDS) as well as the related embodied emissions using the BoaviztAPI.

Output columns: operational_energy_kwh, embodied_emissions_co2eq_g and embodied_adp_sbeq_g.

From https://doc.api.boavizta.org/Explanations/impacts/

Abiotic Depletion Potential (ADP) is an environmental impact indicator. This category corresponds to mineral and resources used and is, in this sense, mainly influenced by the rate of resources extracted. The effect of this consumption on their depletion is estimated according to their availability stock at a global scale. This impact category is divided into two components: a material component and a fossil fuels component (we use a version of ADP which include both). This impact is expressed in grams of antimony equivalent (gSbeq).

Source: sciencedirect

boavizta.aws.BoaviztAPIstatic

Similar to the previous module but does not get the information from an instance of the BoaviztAPI but from a static file generated from it. This makes it simpler to use SPRUCE.

Output columns: operational_energy_kwh, embodied_emissions_co2eq_g and embodied_adp_sbeq_g.

boavizta.azure.BoaviztAPI

Similar to the AWS equivalent but for Azure.

Output columns: operational_energy_kwh, embodied_emissions_co2eq_g and embodied_adp_sbeq_g.

boavizta.azure.BoaviztAPIstatic

Similar to the AWS equivalent but for Azure.

Output columns: operational_energy_kwh, embodied_emissions_co2eq_g and embodied_adp_sbeq_g.

EcoLogits

The following modules estimate the energy consumption and embodied emissions of LLM inference using static coefficients derived from the EcoLogits project.

ecologits.BedrockEcoLogits

Provides an estimate of energy consumption and embodied emissions for LLM inference on AWS Bedrock, based on static per-model coefficients from the EcoLogits project. This follows the same pattern as boavizta.aws.BoaviztAPIstatic: a static data file bundled in the JAR is loaded at initialisation time, and the module matches Bedrock CUR rows to per-model coefficients to compute energy usage and embodied emissions.

The module parses the line_item_usage_type field (format: {REGION}-{ModelKey}-{input|output}-tokens[-batch]) to extract both the model key and the token type, then normalises the token count from pricing_unit (handling real-world values such as 1K tokens or 1M tokens). Only output-token rows are scored β€” the EcoLogits methodology attributes ~all generation cost to the autoregressive output phase, so input-token rows are skipped.

Output columns: operational_energy_kwh and embodied_emissions_co2eq_g.

Batch size assumption: EcoLogits hardcodes a batch size of B=64 concurrent requests. The resulting coefficients are a mid-batch estimate β€” they underestimate energy for low-traffic scenarios and overestimate it for high-throughput batch inference (e.g. Bedrock Batch mode). Making B dynamic requires provider telemetry not available in billing data.

ember.AverageCarbonIntensity

Adds average carbon intensity factors derived from Ember's electricity data, distributed under the Creative Commons Attribution Licence (CC-BY-4.0). Values are keyed directly by cloud provider and region (e.g. aws:us-east-1). For regions in countries with sub-national data (currently the US and India), the carbon intensity is taken from the Ember value for the state hosting the data centre; otherwise the country-level Ember value is used.

The data is loaded from src/main/resources/ember/ember_co2_intensity.csv, which is generated from cloud_regions.json β€” see the scripts under scripts/ and the dedicated README for how to refresh it.

Output column: carbon_intensity.

Cloud region metadata

SPRUCE ships with src/main/resources/cloud_regions.json, a single JSON file listing the AWS, GCP, and Azure cloud regions together with their location (country, metro area, latitude/longitude), service status, and number of availability zones. It is the canonical source for the region-to-location mapping used by other modules and resource files (e.g. ember.AverageCarbonIntensity).

The file is produced in two steps by scripts under scripts/, see scripts/README.md for the full usage details.

RegionExtraction

Extracts the region information from the input and stores it in a standard location.

Output column: region.

AWS Module: com.digitalpebble.spruce.modules.aws.RegionExtraction Azure Module: com.digitalpebble.spruce.modules.azure.RegionExtraction

PWUE

Loads and stores both Power Usage Effectiveness (PUE) and Water Usage Effectiveness (WUE) factors from a single CSV resource file. This module replaces the previous separate PUE module and centralizes the loading of these efficiency factors.

The module uses the 2024 data published by AWS for Power Usage Effectiveness and the corresponding WUE values. For Azure, the source is https://datacenters.microsoft.com/sustainability/efficiency/.

The PWUE module supports both AWS and Azure providers and loads the appropriate resource file based on the provider: - AWS: aws-pue-wue.csv - Azure: azure-pue-wue.csv

The lookup logic follows this priority: 1. Exact region match (e.g., "us-east-1") 2. Regex pattern match (e.g., "us-.+") 3. Default configured value (fallback to 1.15 for PUE, null for WUE)

Output columns: power_usage_effectiveness and water_usage_effectiveness.

Water

Estimates water consumption associated with cloud usage, producing three columns:

  • water_cooling_l – the volume of water (in litres) used for data centre cooling. Computed as operational_energy_kwh Γ— power_usage_effectiveness Γ— WUE, where WUE (Water Usage Effectiveness) is the ratio of litres of water consumed for cooling per kWh of IT energy. The per-region WUE values are loaded by the PWUE module from the 2024 data published by AWS. The source for Azure is https://datacenters.microsoft.com/sustainability/efficiency/.

  • water_electricity_production_l – the volume of water (in litres) consumed during electricity generation to power the data centre. Computed as operational_energy_kwh Γ— power_usage_effectiveness Γ— WCF, where WCF (Water Consumption Factor) represents the litres of water consumed per kWh of electricity generated. The WCF values per electricity grid zone are sourced from the WRI methodology for calculating water use embedded in purchased electricity.

  • water_consumption_stress_area_l – the total water consumption (water_cooling_l + water_electricity_production_l) attributed to regions under high or extremely high water stress (Aqueduct 4.0 baseline water stress category β‰₯ 3). This field is only populated when the electricity grid zone for the region has a water stress category of 3 (High) or 4 (Extremely High); it is absent otherwise. Water stress categories are derived from the WRI Aqueduct 4.0 dataset. The World Resource Institute's Aqueduct tool is licensed through Creative Commons. The data has been extracted and mapped to cloud provider region codes.

Output columns: water_cooling_l, water_electricity_production_l, and water_consumption_stress_area_l.

aws.Networking

Provides an estimate of energy used for networking in and out of data centres. The module distinguishes between three transfer types with separate coefficients (in kWh/Gb):

Transfer type Key Default Description
Intra-region intra 0.001 Traffic within the same region
Inter-region inter 0.0015 Traffic between AWS regions
External extra 0.059 Traffic to/from the internet (AWS Inbound / Outbound)

The coefficients are taken from the Boavizta Cloud Emissions Working Group and can be overridden via the network_coefficients_kwh_gb configuration map.

The relevance and usefulness of attributing emissions for networking based on usage is subject for debate as the energy use of networking is pretty constant independently of traffic. The consequences of reducing networking are probably negligible but since the approach in SPRUCE is attributional, we do the same for networking in order to be consistent.

Output column: operational_energy_kwh.

azure.Networking

Same as above but for Azure.

Output column: operational_energy_kwh.

aws.Serverless

Provides an estimate of energy for the memory and vCPU usage of serverless services like Fargate or EMR. The default coefficients are taken from the Tailpipe methodology and can be overridden via configuration:

Key Default Description
memory_coefficient_kwh 0.0000598 kWh per GB of memory
arm_cpu_coefficient_kwh 0.00191015625 kWh per vCPU (ARM)
x86_cpu_coefficient_kwh 0.0088121875 kWh per vCPU (x86)

Output column: operational_energy_kwh.

OperationalEmissions

Computes operational emissions based on the energy usage, average carbon intensity factors and power_usage_effectiveness estimated by the preceding modules, based on the region. It also accounts for two additional overheads:

  • Power Supply Efficiency: The power lost between the data centre mains electricity and the server (default 1.04).
  • Power Transmission Losses: The power lost between the power station and the data centre mains electricity (default 1.08).

These two values can be overridden via configuration (powerSupplyEfficiency and powerTransmissionLosses).

operational_emissions_co2eq_g is equal to operational_energy_kwh * carbon_intensity * power_usage_effectiveness * powerSupplyEfficiency * powerTransmissionLosses.

Output columns: operational_emissions_co2eq_g.