Enrichment modules¶

SPRUCE generates its estimates by chaining EnrichmentModules. An EnrichmentModule is the unit of extension in SPRUCE. Each module reads columns from the CUR input row and/or from values set by earlier modules, then writes its results into a shared map. The pipeline materialises one output row per CUR row at the end, avoiding per-module row copies.

For instance, the AverageCarbonIntensity.java module applies average carbon intensity factors to energy estimates based on the region in order to generate operational emissions.

The list of columns generated by the modules can be found in the SpruceColumn class.

The enrichment modules are listed and configured in a configuration file, one per cloud provider. If no configuration is specified, SPRUCE uses the bundled default for the active provider (e.g. default-config-aws.json for AWS). See Configure the modules for instructions on how to modify the enrichment modules.

Cloud Carbon Footprint¶

The following modules implement the heuristics from the Cloud Carbon Footprint project.

ccf.aws.Storage¶

Provides an estimate of energy used for storage by applying a flat coefficient per Gb, following the approach used by the Cloud Carbon Footprint project. Service-specific replication factors are applied. See methodology for more details.

The HDD and SSD coefficients (in Wh per TB-hour) can be overridden via configuration:

Key	Default	Description
`hdd_coefficient_tb_h`	0.65	Energy per TB-hour for HDD storage
`ssd_coefficient_tb_h`	1.2	Energy per TB-hour for SSD storage

Output column: operational_energy_kwh.

ccf.azure.Storage¶

Provides an estimate of energy used for Azure storage by applying the same Cloud Carbon Footprint storage coefficients used by ccf.aws.Storage. Service-specific replication factors are applied. Managed disks are estimated from their provisioned capacity.

The HDD and SSD coefficients (in Wh per TB-hour) can be overridden via configuration using the same keys as ccf.aws.Storage:

Key	Default	Description
`hdd_coefficient_tb_h`	0.65	Energy per TB-hour for HDD storage
`ssd_coefficient_tb_h`	1.2	Energy per TB-hour for SSD storage

Output column: operational_energy_kwh.

ccf.aws.Accelerators¶

Provides an estimate of energy used by accelerators, following the approach used by the Cloud Carbon Footprint project. See methodology for more details.

Output column: operational_energy_kwh.

Boavizta¶

The following modules make use of the BoaviztAPI.

boavizta.aws.BoaviztAPI¶

Provides an estimate of final energy used for computation (EC2, OpenSearch, RDS) as well as the related embodied emissions using the BoaviztAPI.

Output columns: operational_energy_kwh, embodied_emissions_co2eq_g and embodied_adp_sbeq_g.

From https://doc.api.boavizta.org/Explanations/impacts/

Abiotic Depletion Potential (ADP) is an environmental impact indicator. This category corresponds to mineral and resources used and is, in this sense, mainly influenced by the rate of resources extracted. The effect of this consumption on their depletion is estimated according to their availability stock at a global scale. This impact category is divided into two components: a material component and a fossil fuels component (we use a version of ADP which include both). This impact is expressed in grams of antimony equivalent (gSbeq).

Source: sciencedirect

boavizta.aws.BoaviztAPIstatic¶

Similar to the previous module but does not get the information from an instance of the BoaviztAPI but from a static file generated from it. This makes it simpler to use SPRUCE.

Output columns: operational_energy_kwh, embodied_emissions_co2eq_g and embodied_adp_sbeq_g.

boavizta.azure.BoaviztAPI¶

Similar to the AWS equivalent but for Azure.

Output columns: operational_energy_kwh, embodied_emissions_co2eq_g and embodied_adp_sbeq_g.

boavizta.azure.BoaviztAPIstatic¶

Similar to the AWS equivalent but for Azure.

Output columns: operational_energy_kwh, embodied_emissions_co2eq_g and embodied_adp_sbeq_g.

EcoLogits¶

The following modules estimate the energy consumption and embodied emissions of LLM inference using static coefficients derived from the EcoLogits project.

ecologits.BedrockEcoLogits¶

Provides an estimate of energy consumption and embodied emissions for LLM inference on AWS Bedrock, based on static per-model coefficients from the EcoLogits project. This follows the same pattern as boavizta.aws.BoaviztAPIstatic: a static data file bundled in the JAR is loaded at initialisation time, and the module matches Bedrock CUR rows to per-model coefficients to compute energy usage and embodied emissions.

The module parses the line_item_usage_type field (format: {REGION}-{ModelKey}-{input|output}-tokens[-batch]) to extract both the model key and the token type, then normalises the token count from pricing_unit (handling real-world values such as 1K tokens or 1M tokens). Only output-token rows are scored — the EcoLogits methodology attributes ~all generation cost to the autoregressive output phase, so input-token rows are skipped.

Output columns: operational_energy_kwh and embodied_emissions_co2eq_g.

Batch size assumption: EcoLogits hardcodes a batch size of B=64 concurrent requests. The resulting coefficients are a mid-batch estimate — they underestimate energy for low-traffic scenarios and overestimate it for high-throughput batch inference (e.g. Bedrock Batch mode). Making B dynamic requires provider telemetry not available in billing data.

ember.AverageCarbonIntensity¶

Adds average carbon intensity factors derived from Ember's electricity data, distributed under the Creative Commons Attribution Licence (CC-BY-4.0). Values are keyed directly by cloud provider and region (e.g. aws:us-east-1). For regions in countries with sub-national data (currently the US and India), the carbon intensity is taken from the Ember value for the state hosting the data centre; otherwise the country-level Ember value is used.

The data is loaded from src/main/resources/ember/ember_co2_intensity.csv, which is generated from cloud_regions.json — see the scripts under scripts/ and the dedicated README for how to refresh it.

Output column: carbon_intensity.

Cloud region metadata¶

SPRUCE ships with src/main/resources/cloud_regions.json, a single JSON file listing the AWS, GCP, and Azure cloud regions together with their location (country, metro area, latitude/longitude), service status, and number of availability zones. It is the canonical source for the region-to-location mapping used by other modules and resource files (e.g. ember.AverageCarbonIntensity).

The file is produced in two steps by scripts under scripts/, see scripts/README.md for the full usage details.

RegionExtraction¶

Extracts the region information from the input and stores it in a standard location.

Output column: region.

AWS Module: com.digitalpebble.spruce.modules.aws.RegionExtraction Azure Module: com.digitalpebble.spruce.modules.azure.RegionExtraction

PWUE¶

Loads and stores both Power Usage Effectiveness (PUE) and Water Usage Effectiveness (WUE) factors from a single CSV resource file. This module replaces the previous separate PUE module and centralizes the loading of these efficiency factors.

The module uses the 2024 data published by AWS for Power Usage Effectiveness and the corresponding WUE values. For Azure, the source is https://datacenters.microsoft.com/sustainability/efficiency/.

The PWUE module supports both AWS and Azure providers and loads the appropriate resource file based on the provider: - AWS: aws-pue-wue.csv - Azure: azure-pue-wue.csv

The lookup logic follows this priority: 1. Exact region match (e.g., "us-east-1") 2. Regex pattern match (e.g., "us-.+") 3. Default configured value (fallback to 1.15 for PUE, null for WUE)

Output columns: power_usage_effectiveness and water_usage_effectiveness.

Water¶

Estimates water consumption associated with cloud usage, producing three columns:

water_cooling_l – the volume of water (in litres) used for data centre cooling. Computed as operational_energy_kwh × power_usage_effectiveness × WUE, where WUE (Water Usage Effectiveness) is the ratio of litres of water consumed for cooling per kWh of IT energy. The per-region WUE values are loaded by the PWUE module from the 2024 data published by AWS. The source for Azure is https://datacenters.microsoft.com/sustainability/efficiency/.
water_electricity_production_l – the volume of water (in litres) consumed during electricity generation to power the data centre. Computed as operational_energy_kwh × power_usage_effectiveness × WCF, where WCF (Water Consumption Factor) represents the litres of water consumed per kWh of electricity generated. The WCF values per electricity grid zone are sourced from the WRI methodology for calculating water use embedded in purchased electricity.
water_consumption_stress_area_l – the total water consumption (water_cooling_l + water_electricity_production_l) attributed to regions under high or extremely high water stress (Aqueduct 4.0 baseline water stress category ≥ 3). This field is only populated when the electricity grid zone for the region has a water stress category of 3 (High) or 4 (Extremely High); it is absent otherwise. Water stress categories are derived from the WRI Aqueduct 4.0 dataset. The World Resource Institute's Aqueduct tool is licensed through Creative Commons. The data has been extracted and mapped to cloud provider region codes.

Output columns: water_cooling_l, water_electricity_production_l, and water_consumption_stress_area_l.

aws.Networking¶

Provides an estimate of energy used for networking in and out of data centres. The module distinguishes between three transfer types with separate coefficients (in kWh/Gb):

Transfer type	Key	Default	Description
Intra-region	`intra`	0.001	Traffic within the same region
Inter-region	`inter`	0.0015	Traffic between AWS regions
External	`extra`	0.059	Traffic to/from the internet (AWS Inbound / Outbound)

The coefficients are taken from the Boavizta Cloud Emissions Working Group and can be overridden via the network_coefficients_kwh_gb configuration map.

The relevance and usefulness of attributing emissions for networking based on usage is subject for debate as the energy use of networking is pretty constant independently of traffic. The consequences of reducing networking are probably negligible but since the approach in SPRUCE is attributional, we do the same for networking in order to be consistent.

Output column: operational_energy_kwh.

azure.Networking¶

Same as above but for Azure.

Output column: operational_energy_kwh.

aws.Serverless¶

Provides an estimate of energy for the memory and vCPU usage of serverless services like Fargate or EMR. The default coefficients are taken from the Tailpipe methodology and can be overridden via configuration:

Key	Default	Description
`memory_coefficient_kwh`	0.0000598	kWh per GB of memory
`arm_cpu_coefficient_kwh`	0.00191015625	kWh per vCPU (ARM)
`x86_cpu_coefficient_kwh`	0.0088121875	kWh per vCPU (x86)

Output column: operational_energy_kwh.

OperationalEmissions¶

Computes operational emissions based on the energy usage, average carbon intensity factors and power_usage_effectiveness estimated by the preceding modules, based on the region. It also accounts for two additional overheads:

Power Supply Efficiency: The power lost between the data centre mains electricity and the server (default 1.04).
Power Transmission Losses: The power lost between the power station and the data centre mains electricity (default 1.08).

These two values can be overridden via configuration (powerSupplyEfficiency and powerTransmissionLosses).

operational_emissions_co2eq_g is equal to operational_energy_kwh * carbon_intensity * power_usage_effectiveness * powerSupplyEfficiency * powerTransmissionLosses.

Output columns: operational_emissions_co2eq_g.