Enrichment modules
SPRUCE generates its estimates by chaining EnrichmentModules.
An EnrichmentModule is the unit of extension in SPRUCE. Each module reads columns from
the CUR input row and/or from values set by earlier modules, then writes its results into a
shared map. The pipeline materialises one output row per CUR row at the end, avoiding
per-module row copies.
For instance, the AverageCarbonIntensity.java module applies average carbon intensity factors to energy estimates based on the region in order to generate operational emissions.
The list of columns generated by the modules can be found in the SpruceColumn class.
The enrichment modules are listed and configured in a configuration file. If no configuration is specified, the default one is used. See Configure the modules for instructions on how to modify the enrichment modules.
Cloud Carbon Footprint
The following modules implement the heuristics from the Cloud Carbon Footprint project.
ccf.Storage
Provides an estimate of energy used for storage by applying a flat coefficient per Gb, following the approach used by the Cloud Carbon Footprint project. Service-specific replication factors are applied. See methodology for more details.
Output column: operational_energy_kwh.
ccf.Networking
Provides an estimate of energy used for networking in and out of data centres. Applies a flat coefficient of 0.001 kWh/Gb by default, see methodology for more details. The coefficient can be changed via configuration as shown in Configure the modules.
Note: this module has been replaced by Networking in the default configuration, which distinguishes between transfer types.
Output column: operational_energy_kwh.
ccf.Accelerators
Provides an estimate of energy used by accelerators, following the approach used by the Cloud Carbon Footprint project. See methodology for more details.
Output column: operational_energy_kwh.
Boavizta
The following modules make use of the BoaviztAPI.
boavizta.BoaviztAPI
Provides an estimate of final energy used for computation (EC2, OpenSearch, RDS) as well as the related embodied emissions using the BoaviztAPI.
Output columns: operational_energy_kwh, embodied_emissions_co2eq_g and embodied_adp_sbeq_g.
From https://doc.api.boavizta.org/Explanations/impacts/
Abiotic Depletion Potential (ADP) is an environmental impact indicator. This category corresponds to mineral and resources used and is, in this sense, mainly influenced by the rate of resources extracted. The effect of this consumption on their depletion is estimated according to their availability stock at a global scale. This impact category is divided into two components: a material component and a fossil fuels component (we use a version of ADP which include both). This impact is expressed in grams of antimony equivalent (gSbeq).
Source: sciencedirect
boavizta.BoaviztAPIstatic
Similar to the previous module but does not get the information from an instance of the BoaviztAPI but from a static file generated from it. This makes it simpler to use SPRUCE.
Output columns: operational_energy_kwh, embodied_emissions_co2eq_g and embodied_adp_sbeq_g.
EcoLogits
The following modules estimate the energy consumption and embodied emissions of LLM inference using static coefficients derived from the EcoLogits project.
ecologits.BedrockEcoLogits
Provides an estimate of energy consumption and embodied emissions for LLM inference on AWS Bedrock, based on static per-model coefficients from the EcoLogits project. This follows the same pattern as boavizta.BoaviztAPIstatic: a static data file bundled in the JAR is loaded at initialisation time, and the module matches Bedrock CUR rows to per-model coefficients to compute energy usage and embodied emissions.
The module reads the model identifier from the product map in the CUR row and normalises the token count from pricing_unit (handling real-world values such as 1K tokens or 1M tokens). It uses the line_item_usage_type field to distinguish between input and output tokens, falling back to a ratio split when the usage type is ambiguous.
Output columns: operational_energy_kwh and embodied_emissions_co2eq_g.
Batch size assumption: EcoLogits hardcodes a batch size of
B=64concurrent requests. The resulting coefficients are a mid-batch estimate — they underestimate energy for low-traffic scenarios and overestimate it for high-throughput batch inference (e.g. Bedrock Batch mode). MakingBdynamic requires provider telemetry not available in billing data.
electricitymaps.AverageCarbonIntensity
Adds average carbon intensity factors generated from ElectricityMaps’ 2024 datasets. The life-cycle emission factors are used.
Output column: carbon_intensity.
RegionExtraction
Extracts the region information from the input and stores it in a standard location.
Output column: region.
PUE
Uses the 2024 data published by AWS for Power Usage Effectiveness to rows for which energy usage has been estimated. This provides a more accurate and up to date approach than the flat rate approach in the CCF methodology.
Output column: power_usage_effectiveness.
Water
Estimates water consumption associated with cloud usage, producing three columns:
-
water_cooling_l– the volume of water (in litres) used for data centre cooling. Computed asoperational_energy_kwh×power_usage_effectiveness× WUE, where WUE (Water Usage Effectiveness) is the ratio of litres of water consumed for cooling per kWh of IT energy. The per-region WUE values come from the 2024 data published by AWS. -
water_electricity_production_l– the volume of water (in litres) consumed during electricity generation to power the data centre. Computed asoperational_energy_kwh×power_usage_effectiveness× WCF, where WCF (Water Consumption Factor) represents the litres of water consumed per kWh of electricity generated. The WCF values per electricity grid zone are sourced from the WRI methodology for calculating water use embedded in purchased electricity. -
water_consumption_stress_area_l– the total water consumption (water_cooling_l+water_electricity_production_l) attributed to regions under high or extremely high water stress (Aqueduct 4.0 baseline water stress category ≥ 3). This field is only populated when the electricity grid zone for the region has a water stress category of 3 (High) or 4 (Extremely High); it is absent otherwise. Water stress categories are derived from the WRI Aqueduct 4.0 dataset. The World Resource Institute’s Aqueduct tool is licensed through Creative Commons. The data has been extracted and mapped to the ElectricityMaps region code.
Output columns: water_cooling_l, water_electricity_production_l, and water_consumption_stress_area_l.
Networking
Provides an estimate of energy used for networking in and out of data centres. Unlike ccf.Networking which applies a single flat coefficient, this module distinguishes between three transfer types with separate coefficients (in kWh/Gb):
| Transfer type | Key | Default | Description |
|---|---|---|---|
| Intra-region | intra | 0.001 | Traffic within the same region |
| Inter-region | inter | 0.0015 | Traffic between AWS regions |
| External | extra | 0.059 | Traffic to/from the internet (AWS Inbound / Outbound) |
The coefficients are taken from the Boavizta Cloud Emissions Working Group and can be overridden via the network_coefficients_kwh_gb configuration map.
The relevance and usefulness of attributing emissions for networking based on usage is subject for debate as the energy use of networking is pretty constant independently of traffic. The consequences of reducing networking are probably negligible but since the approach in SPRUCE is attributional, we do the same for networking in order to be consistent.
Output column: operational_energy_kwh.
Serverless
Provides an estimate of energy for the memory and vCPU usage of serverless services like Fargate or EMR. The default coefficients are taken from the Tailpipe methodology.
Output column: operational_energy_kwh.
OperationalEmissions
Computes operational emissions based on the energy usage, average carbon intensity factors and power_usage_effectiveness estimated by the preceding modules, based on the region. It also accounts for two additional overheads:
- Power Supply Efficiency: The power lost between the data centre mains electricity and the server (default
1.04). - Power Transmission Losses: The power lost between the power station and the data centre mains electricity (default
1.08).
These two values can be overridden via configuration (powerSupplyEfficiency and powerTransmissionLosses).
operational_emissions_co2eq_g is equal to operational_energy_kwh * carbon_intensity * power_usage_effectiveness * powerSupplyEfficiency * powerTransmissionLosses.
Output columns: operational_emissions_co2eq_g.