Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Explore the results

Automated report

The easiest way to explore SPRUCE output is report.py, a Python script that reads enriched Parquet files, runs all the analyses described on this page automatically, and writes a formatted report — no SQL required.

Installation

pip install -r requirements-report.txt   # duckdb (+ markdown, weasyprint for html/pdf)

Usage

# Markdown to stdout
python report.py -i output/

# Write to a file — format is inferred from the suffix
python report.py -i output/ -o report.md
python report.py -i output/ -o report.html
python report.py -i output/ -o report.pdf

# Read directly from S3 (uses ambient AWS credentials)
python report.py -i s3://my-bucket/spruced/ -o report.html
FlagDefaultDescription
-i / --inputrequiredLocal directory, glob, or S3 URI
-o / --outputstdoutOutput file; format inferred from suffix (.md, .html, .pdf)
--top-tags10Maximum number of resource tags offered for interactive breakdown

What it produces

The report covers the following sections, drawn from the queries documented below:

SectionWhat it shows
Summary by Billing PeriodEnergy (kWh), operational CO₂ (kg), embodied CO₂ (kg), water usage (l)
Top Emitters by ServiceTop 20 product/service/operation combinations by operational CO₂
Top Instance TypesTop 20 instance families by operational + embodied CO₂
Coverage% of unblended costs that have emissions data; top 20 uncovered services by cost
Regional AnalysisCO₂, energy, carbon intensity, water, PUE, and gCO₂/$ per AWS region
Tag BreakdownInteractive: emissions split by any resource tag present in the data
RecommendationsAutomatically generated from coverage gaps, regional carbon intensity, instance families, and billing trends

After the fixed sections, the script scans resource_tags and presents an interactive menu of the most consistently used tag keys, ordered by the percentage of line items that carry a non-empty value. Select a tag to see the emissions breakdown by tag value; press Enter when done. If no tags are found, the report states this clearly.

Recommendations are generated automatically from the data:

SignalConditionMessage
Coverage gap< 80 % of costs coveredLists top uncovered services by cost
Carbon intensityRegion > 300 gCO₂/kWh with a lower-intensity alternative presentSuggests migration
Instance familyTop emitter is an x86 family with a Graviton equivalentNames the Graviton replacement
Emissions trend≥ 2 billing periods, last > first × 1.10Flags the percentage increase

Manual queries

As an alternative to the script above, you can also write equivalent SQL queries using DuckDB locally (or Athena if the output was written to S3):

Breakdown by billing period

create table enriched_curs as select * from 'output/**/*.parquet';

select 
       BILLING_PERIOD,
       round(sum(operational_emissions_co2eq_g) / 1000, 2) as co2_usage_kg,
       round(sum(embodied_emissions_co2eq_g) / 1000, 2) as co2_embodied_kg,
       round(sum(operational_energy_kwh),2) as energy_usage_kwh
       from enriched_curs
       group by BILLING_PERIOD
       order by BILLING_PERIOD;

This should give an output similar to

BILLING_PERIODco2_usage_kgco2_embodied_kgenergy_usage_kwh
2025-05863.5189.732029.15
2025-06774.0185.011811.98
2025-07812.0787.191901.13
2025-08848.788.151982.56
2025-09866.2486.762017.36

Breakdown per product, service and operation

select line_item_product_code, product_servicecode, line_item_operation,
       round(sum(operational_emissions_co2eq_g)/1000,2) as co2_usage_kg,
       round(sum(embodied_emissions_co2eq_g)/1000, 2) as co2_embodied_kg,
       round(sum(operational_energy_kwh),2) as energy_usage_kwh
       from enriched_curs where operational_emissions_co2eq_g > 0.01
       group by all
       order by 4 desc, 5 desc, 6 desc, 2;

This should give an output similar to

line_item_product_codeproduct_servicecodeline_item_operationco2_usage_kgco2_embodied_kgenergy_usage_kwh
AmazonECSAmazonECSFargateTask1499.93NULL3784.91
AmazonEC2AmazonEC2RunInstances1365.7433.573068.87
AmazonS3AmazonS3GlacierInstantRetrievalStorage554.85NULL1224.13
AmazonS3AmazonS3OneZoneIAStorage249.22NULL548.48
AmazonS3AmazonS3StandardStorage210.57NULL469.54
AmazonEC2AmazonEC2CreateVolume-Gp3102.27NULL230.39
AmazonEC2AmazonEC2RunInstances:SV00166.413.27146.15
AmazonDocDBAmazonDocDBCreateCluster49.93NULL109.89
AmazonS3AmazonS3IntelligentTieringAIAStorage17.02NULL37.47
AmazonEC2AmazonEC2CreateVolume-Gp211.03NULL34.37
AmazonS3AmazonS3StandardIAStorage9.02NULL19.85
AmazonEC2AmazonEC2CreateSnapshot8.05NULL20.6
AmazonECRAmazonECRTimedStorage-ByteHrs6.96NULL15.31
AmazonEC2AWSDataTransferRunInstances2.8NULL6.25
AmazonS3AmazonS3OneZoneIASizeOverhead2.23NULL4.9
AmazonS3AWSDataTransferGetObjectForRepl1.84NULL4.06
AmazonS3AWSDataTransferUploadPartForRepl1.66NULL3.64
AmazonS3AmazonS3DeleteObject1.45NULL3.2
AmazonMQAmazonMQCreateBroker:00011.16NULL2.55
AmazonECRAmazonECREUW2-TimedStorage-ByteHrs0.43NULL2.16
AmazonS3AmazonS3StandardIASizeOverhead0.3NULL0.65
AmazonS3AWSDataTransferPutObjectForRepl0.19NULL0.42
AWSBackupAWSBackupStorage0.13NULL0.64
AmazonS3AmazonS3GlacierDeepArchiveDeepArchiveStorage0.1NULL0.22
AmazonS3AWSDataTransferPutObject0.08NULL0.17
AmazonECRAWSDataTransferdownloadLayer0.07NULL0.19
AmazonEC2AWSDataTransferPublicIP-In0.06NULL0.14
AmazonCloudWatchAmazonCloudWatchHourlyStorageMetering0.02NULL0.05
AmazonEFSAmazonEFSStorage0.01NULL0.03

Cost coverage

To measure the proportion of the costs for which emissions were calculated

select
  round(covered * 100 / "total costs", 2) as percentage_costs_covered
from (
  select
    sum(line_item_unblended_cost) as "total costs",
    sum(line_item_unblended_cost) filter (where operational_emissions_co2eq_g is not null) as covered
  from
    enriched_curs
  where
    line_item_line_item_type like '%Usage'
);

The figure will vary depending on the services you use. We have measured up to 77% coverage for some users.

Breakdown per region

with agg as (
    select
        region as region_code,
        sum(operational_emissions_co2eq_g) as operational_emissions_g,
        sum(embodied_emissions_co2eq_g) as embodied_emissions_g,
        sum(operational_energy_kwh) as energy_kwh,
        sum(pricing_public_on_demand_cost) as public_cost,
        avg(carbon_intensity) as avg_carbon_intensity,
        avg(power_usage_effectiveness) as pue
    from enriched_curs
    where operational_emissions_co2eq_g is not null
    group by 1
)
select
    region_code,
    round(operational_emissions_g / 1000, 2) as co2_usage_kg,
    round(energy_kwh, 2) as energy_usage_kwh,
    round(avg_carbon_intensity, 2) as carbon_intensity,
    round(pue,2) as pue,
    round((operational_emissions_g + embodied_emissions_g) / public_cost, 2) as g_co2_per_dollar
from agg
order by energy_usage_kwh desc, co2_usage_kg desc, region_code desc;

g_co2_per_dollar being the total emissions (usage + embodied) divided by the public on demand cost.

Below is an example of what the results might look like.

region_codeco2_usage_kgenergy_usage_kwhcarbon_intensitypueg_co2_per_dollar
us-east-19292.0517969.69400.331.1533.94
us-east-21569.793089.49400.331.13164.47
eu-west-2583.542674.09175.031.1122.32
eu-north-10.3513.9520.421.16.84

Breakdown per user tag

User tags are how environmental impacts can be allocated to a business unit, team, product, environment etc… It is as fundamental for a GreenOps practice as it is for FinOps. By enriching data at the finest possible level, SPRUCE allows to aggregate the impacts by the tags that are relevant for a given organisation. The syntax to do so for a tag cost_category_top_level would be for instance

select resource_tags['cost_category_top_level'],
       round(sum(operational_energy_kwh),2) as energie_kwh,
       round(sum(operational_emissions_co2eq_g) / 1000, 2) as operational_kg,
       round(sum(embodied_emissions_co2eq_g) / 1000, 2)    as embodied_kg
       from enriched_curs
       group by 1
       order by 3 desc;