Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Modules configuration

The enrichment modules are configured in a per-provider JSON file bundled in the JAR. The file used at runtime is selected from the cloud provider (-p / --provider CLI flag, defaulting to AWS):

Provider flagResource file
AWSdefault-config-aws.json
GOOGLEdefault-config-google.json (when available)
AZUREdefault-config-azure.json (when available)

The AWS default looks like this:

{
  "modules": [
    {
      "className": "com.digitalpebble.spruce.modules.RegionExtraction"
    },
    {
      "className": "com.digitalpebble.spruce.modules.ccf.aws.Storage",
      "config": {
        "hdd_coefficient_tb_h": 0.65,
        "ssd_coefficient_tb_h": 1.2
      }
    },
    {
      "className": "com.digitalpebble.spruce.modules.aws.Networking",
      "config": {
        "network_coefficients_kwh_gb": {
          "intra": 0.001,
          "inter": 0.0015,
          "extra": 0.059
        }
      }
    },
    {
      "className": "com.digitalpebble.spruce.modules.boavizta.aws.BoaviztAPIstatic"
    },
    {
      "className": "com.digitalpebble.spruce.modules.aws.Serverless",
      "config": {
        "memory_coefficient_kwh": 0.0000598,
        "x86_cpu_coefficient_kwh": 0.0088121875,
        "arm_cpu_coefficient_kwh": 0.00191015625
      }
    },
    {
      "className": "com.digitalpebble.spruce.modules.ccf.aws.Accelerators",
      "config": {
        "gpu_utilisation_percent": 50
      }
    },
    {
      "className": "com.digitalpebble.spruce.modules.ecologits.BedrockEcoLogits"
    },
    {
      "className": "com.digitalpebble.spruce.modules.PUE",
      "config": {
        "default": 1.15
      }
    },
    {
      "className": "com.digitalpebble.spruce.modules.Water"
    },
    {
      "className": "com.digitalpebble.spruce.modules.ember.AverageCarbonIntensity"
    },
    {
      "className": "com.digitalpebble.spruce.modules.OperationalEmissions",
      "config": {
        "powerSupplyEfficiency": 1.04,
        "powerTransmissionLosses": 1.08
      }
    }
  ]
}

This determines which modules are used and in what order but also configures their behaviour. For instance, the Networking module uses different coefficients for intra-region, inter-region, and external data transfers, all configurable via the network_coefficients_kwh_gb map.

Change the configuration

In order to use a different configuration, for instance to replace a module with another one, or change their configuration (like the network coefficient above), you simply need to write a json file with your changes and pass it as an argument to the Spark job with -c. A custom config passed via -c overrides the per-provider default.

-p still applies when you pass -c: provider-aware modules (such as Water or AverageCarbonIntensity) need it to pick the correct region-keyed lookups. If your custom config targets Azure, pass -p AZURE alongside -c so those modules don’t fall back to the AWS default.

Selecting a provider

If you do not pass -c, SPRUCE picks the bundled config matching the provider:

spark-submit --class com.digitalpebble.spruce.SparkJob ./target/spruce-*.jar \
  -i ./curs -o ./output -p AWS

-p defaults to AWS, so existing AWS workflows do not need to change.