With the release of Kedro v0.19, the OmegaConfigLoader became the default configuration loader. This loader introduces enhanced capabilities for complex configurations and variable interpolation. By leveraging OmegaConfigLoader, values defined in your parameters file or global configuration can be seamlessly incorporated in your catalog.yml without hardcoding them.
The concept behind OmegaConfigLoader is straightforward: it allows you to define configuration variables externally (e.g., in parameters.yml or globals.yml) and reference them using an interpolation syntax. This method minimizes redundancy, decreases hardcoding, and increases the flexibility of your Kedro project configuration by providing more robust environment and runtime parameter management.
To resolve the interpolation issue where you receive an InterpolationKeyError when trying to reference parameters in your catalog.yml, follow these steps carefully:
Based on Kedro's recommendations, begin by ensuring that your project is configured to use OmegaConfigLoader. In your project's settings.py file, explicitly configure the loader:
# settings.py
from kedro.config import OmegaConfigLoader
# Setting the config loader to OmegaConfigLoader enables advanced features, including interpolation.
CONFIG_LOADER_CLASS = OmegaConfigLoader
This small snippet ensures that Kedro makes full use of the capabilities provided by OmegaConfigLoader. Once this is in place, you can use the interpolation syntax defined by the Omega configuration system.
Ensure your parameters are defined in the correct location:
Kedro typically expects the parameters file in either conf/base/parameters.yml
or conf/local/parameters.yml
. Your parameters.yml file might look like this:
# parameters.yml
mode: air
observation_date: '2024-01-01'
This file contains key-value pairs and should be formatted correctly to avoid any accidental parsing errors.
The most common interpolation patterns in Kedro include using ${parameters:key}
or ${params:key}
. However, the most consistent approach, particularly when using global values, is to utilize direct variable substitution if you have set them in a globals.yml file. There are two common approaches:
In your catalog.yml, if you prefer to reference values straight from parameters.yml, the correct syntax is:
# catalog_actuals.yml
mode: ${parameters.mode}
observation_date: ${parameters.observation_date}
However, ensure that your Kedro version supports this directly and that the parameters file is being read properly by OmegaConfigLoader.
Another recommended method is to use a globals.yml file. Define your global configuration settings that are accessible across multiple configuration parts:
# globals.yml (placed in conf/base)
mode: air
observation_date: '2024-01-01'
Then, in your catalog.yml, simply reference these variables without the prefix:
# catalog_actuals.yml
mode: ${mode}
observation_date: ${observation_date}
dbo_vActuals:
type: kedro_datasets.pandas.SQLQueryDataset
sql: >
SELECT *
FROM table1
WHERE mode = ?
AND statDate >= DATEADD(month, -25, ?)
load_args:
params:
- ${mode}
- ${observation_date}
This approach simplifies variable usage and leverages OmegaConfigLoader’s ability to perform global interpolations. Keep in mind that when using globals, the file should be named correctly and placed in an accessible configuration directory.
Debugging interpolation errors in YAML files, such as the InterpolationKeyError
, can be challenging since traditional breakpoints do not work in YAML files. However, here are some troubleshooting suggestions:
One way to diagnose parameter issues is to log the parameter values within your Kedro pipeline nodes. Insert logging statements at the start of your node functions to print out the loaded parameters:
# Example in a Kedro node
def some_node_function(params):
import logging
logging.info(f"Loaded mode: {params['mode']}")
logging.info(f"Loaded observation_date: {params['observation_date']}")
# proceed with node logic
This practice can help you verify if the parameters have been correctly assigned from the configuration files.
When running your pipeline, include the --verbose
option:
kedro run --verbose
This flag will output additional debugging information, making it easier to spot if any configuration files were not read properly or if an interpolation key is missing.
Ensure that your configuration files are stored correctly:
conf/base/parameters.yml
or conf/local/parameters.yml
.conf/base/globals.yml
.conf/base/catalog_actuals.yml
or another appropriate directory.Even small typos, extra spaces, or indentation errors can cause the loader to fail during interpolation.
Kedro allows you to override parameters at runtime using the CLI. For example, you might run:
kedro run --params="mode=production,observation_date=2024-02-01"
Runtime parameters can help verify that the interpolation is in fact working by seeing if the runtime values override those defined in your configuration files.
To better understand the differences and use cases for parameter interpolation, review the table below which contrasts the two configurations.
Aspect | Using parameters.yml |
Using globals.yml |
---|---|---|
File Location | conf/base/parameters.yml |
conf/base/globals.yml |
Interpolation Syntax in catalog.yml | ${parameters.mode} and ${parameters.observation_date} |
${mode} and ${observation_date} |
Usage Context | Directly for node-level parameters | More global, accessible across multiple config files |
Override Capability | Supports runtime modifications using --params |
Often combined with runtime parameters, but managed centrally |
Complexity | Simpler invocation | Requires a proper globals file and additional configuration awareness |
Here’s how you can structure your Kedro project's configuration to avoid hardcoding and ensure effective interpolation:
# conf/base/globals.yml
mode: air
observation_date: '2024-01-01'
Place this file in your conf/base
directory so that it is automatically loaded as part of the global configuration.
# conf/base/catalog_actuals.yml
mode: ${mode}
observation_date: ${observation_date}
dbo_vActuals:
type: kedro_datasets.pandas.SQLQueryDataset
sql: >
SELECT *
FROM table1
WHERE mode = ?
AND statDate >= DATEADD(month, -25, ?)
load_args:
params:
- ${mode}
- ${observation_date}
Notice how the ${mode}
and ${observation_date}
references directly tap into the global configuration values.
If you need to override these values without editing configuration files, execute your pipeline with runtime parameters:
kedro run --params="mode=production,observation_date=2024-02-01"
This command helps ensure that the parameters are dynamically adjusted, allowing you to verify that the YAML interpolation is working as intended.
In addition to setting up your configuration correctly, consider these further best practices:
When errors such as InterpolationKeyError
occur without obvious reasons, adopt the following debugging measures:
Make sure that all configuration files (parameters.yml, globals.yml, and catalog.yml) are in their expected locations. Verify that there are no naming conflicts or misplaced files.
Adding logging statements inside your Kedro nodes can help identify whether the loaded configuration parameters match your expectations. This way, you can see the resolved parameters in action.
In some setups, environment variables might be used as a fallback for parameters. Verify if there is any conflict or override by environment variables that could lead to interpolation errors.