Config Schema

Purpose

This page defines the YAML contract used by:

scripts/dsambayes.R
DSAMbayes::run_from_yaml()

It documents defaults, allowed values, and cross-field constraints enforced by the runner.

Schema processing order

The runner processes config in this order:

Parse YAML.
Coerce YAML infinity tokens (.Inf, -.Inf).
Apply defaults (resolve_config_defaults()).
Resolve relative paths against the config file directory (resolve_config_paths()).
Validate values and cross-field constraints (validate_config()).
Validate formula safety unless security.allow_unsafe_formula: true.

Root sections

Key	Default	Notes
`schema_version`	`1`	Must be `1`.
`data`	object	Input data path, format, date handling, dictionary metadata.
`model`	object	Formula, model type, scaling, compile behaviour.
`cre`	object	Correlated random effects settings (Mundlak).
`pooling`	object	Pooled model map and grouping settings.
`transforms`	object	Transform mode and sensitivity scenarios.
`priors`	object	Default priors plus sparse overrides.
`boundaries`	object	Parameter boundary overrides.
`time_components`	object	Managed holiday feature generation.
`fit`	object	MCMC or optimisation fit arguments.
`allocation`	object	Post-fit budget optimisation settings.
`outputs`	object	Run directory and artefact save flags.
`forecast`	object	Reserved forecast stage toggle.
`diagnostics`	object	Policy mode, identifiability, model selection, time-series selection.
`security`	object	Formula safety bypass flag.

Minimal valid config

schema_version: 1

data:
  path: data/your_data.csv
  format: csv

model:
  formula: y ~ x1 + x2

Expected outcome: this resolves with defaults for all omitted sections and passes schema validation if the file exists and formula variables exist in data.

Section reference

`schema_version`

Key	Type	Default	Rules
`schema_version`	integer	`1`	Only `1` is supported.

`data`

Key	Type	Default	Rules
`data.path`	string	none	Required. File must exist. Relative path resolves from config directory.
`data.format`	string	inferred from file extension	Must be `csv`, `rds`, or `long`.
`data.date_var`	string or null	`null`	Required for `data.format: long`. Required when holidays are enabled. Required at runtime for time-series selection.
`data.date_format`	string or null	`null`	Optional date parser format for date columns.
`data.na_action`	string	`omit`	Must be `omit` or `error` during formula/data checks.
`data.long_id_col`	string or null	`null`	Required when `data.format: long`.
`data.long_variable_col`	string or null	`null`	Required when `data.format: long`.
`data.long_value_col`	string or null	`null`	Required when `data.format: long`.
`data.dictionary_path`	string or null	`null`	Optional CSV. Must exist if provided. Relative path resolves from config directory.
`data.dictionary`	mapping	`{}`	Optional inline metadata keyed by term name. Allowed fields: `unit`, `cadence`, `source`, `transform`, `rationale`.

Long-format-specific rules:

data.long_id_col, data.long_variable_col, data.long_value_col, and data.date_var must all be set.
These four column names must be distinct.
Long data is reshaped wide before modelling and duplicate key rows are rejected.

`model`

Key	Type	Default	Rules
`model.name`	string	config filename stem	Used in run folder slug.
`model.formula`	string	none	Required. Must parse to `y ~ ...`.
`model.type`	string	`auto`	`auto`, `blm`, `re`, `cre`, `pooled`.
`model.kpi_type`	string	`revenue`	`revenue` or `subscriptions`.
`model.scale`	boolean	`true`	Controls internal scaling before fit.
`model.force_recompile`	boolean	`false`	Forces Stan recompile when `true`.

Model type resolution rules:

auto resolves to pooled if pooling.enabled: true.
auto resolves to cre if cre.enabled: true.
auto resolves to re if formula contains bar terms (for example (1 | group)).
auto resolves to blm otherwise.
re or cre requires bar terms in formula.
blm or pooled cannot be used with bar terms.

`cre`

Key	Type	Default	Rules
`cre.enabled`	boolean	`false` (or `true` when `model.type: cre`)	Cannot be true for `model.type: blm` or `pooled`.
`cre.vars`	list of strings	`[]`	Required and non-empty when `cre.enabled: true`.
`cre.group`	string or null	`null`	Grouping column used in CRE construction.
`cre.prefix`	string	`cre_mean_`	Prefix for generated CRE mean terms.

`pooling`

Key	Type	Default	Rules
`pooling.enabled`	boolean	`false` (or `true` when `model.type: pooled`)	Cannot be true for `model.type: re` or `cre`.
`pooling.grouping_vars`	list of strings	`[]`	Required and non-empty when pooling is enabled.
`pooling.map_path`	string or null	`null`	Required when pooling is enabled. File must exist. Relative path resolves from config directory.
`pooling.map_format`	string or null	inferred from `map_path` extension	Must be `csv` or `rds`.
`pooling.min_waves`	integer or null	`null`	If set, must be positive integer.

Pooling map requirements at model-build time:

Must include a variable column.
Must include every column named in pooling.grouping_vars.
variable values must be unique.

`transforms`

Key	Type	Default	Rules
`transforms.mode`	string	`fixed_formula`	Currently only `fixed_formula` is supported.
`transforms.sensitivity.enabled`	boolean	`false`	If true, requires `fit.method: optimise`.
`transforms.sensitivity.scenarios`	list	`[]`	Required and non-empty when sensitivity is enabled.

Each sensitivity scenario requires:

name (unique, non-empty, not base)
formula (safe formula string unless unsafe mode is enabled)

`priors`

Key	Type	Default	Rules
`priors.use_defaults`	boolean	`true`	Must be `true` in current runner version.
`priors.overrides`	list	`[]`	Optional sparse overrides.

Prior override row contract:

parameter (string, must exist in model prior table)
family (normal or lognormal_ms, default normal)
mean (numeric)
sd (numeric, > 0)

lognormal_ms extra constraints:

mean > 0
allowed only for noise_sd and parameters matching sd_<index>[<term>]

`boundaries`

Key	Type	Default	Rules
`boundaries.overrides`	list	`[]`	Optional parameter boundary overrides.

Boundary override row contract:

parameter (string, must exist in model boundary table)
lower (numeric, default -Inf)
upper (numeric, default Inf)

`time_components`

Key	Type	Default	Rules
`time_components.enabled`	boolean	`false`	Master toggle.
`time_components.holidays.enabled`	boolean	`false`	Enables holiday feature generation.
`time_components.holidays.calendar_path`	string or null	`null`	Required when holidays are enabled. CSV or RDS. Relative path resolves from config directory.
`time_components.holidays.date_col`	string or null	`null`	Optional calendar date column override.
`time_components.holidays.label_col`	string	`holiday`	Holiday label column.
`time_components.holidays.date_format`	string or null	`null`	Optional parser format for character dates.
`time_components.holidays.week_start`	string	`monday`	One of `monday` … `sunday`.
`time_components.holidays.timezone`	string	`UTC`	Timezone used in week alignment.
`time_components.holidays.prefix`	string	`holiday_`	Prefix for generated terms.
`time_components.holidays.window_before`	integer	`0`	Must be non-negative.
`time_components.holidays.window_after`	integer	`0`	Must be non-negative.
`time_components.holidays.aggregation_rule`	string	`count`	`count` or `any`.
`time_components.holidays.overlap_policy`	string	`count_all`	`count_all` or `dedupe_label_date`.
`time_components.holidays.add_to_formula`	boolean	`true`	Auto-add generated terms to formula.
`time_components.holidays.overwrite_existing`	boolean	`false`	If false, generated name collisions abort.

`fit`

Key	Type	Default	Rules
`fit.method`	string	`mcmc`	`mcmc` or `optimise`.
`fit.seed`	numeric or null	`null`	Optional scalar seed.
`fit.optimise.n_runs`	integer	`10`	Multi-start retries for `fit_map()`.
`fit.mcmc.chains`	integer	`4`	MCMC chains.
`fit.mcmc.iter`	integer	`2000`	Total iterations per chain.
`fit.mcmc.warmup`	integer	`1000`	Warmup iterations per chain.
`fit.mcmc.cores`	integer	`1`	Parallel chains.
`fit.mcmc.refresh`	integer	`0`	Stan progress refresh interval.
`fit.mcmc.parameterization.positive_priors`	string	`centered`	`centered` or `noncentered`.

Allowed keys under fit.optimise:

n_runs, iter, seed, init, algorithm, hessian, as_vector

Allowed keys under fit.mcmc:

chains, iter, warmup, thin, cores, refresh, seed, init, control, parameterization

Allowed keys under fit.mcmc.parameterization:

positive_priors

`allocation`

Key	Type	Default	Rules
`allocation.enabled`	boolean	`false`	Enables budget optimisation stage.
`allocation.scenario`	string	`max_response`	`max_response` or `target_efficiency`.
`allocation.target_value`	numeric or null	`null`	Required and `> 0` for `target_efficiency`.
`allocation.n_candidates`	integer	`2000`	Must be integer `>= 10`.
`allocation.seed`	numeric or null	`null`	Optional allocator seed.
`allocation.budget.total`	numeric or null	`null`	Required and `> 0` for `max_response`. Optional `> 0` for `target_efficiency`.
`allocation.channels`	list	`[]`	Required and non-empty when allocation is enabled.
`allocation.reference_spend`	numeric/list or null	`null`	Optional baseline spend vector.
`allocation.currency_scale`	numeric or null	`null`	Optional positive scaling factor.
`allocation.posterior.draws`	integer	`500`	Must be positive integer when provided.
`allocation.objective.target`	string	`kpi_uplift`	`kpi_uplift` or `profit`.
`allocation.objective.value_per_kpi`	numeric or null	`null`	Required at optimisation runtime for `objective.target: profit`.
`allocation.objective.kpi_baseline`	numeric or null	`null`	Must be `> 0` when provided.
`allocation.objective.allow_relative_log_uplift`	boolean	`false`	Allows relative uplift output for log-response runs without baseline.
`allocation.objective.risk.type`	string	`mean`	`mean`, `mean_minus_sd`, or `quantile`.
`allocation.objective.risk.lambda`	numeric	`0`	Must be `>= 0` when `risk.type: mean_minus_sd`.
`allocation.objective.risk.quantile`	numeric	`0.1`	Must be in `(0, 1)`.

Channel row contract (allocation.channels[]):

term required, scalar string, unique across channels.
name optional, defaults to term, must be unique.
spend_col optional, defaults to name.
bounds.min optional, defaults 0, must be finite and >= 0.
bounds.max optional, defaults Inf, but operationally must be finite and >= bounds.min.
currency_col optional.
response optional mapping:
- type: identity (default)
- type: atan requires positive scale
- type: log1p requires positive scale
- type: hill requires positive k and positive n

Log-response runtime rules:

scenario: target_efficiency requires allocation.objective.kpi_baseline.
Other scenarios require kpi_baseline unless allow_relative_log_uplift: true.

`outputs`

Path keys:

Key	Type	Default	Rules
`outputs.root_dir`	string	`results`	Relative path resolves from config directory.
`outputs.run_dir`	string or null	`null`	If relative, resolves under `outputs.root_dir`.
`outputs.overwrite`	boolean	`false`	Existing run dir can be reused only when true and contents are recognised runner artefacts.
`outputs.layout`	string	`staged`	`staged` or `flat`.
`outputs.decomp_top_n`	integer	`8`	Must be positive integer.

Save toggles (all booleans):

Key	Default
`outputs.save_model_rds`	`true`
`outputs.save_posterior_rds`	`false`
`outputs.save_posterior_summary_csv`	`true`
`outputs.save_fitted_csv`	`true`
`outputs.save_observed_csv`	`true`
`outputs.save_chain_diagnostics_txt`	`true`
`outputs.save_diagnostics_report_csv`	`true`
`outputs.save_diagnostics_summary_txt`	`true`
`outputs.save_session_info_txt`	`true`
`outputs.save_transform_sensitivity_summary_csv`	`true`
`outputs.save_transform_sensitivity_parameters_csv`	`true`
`outputs.save_transform_assumptions_txt`	`true`
`outputs.save_data_dictionary_csv`	`true`
`outputs.save_allocator_csv`	`true`
`outputs.save_allocator_png`	`true`
`outputs.save_allocator_json`	`false`
`outputs.save_decomp_csv`	`true`
`outputs.save_decomp_png`	`true`
`outputs.save_spec_summary_csv`	`true`
`outputs.save_design_matrix_manifest_csv`	`true`
`outputs.save_vif_report_csv`	`true`
`outputs.save_predictor_risk_register_csv`	`true`
`outputs.save_fit_png`	`true`
`outputs.save_residuals_csv`	`true`
`outputs.save_diagnostics_png`	`true`
`outputs.save_model_selection_csv`	`true`
`outputs.save_model_selection_pointwise_csv`	`true`

Run directory precedence:

CLI --run-dir / run_from_yaml(..., run_dir=...)
outputs.run_dir
Timestamped directory under outputs.root_dir

`forecast`

Key	Type	Default	Rules
`forecast.enabled`	boolean	`false`	Reserved stage toggle (`70_forecast`).

`diagnostics`

Key	Type	Default	Rules
`diagnostics.enabled`	boolean	`true`	Enables diagnostics pipeline.
`diagnostics.policy_mode`	string	`publish`	`explore`, `publish`, or `strict`.
`diagnostics.enforce_publish_gate`	boolean	`false`	If true, run aborts only when overall status is `fail`.

diagnostics.model_selection:

Key	Default	Rules
`enabled`	`true`	Toggle PSIS-LOO check pipeline.
`method`	`psis_loo`	Currently only `psis_loo` is supported.
`max_draws`	`null`	If set, must be `> 0`.
`pareto_k_warn`	`0.7`	Finite value must be in `[0, 1]`, or `Inf`.
`pareto_k_fail`	`Inf`	Finite value must be in `[0, 1]` and strictly greater than `pareto_k_warn`.
`moment_match`	`false`	Boolean.
`reloo`	`false`	Boolean.
`top_n`	`10`	Must be `>= 0`.

diagnostics.time_series_selection:

Key	Default	Rules
`enabled`	`false`	When true, runs refit-and-score time-series selection.
`method`	`blocked_cv`	`blocked_cv` or `leave_future_out`.
`horizon_weeks`	`13`	Positive integer.
`n_folds`	`4`	Positive integer.
`stride_weeks`	`horizon_weeks`	Positive integer.
`min_train_weeks`	`52`	Positive integer.
`save_pointwise`	`false`	Boolean.
`save_png`	`true`	Boolean.

Time-series selection constraints:

Requires fit.method: mcmc.
Not supported for pooled runs.
Requires data.date_var set and present in data.

diagnostics.identifiability:

Key	Default	Rules
`enabled`	`true`	Boolean.
`media_terms`	`[]`	Character list.
`baseline_terms`	`[]`	Character list.
`baseline_regex`	`["^t(_	$)", “^sin[0-9]”, “^cos[0-9]”, “^holiday_”]`
`abs_corr_warn`	`0.80`	Must be in `[0, 1)`.
`abs_corr_fail`	`0.95`	Finite value must be `> abs_corr_warn` and `<= 1`, or `Inf`.

`security`

Key	Type	Default	Rules
`security.allow_unsafe_formula`	boolean	`false`	If `false`, formula safety checks are enforced.

Safe formula calls when unsafe mode is disabled:

Operators: ~, +, -, *, /, ^, :, |, (
Functions: log, exp, sqrt, atan, sin, cos, tan, I, offset, pmax, pmin, abs
Namespaced calls: dplyr::lag, dplyr::lead

Recommended authoring workflow

Generate a template:

Rscript scripts/dsambayes.R init --template master --out config/my_run.yaml

Validate before running:

R_LIBS_USER="$PWD/.Rlib" \
  Rscript scripts/dsambayes.R validate --config config/my_run.yaml

Execute:

R_LIBS_USER="$PWD/.Rlib" \
  Rscript scripts/dsambayes.R run --config config/my_run.yaml