Config Schema

Purpose

This page defines the YAML contract used by:

  • scripts/dsambayes.R
  • DSAMbayes::run_from_yaml()

It documents defaults, allowed values, and cross-field constraints enforced by the runner.

Schema processing order

The runner processes config in this order:

  1. Parse YAML.
  2. Coerce YAML infinity tokens (.Inf, -.Inf).
  3. Apply defaults (resolve_config_defaults()).
  4. Resolve relative paths against the config file directory (resolve_config_paths()).
  5. Validate values and cross-field constraints (validate_config()).
  6. Validate formula safety unless security.allow_unsafe_formula: true.

Root sections

Key Default Notes
schema_version 1 Must be 1.
data object Input data path, format, date handling, dictionary metadata.
model object Formula, model type, scaling, compile behaviour.
cre object Correlated random effects settings (Mundlak).
pooling object Pooled model map and grouping settings.
transforms object Transform mode and sensitivity scenarios.
priors object Default priors plus sparse overrides.
boundaries object Parameter boundary overrides.
time_components object Managed holiday feature generation.
fit object MCMC or optimisation fit arguments.
allocation object Post-fit budget optimisation settings.
outputs object Run directory and artefact save flags.
forecast object Reserved forecast stage toggle.
diagnostics object Policy mode, identifiability, model selection, time-series selection.
security object Formula safety bypass flag.

Minimal valid config

schema_version: 1

data:
  path: data/your_data.csv
  format: csv

model:
  formula: y ~ x1 + x2

Expected outcome: this resolves with defaults for all omitted sections and passes schema validation if the file exists and formula variables exist in data.

Section reference

schema_version

Key Type Default Rules
schema_version integer 1 Only 1 is supported.

data

Key Type Default Rules
data.path string none Required. File must exist. Relative path resolves from config directory.
data.format string inferred from file extension Must be csv, rds, or long.
data.date_var string or null null Required for data.format: long. Required when holidays are enabled. Required at runtime for time-series selection.
data.date_format string or null null Optional date parser format for date columns.
data.na_action string omit Must be omit or error during formula/data checks.
data.long_id_col string or null null Required when data.format: long.
data.long_variable_col string or null null Required when data.format: long.
data.long_value_col string or null null Required when data.format: long.
data.dictionary_path string or null null Optional CSV. Must exist if provided. Relative path resolves from config directory.
data.dictionary mapping {} Optional inline metadata keyed by term name. Allowed fields: unit, cadence, source, transform, rationale.

Long-format-specific rules:

  • data.long_id_col, data.long_variable_col, data.long_value_col, and data.date_var must all be set.
  • These four column names must be distinct.
  • Long data is reshaped wide before modelling and duplicate key rows are rejected.

model

Key Type Default Rules
model.name string config filename stem Used in run folder slug.
model.formula string none Required. Must parse to y ~ ....
model.type string auto auto, blm, re, cre, pooled.
model.kpi_type string revenue revenue or subscriptions.
model.scale boolean true Controls internal scaling before fit.
model.force_recompile boolean false Forces Stan recompile when true.

Model type resolution rules:

  • auto resolves to pooled if pooling.enabled: true.
  • auto resolves to cre if cre.enabled: true.
  • auto resolves to re if formula contains bar terms (for example (1 | group)).
  • auto resolves to blm otherwise.
  • re or cre requires bar terms in formula.
  • blm or pooled cannot be used with bar terms.

cre

Key Type Default Rules
cre.enabled boolean false (or true when model.type: cre) Cannot be true for model.type: blm or pooled.
cre.vars list of strings [] Required and non-empty when cre.enabled: true.
cre.group string or null null Grouping column used in CRE construction.
cre.prefix string cre_mean_ Prefix for generated CRE mean terms.

pooling

Key Type Default Rules
pooling.enabled boolean false (or true when model.type: pooled) Cannot be true for model.type: re or cre.
pooling.grouping_vars list of strings [] Required and non-empty when pooling is enabled.
pooling.map_path string or null null Required when pooling is enabled. File must exist. Relative path resolves from config directory.
pooling.map_format string or null inferred from map_path extension Must be csv or rds.
pooling.min_waves integer or null null If set, must be positive integer.

Pooling map requirements at model-build time:

  • Must include a variable column.
  • Must include every column named in pooling.grouping_vars.
  • variable values must be unique.

transforms

Key Type Default Rules
transforms.mode string fixed_formula Currently only fixed_formula is supported.
transforms.sensitivity.enabled boolean false If true, requires fit.method: optimise.
transforms.sensitivity.scenarios list [] Required and non-empty when sensitivity is enabled.

Each sensitivity scenario requires:

  • name (unique, non-empty, not base)
  • formula (safe formula string unless unsafe mode is enabled)

priors

Key Type Default Rules
priors.use_defaults boolean true Must be true in current runner version.
priors.overrides list [] Optional sparse overrides.

Prior override row contract:

  • parameter (string, must exist in model prior table)
  • family (normal or lognormal_ms, default normal)
  • mean (numeric)
  • sd (numeric, > 0)

lognormal_ms extra constraints:

  • mean > 0
  • allowed only for noise_sd and parameters matching sd_<index>[<term>]

boundaries

Key Type Default Rules
boundaries.overrides list [] Optional parameter boundary overrides.

Boundary override row contract:

  • parameter (string, must exist in model boundary table)
  • lower (numeric, default -Inf)
  • upper (numeric, default Inf)

time_components

Key Type Default Rules
time_components.enabled boolean false Master toggle.
time_components.holidays.enabled boolean false Enables holiday feature generation.
time_components.holidays.calendar_path string or null null Required when holidays are enabled. CSV or RDS. Relative path resolves from config directory.
time_components.holidays.date_col string or null null Optional calendar date column override.
time_components.holidays.label_col string holiday Holiday label column.
time_components.holidays.date_format string or null null Optional parser format for character dates.
time_components.holidays.week_start string monday One of mondaysunday.
time_components.holidays.timezone string UTC Timezone used in week alignment.
time_components.holidays.prefix string holiday_ Prefix for generated terms.
time_components.holidays.window_before integer 0 Must be non-negative.
time_components.holidays.window_after integer 0 Must be non-negative.
time_components.holidays.aggregation_rule string count count or any.
time_components.holidays.overlap_policy string count_all count_all or dedupe_label_date.
time_components.holidays.add_to_formula boolean true Auto-add generated terms to formula.
time_components.holidays.overwrite_existing boolean false If false, generated name collisions abort.

fit

Key Type Default Rules
fit.method string mcmc mcmc or optimise.
fit.seed numeric or null null Optional scalar seed.
fit.optimise.n_runs integer 10 Multi-start retries for fit_map().
fit.mcmc.chains integer 4 MCMC chains.
fit.mcmc.iter integer 2000 Total iterations per chain.
fit.mcmc.warmup integer 1000 Warmup iterations per chain.
fit.mcmc.cores integer 1 Parallel chains.
fit.mcmc.refresh integer 0 Stan progress refresh interval.
fit.mcmc.parameterization.positive_priors string centered centered or noncentered.

Allowed keys under fit.optimise:

  • n_runs, iter, seed, init, algorithm, hessian, as_vector

Allowed keys under fit.mcmc:

  • chains, iter, warmup, thin, cores, refresh, seed, init, control, parameterization

Allowed keys under fit.mcmc.parameterization:

  • positive_priors

allocation

Key Type Default Rules
allocation.enabled boolean false Enables budget optimisation stage.
allocation.scenario string max_response max_response or target_efficiency.
allocation.target_value numeric or null null Required and > 0 for target_efficiency.
allocation.n_candidates integer 2000 Must be integer >= 10.
allocation.seed numeric or null null Optional allocator seed.
allocation.budget.total numeric or null null Required and > 0 for max_response. Optional > 0 for target_efficiency.
allocation.channels list [] Required and non-empty when allocation is enabled.
allocation.reference_spend numeric/list or null null Optional baseline spend vector.
allocation.currency_scale numeric or null null Optional positive scaling factor.
allocation.posterior.draws integer 500 Must be positive integer when provided.
allocation.objective.target string kpi_uplift kpi_uplift or profit.
allocation.objective.value_per_kpi numeric or null null Required at optimisation runtime for objective.target: profit.
allocation.objective.kpi_baseline numeric or null null Must be > 0 when provided.
allocation.objective.allow_relative_log_uplift boolean false Allows relative uplift output for log-response runs without baseline.
allocation.objective.risk.type string mean mean, mean_minus_sd, or quantile.
allocation.objective.risk.lambda numeric 0 Must be >= 0 when risk.type: mean_minus_sd.
allocation.objective.risk.quantile numeric 0.1 Must be in (0, 1).

Channel row contract (allocation.channels[]):

  • term required, scalar string, unique across channels.
  • name optional, defaults to term, must be unique.
  • spend_col optional, defaults to name.
  • bounds.min optional, defaults 0, must be finite and >= 0.
  • bounds.max optional, defaults Inf, but operationally must be finite and >= bounds.min.
  • currency_col optional.
  • response optional mapping:
    • type: identity (default)
    • type: atan requires positive scale
    • type: log1p requires positive scale
    • type: hill requires positive k and positive n

Log-response runtime rules:

  • scenario: target_efficiency requires allocation.objective.kpi_baseline.
  • Other scenarios require kpi_baseline unless allow_relative_log_uplift: true.

outputs

Path keys:

Key Type Default Rules
outputs.root_dir string results Relative path resolves from config directory.
outputs.run_dir string or null null If relative, resolves under outputs.root_dir.
outputs.overwrite boolean false Existing run dir can be reused only when true and contents are recognised runner artefacts.
outputs.layout string staged staged or flat.
outputs.decomp_top_n integer 8 Must be positive integer.

Save toggles (all booleans):

Key Default
outputs.save_model_rds true
outputs.save_posterior_rds false
outputs.save_posterior_summary_csv true
outputs.save_fitted_csv true
outputs.save_observed_csv true
outputs.save_chain_diagnostics_txt true
outputs.save_diagnostics_report_csv true
outputs.save_diagnostics_summary_txt true
outputs.save_session_info_txt true
outputs.save_transform_sensitivity_summary_csv true
outputs.save_transform_sensitivity_parameters_csv true
outputs.save_transform_assumptions_txt true
outputs.save_data_dictionary_csv true
outputs.save_allocator_csv true
outputs.save_allocator_png true
outputs.save_allocator_json false
outputs.save_decomp_csv true
outputs.save_decomp_png true
outputs.save_spec_summary_csv true
outputs.save_design_matrix_manifest_csv true
outputs.save_vif_report_csv true
outputs.save_predictor_risk_register_csv true
outputs.save_fit_png true
outputs.save_residuals_csv true
outputs.save_diagnostics_png true
outputs.save_model_selection_csv true
outputs.save_model_selection_pointwise_csv true

Run directory precedence:

  1. CLI --run-dir / run_from_yaml(..., run_dir=...)
  2. outputs.run_dir
  3. Timestamped directory under outputs.root_dir

forecast

Key Type Default Rules
forecast.enabled boolean false Reserved stage toggle (70_forecast).

diagnostics

Key Type Default Rules
diagnostics.enabled boolean true Enables diagnostics pipeline.
diagnostics.policy_mode string publish explore, publish, or strict.
diagnostics.enforce_publish_gate boolean false If true, run aborts only when overall status is fail.

diagnostics.model_selection:

Key Default Rules
enabled true Toggle PSIS-LOO check pipeline.
method psis_loo Currently only psis_loo is supported.
max_draws null If set, must be > 0.
pareto_k_warn 0.7 Finite value must be in [0, 1], or Inf.
pareto_k_fail Inf Finite value must be in [0, 1] and strictly greater than pareto_k_warn.
moment_match false Boolean.
reloo false Boolean.
top_n 10 Must be >= 0.

diagnostics.time_series_selection:

Key Default Rules
enabled false When true, runs refit-and-score time-series selection.
method blocked_cv blocked_cv or leave_future_out.
horizon_weeks 13 Positive integer.
n_folds 4 Positive integer.
stride_weeks horizon_weeks Positive integer.
min_train_weeks 52 Positive integer.
save_pointwise false Boolean.
save_png true Boolean.

Time-series selection constraints:

  • Requires fit.method: mcmc.
  • Not supported for pooled runs.
  • Requires data.date_var set and present in data.

diagnostics.identifiability:

Key Default Rules
enabled true Boolean.
media_terms [] Character list.
baseline_terms [] Character list.
baseline_regex `["^t(_ $)", “^sin[0-9]”, “^cos[0-9]”, “^holiday_”]`
abs_corr_warn 0.80 Must be in [0, 1).
abs_corr_fail 0.95 Finite value must be > abs_corr_warn and <= 1, or Inf.

security

Key Type Default Rules
security.allow_unsafe_formula boolean false If false, formula safety checks are enforced.

Safe formula calls when unsafe mode is disabled:

  • Operators: ~, +, -, *, /, ^, :, |, (
  • Functions: log, exp, sqrt, atan, sin, cos, tan, I, offset, pmax, pmin, abs
  • Namespaced calls: dplyr::lag, dplyr::lead
  1. Generate a template:
Rscript scripts/dsambayes.R init --template master --out config/my_run.yaml
  1. Validate before running:
R_LIBS_USER="$PWD/.Rlib" \
  Rscript scripts/dsambayes.R validate --config config/my_run.yaml
  1. Execute:
R_LIBS_USER="$PWD/.Rlib" \
  Rscript scripts/dsambayes.R run --config config/my_run.yaml