Config Schema
Purpose
This page defines the YAML contract used by:
scripts/dsambayes.RDSAMbayes::run_from_yaml()
It documents defaults, allowed values, and cross-field constraints enforced by the runner.
Schema processing order
The runner processes config in this order:
- Parse YAML.
- Coerce YAML infinity tokens (
.Inf,-.Inf). - Apply defaults (
resolve_config_defaults()). - Resolve relative paths against the config file directory (
resolve_config_paths()). - Validate values and cross-field constraints (
validate_config()). - Validate formula safety unless
security.allow_unsafe_formula: true.
Root sections
| Key | Default | Notes |
|---|---|---|
schema_version |
1 |
Must be 1. |
data |
object | Input data path, format, date handling, dictionary metadata. |
model |
object | Formula, model type, scaling, compile behaviour. |
cre |
object | Correlated random effects settings (Mundlak). |
pooling |
object | Pooled model map and grouping settings. |
transforms |
object | Transform mode and sensitivity scenarios. |
priors |
object | Default priors plus sparse overrides. |
boundaries |
object | Parameter boundary overrides. |
time_components |
object | Managed holiday feature generation. |
fit |
object | MCMC or optimisation fit arguments. |
allocation |
object | Post-fit budget optimisation settings. |
outputs |
object | Run directory and artefact save flags. |
forecast |
object | Reserved forecast stage toggle. |
diagnostics |
object | Policy mode, identifiability, model selection, time-series selection. |
security |
object | Formula safety bypass flag. |
Minimal valid config
Expected outcome: this resolves with defaults for all omitted sections and passes schema validation if the file exists and formula variables exist in data.
Section reference
schema_version
| Key | Type | Default | Rules |
|---|---|---|---|
schema_version |
integer | 1 |
Only 1 is supported. |
data
| Key | Type | Default | Rules |
|---|---|---|---|
data.path |
string | none | Required. File must exist. Relative path resolves from config directory. |
data.format |
string | inferred from file extension | Must be csv, rds, or long. |
data.date_var |
string or null | null |
Required for data.format: long. Required when holidays are enabled. Required at runtime for time-series selection. |
data.date_format |
string or null | null |
Optional date parser format for date columns. |
data.na_action |
string | omit |
Must be omit or error during formula/data checks. |
data.long_id_col |
string or null | null |
Required when data.format: long. |
data.long_variable_col |
string or null | null |
Required when data.format: long. |
data.long_value_col |
string or null | null |
Required when data.format: long. |
data.dictionary_path |
string or null | null |
Optional CSV. Must exist if provided. Relative path resolves from config directory. |
data.dictionary |
mapping | {} |
Optional inline metadata keyed by term name. Allowed fields: unit, cadence, source, transform, rationale. |
Long-format-specific rules:
data.long_id_col,data.long_variable_col,data.long_value_col, anddata.date_varmust all be set.- These four column names must be distinct.
- Long data is reshaped wide before modelling and duplicate key rows are rejected.
model
| Key | Type | Default | Rules |
|---|---|---|---|
model.name |
string | config filename stem | Used in run folder slug. |
model.formula |
string | none | Required. Must parse to y ~ .... |
model.type |
string | auto |
auto, blm, re, cre, pooled. |
model.kpi_type |
string | revenue |
revenue or subscriptions. |
model.scale |
boolean | true |
Controls internal scaling before fit. |
model.force_recompile |
boolean | false |
Forces Stan recompile when true. |
Model type resolution rules:
autoresolves topooledifpooling.enabled: true.autoresolves tocreifcre.enabled: true.autoresolves toreif formula contains bar terms (for example(1 | group)).autoresolves toblmotherwise.reorcrerequires bar terms in formula.blmorpooledcannot be used with bar terms.
cre
| Key | Type | Default | Rules |
|---|---|---|---|
cre.enabled |
boolean | false (or true when model.type: cre) |
Cannot be true for model.type: blm or pooled. |
cre.vars |
list of strings | [] |
Required and non-empty when cre.enabled: true. |
cre.group |
string or null | null |
Grouping column used in CRE construction. |
cre.prefix |
string | cre_mean_ |
Prefix for generated CRE mean terms. |
pooling
| Key | Type | Default | Rules |
|---|---|---|---|
pooling.enabled |
boolean | false (or true when model.type: pooled) |
Cannot be true for model.type: re or cre. |
pooling.grouping_vars |
list of strings | [] |
Required and non-empty when pooling is enabled. |
pooling.map_path |
string or null | null |
Required when pooling is enabled. File must exist. Relative path resolves from config directory. |
pooling.map_format |
string or null | inferred from map_path extension |
Must be csv or rds. |
pooling.min_waves |
integer or null | null |
If set, must be positive integer. |
Pooling map requirements at model-build time:
- Must include a
variablecolumn. - Must include every column named in
pooling.grouping_vars. variablevalues must be unique.
transforms
| Key | Type | Default | Rules |
|---|---|---|---|
transforms.mode |
string | fixed_formula |
Currently only fixed_formula is supported. |
transforms.sensitivity.enabled |
boolean | false |
If true, requires fit.method: optimise. |
transforms.sensitivity.scenarios |
list | [] |
Required and non-empty when sensitivity is enabled. |
Each sensitivity scenario requires:
name(unique, non-empty, notbase)formula(safe formula string unless unsafe mode is enabled)
priors
| Key | Type | Default | Rules |
|---|---|---|---|
priors.use_defaults |
boolean | true |
Must be true in current runner version. |
priors.overrides |
list | [] |
Optional sparse overrides. |
Prior override row contract:
parameter(string, must exist in model prior table)family(normalorlognormal_ms, defaultnormal)mean(numeric)sd(numeric,> 0)
lognormal_ms extra constraints:
mean > 0- allowed only for
noise_sdand parameters matchingsd_<index>[<term>]
boundaries
| Key | Type | Default | Rules |
|---|---|---|---|
boundaries.overrides |
list | [] |
Optional parameter boundary overrides. |
Boundary override row contract:
parameter(string, must exist in model boundary table)lower(numeric, default-Inf)upper(numeric, defaultInf)
time_components
| Key | Type | Default | Rules |
|---|---|---|---|
time_components.enabled |
boolean | false |
Master toggle. |
time_components.holidays.enabled |
boolean | false |
Enables holiday feature generation. |
time_components.holidays.calendar_path |
string or null | null |
Required when holidays are enabled. CSV or RDS. Relative path resolves from config directory. |
time_components.holidays.date_col |
string or null | null |
Optional calendar date column override. |
time_components.holidays.label_col |
string | holiday |
Holiday label column. |
time_components.holidays.date_format |
string or null | null |
Optional parser format for character dates. |
time_components.holidays.week_start |
string | monday |
One of monday … sunday. |
time_components.holidays.timezone |
string | UTC |
Timezone used in week alignment. |
time_components.holidays.prefix |
string | holiday_ |
Prefix for generated terms. |
time_components.holidays.window_before |
integer | 0 |
Must be non-negative. |
time_components.holidays.window_after |
integer | 0 |
Must be non-negative. |
time_components.holidays.aggregation_rule |
string | count |
count or any. |
time_components.holidays.overlap_policy |
string | count_all |
count_all or dedupe_label_date. |
time_components.holidays.add_to_formula |
boolean | true |
Auto-add generated terms to formula. |
time_components.holidays.overwrite_existing |
boolean | false |
If false, generated name collisions abort. |
fit
| Key | Type | Default | Rules |
|---|---|---|---|
fit.method |
string | mcmc |
mcmc or optimise. |
fit.seed |
numeric or null | null |
Optional scalar seed. |
fit.optimise.n_runs |
integer | 10 |
Multi-start retries for fit_map(). |
fit.mcmc.chains |
integer | 4 |
MCMC chains. |
fit.mcmc.iter |
integer | 2000 |
Total iterations per chain. |
fit.mcmc.warmup |
integer | 1000 |
Warmup iterations per chain. |
fit.mcmc.cores |
integer | 1 |
Parallel chains. |
fit.mcmc.refresh |
integer | 0 |
Stan progress refresh interval. |
fit.mcmc.parameterization.positive_priors |
string | centered |
centered or noncentered. |
Allowed keys under fit.optimise:
n_runs,iter,seed,init,algorithm,hessian,as_vector
Allowed keys under fit.mcmc:
chains,iter,warmup,thin,cores,refresh,seed,init,control,parameterization
Allowed keys under fit.mcmc.parameterization:
positive_priors
allocation
| Key | Type | Default | Rules |
|---|---|---|---|
allocation.enabled |
boolean | false |
Enables budget optimisation stage. |
allocation.scenario |
string | max_response |
max_response or target_efficiency. |
allocation.target_value |
numeric or null | null |
Required and > 0 for target_efficiency. |
allocation.n_candidates |
integer | 2000 |
Must be integer >= 10. |
allocation.seed |
numeric or null | null |
Optional allocator seed. |
allocation.budget.total |
numeric or null | null |
Required and > 0 for max_response. Optional > 0 for target_efficiency. |
allocation.channels |
list | [] |
Required and non-empty when allocation is enabled. |
allocation.reference_spend |
numeric/list or null | null |
Optional baseline spend vector. |
allocation.currency_scale |
numeric or null | null |
Optional positive scaling factor. |
allocation.posterior.draws |
integer | 500 |
Must be positive integer when provided. |
allocation.objective.target |
string | kpi_uplift |
kpi_uplift or profit. |
allocation.objective.value_per_kpi |
numeric or null | null |
Required at optimisation runtime for objective.target: profit. |
allocation.objective.kpi_baseline |
numeric or null | null |
Must be > 0 when provided. |
allocation.objective.allow_relative_log_uplift |
boolean | false |
Allows relative uplift output for log-response runs without baseline. |
allocation.objective.risk.type |
string | mean |
mean, mean_minus_sd, or quantile. |
allocation.objective.risk.lambda |
numeric | 0 |
Must be >= 0 when risk.type: mean_minus_sd. |
allocation.objective.risk.quantile |
numeric | 0.1 |
Must be in (0, 1). |
Channel row contract (allocation.channels[]):
termrequired, scalar string, unique across channels.nameoptional, defaults toterm, must be unique.spend_coloptional, defaults toname.bounds.minoptional, defaults0, must be finite and>= 0.bounds.maxoptional, defaultsInf, but operationally must be finite and>= bounds.min.currency_coloptional.responseoptional mapping:type: identity(default)type: atanrequires positivescaletype: log1prequires positivescaletype: hillrequires positivekand positiven
Log-response runtime rules:
scenario: target_efficiencyrequiresallocation.objective.kpi_baseline.- Other scenarios require
kpi_baselineunlessallow_relative_log_uplift: true.
outputs
Path keys:
| Key | Type | Default | Rules |
|---|---|---|---|
outputs.root_dir |
string | results |
Relative path resolves from config directory. |
outputs.run_dir |
string or null | null |
If relative, resolves under outputs.root_dir. |
outputs.overwrite |
boolean | false |
Existing run dir can be reused only when true and contents are recognised runner artefacts. |
outputs.layout |
string | staged |
staged or flat. |
outputs.decomp_top_n |
integer | 8 |
Must be positive integer. |
Save toggles (all booleans):
| Key | Default |
|---|---|
outputs.save_model_rds |
true |
outputs.save_posterior_rds |
false |
outputs.save_posterior_summary_csv |
true |
outputs.save_fitted_csv |
true |
outputs.save_observed_csv |
true |
outputs.save_chain_diagnostics_txt |
true |
outputs.save_diagnostics_report_csv |
true |
outputs.save_diagnostics_summary_txt |
true |
outputs.save_session_info_txt |
true |
outputs.save_transform_sensitivity_summary_csv |
true |
outputs.save_transform_sensitivity_parameters_csv |
true |
outputs.save_transform_assumptions_txt |
true |
outputs.save_data_dictionary_csv |
true |
outputs.save_allocator_csv |
true |
outputs.save_allocator_png |
true |
outputs.save_allocator_json |
false |
outputs.save_decomp_csv |
true |
outputs.save_decomp_png |
true |
outputs.save_spec_summary_csv |
true |
outputs.save_design_matrix_manifest_csv |
true |
outputs.save_vif_report_csv |
true |
outputs.save_predictor_risk_register_csv |
true |
outputs.save_fit_png |
true |
outputs.save_residuals_csv |
true |
outputs.save_diagnostics_png |
true |
outputs.save_model_selection_csv |
true |
outputs.save_model_selection_pointwise_csv |
true |
Run directory precedence:
- CLI
--run-dir/run_from_yaml(..., run_dir=...) outputs.run_dir- Timestamped directory under
outputs.root_dir
forecast
| Key | Type | Default | Rules |
|---|---|---|---|
forecast.enabled |
boolean | false |
Reserved stage toggle (70_forecast). |
diagnostics
| Key | Type | Default | Rules |
|---|---|---|---|
diagnostics.enabled |
boolean | true |
Enables diagnostics pipeline. |
diagnostics.policy_mode |
string | publish |
explore, publish, or strict. |
diagnostics.enforce_publish_gate |
boolean | false |
If true, run aborts only when overall status is fail. |
diagnostics.model_selection:
| Key | Default | Rules |
|---|---|---|
enabled |
true |
Toggle PSIS-LOO check pipeline. |
method |
psis_loo |
Currently only psis_loo is supported. |
max_draws |
null |
If set, must be > 0. |
pareto_k_warn |
0.7 |
Finite value must be in [0, 1], or Inf. |
pareto_k_fail |
Inf |
Finite value must be in [0, 1] and strictly greater than pareto_k_warn. |
moment_match |
false |
Boolean. |
reloo |
false |
Boolean. |
top_n |
10 |
Must be >= 0. |
diagnostics.time_series_selection:
| Key | Default | Rules |
|---|---|---|
enabled |
false |
When true, runs refit-and-score time-series selection. |
method |
blocked_cv |
blocked_cv or leave_future_out. |
horizon_weeks |
13 |
Positive integer. |
n_folds |
4 |
Positive integer. |
stride_weeks |
horizon_weeks |
Positive integer. |
min_train_weeks |
52 |
Positive integer. |
save_pointwise |
false |
Boolean. |
save_png |
true |
Boolean. |
Time-series selection constraints:
- Requires
fit.method: mcmc. - Not supported for pooled runs.
- Requires
data.date_varset and present in data.
diagnostics.identifiability:
| Key | Default | Rules |
|---|---|---|
enabled |
true |
Boolean. |
media_terms |
[] |
Character list. |
baseline_terms |
[] |
Character list. |
baseline_regex |
`["^t(_ | $)", “^sin[0-9]”, “^cos[0-9]”, “^holiday_”]` |
abs_corr_warn |
0.80 |
Must be in [0, 1). |
abs_corr_fail |
0.95 |
Finite value must be > abs_corr_warn and <= 1, or Inf. |
security
| Key | Type | Default | Rules |
|---|---|---|---|
security.allow_unsafe_formula |
boolean | false |
If false, formula safety checks are enforced. |
Safe formula calls when unsafe mode is disabled:
- Operators:
~,+,-,*,/,^,:,|,( - Functions:
log,exp,sqrt,atan,sin,cos,tan,I,offset,pmax,pmin,abs - Namespaced calls:
dplyr::lag,dplyr::lead
Recommended authoring workflow
- Generate a template:
- Validate before running:
- Execute: