Documentation for DSAMbayes v1.2.2 — a Bayesian marketing mix modelling toolkit for R, built on Stan.
DSAMbayes provides a unified interface for building, fitting, and interpreting MMM models. It supports single-market regression (BLM), multi-market hierarchical models with partial pooling, and pooled models with structured media coefficients. All model types share the same post-fit interface for posterior extraction, diagnostics, decomposition, and budget optimisation.
Key changes since v1.2.0 (see CHANGELOG.md for full details):
KPI back-transform correction — log-response models now default to the conditional-mean estimator exp(mu + sigma²/2) rather than the median exp(mu). Use log_response = "median" to retain the old behaviour. See Response Scale Semantics.
Composite hierarchy keys — hierarchical models now support composite grouping keys (e.g. market:brand).
Pooled lognormal_ms support — pooled models accept noise_sd ~ lognormal_ms(...) priors.
Strict pre-flight validation — runner-driven fits now abort on structural data-quality failures instead of warning silently.
Stan cache hardening — stale compiled models are recompiled automatically.
Subsections of DSAMbayes Documentation
Getting Started
Purpose
Onboard a new user from install to first successful DSAMbayes run.
Audience
New DSAMbayes users.
Analysts running DSAMbayes through R scripts or CLI.
# 1. Create repo-local library and cache directoriesmkdir -p .Rlib .cache
# 2. Set environment variables (add to .bashrc/.zshrc for persistence)exportR_LIBS_USER="$PWD/.Rlib"exportXDG_CACHE_HOME="$PWD/.cache"# 3. Install DSAMbayes from the local checkoutR_LIBS_USER="$PWD/.Rlib" R -q -e 'install.packages(".", repos = NULL, type = "source")'
This keeps all package libraries and Stan compilation caches inside the repo, avoiding permission issues with system library paths.
Verify the installation
1. Confirm DSAMbayes loads
R_LIBS_USER="$PWD/.Rlib" R -q -e 'library(DSAMbayes); cat("Version:", as.character(utils::packageVersion("DSAMbayes")), "\n")'
Expected: prints Version: 1.2.2 (or current version).
Symptom:install.packages(".", repos = NULL, type = "source") errors.
Actions:
Confirm you are in the repository root directory.
Confirm .Rlib exists and is writable: ls -la .Rlib.
Check for missing system dependencies in the error output.
Stale Stan cache
Symptom: unexpected model behaviour after updating the package.
Actions:
Clear the cache: rm -rf .cache/dsambayes.
Re-run with model.force_recompile: true in your config (or leave default false — v1.2.2 auto-detects stale caches).
Permission issues
Symptom: write failures for library, cache, or run outputs.
Actions:
Ensure .Rlib, .cache, and results/ are writable.
Keep R_LIBS_USER and XDG_CACHE_HOME set in your shell session.
Run all commands from the repository root.
Concepts
What is DSAMbayes?
DSAMbayes is an R package that fits Bayesian marketing mix models (MMM) using Stan. It provides a familiar lm()-style interface for specifying models, adds prior and boundary controls, and delegates estimation to Stan’s Hamiltonian Monte Carlo (HMC) sampler. The result is a full posterior distribution over model parameters — not just point estimates — enabling rigorous uncertainty quantification for media contribution and budget allocation decisions.
Why Bayesian MMM?
Classical OLS regression gives point estimates and confidence intervals that assume the model is correctly specified. In MMM, where media variables are collinear, sample sizes are small (often 100–200 weeks), and the functional form is uncertain, these assumptions are routinely violated.
Bayesian estimation addresses this by:
Regularisation through priors — weakly informative priors stabilise estimates when the data alone cannot separate correlated effects. This is particularly valuable for media channels with overlapping campaign timing.
Hard parameter constraints — boundary constraints (e.g. media coefficients must be non-negative) are enforced directly in the posterior, rather than post-hoc.
Full uncertainty propagation — every downstream output (fitted values, decomposition, budget allocation) carries the full posterior uncertainty, not just a point estimate ± standard error.
Principled model comparison — leave-one-out cross-validation (LOO-CV) via Pareto-smoothed importance sampling provides predictive model comparison without refitting.
Model classes
DSAMbayes supports three model classes, all sharing the same post-fit interface:
BLM (Bayesian Linear Model)
The simplest class. One market, one response variable, one set of predictors. Equivalent to lm() but with Bayesian estimation.
Use when: single-market modelling with sufficient data (typically 100+ weeks).
Hierarchical
Panel data with multiple groups (e.g. markets, regions, brands). Random effects allow each group to deviate from the population average while sharing information across groups (partial pooling).
kpi ~ population_terms + (varying_terms | group)
Use when: multi-market data where you want to borrow strength across markets while allowing market-specific effects.
Pooled
Single-market data where media variables have a nested structure (e.g. campaign > channel > platform). Coefficients are pooled across labelled dimensions.
Decide — optimise_budget() translates estimates into budget allocation recommendations.
Key concepts for practitioners
Priors
A prior distribution encodes what you believe about a parameter before seeing the data. In DSAMbayes:
Default priors are normal(0, 5) — weakly informative, centred at zero.
Informative priors can be set when domain knowledge justifies it (e.g. price_index ~ normal(-0.2, 0.1) if you know price has a small negative effect).
Priors are specified using formula notation: set_prior(model, m_tv ~ normal(0, 10)).
Boundaries
Hard constraints on parameter values. The posterior density is zero outside the boundary:
set_boundary(model, m_tv > 0) forces the TV coefficient to be non-negative.
Default boundaries are unconstrained (-Inf, Inf).
MCMC vs MAP
MCMC (fit()) draws samples from the full posterior distribution. Slower but gives complete uncertainty quantification. Use for final reporting.
MAP (optimise()) finds the single most probable parameter vector. Much faster but gives only a point estimate. Use for rapid iteration during model development.
Response scale
Models can operate on the original KPI scale (identity response) or the log scale:
Identity:kpi ~ ... — coefficients represent unit changes in KPI.
Log-response models require careful back-transformation to the KPI scale. DSAMbayes handles this automatically via fitted_kpi(). See Response Scale Semantics.
Diagnostics
After fitting, DSAMbayes runs a battery of diagnostic checks:
Boundary monitoring — share of draws hitting constraints.
Each check produces a pass, warn, or fail status. See Diagnostics Gates.
The YAML runner
For reproducible, configuration-driven runs, DSAMbayes provides a CLI runner:
Rscript scripts/dsambayes.R run --config config/my_model.yaml
The runner reads a YAML file specifying the data, formula, priors, boundaries, fit settings, diagnostics policy, and optional budget optimisation. It writes structured artefacts (CSVs, plots, model objects) to a timestamped directory under results/. See CLI Usage and Config Schema.
This walkthrough uses the synthetic dataset shipped at data/synthetic_dsam_example_wide_data.csv. It contains weekly observations for a single market with columns for:
First-time Stan compilation takes 1–3 minutes. Subsequent runs use a cached binary. With 2 chains on synthetic data, sampling typically completes in under 2 minutes.
Step 5: Sampler diagnostics
chain_diagnostics(fitted_model)
Metric
Good
Concern
Max Rhat
< 1.01
> 1.05 means chains have not converged
Min ESS (bulk)
> 400
< 200 means too few effective samples
Divergences
0
Any non-zero count warrants investigation
Step 6: Extract the posterior
post<-get_posterior(fitted_model)
post is a tibble with one row per draw containing coef (named coefficient list), yhat (fitted values), noise_sd, r2, rmse, and smape.
For a well-specified MMM on weekly data, in-sample R² above 0.85 is typical.
Step 8: Response decomposition
decomp_tbl<-decomp(fitted_model)head(decomp_tbl)
Shows each term’s contribution (coefficient × design-matrix column) to the predicted KPI at each time point — the foundation for media contribution and ROI reporting.
Step 9: MAP for rapid iteration
During development, use MAP for fast point estimates:
Understanding of random-effects / mixed-model concepts.
Dataset
This walkthrough uses data/synthetic_dsam_example_hierarchical_data.csv — a panel dataset with weekly observations across multiple markets. Key columns:
Response:kpi_value — weekly KPI per market.
Group:market — market identifier.
Media:m_tv, m_search, m_social — media exposure variables.
Controls:trend, seasonality, brand_metric.
Date:date — weekly date index.
library(DSAMbayes)panel_df<-read.csv("data/synthetic_dsam_example_hierarchical_data.csv")table(panel_df$market)# Check group counts
Step 1: Construct the hierarchical model
The (term | group) syntax tells DSAMbayes to fit random effects. Terms inside the parentheses get group-specific deviations from the population mean:
Boundaries apply to the population-level coefficients.
Step 3: (Optional) Add CRE / Mundlak correction
If you suspect that group-level spending patterns are correlated with unobserved market characteristics (e.g. high-spend markets also have higher baseline demand), CRE controls for this:
This adds cre_mean_m_tv, cre_mean_m_search, cre_mean_m_social as fixed effects — the group-level means of each media variable. The within-group coefficients then represent purely temporal variation, controlling for between-group confounding.
Hierarchical models are slower than BLM — expect 10–30 minutes depending on group count and data size. First-time Stan compilation of the hierarchical template adds 2–3 minutes.
Step 5: Check diagnostics
chain_diagnostics(fitted_model)
Pay special attention to Rhat and ESS for sd_* parameters (group-level standard deviations), which are often harder to estimate than population coefficients.
Step 6: Extract the posterior
post<-get_posterior(fitted_model)
For hierarchical models, coefficient draws from get_posterior() return vectors (one value per group) rather than scalars. The population-level (fixed-effect) estimates are averaged across groups.
Step 7: Group-level results
Fitted values and decomposition are returned per group:
# Fitted values — one row per observation, grouped by marketfit_tbl<-fitted(fitted_model)head(fit_tbl)# Decomposition — per-group predictor contributionsdecomp_tbl<-decomp(fitted_model)
Step 8: Budget optimisation (population level)
Budget optimisation uses population-level (fixed-effect) beta draws, not group-specific totals:
# See Budget Optimisation docs for full scenario specificationresult<-optimise_budget(fitted_model,scenario=my_scenario)
Key differences from BLM
Aspect
BLM
Hierarchical
Data structure
Single market
Panel (multiple groups)
Coefficient draws
Scalars
Vectors (one per group)
Fit time
2–5 min
10–30 min
Decomposition
Direct
May fail gracefully for `
Forest/prior-posterior plots
Direct
Group-averaged population estimates
Stan template
bayes_lm_updater_revised.stan
general_hierarchical.stan (templated per group count)
Common pitfalls
Pitfall
Symptom
Fix
Too few groups
Weak partial pooling; group SDs poorly estimated
Need 4+ groups for meaningful hierarchical structure
Run from YAML — reproducible hierarchical runs via the runner
Quickstart (YAML Runner)
Goal
Complete one reproducible DSAMbayes runner execution from validation to artefact inspection, then load the fitted model in R to explore the results interactively.
1–3 minutes on most machines. Subsequent runs use a cached binary and start sampling immediately. If compilation seems stuck, check your C++ toolchain — see Install and Setup.
Do I need to set R_LIBS_USER every time?
Yes, unless you add it to your shell profile (.bashrc, .zshrc, or equivalent). The repo-local .Rlib path keeps DSAMbayes and its dependencies isolated from your system R library.
Can I use renv instead of .Rlib?
Yes. The repository includes a renv.lock file. Run renv::restore() to install exact dependency versions. See the renv section in Install and Setup.
Modelling
How many weeks of data do I need?
There is no hard minimum, but as a rough guide:
BLM: 100+ weeks for a model with 10–15 predictors. Below 80 weeks, most media effects will be prior-driven.
Hierarchical: 80+ weeks per group, ideally with 4+ groups for meaningful partial pooling.
More data is always better. Short series with many predictors will lean heavily on priors.
Should I use identity or log response?
Identity (kpi ~ ...) when the KPI is naturally additive and variance is roughly constant across levels. Coefficients represent absolute unit changes.
Log (log(kpi) ~ ...) when the KPI is strictly positive, variance scales with level, or you want multiplicative (percentage) effects. Common for revenue and sales.
If unsure, fit both and compare diagnostics.
How many MCMC iterations do I need?
The defaults (iter = 2000, warmup = 1000, chains = 4) are a reasonable starting point. Check diagnostics after fitting:
Rhat < 1.01 and ESS > 400 → iterations are sufficient.
Rhat > 1.05 or ESS < 200 → increase iter and warmup (e.g. double both).
For rapid iteration during development, use MAP estimation (optimise()) instead of MCMC.
When should I set boundaries on media coefficients?
Set m_channel > 0 when you are confident that additional media exposure cannot decrease the KPI. This is the most common boundary specification in MMM. Do not set boundaries on control variables (trend, seasonality, price) unless you have a clear structural reason — see the Minimal-Prior Policy.
When should I use CRE (Mundlak)?
Use CRE when fitting a hierarchical model where:
Time-varying regressors (e.g. media spend) have group-level means correlated with unobserved group heterogeneity.
You want to separate within-group (temporal) effects from between-group (cross-sectional) effects.
CRE adds group-mean terms to the population formula. See CRE / Mundlak.
What does scale = TRUE do?
It standardises the response and predictors (centre and divide by SD) before passing them to Stan. This improves sampler efficiency by putting all coefficients on a comparable scale. Post-fit, get_posterior() back-transforms coefficients to the original data scale automatically. Leave it on (the default) unless you have a specific reason to disable it.
Runner and outputs
How long does a typical run take?
Model type
Fit method
Data size
Typical time
BLM
MCMC (2 chains, 2000 iter)
150 weeks
2–5 minutes
BLM
MAP (10 starts)
150 weeks
10–30 seconds
Hierarchical
MCMC (4 chains, 2000 iter)
150 weeks × 5 groups
10–30 minutes
Pooled
MCMC (4 chains, 2000 iter)
150 weeks
5–15 minutes
First-time Stan compilation adds 1–3 minutes.
What is the difference between validate and run?
validate checks config structure, data paths, formula validity, and cross-field constraints — without compiling or fitting Stan models. Use it as a pre-run gate.
run does everything validate does, then compiles, fits, runs diagnostics, and writes artefacts.
Always validate before run when you change config or data.
Where do outputs go?
By default, under results/<timestamp>_<model_name>/ with numbered stage folders. See Output Artefacts for the full contract.
How do I compare two model runs?
Use compare_runs() in R or compare 50_model_selection/loo_summary.csv files manually. See Compare Runs.
Diagnostics
What does “Pareto-k > 0.7” mean?
It means the PSIS-LOO approximation is unreliable for that observation — the observation is highly influential. A few amber (0.5–0.7) points are normal. Red (> 0.7) points warrant investigation. See Model Selection Plots.
My diagnostics say “warn” — should I worry?
It depends on the policy mode:
explore — warnings are expected during development. Continue iterating.
publish — review warnings before sharing results. Most warns are acceptable if you understand the cause.
It generates feasible spend allocations within channel bounds, evaluates each against the posterior, and selects the allocation that maximises the chosen objective (KPI uplift or profit). It is a Monte Carlo search, not an analytical optimiser. See Budget Optimisation.
Can I use budget optimisation with MAP-fitted models?
Yes, but the results will be based on a single point estimate rather than the full posterior distribution. Uncertainty intervals will not be meaningful.
Runner
Purpose
Document CLI and YAML runner contracts for reproducible DSAMbayes runs.
Audience
Users operating DSAMbayes through scripts/dsambayes.R.
Engineers maintaining runner config and artefact contracts.
Expected outcome: this resolves with defaults for all omitted sections and passes schema validation if the file exists and formula variables exist in data.
Section reference
schema_version
Key
Type
Default
Rules
schema_version
integer
1
Only 1 is supported.
data
Key
Type
Default
Rules
data.path
string
none
Required. File must exist. Relative path resolves from config directory.
data.format
string
inferred from file extension
Must be csv, rds, or long.
data.date_var
string or null
null
Required for data.format: long. Required when holidays are enabled. Required at runtime for time-series selection.
data.date_format
string or null
null
Optional date parser format for date columns.
data.na_action
string
omit
Must be omit or error during formula/data checks.
data.long_id_col
string or null
null
Required when data.format: long.
data.long_variable_col
string or null
null
Required when data.format: long.
data.long_value_col
string or null
null
Required when data.format: long.
data.dictionary_path
string or null
null
Optional CSV. Must exist if provided. Relative path resolves from config directory.
data.dictionary
mapping
{}
Optional inline metadata keyed by term name. Allowed fields: unit, cadence, source, transform, rationale.
Long-format-specific rules:
data.long_id_col, data.long_variable_col, data.long_value_col, and data.date_var must all be set.
These four column names must be distinct.
Long data is reshaped wide before modelling and duplicate key rows are rejected.
model
Key
Type
Default
Rules
model.name
string
config filename stem
Used in run folder slug.
model.formula
string
none
Required. Must parse to y ~ ....
model.type
string
auto
auto, blm, re, cre, pooled.
model.kpi_type
string
revenue
revenue or subscriptions.
model.scale
boolean
true
Controls internal scaling before fit.
model.force_recompile
boolean
false
Forces Stan recompile when true.
Model type resolution rules:
auto resolves to pooled if pooling.enabled: true.
auto resolves to cre if cre.enabled: true.
auto resolves to re if formula contains bar terms (for example (1 | group)).
auto resolves to blm otherwise.
re or cre requires bar terms in formula.
blm or pooled cannot be used with bar terms.
cre
Key
Type
Default
Rules
cre.enabled
boolean
false (or true when model.type: cre)
Cannot be true for model.type: blm or pooled.
cre.vars
list of strings
[]
Required and non-empty when cre.enabled: true.
cre.group
string or null
null
Grouping column used in CRE construction.
cre.prefix
string
cre_mean_
Prefix for generated CRE mean terms.
pooling
Key
Type
Default
Rules
pooling.enabled
boolean
false (or true when model.type: pooled)
Cannot be true for model.type: re or cre.
pooling.grouping_vars
list of strings
[]
Required and non-empty when pooling is enabled.
pooling.map_path
string or null
null
Required when pooling is enabled. File must exist. Relative path resolves from config directory.
pooling.map_format
string or null
inferred from map_path extension
Must be csv or rds.
pooling.min_waves
integer or null
null
If set, must be positive integer.
Pooling map requirements at model-build time:
Must include a variable column.
Must include every column named in pooling.grouping_vars.
variable values must be unique.
transforms
Key
Type
Default
Rules
transforms.mode
string
fixed_formula
Currently only fixed_formula is supported.
transforms.sensitivity.enabled
boolean
false
If true, requires fit.method: optimise.
transforms.sensitivity.scenarios
list
[]
Required and non-empty when sensitivity is enabled.
Each sensitivity scenario requires:
name (unique, non-empty, not base)
formula (safe formula string unless unsafe mode is enabled)
priors
Key
Type
Default
Rules
priors.use_defaults
boolean
true
Must be true in current runner version.
priors.overrides
list
[]
Optional sparse overrides.
Prior override row contract:
parameter (string, must exist in model prior table)
family (normal or lognormal_ms, default normal)
mean (numeric)
sd (numeric, > 0)
lognormal_ms extra constraints:
mean > 0
allowed only for noise_sd and parameters matching sd_<index>[<term>]
boundaries
Key
Type
Default
Rules
boundaries.overrides
list
[]
Optional parameter boundary overrides.
Boundary override row contract:
parameter (string, must exist in model boundary table)
lower (numeric, default -Inf)
upper (numeric, default Inf)
time_components
Key
Type
Default
Rules
time_components.enabled
boolean
false
Master toggle.
time_components.holidays.enabled
boolean
false
Enables holiday feature generation.
time_components.holidays.calendar_path
string or null
null
Required when holidays are enabled. CSV or RDS. Relative path resolves from config directory.
Includes DSAMbayes version, schema version, model/fit metadata, and sessionInfo().
10_pre_run
File
Controlled by
Written when
Notes
transform_assumptions.txt
outputs.save_transform_assumptions_txt
flag is true
Written even if transform sensitivity scenarios are disabled.
transform_sensitivity_summary.csv
outputs.save_transform_sensitivity_summary_csv
sensitivity object exists with rows
Requires transforms.sensitivity.enabled: true and successful scenario execution.
transform_sensitivity_parameters.csv
outputs.save_transform_sensitivity_parameters_csv
sensitivity object exists with rows
Parameter means/SD by scenario.
dropped_groups.csv
none
groups dropped by pooling.min_waves filter
Written only when sparse groups are excluded.
holiday_feature_manifest.csv
none
managed holidays enabled and features generated
Documents generated holiday terms and active-week counts.
design_matrix_manifest.csv
outputs.save_design_matrix_manifest_csv
flag is true and manifest non-empty
Per-term design metadata.
data_dictionary.csv
outputs.save_data_dictionary_csv
flag is true and dictionary table non-empty
Merges inline YAML metadata and optional CSV dictionary metadata.
spec_summary.csv
outputs.save_spec_summary_csv
flag is true and table available
Single-row model/spec summary.
vif_report.csv
outputs.save_vif_report_csv
flag is true and predictors available
VIF diagnostics for non-intercept predictors.
20_model_fit
File
Controlled by
Written when
Notes
model.rds
outputs.save_model_rds
flag is true
Fitted model object.
posterior.rds
outputs.save_posterior_rds
flag is true and MCMC fit
Raw posterior object for MCMC runs only.
fit_metrics_by_group.csv
implicit
fitted summary is computed
Written when any of save_fitted_csv, save_fit_png, save_residuals_csv, save_diagnostics_png is true.
fit_timeseries.png
outputs.save_fit_png
flag is true and ggplot2 installed
Observed vs fitted over time.
fit_scatter.png
outputs.save_fit_png
flag is true and ggplot2 installed
Observed vs fitted scatter.
30_post_run
File
Controlled by
Written when
Notes
observed.csv
outputs.save_observed_csv
flag is true
Observed response on model response scale.
observed_kpi.csv
outputs.save_observed_csv
flag is true and response scale is log
KPI-scale observed values (exp) with conversion_method = point_exp.
fitted.csv
outputs.save_fitted_csv
flag is true
Fitted summaries on model response scale.
fitted_kpi.csv
outputs.save_fitted_csv
flag is true and response scale is log
KPI-scale fitted summaries (exp).
posterior_summary.csv
outputs.save_posterior_summary_csv
flag is true and MCMC fit
Posterior summaries for coefficients and scalar diagnostics.
optimisation_runs.csv
none
fit.method: optimise
All optimisation starts.
optimisation_best.csv
none
fit.method: optimise
Best run by RMSE.
Implementation note:
decomp_predictor_impact.csv, decomp_predictor_impact.png, decomp_timeseries.csv, and decomp_timeseries.png are present in stage mapping and config flags, but are not currently invoked by write_run_artifacts() in the active pipeline.
40_diagnostics
File
Controlled by
Written when
Notes
chain_diagnostics.txt
outputs.save_chain_diagnostics_txt
flag is true and MCMC fit
Chain diagnostics text output.
diagnostics_report.csv
outputs.save_diagnostics_report_csv
flag is true and diagnostics object exists
One row per diagnostic check.
diagnostics_summary.txt
outputs.save_diagnostics_summary_txt
flag is true and diagnostics object exists
Counts by status and overall status.
residuals.csv
outputs.save_residuals_csv
flag is true and fitted summary is computed
Residual table on response scale.
residuals_timeseries.png
outputs.save_diagnostics_png
flag is true and ggplot2 installed
Residuals over time.
residuals_vs_fitted.png
outputs.save_diagnostics_png
flag is true and ggplot2 installed
Residuals vs fitted.
residuals_hist.png
outputs.save_diagnostics_png
flag is true and ggplot2 installed
Residual histogram.
residuals_acf.png
outputs.save_diagnostics_png
flag is true and ggplot2 installed
Residual autocorrelation plot.
residual_diagnostics.csv
none
diagnostics residual checks available
Ljung-Box / ACF check outputs.
residuals_latent.csv
none
diagnostics latent residuals available
Latent residual series from diagnostics object.
residuals_latent_acf.png
outputs.save_diagnostics_png
latent residuals available and ggplot2 installed
Latent residual ACF plot.
boundary_hits.csv
none
boundary-hit table available
Boundary-hit rates per parameter.
boundary_hits.png
outputs.save_diagnostics_png
boundary-hit table available and ggplot2 installed
Boundary-hit visualisation.
within_variation.csv
none
within-variation table available
Within-variation diagnostics for hierarchical terms.
within_variation.png
outputs.save_diagnostics_png
within-variation table available and ggplot2 installed
Decision-layer budget allocation, objectives, risk scoring, and response transforms
Subsections of Modelling
Model Classes
Purpose
DSAMbayes provides three model classes for Bayesian marketing mix modelling. Each class targets a different data structure and pooling strategy. This page describes the constructor pathways, fit support, and practical limitations of each class so that an operator can select the appropriate model for a given dataset.
Class summary
Class
S3 class chain
Constructor
Data structure
Grouping
Typical use case
BLM
blm
blm(formula, data)
Single market/brand
None
One-market regression with full prior and boundary control
Hierarchical
hierarchical, blm
blm(formula, data) with (term | group) syntax
Panel (long format)
Random effects by group
Multi-market models sharing strength across groups
Pooled
pooled, blm
pool(blm_obj, grouping_vars, map)
Single market
Structured coefficient pooling via dimension map
Single-market models with media coefficients pooled across labelled dimensions
blm() dispatches on the first argument. When passed a formula, it creates a blm object with default priors and boundaries. When passed an lm object, it creates a bayes_lm_updater whose priors are initialised from the OLS coefficient estimates and standard errors.
Terms to the left of | become random slopes; the variable to the right defines the grouping factor. Multiple grouping terms are supported.
CRE / Mundlak extension
For correlated random effects, call set_cre() after construction:
model<-set_cre(model,vars=c("m_tv","m_search"))
This augments the population formula with group-mean terms (cre_mean_*) and updates priors and boundaries accordingly. See CRE / Mundlak for details.
Fit support
Method
Function
Backend
MCMC
fit(model, ...)
rstan::sampling()
MAP
fit_map(model, n_runs, ...)
rstan::optimizing() (repeated starts)
Post-fit accessors
Same as BLM. Coefficient draws from get_posterior() return vectors (one value per group) rather than scalars. Budget optimisation uses the population-level (fixed-effect) coefficient draws from the beta parameter.
Limitations
Stan template compilation uses a templated source (general_hierarchical.stan) rendered per number of groups and parameterisation mode. First compilation is slow; subsequent runs use a cached binary.
Response decomposition via model.matrix() may fail for formulas containing | syntax. The runner wraps this in tryCatch and skips gracefully.
Posterior forest and prior-vs-posterior plots average group-specific draws to produce a single population-level estimate.
Offset support in the hierarchical Stan template is handled via stats::model.offset() within build_hierarchical_frame_data().
Pooled (pooled)
Construction
The pooled class is created by converting an existing BLM object with pool():
The map is a data frame with a variable column mapping formula terms to pooling dimension labels. Priors and boundaries are reset to defaults when pool() is called.
Fit support
Method
Function
Backend
MCMC
fit(model, ...)
rstan::sampling()
MAP fitting (fit_map) is not currently implemented for pooled models.
Post-fit accessors
Same as BLM. The design matrix is split into base terms (intercept + non-pooled) and media terms (pooled). The Stan template uses a per-dimension coefficient structure.
Limitations
MAP fitting is not available.
extract_stan_design_matrix() may return a zero-row matrix, which causes VIF computation to be skipped.
The pooled Stan cache key includes sorted grouping variable names to avoid collisions between different pooling configurations.
Time-series cross-validation is not supported for pooled models (rejected by config validation).
Class selection guide
Scenario
Recommended class
Rationale
Single market, sufficient data
BLM
Simplest pathway; full accessor and optimisation support
Single market, OLS baseline available
BLM via blm(lm_obj, data)
Priors initialised from OLS; Bayesian updating
Multi-market panel
Hierarchical
Partial pooling shares strength across markets
Multi-market panel with confounding concerns
Hierarchical + CRE
Mundlak terms control for between-group confounding
Single market with structured media dimensions
Pooled
Coefficient pooling across labelled media categories
Fit method selection
Criterion
MCMC (fit)
MAP (fit_map)
Full posterior
Yes
No (point estimate only)
Credible intervals
Yes
Approximate via repeated starts
Diagnostics (Rhat, ESS, divergences)
Yes
Not applicable
LOO-CV / model selection
Yes
Not supported
Speed
Minutes to hours
Seconds to minutes
Budget optimisation
Full posterior-based
Point-estimate-based
For production runs where diagnostics and uncertainty quantification matter, MCMC is the recommended fit method. MAP is useful for rapid iteration during model development.
DSAMbayes model objects (blm, hierarchical, pooled) are mutable S3 lists
that progress through a well-defined sequence of states. Understanding these
states helps avoid calling post-fit accessors on an unfitted object or
forgetting to compile before fitting.
This page defines how DSAMbayes specifies, defaults, overrides, and scales coefficient priors and parameter boundaries for all model classes. It covers the prior schema, supported families, default-generation logic, YAML override contract, and the interaction between priors, boundaries, and the scale=TRUE pathway.
Prior schema
Each model object carries a .prior tibble with one row per parameter. The columns are:
Column
Type
Meaning
parameter
character
Parameter name (matches design-matrix column or special name)
description
character
Human-readable label
distribution
call
R distribution call, e.g. normal(0, 5)
is_default
logical
Whether the row was generated by default_prior()
Supported prior families
Family
Stan encoding
Use case
normal(mean, sd)
Default (prior_family_noise_sd = 0)
Coefficient priors (location–scale)
lognormal_ms(mean, sd)
Encoded as prior_family_noise_sd = 1 with log-transformed parameters
noise_sd prior when positive-support is desired
All coefficient priors use normal(). The lognormal_ms family is available only for the noise_sd parameter and is parameterised by the mean and standard deviation on the original (non-log) scale; DSAMbayes converts these internally to log-space parameters.
Default prior generation
BLM and hierarchical (population terms)
default_prior.blm() calls standard_prior_terms(), which produces normal(0, 5) for each population-formula term (intercept and slope terms) plus a noise_sd entry.
Hierarchical (group-level standard deviations)
default_prior.hierarchical() additionally generates sd_<idx>[<term>] rows for each group factor. The prior standard deviation is set to the between-group standard deviation of the response, rounded to two decimal places.
BLM from lm (Bayesian updating)
default_prior.bayes_lm_updater() initialises coefficient priors from the OLS point estimates (mean) and standard errors (sd), enabling informative Bayesian updating.
Pooled
default_prior.pooled() uses the BLM defaults for non-pooled terms (intercept, base regressors, noise_sd) and normal(0, 5) for each dimension-level pooled coefficient.
Boundary schema
Each model object carries a .boundaries tibble with one row per parameter:
Column
Type
Meaning
parameter
character
Parameter name
description
character
Human-readable label
boundary
list-column
List with $lower and $upper (numeric scalars)
is_default
logical
Whether the row was generated by default_boundary()
Default boundaries are lower = -Inf, upper = Inf for all terms. No sign constraints are imposed by default.
Each override replaces the distribution call for the named parameter with normal(mean, sd). Overrides are sparse: only the listed parameters are changed; all other parameters keep their defaults.
When use_defaults: false, the default prior table is not generated. This is not recommended for typical use.
Each override replaces the boundary entry for the named parameter. YAML infinity tokens (.Inf, -.Inf) are coerced during config resolution.
Scale semantics (scale = TRUE)
When model.scale: true (the default), the response and predictors are standardised before Stan fitting. This affects both priors and boundaries.
Coefficient prior scaling
Prior standard deviations are scaled by the ratio sx / sy for slope terms and by 1 / sy for the intercept. The noise_sd prior standard deviation is multiplied by sy (the response standard deviation) to remain interpretable in the scaled space.
Boundary scaling
Zero boundaries (0) are invariant under scaling.
Infinite boundaries (±Inf) are invariant under scaling.
Finite non-zero boundaries for slope terms are scaled using scale_boundary_for_parameter(), which applies the same sx / sy ratio used for slope priors.
If a finite non-zero boundary is specified for a parameter without a matching scale factor in the design matrix, DSAMbayes aborts with a validation error.
Practical implication
Users specify priors and boundaries on the original (unscaled) data scale. DSAMbayes converts them internally before passing data to Stan. Post-fit, coefficient draws are back-transformed to the original scale by get_posterior().
Interaction with model classes
Behaviour
BLM
Hierarchical
Pooled
Default priors
normal(0, 5) per term
Population: same as BLM; group SD: data-derived
Non-pooled: BLM defaults; pooled: normal(0, 5) per dimension
The recommended operating profile for MMM is documented in Minimal-Prior Policy. The policy keeps priors weak by default and uses hard constraints only when there is structural business knowledge.
Cross-references
Model Classes — constructor and fit support per class
Do override when domain mechanism is stable and defensible.
Do not override only to improve one run’s fit metrics.
Do not add bounds if sign can plausibly flip under promotion, pricing, or
substitution effects.
Review checklist
Are overrides fewer than the number of major business assumptions?
Is each bound tied to a concrete causal rationale?
Did diagnostics indicate a real identifiability problem before tightening?
Response Scale Semantics
Purpose
DSAMbayes models can operate on an identity (level) or log response scale. This page defines how response scale is detected, stored, and used for post-fit reporting, so that operators understand which scale their outputs are on and how KPI-scale conversions work.
Response scale detection
Response scale is determined at construction time by detect_response_scale(), which inspects the left-hand side of the formula:
Formula LHS
Detected transform
Response scale label
kpi ~ ...
identity
response_level
log(kpi) ~ ...
log
response_log
The detected value is stored in two model-object fields:
.response_transform — "identity" or "log". Describes the mathematical transform applied to the response before modelling.
.response_scale — "identity" or "log". Used as a label when reporting whether outputs are on the model scale or the KPI scale.
Both fields are set by the constructor and confirmed by pre_flight_checks().
Model scale vs KPI scale
Concept
Identity response
Log response
Model scale
Raw KPI units
Log of KPI units
KPI scale
Same as model scale
exp() of model scale
Coefficient interpretation
Unit change in KPI per unit change in predictor
Approximate percentage change in KPI per unit change in predictor
For identity-response models, model scale and KPI scale are identical. For log-response models, fitted values and residuals on the model scale are in log units and must be exponentiated to obtain KPI-scale values.
Post-fit accessors and scale behaviour
fitted() — model scale
fitted() returns predicted values on the model scale. For identity-response models this is the KPI scale. For log-response models this is the log scale.
fit_tbl<-fitted(model)# fit_tbl$fitted is on model scale
fitted_kpi() — KPI scale
fitted_kpi() applies the inverse transform draw-wise before summarising. For log-response models the default conversion (since v1.2.2) uses the conditional-mean estimator:
This is the bias-corrected back-transform that accounts for the log-normal variance term. The previous behaviour (v1.2.0) used the simpler exp(mu) estimator, which corresponds to the conditional median on the KPI scale. To retain that behaviour, pass log_response = "median":
# Default (v1.2.2): conditional mean — bias-correctedkpi_tbl<-fitted_kpi(model)# Explicit median — equivalent to pre-v1.2.2 behaviourkpi_tbl<-fitted_kpi(model,log_response="median")
The output includes source_response_scale (the model’s response scale), response_scale = "kpi", and conversion_method ("conditional_mean" or "point_exp") to label the result.
observed() — model scale
observed() returns the observed response on the model scale after unscaling (if scale=TRUE).
observed_kpi() — KPI scale
observed_kpi() returns the observed response on the KPI scale. For log-response models, this applies exp() to the model-scale observed values.
to_kpi_scale() helper
The internal function to_kpi_scale(x, response_scale) implements the conversion:
If response_scale == "log": returns exp(x).
Otherwise: returns x unchanged.
This function is used consistently by fitted_kpi(), observed_kpi(), and runner artefact writers.
Runner artefact scale conventions
Runner artefact writers use the response scale metadata to determine which scale to report:
Artefact
Scale
Notes
fitted.csv
Model scale
Direct output from fitted()
observed.csv
Model scale
Direct output from observed()
posterior_summary.csv
Model scale
Coefficient summaries on model scale
Fit time series plot
KPI scale
Uses fitted_kpi() and observed_kpi() for visual comparison
Fit scatter plot
KPI scale
Same as fit time series
Diagnostics (residuals)
Model scale
Residuals computed on model scale
Budget optimisation outputs
KPI scale
Response curves and allocations reported on KPI scale
Interaction with scale = TRUE
The scale flag and response scale are orthogonal:
scale = TRUE standardises predictors and response by centring and dividing by standard deviation before Stan fitting. Coefficients and fitted values are back-transformed to the original scale by get_posterior().
Response scale determines whether the original scale is levels (identity) or logs (log).
Both transformations compose: a log-response model with scale=TRUE first takes the log of the response (via the formula), then standardises the logged values. Post-fit, draws are first unscaled, then (for KPI-scale outputs) exponentiated.
Jensen’s inequality and draw-wise conversion
When converting log-scale posterior draws to KPI scale, DSAMbayes applies exp() to each draw individually before computing summaries (mean, median, credible intervals). This is the correct Bayesian approach because:
E[exp(X)] ≠ exp(E[X]) when X has non-zero variance (Jensen’s inequality).
Draw-wise conversion preserves the full posterior distribution on the KPI scale.
The correlated random effects (CRE) pathway, implemented as a Mundlak device, augments hierarchical DSAMbayes models with group-mean terms. This separates within-group variation from between-group variation for selected regressors, reducing confounding bias when group-level means are correlated with the random effects.
When to use CRE
Use CRE when:
The model is hierarchical (panel data with (term | group) syntax).
Time-varying regressors (e.g. media spend) have group-level means that may be correlated with the group intercept or slope.
You want to decompose effects into within-group (temporal) and between-group (cross-sectional) components.
Do not use CRE when:
The model is BLM or pooled (CRE requires hierarchical class).
The panel has only one group (no between-group variation exists).
All regressors of interest are time-invariant (CRE mean terms would be constant).
Construction
CRE is applied after model construction via set_cre():
Resolves the grouping variable. If the formula has one group factor, it is used automatically. If multiple group factors exist, the group argument must be specified explicitly.
Generates group-mean column names. For each variable in vars, a mean-term column is named cre_mean_<variable> (configurable via prefix).
Augments the data.apply_cre_data() computes group-level means of each CRE variable and joins them back to the panel data as new columns.
Updates the formula. The generated mean terms are appended to the population formula as fixed effects.
Extends priors and boundaries. Default prior and boundary entries are added for each new mean term, matching the existing prior schema.
The runner calls set_cre() during model construction if cre.enabled: true.
Mundlak decomposition
For a regressor $x_{gt}$ (group $g$, time $t$), the Mundlak device decomposes the effect into:
Within-group effect: the coefficient on $x_{gt}$ in the population formula captures temporal variation after conditioning on the group mean.
Between-group effect: the coefficient on $\bar{x}_g$ (the CRE mean term) captures cross-sectional variation in group-level averages.
The original coefficient on $x_{gt}$ in a standard random-effects model conflates both sources. Adding $\bar{x}_g$ as a fixed effect separates them.
Validation and identification warnings
Input validation
set_cre() validates:
The model is hierarchical (aborts for BLM or pooled).
All vars are present in the data and are numeric.
The group variable exists in the formula’s group factors.
No CRE mean terms appear in random-slope blocks (would cause double-counting).
Identification warnings
warn_cre_identification() checks two conditions after CRE setup:
More CRE variables than groups. If length(vars) > n_groups, between-effect estimates may be weakly identified. The function emits a warning.
Near-zero within-group variation. For each CRE variable, the within-group residual ($x_{gt} - \bar{x}_g$) standard deviation is checked. If it is effectively zero, within-effect identification is weak. The function emits a per-variable warning.
Zero-variance CRE mean terms
If a CRE mean term has zero variance across all observations (possible when the underlying variable has identical group means), calculate_scaling_terms() in R/scale.R will abort when scale=TRUE. The error message identifies the constant CRE columns and suggests using model.type: re (without CRE) or model.scale: false as workarounds.
Panel assumptions
Balanced panels are not required. apply_cre_data() computes group means using dplyr::group_by() and mean(), which handles unequal group sizes.
Missing values in CRE variables are excluded from the group-mean calculation (na.rm = TRUE).
Group-mean recomputation. CRE mean columns are recomputed each time apply_cre_data() is called, including during prep_data_for_fit.hierarchical(). Existing CRE mean columns are dropped and regenerated to prevent stale values.
Decomposition and reporting
CRE mean terms appear as ordinary fixed-effect terms in the population formula. This means:
Posterior summary includes CRE mean-term coefficients alongside other population coefficients.
Response decomposition via decomp() attributes fitted-value contributions to CRE mean terms separately from their within-group counterparts.
Plots (posterior forest, prior-vs-posterior) include CRE mean terms.
Interpretation note: the CRE mean-term coefficient represents the between-group effect conditional on the within-group variation. It does not represent the total effect of the underlying variable.
DSAMbayes provides managed time-component generation through the time_components config section. When enabled, the runner deterministically generates holiday feature columns from a calendar file and optionally appends them to the model formula. This page defines the configuration contract, generation logic, naming conventions, and audit properties.
Overview
Time components in DSAMbayes cover:
Holidays — deterministic weekly indicator features derived from an external calendar file.
Trend and seasonality — specified directly in the model formula (e.g. t_scaled, sin52_1, cos52_1). These are not generated by the time-components system; they are user-supplied columns in the data.
The time_components system is responsible only for holiday feature generation.
YAML configuration
time_components:enabled:trueholidays:enabled:truecalendar_path:data/holidays.csvdate_col: null # auto-detected:date, ds, or event_datelabel_col:holidaydate_format:null# null = ISO 8601; or e.g. "%d/%m/%Y"week_start:mondaytimezone:UTCprefix:holiday_window_before:0window_after:0aggregation_rule:count# count | anyoverlap_policy:count_all# count_all | dedupe_label_dateadd_to_formula:trueoverwrite_existing:false
Key definitions
Key
Default
Description
enabled
false
Master toggle for the time-components system
holidays.enabled
false
Toggle for holiday feature generation
holidays.calendar_path
null
Path to the holiday calendar CSV (resolved relative to the config file)
holidays.date_col
null
Date column in the calendar; auto-detected from date, ds, or event_date
holidays.label_col
holiday
Column containing holiday event labels
holidays.date_format
null
Date parse format; null assumes ISO 8601
holidays.week_start
monday
Day-of-week anchor for weekly aggregation
holidays.timezone
UTC
Timezone used when parsing POSIX date-time inputs
holidays.prefix
holiday_
Prefix prepended to generated feature column names
holidays.window_before
0
Days before each event date to include in the holiday window
holidays.window_after
0
Days after each event date to include in the holiday window
holidays.aggregation_rule
count
Weekly aggregation: count sums event-days per week; any produces a binary indicator
holidays.overlap_policy
count_all
Overlap handling: count_all counts every event-day; dedupe_label_date deduplicates per label and date
holidays.add_to_formula
true
Whether generated holiday terms are appended to the model formula automatically
holidays.overwrite_existing
false
Whether existing columns with matching names are overwritten
Calendar file contract
The holiday calendar is a CSV (or data frame) with at minimum:
Column
Required
Content
Date column
Yes
Daily event dates (one row per event occurrence)
Label column
Yes
Human-readable event name (e.g. Christmas, Black Friday)
Date column detection
If date_col is null, the system tries column names in order: date, ds, event_date. If none is found, validation aborts.
Label normalisation
Holiday labels are normalised to lowercase, alphanumeric-plus-underscore form via normalise_holiday_label(). For example:
Black Friday → black_friday
New Year's Day → new_year_s_day
Empty labels → unnamed
The generated feature column name is {prefix}{normalised_label}, e.g. holiday_black_friday.
Generation pipeline
The runner calls build_weekly_holiday_features() with the following steps:
Parse and validate the calendar.validate_holiday_calendar() checks column presence, date parsing, and label completeness.
Expand holiday windows.expand_holiday_windows() replicates each event row across the [event_date - window_before, event_date + window_after] range.
Align to weekly index. Each expanded event-day is mapped to its containing week using week_floor_date() with the configured week_start.
Aggregate per week. Events are counted per week per feature. Under aggregation_rule: any, counts are collapsed to binary (0/1). Under overlap_policy: dedupe_label_date, duplicate label-date pairs within a week are removed before counting.
Join to model data. The generated feature matrix is left-joined to the model data by the date column. Weeks with no events receive zero.
Append to formula. If add_to_formula: true, generated feature columns are appended as additive terms to the population formula.
Weekly anchoring
All weekly alignment uses week_floor_date(), which computes the most recent occurrence of week_start on or before each date. The model data’s date column must contain week-start-aligned dates; normalise_weekly_index() validates this and aborts if dates are not aligned.
Calendar dates are parsed using the configured timezone (default UTC).
If the calendar contains POSIXt values, they are coerced to Date in the configured timezone.
Character dates are parsed as ISO 8601 by default, or using date_format if specified.
Generated-term audit contract
Generated holiday terms are tracked for downstream diagnostics and reporting:
The list of generated term names is stored in model$.runner_time_components$generated_terms.
The identifiability gate in R/diagnostics_report.R uses this list to auto-detect baseline terms (via detect_baseline_terms()), so generated holiday terms are included in baseline-media correlation checks without requiring explicit configuration.
Feature naming collision
If two different holiday labels normalise to the same feature name, build_weekly_holiday_features() aborts with a collision error. Ensure calendar labels are distinct after normalisation.
Interaction with existing data columns
If overwrite_existing: false (default), the runner aborts if any generated column name already exists in the data.
If overwrite_existing: true, existing columns with matching names are replaced by the generated features.
Practical guidance
Start with aggregation_rule: count to capture multi-day holiday effects (e.g. a holiday spanning two days in one week produces a count of 2).
Use window_before and window_after for events with known anticipation or lingering effects (e.g. window_before: 7 for pre-Christmas shopping).
Use aggregation_rule: any when you want binary holiday indicators regardless of how many event-days fall in a week.
Check generated terms in the resolved config (config.resolved.yaml) and posterior summary to confirm which holidays entered the model.
DSAMbayes runs a deterministic diagnostics framework after model fitting. Each diagnostic check produces a pass, warn, or fail status. The policy mode controls how lenient or strict the thresholds are. This page defines the check taxonomy, threshold tables, policy modes, identifiability gate, and the overall status aggregation rule.
Policy modes
The diagnostics framework supports three policy modes, configured via diagnostics.policy_mode in YAML:
Mode
Intent
Threshold behaviour
explore
Rapid iteration during model development
Relaxed fail thresholds; many checks can only warn, not fail
publish
Default production mode for shareable outputs
Balanced thresholds; condition-number fail is downgraded to warn
strict
Audit-grade gating for release candidates
Tightest thresholds; rank deficit fails rather than warns
The mode is resolved by diagnostics_policy_thresholds(mode) in R/diagnostics_report.R.
In explore mode, fail thresholds are substantially relaxed (e.g. rhat_fail = 1.10, ess_bulk_fail = 50). In strict mode, warn thresholds match publish fail thresholds.
P1 residual checks
Check ID
Metric
Direction
Warn
Fail
resid_ljung_box_p
resid_lb_p
Higher is better
0.05
0.01
resid_acf_max
resid_acf_max
Lower is better
0.20
0.40
Mode adjustments for residual checks
Mode
resid_lb_p warn
resid_lb_p fail
resid_acf warn
resid_acf fail
explore
0.05
0.00 (cannot fail)
0.20
∞ (cannot fail)
publish
0.05
0.01
0.20
0.40
strict
0.10
0.05
0.15
0.30
P1 boundary hit check
Check ID
Metric
Direction
Warn
Fail
boundary_hit_fraction
boundary_hit_frac
Lower is better
0.05
0.20
In explore mode, boundary hits cannot fail. In strict mode, thresholds tighten to warn > 0.02, fail > 0.10.
P1 within-group variation check
Check ID
Metric
Direction
Warn
Fail
within_var_ratio
within_var_min_ratio
Higher is better
0.10
0.05
This check applies to hierarchical models and flags groups where within-group variation is extremely low relative to between-group variation. In explore mode, the fail threshold is zero (cannot fail).
Identifiability gate
The identifiability gate measures the maximum absolute correlation between baseline terms and media terms in the design matrix. It is configured via diagnostics.identifiability in YAML:
DSAMbayes provides a decision-layer budget optimisation engine that operates on fitted model posteriors. Given a channel scenario with spend bounds, response-transform specifications, and an objective function, the engine searches for the allocation that maximises the chosen objective while respecting channel-level constraints. This page defines the inputs, objectives, risk scoring, response-scale handling, and output structure.
Overview
Budget optimisation is separate from parameter estimation. It takes a fitted model and a scenario specification, then:
Extracts posterior coefficient draws for the scenario’s channel terms.
Generates feasible candidate allocations within channel bounds that sum to the total budget.
Evaluates each candidate across all posterior draws to obtain a distribution of KPI outcomes.
Ranks candidates by the configured objective and risk scoring function.
Returns the best allocation, channel-level summaries, response curves, and impact breakdowns.
The optimize_budget() alias is also available for American English convention.
Scenario specification
The scenario is a structured list with the following top-level keys:
channels
A list of channel definitions, each containing:
Key
Required
Default
Description
term
Yes
—
Model formula term name for this channel
name
No
Same as term
Human-readable channel label
spend_col
No
Same as name
Data column used for reference spend lookup
bounds.min
No
0
Minimum allowed spend for this channel
bounds.max
No
Inf
Maximum allowed spend for this channel
response
No
{type: "identity"}
Response transform specification
currency_col
No
null
Data column for currency-unit conversion
Channel names and terms must be unique across the scenario.
budget_total
Total budget to allocate across all channels. All feasible allocations sum to this value.
reference_spend
Optional named list of per-channel reference spend values. If not provided, reference spend is estimated from the mean of the spend_col in the model’s original data.
objective
Defines the optimisation target and risk scoring:
Key
Values
Description
target
kpi_uplift, profit
What to maximise
value_per_kpi
numeric (required for profit)
Currency value of one KPI unit
risk.type
mean, mean_minus_sd, quantile
Risk scoring function
risk.lambda
numeric ≥ 0 (for mean_minus_sd)
Penalty weight on posterior standard deviation
risk.quantile
(0, 1) (for quantile)
Quantile level for pessimistic scoring
Response transforms
Each channel can specify a response transform that maps raw spend to the transformed value used in the linear predictor. Supported types:
Type
Formula
Parameters
identity
spend
None
atan
atan(spend / scale)
scale (positive scalar)
log1p
log(1 + spend / scale)
scale (positive scalar)
hill
spend^n / (spend^n + k^n)
k (half-saturation), n (shape)
The response transform is applied within response_transform_value() and determines the shape of the channel’s response curve.
Objective functions
kpi_uplift
Maximises the expected change in KPI relative to the reference allocation. The metric for each candidate is:
where $\Delta\text{spend} = \text{candidate total} - \text{reference total}$.
Risk-aware scoring
The risk scoring function determines how the distribution of objective draws is summarised into a single score for ranking candidates:
Risk type
Score formula
Use case
mean
$\bar{m}$
Risk-neutral; maximises expected value
mean_minus_sd
$\bar{m} - \lambda \cdot \sigma$
Penalises uncertainty; higher $\lambda$ is more conservative
quantile
$Q_\alpha(m)$
Optimises the $\alpha$-quantile; directly targets worst-case outcomes
Coefficient extraction
BLM and pooled models
Coefficient draws are extracted via get_posterior() and indexed by the scenario’s channel terms.
Hierarchical models
For hierarchical MCMC models, the population-level (fixed-effect) beta draws are extracted directly from the Stan posterior. If the model was fitted with scale=TRUE, draws are back-transformed to the original scale before optimisation. This ensures that optimisation operates on the population effect rather than group-specific random-effect totals.
Draw thinning
If max_draws is specified, a random subsample of posterior draws is used for computational efficiency. The subsampling uses the configured seed for reproducibility.
Response-scale handling
Budget optimisation handles both identity and log response scales:
Identity response: $\Delta\text{KPI}$ is the difference in linear-predictor draws between candidate and reference allocations.
Log response: $\Delta\text{KPI}$ is computed via kpi_delta_from_link_levels(), which correctly accounts for the exponential back-transformation. If kpi_baseline is available, the delta is expressed in absolute KPI units; otherwise, it is expressed as a relative change.
The delta_kpi_from_link() and kpi_delta_from_link_levels() functions ensure Jensen-safe conversions by operating draw-wise.
Feasible allocation generation
sample_feasible_allocation() generates random allocations that:
Respect per-channel lower bounds.
Respect per-channel upper bounds.
Sum exactly to budget_total.
Allocation is performed by distributing remaining budget (after lower bounds) using exponential random weights, iteratively filling channels until the budget is exhausted. project_to_budget() ensures exact budget equality via proportional adjustment.
Output structure
optimise_budget() returns a budget_optimisation object containing:
Field
Content
best_spend
Named numeric vector of optimal per-channel spend
best_score
Objective score of the best allocation
channel_summary
Tibble with per-channel reference vs optimised spend, response, ROI, CPA, and deltas
curves
List of per-channel response curve tibbles (spend grid × mean/lower/p50/upper)
points
Tibble of reference and optimised points per channel with confidence intervals
impact
Waterfall-style tibble of per-channel KPI contribution and interaction residual
objective_cfg
Echo of the objective configuration
scenario
Echo of the input scenario
model_metadata
Model class, response scale, and scale flag
Runner integration
When allocation.enabled: true in YAML, the runner calls optimise_budget() after fitting and writes artefacts under 60_optimisation/:
Artefact
Content
allocation_summary.csv
Channel summary table
response_curves.csv
Response curve data for all channels
allocation_impact.csv
Waterfall impact breakdown
Plot PNGs
Response curves, ROI/CPA panel, allocation waterfall, and other visual outputs
Constraints and guardrails
Budget feasibility: if channel lower bounds sum to more than budget_total, the engine aborts.
Upper bound capacity: if channel upper bounds cannot accommodate the full budget, the engine aborts.
Missing terms: if a scenario term is not found in the posterior coefficients, the engine aborts with a descriptive error.
Offset + scale combination: for bayes_lm_updater models, optimise_budget() aborts if scale=TRUE and an offset is present.
Cross-references
Model Classes — fit support and posterior extraction per class
This section documents every plot the DSAMbayes runner produces. Each page covers one pipeline stage, describes what the plot shows, explains when and why the runner generates it, and gives practical interpretation guidance. The target reader is a modelling operator or analyst who needs to assess run quality without reading source code.
Pipeline stages
The runner writes artefacts into timestamped directories under results/. Plots are organised into six stages, each with its own subdirectory:
Stage
Directory
Role
Page
Pre-run
10_pre_run/
Data quality and input sanity checks before fitting
R/run_artifacts_enrichment.R — wiring for fit-stage and pre-run plots
R/run_artifacts_diagnostics.R — wiring for diagnostics and model selection plots
Subsections of Plots
Pre-run Plots
Purpose
Pre-run plots are generated before the model is fitted. They visualise the input data and flag structural problems — multicollinearity, missing spend periods, implausible KPI–media relationships — that could compromise inference. Treat these as a data quality gate: review them before interpreting any downstream output.
All pre-run plots are written to 10_pre_run/ within the run directory. They require ggplot2 and are generated by write_pre_run_plots() in R/run_artifacts_enrichment.R. The runner produces them whenever an allocation.channels block is present in the configuration and the data contains the referenced spend columns.
Design matrix extractable with >1 predictor and >1 row
Media spend time series
Filename:media_spend_timeseries.png
What it shows
A stacked area chart of weekly media spend by channel, drawn from the raw spend_col columns declared in the allocation configuration. The x-axis is the date variable; the y-axis is spend in model units.
When it is generated
The runner generates this plot when:
The configuration includes an allocation.channels block.
At least one declared spend_col exists in the input data.
If no valid spend columns are found, the plot is silently skipped.
How to interpret it
Look for three things. First, check that each channel has plausible seasonal patterns and no unexpected gaps — zero-spend weeks in the middle of a campaign period suggest data ingestion problems. Second, verify that the relative magnitudes make sense: if TV dominates the stack but the brand has historically been digital-first, the data may be mislabelled or aggregated incorrectly. Third, confirm that the date range matches the modelling window declared in the configuration.
Warning signs
Flat channels: A channel with constant spend across all weeks contributes no variation and cannot be identified by the model. The coefficient will be driven entirely by the prior.
Sudden jumps or drops: Step changes in spend that do not correspond to known campaign events may indicate data joins across sources with different reporting conventions.
Missing periods: Gaps where spend drops to zero mid-series can distort adstock calculations if the model applies geometric decay.
Action
If a channel shows no variation, consider removing it from the formula or fixing the upstream data. If gaps are genuine (e.g. a seasonal channel), confirm the adstock specification handles zero-spend periods correctly.
Related artefacts
data_dictionary.csv in 10_pre_run/ provides summary statistics for every input column.
KPI–media overlay
Filename:kpi_media_overlay.png
What it shows
A dual-axis time series with the KPI response variable on the left axis (blue) and total media spend (sum of all declared spend_col values) on the right axis (red, rescaled to share the vertical space). This is a visual correlation check, not a causal claim.
When it is generated
The runner generates this plot when:
The configuration includes an allocation.channels block with at least one valid spend_col.
The response variable exists in the data.
If the total spend has zero variance, the plot is skipped.
How to interpret it
The overlay reveals whether KPI and aggregate spend move together over time. A rough co-movement is expected in MMM data — media drives response — but the relationship need not be tight. Seasonal KPI peaks that precede or lag media bursts suggest confounding (e.g. demand-driven spend timing). Divergences where spend rises but KPI falls (or vice versa) are worth investigating: they may reflect diminishing returns, competitor activity, or a structural break in the data.
Warning signs
Perfect alignment: If the two series track each other almost exactly, the model may be fitting spend timing rather than incremental media effects.
Opposite trends: A persistent negative relationship between total spend and KPI suggests reverse causality or omitted-variable bias.
Scale artefacts: The dual-axis rescaling can exaggerate or suppress visual correlation. Do not draw quantitative conclusions from this plot.
Action
Use this plot as a sanity check only. If the relationship looks implausible, investigate the data and consider whether the formula includes adequate controls for seasonality, trend, and external factors.
Variance inflation factor (VIF) bar chart
Filename:vif_bar.png
What it shows
A horizontal bar chart of variance inflation factors for each predictor in the model’s design matrix. Bars are colour-coded by severity: green (VIF < 5), amber (5 ≤ VIF < 10), and red (VIF ≥ 10). Dashed vertical lines mark the 5 and 10 thresholds.
When it is generated
The runner generates this plot when:
The design matrix has more than one predictor column and more than one row.
The VIF computation does not encounter a singular or degenerate correlation matrix.
For pooled models, the design matrix extraction may return zero rows, in which case the plot is skipped.
How to interpret it
VIF measures how much the variance of a coefficient estimate inflates due to correlation with other predictors. A VIF of 1 means no multicollinearity; a VIF of 10 means the standard error is roughly three times larger than it would be with orthogonal predictors. In Bayesian MMM, high VIF does not break inference the way it does in OLS — priors regularise the estimates — but it does reduce the data’s ability to inform the posterior, making results more prior-dependent.
Warning signs
VIF > 10 on media channels: The model cannot reliably separate the effects of those channels. Posterior estimates will lean heavily on the prior. Consider whether the channels can be combined or whether one should be dropped.
VIF > 10 on seasonality terms: Common and usually harmless if the terms are included as controls rather than as interpretive outputs.
All terms moderate or high: The overall collinearity structure may be too severe for the data length. Consider increasing the sample size or simplifying the formula.
Action
Review the top-VIF terms. If two media channels are highly collinear (e.g. search and affiliate), consider whether they can be meaningfully separated given the available data. If not, combine them or use informative priors to anchor the split.
Related artefacts
design_matrix_manifest.csv in 10_pre_run/ lists all design matrix columns with variance and uniqueness statistics.
spec_summary.csv in 10_pre_run/ summarises the model specification.
Model fit plots summarise the posterior and compare fitted values against observed data. They answer two questions: does the model track the response variable adequately, and are the estimated coefficients plausible? These plots are written to 20_model_fit/ within the run directory.
The runner generates them via write_model_fit_plots() in R/run_artifacts_enrichment.R. All four plots require ggplot2 and the fitted model object. Each is wrapped in tryCatch so that a failure in one does not prevent the others from being written.
Plot catalogue
Filename
What it shows
Conditions
fit_timeseries.png
Observed vs fitted over time with 95% credible band
Always generated after a successful fit
fit_scatter.png
Observed vs fitted scatter
Always generated after a successful fit
posterior_forest.png
Coefficient point estimates with 90% CIs
Posterior draws available via get_posterior()
prior_posterior.png
Prior-to-posterior density shift for media terms
Model has a .prior table with media (m_*) parameters
Fit time series
Filename:fit_timeseries.png
What it shows
The observed KPI (orange) and posterior mean fitted values (blue) plotted over time, with a shaded 95% credible interval band. The subtitle reports in-sample fit metrics: R², RMSE, MAE, mean error (bias), sMAPE, 95% prediction interval coverage, lag-1 ACF of residuals, and sample size. For hierarchical models the plot facets by group.
When it is generated
Always, provided the model has been fitted successfully and the fit table (observed, mean, percentiles) can be computed.
How to interpret it
The fitted line should track the general level and seasonal pattern of the observed series. The 95% credible band should contain most observed points — the subtitle reports the actual coverage, which should be close to 95%. Systematic departures reveal model misspecification: if the fitted line consistently overshoots during holidays or undershoots during quiet periods, the formula may lack appropriate seasonal or event terms.
Warning signs
Coverage well below 95%: The model underestimates uncertainty. Common when the noise prior is too tight or the model is overfit to a subset of the data.
Coverage well above 95%: The credible interval is too wide. The model is underfit or the noise prior is too diffuse.
Persistent bias (ME far from zero): The model systematically over- or under-predicts. Check for missing structural terms (trend, level shifts, intercept misspecification).
High lag-1 ACF (> 0.3): Residuals are autocorrelated. The model is missing temporal structure — consider adding lagged terms or checking adstock specifications.
Action
If coverage or bias is unacceptable, revisit the formula (missing controls, wrong functional form) or the prior specification (overly tight noise SD). Cross-reference with the residuals diagnostics for a more detailed picture.
Related artefacts
fit_metrics_by_group.csv in 20_model_fit/ provides the same metrics in tabular form, broken down by group for hierarchical models.
Fit scatter
Filename:fit_scatter.png
What it shows
A scatter plot of observed values (y-axis) against posterior mean fitted values (x-axis), with a 45-degree reference line. Points on the line indicate perfect fit. For hierarchical models the plot facets by group.
When it is generated
Always, provided the fit table is available.
How to interpret it
Points should cluster tightly around the diagonal. Curvature away from the line suggests a systematic misfit — for instance, if the model underpredicts at high KPI values, the response may need a nonlinear term or a log transformation. Outliers far from the line warrant investigation: they may correspond to anomalous weeks (data errors, one-off events) that the model cannot capture.
Warning signs
Fan shape (wider scatter at higher values): Heteroscedasticity. A log-scale model or a variance-stabilising transform may be more appropriate.
Systematic curvature: The mean function is misspecified. Consider adding polynomial or interaction terms.
Isolated outliers: Check the dates of extreme residuals against the residuals time series and the input data for data quality issues.
Action
If the scatter reveals non-constant variance, consider fitting on the log scale (model.scale or a log-transformed formula). If curvature is evident, review the functional form of media transforms and control variables.
Posterior forest plot
Filename:posterior_forest.png
What it shows
A horizontal forest plot of posterior coefficient estimates. Each row is a model term (excluding the intercept). The point marks the posterior median; the horizontal bar spans the 5th to 95th percentile (90% credible interval). Terms whose interval excludes zero are drawn in colour; those consistent with zero are grey.
For hierarchical models, the plot displays population-level (group-averaged) estimates.
When it is generated
The runner generates this plot when posterior draws are available via get_posterior(). It is skipped if the posterior extraction fails.
How to interpret it
Focus on the media coefficients. Positive values indicate that higher media exposure is associated with higher KPI, which is the expected direction for most channels. The width of the interval reflects estimation precision: a narrow interval means the data informed the estimate strongly; a wide interval means the prior dominates.
Terms ordered by absolute magnitude (bottom to top) give a quick ranking of effect sizes, but note that these are on the model’s internal scale. For models fitted on the log scale, coefficients represent approximate percentage effects; for levels models, they represent absolute KPI units per unit of the transformed media input.
Warning signs
Media coefficient crosses zero: The model cannot confidently distinguish the channel’s effect from noise. This is not necessarily wrong — some channels may genuinely have weak effects — but it warrants scrutiny, especially if the prior was informative.
Implausibly large coefficients: Check for scaling issues. If model.scale: true, coefficients are on the standardised scale and must be interpreted accordingly.
All intervals very wide: The data may not have enough variation to identify individual effects. Review the VIF bar chart for multicollinearity.
Action
If a media coefficient is unexpectedly negative, investigate whether the data supports it (e.g. counter-cyclical spend) or whether multicollinearity is pulling the estimate. Cross-reference with the prior vs posterior plot to see how far the data moved the estimate from its prior.
Prior vs posterior
Filename:prior_posterior.png
What it shows
Faceted density plots for each media coefficient (m_* parameters). The grey distribution is the prior (Normal, as specified in the model’s .prior table); the blue distribution is the posterior (estimated from MCMC draws). Overlap indicates that the data did not strongly inform the estimate; separation indicates data-driven updating.
For hierarchical models, posterior draws are averaged across groups to show the population-level density.
When it is generated
The runner generates this plot when:
The model has a .prior table (i.e. it is a requires_prior model).
The prior table contains at least one m_* parameter.
Posterior draws are available.
If the model has no prior table (e.g. a pure OLS updater), the plot is skipped.
How to interpret it
A well-identified coefficient shifts noticeably from prior to posterior. If the two densities sit on top of each other, the data provided little information for that channel — the estimate is prior-driven. This is not inherently wrong (the prior may be well-calibrated from previous studies), but it does mean the current dataset alone cannot validate the estimate.
Warning signs
No shift at all: The channel has insufficient variation or is too collinear with other terms for the data to update the prior. The resulting coefficient is essentially assumed, not estimated.
Posterior much narrower than prior: Expected and healthy. The data concentrated the estimate.
Posterior shifted to the boundary: If a boundary constraint is active (e.g. non-negativity), the posterior may pile up at zero. Cross-reference with the boundary hits plot to confirm.
Action
If key media channels show no prior-to-posterior shift, consider whether the prior is appropriate, whether the data period is long enough, or whether multicollinearity prevents identification. For channels where the prior dominates, document this clearly when reporting ROAS or contribution estimates — the output reflects an assumption, not a data-driven finding.
Cross-references
Pre-run plots — VIF and data quality checks that contextualise fit results
Diagnostics plots — residual analysis that complements the fit overview
Post-run plots decompose the fitted response into its constituent parts. They answer the question: how much does each predictor contribute to the modelled KPI, and how do those contributions evolve over time? These plots are written to 30_post_run/ within the run directory.
The runner generates them via write_response_decomposition_artifacts() in R/run_artifacts_enrichment.R, which calls runner_response_decomposition_tables() to compute per-term contributions from the design matrix and posterior coefficient estimates. For hierarchical models with random-effects formula syntax (|), the decomposition may fail gracefully — the runner logs a warning and continues to downstream stages.
Plot catalogue
Filename
What it shows
Conditions
decomp_predictor_impact.png
Total contribution per model term (bar chart)
Decomposition tables computed successfully
decomp_timeseries.png
Stacked media channel contribution over time
Decomposition tables computed successfully; media terms present
Predictor impact
Filename:decomp_predictor_impact.png
What it shows
A horizontal bar chart of the total contribution of each model term to the response, computed as the sum of coefficient × design-matrix column across all observations. Terms are sorted by absolute contribution magnitude. The intercept and total rows are excluded.
When it is generated
The runner generates this plot when runner_response_decomposition_tables() returns a valid predictor-level summary table. This requires that stats::model.matrix() can parse the model formula against the original input data — a condition that holds for BLM and pooled models but may fail for hierarchical models whose formulas contain random-effects syntax.
How to interpret it
The bar lengths represent total modelled impact over the data period. Media channels with large positive bars drove the most KPI in the model’s account of the data. Control variables (trend, seasonality, holidays) often dominate in absolute terms because they capture baseline demand — this is expected and does not diminish the media findings.
Negative contributions can arise for terms with negative coefficients (e.g. price sensitivity) or for seasonality harmonics where the net effect over the year partially cancels.
Warning signs
A media channel with negative total contribution: Unless the coefficient is intentionally unconstrained (no lower boundary at zero), a negative contribution suggests the model is absorbing noise or confounding through that channel. Review the posterior forest plot and check whether the coefficient’s credible interval excludes zero.
Intercept-dominated decomposition (not shown here, but visible in the CSV): If the intercept accounts for >90% of the total, media effects are negligible relative to baseline demand. This may be correct, but it limits the utility of the model for budget allocation.
Missing plot: If the decomposition failed (logged as a warning), the model type likely does not support direct model.matrix() decomposition. The CSV companions will also be absent.
Action
Use this plot to prioritise which channels to scrutinise. Cross-reference large contributors with the prior vs posterior plot to confirm they are data-driven rather than prior-driven.
Related artefacts
decomp_predictor_impact.csv in 30_post_run/ contains the same data in tabular form.
posterior_summary.csv in 30_post_run/ provides the coefficient summary underlying the decomposition.
Decomposition time series
Filename:decomp_timeseries.png
What it shows
A stacked area chart of media channel contributions over time. Each layer represents one media term’s weekly contribution (coefficient × transformed media input). Non-media terms (intercept, controls, seasonality) are excluded to focus the view on the media mix.
When it is generated
The runner generates this plot alongside the predictor impact chart, provided the decomposition tables include at least one media term.
How to interpret it
The height of each band at a given week represents how much that channel contributed to the modelled response. Seasonal patterns in the stack reflect campaign timing and adstock carry-over. The total height of the stack is the aggregate media contribution — the gap between this and the observed KPI is accounted for by non-media terms and noise.
Warning signs
A channel with near-zero contribution throughout: The model assigns negligible effect to that channel. This could be correct (low spend, weak signal) or a sign that multicollinearity is suppressing the estimate.
Implausibly large single-channel dominance: If one channel accounts for the vast majority of the media stack, verify the coefficient is plausible and not inflated by collinearity with a correlated channel.
Abrupt jumps unrelated to spend changes: Check whether the design matrix term (adstock/saturation output) is well-behaved. Sudden spikes in contribution without corresponding spend changes suggest a data or transform issue.
Action
Compare the relative channel contributions here with the business’s spend allocation. Channels that receive large spend but show small contributions may have diminishing returns or weak effects. This comparison motivates the budget optimisation stage.
Related artefacts
decomp_timeseries.csv in 30_post_run/ contains the weekly decomposition in long format.
Cross-references
Model fit plots — posterior estimates that drive the decomposition
Optimisation plots — budget allocation informed by these contribution estimates
Diagnostics plots assess whether the fitted model’s assumptions hold and whether any structural problems warrant remedial action. They cover residual behaviour, posterior predictive adequacy, and boundary constraint monitoring. These plots are written to 40_diagnostics/ within the run directory.
The runner generates residual plots via write_residual_diagnostics() in R/run_artifacts_diagnostics.R, the PPC plot via write_model_fit_plots() in R/run_artifacts_enrichment.R, and the boundary hits plot via write_boundary_diagnostics() in R/run_artifacts_diagnostics.R. Each plot is wrapped in tryCatch so that individual failures do not block the remaining outputs.
Plot catalogue
Filename
What it shows
Conditions
ppc.png
Posterior predictive check fan chart
Posterior draws (yhat) extractable from fitted model
residuals_timeseries.png
Residuals over time
Fit table available
residuals_vs_fitted.png
Residuals vs fitted values
Fit table available
residuals_hist.png
Residual distribution histogram
Fit table available
residuals_acf.png
Residual autocorrelation function
Fit table available
residuals_latent_acf.png
Latent-scale residual ACF
Model uses log-scale response (response_scale != "identity")
boundary_hits.png
Posterior draw proximity to coefficient bounds
Boundary hit rates computable from posterior and bound specifications
Posterior predictive check (PPC)
Filename:ppc.png
What it shows
A fan chart of posterior predictive draws overlaid with observed data. The blue line is the posterior mean of the predicted response; the dark band spans the 25th–75th percentile (50% CI) and the light band spans the 5th–95th percentile (90% CI). Red dots mark observed values.
When it is generated
The runner generates this plot whenever posterior predictive draws (yhat) can be extracted from the fitted model via runner_yhat_draws(). This works for BLM, hierarchical, and pooled models fitted with MCMC.
How to interpret it
Well-calibrated models produce bands that contain roughly 50% and 90% of observed points in the respective intervals. The key diagnostic is whether observed values fall systematically outside the bands during specific periods — this reveals time-localised misfit that aggregate metrics like RMSE can mask.
Warning signs
Observed points consistently outside the 90% band: The model underestimates uncertainty or misses a structural feature (holiday, promotion, regime change).
Bands that widen dramatically in specific periods: The model is uncertain about those periods, possibly because the training data lacks similar observations.
Bands that are uniformly very wide: The noise prior may be too diffuse, or the model has too many weakly identified parameters.
Action
If the PPC reveals localised misfit, check whether the affected periods correspond to missing control variables (holidays, events). If the bands are too wide overall, consider tightening the noise prior or simplifying the formula. Cross-reference with the LOO-PIT histogram for an aggregate calibration assessment.
Residuals over time
Filename:residuals_timeseries.png
What it shows
A line chart of residuals (observed minus posterior mean) over time. A horizontal reference line at zero marks perfect fit. For hierarchical models, the plot facets by group.
When it is generated
Always, provided the fit table is available.
How to interpret it
Residuals should scatter randomly around zero with no discernible trend or seasonal pattern. Any structure in the residuals indicates that the model has failed to capture a systematic component of the data.
Warning signs
Trend in residuals: The model’s trend specification is inadequate. Consider adding a higher-order polynomial or a structural-break term.
Seasonal oscillation: The Fourier harmonics or holiday dummies are insufficient. Add more harmonics or specific event indicators.
Clusters of large residuals: Localised misfit — check corresponding dates for data anomalies.
Action
Residual structure that persists across multiple weeks warrants a formula revision. Short isolated spikes are often data outliers and may not require model changes.
Residuals vs fitted
Filename:residuals_vs_fitted.png
What it shows
A scatter plot of residuals (y-axis) against posterior mean fitted values (x-axis), with a horizontal reference at zero. For hierarchical models, the plot facets by group.
When it is generated
Always, provided the fit table is available.
How to interpret it
The scatter should form a horizontal band centred on zero with roughly constant vertical spread across the fitted-value range. Patterns in this plot diagnose specific model violations.
Warning signs
Funnel shape (wider spread at higher fitted values): Heteroscedasticity. A log-scale model would be more appropriate.
Curvature: The mean function is misspecified. The model under- or over-predicts at the extremes.
Discrete clusters: May indicate grouping structure that the model does not account for.
Action
Heteroscedasticity in a levels model is the most common finding. If the funnel pattern is pronounced, re-fit on the log scale and compare diagnostics. Cross-reference with the fit scatter plot which shows the same information from a different angle.
Residual distribution
Filename:residuals_hist.png
What it shows
A histogram of residuals across all observations (40 bins). For hierarchical models with six or fewer groups, the histogram facets by group.
When it is generated
Always, provided the fit table is available.
How to interpret it
The distribution should be approximately symmetric and unimodal if the Normal noise assumption holds. Heavy tails or skewness indicate departures from normality.
Warning signs
Strong right skew: Common in levels models when the response is strictly positive and has occasional large values. A log transform may help.
Bimodality: Suggests a mixture or an omitted grouping variable. Check whether the data contains distinct regimes.
Extreme outliers: Individual residuals several standard deviations from the mean warrant data inspection.
Action
Moderate departures from normality in the residuals are tolerable in Bayesian inference — the posterior is still valid if the model is otherwise well-specified. Severe skewness or heavy tails, however, can distort credible intervals and predictive coverage. Consider robust likelihood specifications or transformations.
Residual autocorrelation (ACF)
Filename:residuals_acf.png
What it shows
A bar chart of the sample autocorrelation function of residuals, computed up to lag 26 (roughly half a year of weekly data). Red dashed lines mark the 95% significance bounds (±1.96/√n). For hierarchical models, the plot facets by group.
When it is generated
Always, provided the fit table is available.
How to interpret it
Bars within the significance bounds indicate no serial correlation at that lag. Significant autocorrelation — especially at low lags (1–4 weeks) — means the model misses short-run temporal dependence. Significant spikes at lag 52 (if the series is long enough) suggest residual annual seasonality.
Warning signs
Lag-1 ACF > 0.3: Strong short-run autocorrelation. The model’s uncertainty estimates are anti-conservative (credible intervals too narrow), and coefficient estimates may be biased if lagged effects are present.
Decaying positive ACF: Suggests an omitted AR component or insufficient adstock decay modelling.
Spike at lag 52: Residual annual seasonality not captured by the Fourier terms.
Action
If lag-1 ACF is material, consider adding lagged response terms or increasing the number of Fourier harmonics. For adstock-driven channels, verify that the decay rate is not too fast (underfitting carry-over) or too slow (overfitting noise).
Latent-scale residual ACF
Filename:residuals_latent_acf.png
What it shows
The same ACF plot as above, but computed on the latent (log) scale when the model’s response scale is not identity. This is relevant for models fitted with model.scale: true or log-transformed response variables.
When it is generated
The runner generates this plot when response_scale != "identity". It is skipped for levels-scale models.
How to interpret it
Interpretation is identical to the standard ACF plot. The latent-scale version is preferred for log models because autocorrelation in the log residuals is more directly interpretable as a model adequacy check on the scale where inference is performed.
Warning signs
Same as the standard ACF. Compare both plots if both are generated — discrepancies may indicate that the log transformation introduces or removes autocorrelation artefacts.
Boundary hits
Filename:boundary_hits.png
What it shows
A horizontal chart showing, for each constrained coefficient, the share of posterior draws that fall within a tolerance of the finite lower or upper bound. Bars are colour-coded: green (0% hit rate), amber (1–10%), red (≥10%). When all hit rates are zero, the plot displays green dots with explicit “0.0%” labels.
When it is generated
The runner generates this plot when the model has finite boundary constraints set via set_boundary() and boundary hit rates can be computed from the posterior draws. It is written by write_boundary_diagnostics() in R/run_artifacts_diagnostics.R.
How to interpret it
A zero hit rate for all parameters means no posterior draws approached any boundary — the constraints are not binding and the posterior is effectively unconstrained. This is the ideal outcome.
A non-zero hit rate means the boundary is influencing the posterior shape. Moderate rates (1–10%) suggest the data mildly conflicts with the constraint; high rates (≥10%) mean the data wants the coefficient outside the allowed range and the boundary is actively truncating the posterior.
Warning signs
Hit rate ≥10% on a media coefficient: The non-negativity constraint is binding. The true effect may be zero or negative, but the boundary forces a positive estimate. This inflates the channel’s apparent contribution.
Hit rate ≥10% on many parameters simultaneously: The overall constraint specification may be too tight for the data. Consider widening bounds or reviewing the formula.
Lower-bound hits on a coefficient with strong prior mass at zero: The prior and boundary together may create a “pile-up” at the bound. The posterior is not reflecting the data faithfully.
Action
For channels with high boundary hit rates, critically assess whether the non-negativity constraint is justified by domain knowledge. If the constraint is essential (e.g. media cannot destroy demand), document that the estimate is boundary-driven. If it is not essential, consider relaxing the bound and re-fitting to see whether the unconstrained estimate is materially different.
Related artefacts
boundary_hits.csv in 40_diagnostics/ provides the per-parameter hit rates in tabular form.
diagnostics_report.csv in 40_diagnostics/ includes a summary check for boundary binding.
Hierarchical-specific: within variation
Filename:within_variation.png (generated only for hierarchical models)
This plot shows the within-group variation ratio for each non-CRE (correlated random effects) term: Var(x − mean_g(x)) / Var(x). Low ratios indicate that most variation in a predictor is between groups rather than within groups, making it difficult to identify the coefficient from within-group variation alone. Dashed lines at 5% and 10% mark conventional concern thresholds.
This plot is generated only for hierarchical models and is not included in the standard BLM image set.
Cross-references
Model fit plots — fit overview that the residual diagnostics refine
Model selection plots provide leave-one-out cross-validation (LOO-CV) diagnostics that assess predictive adequacy and calibration. They help answer: does the model generalise to unseen observations, and are any individual data points unduly influencing the fit? These plots are written to 50_model_selection/ within the run directory.
The runner generates them via write_model_selection_artifacts() in R/run_artifacts_diagnostics.R. LOO-CV is computed using Pareto-smoothed importance sampling (PSIS-LOO) from the loo package, which approximates exact leave-one-out predictive densities from a single MCMC fit. All three plots depend on the pointwise LOO table (loo_pointwise.csv), which contains per-observation ELPD contributions, Pareto-k diagnostics, and influence flags.
Plot catalogue
Filename
What it shows
Conditions
pareto_k.png
Pareto-k diagnostic scatter over time
Pointwise LOO table available with pareto_k column
loo_pit.png
LOO-PIT calibration histogram
Posterior draws (yhat) extractable from fitted model
elpd_influence.png
Pointwise ELPD contributions over time
Pointwise LOO table available with elpd_loo and pareto_k columns
Pareto-k diagnostic
Filename:pareto_k.png
What it shows
A scatter plot of Pareto-k values over time, one point per observation. Points are colour-coded by severity:
Green (k < 0.5): PSIS approximation is reliable.
Amber (0.5 ≤ k < 0.7): Approximation is acceptable but warrants monitoring.
Red (0.7 ≤ k < 1.0): Approximation is unreliable. The observation is influential.
Purple (k > 1.0): PSIS fails entirely. The observation dominates the posterior.
Dashed horizontal lines mark the 0.5, 0.7, and 1.0 thresholds. The legend always displays all four severity levels regardless of whether points exist in each category.
When it is generated
The runner generates this plot whenever the pointwise LOO table contains a pareto_k column. This requires a successful PSIS-LOO computation, which in turn requires the fitted model to produce log-likelihood values.
How to interpret it
Most points should be green. A small number of amber points is typical and does not invalidate the LOO estimate. Red and purple points identify observations where the posterior changes substantially when that observation is excluded — these are influential data points.
Influential observations concentrated in a specific time period (e.g. a cluster of red points around a holiday) suggest that the model struggles with those conditions. Isolated influential points may correspond to data anomalies or outliers.
Warning signs
More than 10% of points above 0.7: The overall PSIS-LOO estimate is unreliable. The loo package will issue a warning. Consider moment-matching or exact refitting for affected observations.
Purple points (k > 1): These observations are so influential that removing them would substantially change the posterior. Investigate whether they represent data errors, one-off events, or genuine but rare conditions.
Influential points at the start or end of the series: Edge effects in adstock transforms can create artificial influence at series boundaries.
Action
For isolated red/purple points, inspect the corresponding dates and data values. If they are data errors, correct the data. If they are genuine but extreme, consider whether the model’s likelihood (Normal) is appropriate — heavy-tailed alternatives (Student-t) are more robust to outliers. If influential points are numerous, the model may be misspecified more broadly: revisit the formula, priors, and functional form.
Related artefacts
loo_pointwise.csv in 50_model_selection/ contains the per-observation Pareto-k, ELPD, and influence flags.
loo_summary.csv in 50_model_selection/ reports the aggregate ELPD with standard error.
LOO-PIT calibration histogram
Filename:loo_pit.png
What it shows
A histogram of leave-one-out probability integral transform (LOO-PIT) values across all observations. The PIT value for observation t is the proportion of posterior predictive draws that fall below the observed value: PIT_t = Pr(ŷ_t ≤ y_t | y_{-t}). The histogram uses 20 equal-width bins from 0 to 1. A dashed red horizontal line marks the expected count under a perfectly calibrated model (n/20).
When it is generated
The runner generates this plot whenever posterior predictive draws can be extracted via runner_yhat_draws(). It does not require the pointwise LOO table — it computes PIT values directly from the posterior predictive distribution. The plot is written by write_model_fit_plots() in R/run_artifacts_enrichment.R and filed under 50_model_selection/.
How to interpret it
A well-calibrated model produces a uniform PIT distribution — all bins should be roughly equal in height, close to the dashed reference line. Departures from uniformity reveal specific calibration failures:
U-shape (excess mass at 0 and 1): The model is overdispersed — its predictive intervals are too narrow. Observed values fall in the tails of the predictive distribution more often than expected.
Inverse U-shape (excess mass in the centre): The model is underdispersed — its predictive intervals are too wide. The model is more uncertain than it needs to be.
Left-skewed (excess mass near 0): The model systematically overpredicts. Observed values tend to fall below the predictive distribution.
Right-skewed (excess mass near 1): The model systematically underpredicts.
Warning signs
Strong U-shape: The noise variance is underestimated or the model is missing a source of variation. This is the most concerning pattern because it means the credible intervals are anti-conservative — reported uncertainty is too low.
One bin dramatically taller than others: A single bin containing many more observations than expected suggests a discrete cluster of misfits. Check the dates of those observations.
Monotone slope: A systematic bias that the model has not captured. Check the residuals time series for trend.
Action
U-shaped PIT histograms call for wider predictive intervals: increase the noise prior, add missing covariates, or allow for heavier tails. Inverse-U patterns suggest the noise prior is too diffuse — tighten it. Skewed patterns indicate systematic bias that should be addressed through formula changes (missing controls, trend, level shifts). Cross-reference with the PPC fan chart for a visual complement.
ELPD influence plot
Filename:elpd_influence.png
What it shows
A lollipop chart of pointwise expected log predictive density (ELPD) contributions over time. Each vertical stem connects the observation’s ELPD value to zero; the dot marks the ELPD value. Blue points and stems indicate non-influential observations (Pareto-k ≤ 0.7); red indicates influential ones (Pareto-k > 0.7). Larger red dots draw attention to the problematic observations.
When it is generated
The runner generates this plot whenever the pointwise LOO table contains both elpd_loo and pareto_k columns. It is written by write_model_selection_artifacts() in R/run_artifacts_diagnostics.R, immediately after the Pareto-k scatter.
How to interpret it
ELPD values quantify each observation’s contribution to the model’s out-of-sample predictive performance. Values near zero indicate observations that the model predicts well. Large negative values indicate observations where the model assigns low predictive probability — these are the worst-predicted points.
The combination of ELPD magnitude and Pareto-k severity is informative:
Large negative ELPD + low k: The model predicts this observation poorly, but the PSIS estimate is reliable. The model genuinely struggles with this data point.
Large negative ELPD + high k: Both the prediction and the LOO approximation are unreliable. This observation is highly influential and poorly fit — it warrants the closest scrutiny.
Near-zero ELPD + high k: The observation is influential but well-predicted. It may be a leverage point (extreme in predictor space) that happens to lie on the fitted surface.
Warning signs
Cluster of large negative values in a specific period: The model systematically fails during that period. Check for missing events, structural breaks, or data quality problems.
Many red (influential) points with large negative ELPD: The model’s aggregate LOO estimate is unreliable, and the worst-fit observations are also the most influential. This combination makes model comparison results untrustworthy.
Monotone trend in ELPD values: Suggests time-varying model adequacy — the model may fit the training period well but degrade towards the edges.
Action
Investigate the dates of the worst ELPD observations. If they correspond to known anomalies (data errors, one-off events), consider excluding or down-weighting them. If they correspond to regular conditions that the model should handle, the model needs revision. Use the Pareto-k plot to confirm which observations are both poorly predicted and influential, and prioritise those for investigation.
Related artefacts
loo_pointwise.csv in 50_model_selection/ contains the full pointwise table with ELPD, Pareto-k, and influence flags.
loo_summary.csv in 50_model_selection/ reports the aggregate ELPD estimate and standard error for model comparison.
Cross-references
Diagnostics plots — residual-level checks that complement LOO diagnostics
Model fit plots — posterior summaries and fitted-vs-observed views
Optimisation plots visualise the outputs of the budget allocator. They translate model estimates into actionable budget decisions by showing response curves, efficiency comparisons, and the sensitivity of recommendations to budget changes. These are decision-layer artefacts: they sit downstream of all modelling and diagnostics, and their quality depends entirely on the credibility of the upstream fit.
All optimisation plots are written to 60_optimisation/ within the run directory. The runner generates them via write_budget_optimisation_artifacts() in R/run_artifacts_enrichment.R, which calls the public plotting APIs in R/optimise_budget_plots.R. They require a successful call to optimise_budget() that produces a budget_optimisation object with a plot_data payload.
Plot catalogue
Filename
What it shows
Conditions
budget_response_curves.png
Channel response curves with current/optimised points
Optimisation completed with response curve data
budget_roi_cpa.png
ROI or CPA comparison by channel
Optimisation completed with ROI/CPA summary
budget_impact.png
Spend reallocation and response impact (diverging bars)
Optimisation completed with ROI/CPA summary
budget_contribution.png
Absolute response comparison by channel
Optimisation completed with ROI/CPA summary
budget_confidence_comparison.png
Posterior credible intervals for current vs optimised
Optimisation completed with response points
budget_sensitivity.png
Total response change when each channel varies ±20%
Optimisation completed with response curve data
budget_efficient_frontier.png
Optimised response across budget levels
Efficient frontier computed via budget_efficient_frontier()
Waterfall data computable from model coefficients and data means
budget_marginal_roi.png
Marginal ROI (or marginal response) curves by channel
Optimisation completed with response curve data
budget_spend_share.png
Current vs optimised spend allocation as percentage
Optimisation completed with ROI/CPA summary
Response curves
Filename:budget_response_curves.png
What it shows
Faceted line charts of the estimated response curve for each media channel. The x-axis is raw spend (model units); the y-axis is expected response. A shaded band shows the posterior credible interval around the mean curve. Two marked points per channel indicate the current (reference) and optimised spend allocations.
The subtitle notes which media transforms were applied (e.g. Hill saturation, adstock). A caption reports the marginal response at the optimised point for each channel.
When it is generated
The runner generates this plot whenever optimise_budget() returns response curve data in the plot_data payload. This requires at least one media channel in the allocation configuration with a computable response function.
How to interpret it
The curve shape encodes diminishing returns. Steep initial slopes indicate high marginal response at low spend; flattening curves indicate saturation. The gap between the current and optimised points shows the direction of the recommended reallocation: if the optimised point sits to the right (higher spend) of the current point, the allocator recommends increasing that channel’s budget.
The credible band width reflects posterior uncertainty about the response function. Wide bands mean the shape is poorly identified — the recommendation is sensitive to modelling assumptions. Narrow bands indicate data-informed estimates.
Warning signs
Very wide credible bands: The response curve shape is uncertain. Budget recommendations based on it carry substantial risk.
Optimised point near the flat part of the curve: The channel is saturated at the recommended spend. Further increases yield negligible marginal returns.
Current and optimised points nearly identical: The allocator found little room for improvement on that channel. The current allocation is already near-optimal (or the response function is too uncertain to justify a change).
Action
Compare the marginal response values across channels. The allocator equalises marginal response at the optimum — if marginal values differ substantially, the optimisation may have hit a constraint (spend floor/ceiling). Cross-reference with the budget sensitivity plot to assess how robust the recommendation is.
Related artefacts
budget_response_curves.csv in 60_optimisation/ contains the curve data.
budget_response_points.csv in 60_optimisation/ contains the current and optimised point coordinates.
ROI/CPA comparison
Filename:budget_roi_cpa.png
What it shows
A grouped bar chart comparing ROI (or CPA, for subscription KPIs) by channel under the current and optimised allocations. If currency_col is defined per channel, bars show financial ROI; otherwise they show response-per-unit-spend in model units. A TOTAL bar summarises the portfolio-level metric.
The metric choice is automatic: the allocator uses ROI for revenue-type KPIs and CPA for subscription-type KPIs.
When it is generated
The runner generates this plot whenever the optimisation result includes a roi_cpa summary table.
How to interpret it
Channels where the optimised bar exceeds the current bar gain efficiency from the reallocation. Channels where the optimised bar is lower have had spend reduced — their marginal efficiency was below the portfolio average. The TOTAL bar shows the net portfolio improvement.
Warning signs
Optimised ROI lower than current for most channels: The allocator redistributed spend towards higher-response channels, which may have lower per-unit efficiency but larger absolute contribution. This is not necessarily wrong — the allocator maximises total response, not per-channel ROI.
TOTAL bar shows negligible improvement: The current allocation is already near-optimal, or the model’s response functions are too flat to support meaningful reallocation.
Very large ROI values on low-spend channels: Small denominators inflate ROI. These channels may have high marginal returns at low spend but limited capacity to absorb budget.
Action
Do not interpret this plot in isolation. Cross-reference with the contribution comparison and the response curves to distinguish efficiency improvements from scale effects.
Related artefacts
budget_roi_cpa.csv in 60_optimisation/ contains the per-channel ROI/CPA values.
budget_summary.csv in 60_optimisation/ provides the top-level allocation summary.
Allocation impact
Filename:budget_impact.png
What it shows
A horizontal diverging bar chart in two facets. The left facet shows spend reallocation (positive = increase, negative = decrease) per channel. The right facet shows the corresponding response impact. Bars are coloured green for increases and red for decreases. A TOTAL row at the bottom summarises the net change with muted styling.
Channels are sorted by response impact magnitude — the channels most affected by the reallocation appear at the top.
When it is generated
The runner generates this plot whenever the optimisation result includes a roi_cpa summary with delta_spend and delta_response columns.
How to interpret it
The spend facet shows where the allocator moves budget. The response facet shows the expected consequence. A useful pattern is a channel that receives a spend decrease (red bar, left) but shows a small response decrease (small red bar, right) — that channel was inefficient and the freed budget drives larger gains elsewhere.
Warning signs
Large spend increase on a channel with modest response gain: Diminishing returns may be steep. Verify against the response curve.
Response decreases that exceed response gains: The allocator expects a net negative outcome. This should not happen with a correctly specified max_response objective, and suggests a configuration or constraint issue.
Action
Use this chart to brief stakeholders on the “where and why” of reallocation. Pair it with the confidence comparison to communicate whether the expected gains are statistically distinguishable from zero.
Response contribution
Filename:budget_contribution.png
What it shows
A grouped bar chart comparing absolute expected response (contribution) by channel under the current and optimised allocations. Delta annotations above each pair show the change. A TOTAL bar with muted styling shows the portfolio-level gain. The subtitle reports the percentage total response gain from optimisation.
When it is generated
The runner generates this plot whenever the optimisation result includes mean_reference and mean_optimised columns in the roi_cpa summary.
How to interpret it
This chart answers the question: in absolute terms, how much more (or less) response does each channel deliver under the optimised allocation? Unlike the ROI chart, this view is not distorted by small denominators — it shows the quantity the allocator actually maximises.
Warning signs
Negative delta on a channel with high current contribution: The allocator is pulling spend from a channel that currently contributes a great deal. This is rational if the marginal return on that channel is below the portfolio average, but it requires careful communication to stakeholders accustomed to interpreting total contribution as “importance”.
TOTAL gain is small: The reallocation may not justify the operational cost of implementing it. Consider whether the confidence intervals overlap (see confidence comparison).
Action
Report the TOTAL percentage gain as the headline number. Caveat it with the credible interval width from the confidence comparison. If the gain is within posterior uncertainty, the recommendation is suggestive rather than conclusive.
Related artefacts
budget_allocation.csv in 60_optimisation/ contains the per-channel spend and response values.
Confidence comparison
Filename:budget_confidence_comparison.png
What it shows
A horizontal forest plot (dodge-positioned point-and-errorbar) showing the posterior mean response and 90% credible interval for each channel under the current (grey) and optimised (red) allocations. Channels where the intervals overlap suggest that the reallocation gain may not be statistically meaningful.
When it is generated
The runner generates this plot whenever the optimisation result includes response point data with mean, lower, and upper columns for both reference and optimised allocations.
How to interpret it
Focus on channels where the optimised interval (red) does not overlap with the current interval (grey). These are the channels where the reallocation produces a distinguishable change in expected response. Overlapping intervals mean the posterior cannot confidently distinguish the two allocations — the gain exists in expectation but falls within sampling uncertainty.
Warning signs
All intervals overlap: The data is too uncertain to support a confident reallocation recommendation. The allocator’s point estimate suggests improvement, but the posterior cannot distinguish it from noise.
One channel shows a clear gain while others overlap: The headline portfolio gain may be driven by a single channel. Verify that channel’s response curve and prior-posterior shift.
Action
Use this plot to calibrate the confidence of the recommendation. If intervals overlap for most channels, present the allocation as “directionally suggestive” rather than “statistically supported”. If key channels show clear separation, the recommendation is stronger.
Budget sensitivity
Filename:budget_sensitivity.png
What it shows
A spider chart (line plot) showing how total expected response changes when each channel’s spend is varied ±20% from its optimised level, while all other channels are held fixed. Steeper lines indicate channels whose budgets have the most influence on total response. A horizontal dashed line at zero marks the optimised baseline.
When it is generated
The runner generates this plot whenever the optimisation result includes response curve data. The ±20% range and 11 evaluation points per channel are defaults set in plot_budget_sensitivity().
How to interpret it
Channels with steep lines are the most sensitive: small deviations from their optimised spend produce large response changes. Flat lines indicate channels where modest budget deviations have little impact — the response function is either saturated (on the flat part of the curve) or nearly linear (constant marginal return).
Warning signs
A channel with an asymmetric slope (steep downward, flat upward): Cutting this channel’s spend is costly, but increasing it yields little. It is at or near its saturation point.
All lines nearly flat: The optimisation surface is plateau-like. The allocator’s recommendation is robust to implementation imprecision, but also implies limited upside from optimisation.
Lines that cross: Channels swap in relative importance at different budget perturbations. This complicates simple priority rankings.
Action
Use this chart to communicate implementation risk. If the recommended allocation is operationally difficult to achieve exactly, the sensitivity chart shows which channels require precise execution and which have margin for error.
Efficient frontier
Filename:budget_efficient_frontier.png
What it shows
A line-and-point chart of total optimised response as a function of total budget. Each point represents the optimal allocation at that budget level (expressed as a percentage of the current total budget). A red diamond marks the current budget level. The curve shows how much additional response is achievable by increasing the total budget — and the diminishing returns of doing so.
When it is generated
The runner generates this plot when budget_efficient_frontier() produces a budget_frontier object with at least two feasible points. This requires a valid optimisation result and a set of budget multipliers (configured in allocation.efficient_frontier).
How to interpret it
The frontier’s shape reveals the budget’s overall productivity. A concave curve (steepening, then flattening) is the classic diminishing-returns shape: each additional unit of budget buys less incremental response. The gap between the current point and the curve above it shows the unrealised potential at the same budget — the difference between the current allocation and the optimal one.
Warning signs
Frontier is nearly linear: Returns are approximately constant across the budget range. The model may not have enough data to identify saturation, or the budget range is too narrow to reveal it.
Frontier flattens early: The portfolio saturates at a budget well below the current level. The current spend may be wastefully high.
Only 2–3 feasible points: The optimiser could not find feasible allocations at most budget levels. Constraints may be too tight.
Action
Use the frontier to frame budget conversations. The curve shows what is achievable at each budget level. If a stakeholder proposes a budget cut, the frontier quantifies the response cost. If they propose an increase, it quantifies the expected gain. Present the frontier alongside the spend share comparison to show how the allocation shifts at each level.
Related artefacts
budget_efficient_frontier.csv in 60_optimisation/ contains the frontier data.
KPI waterfall
Filename:budget_kpi_waterfall.png
What it shows
A horizontal waterfall bar chart decomposing the predicted KPI into its constituent components: base (intercept), trend, seasonality, holidays, controls, and individual media channels. Each bar shows the mean posterior coefficient multiplied by the mean predictor value — the average contribution of that component to the predicted KPI. A red TOTAL bar anchors the sum.
When it is generated
The runner generates this plot when build_kpi_waterfall_data() can extract posterior coefficients and match them to predictor means in the original data. This requires that the model’s .formula and .original_data are both accessible. For hierarchical models with random-effects syntax, the waterfall may fail gracefully and be skipped.
How to interpret it
The waterfall answers: “of the total predicted KPI, how much comes from each source?” The base (intercept) typically dominates, representing baseline demand independent of media and controls. Media channels sit at the bottom, showing their individual incremental contributions. The relative sizes of the media bars correspond to the decomposition impact chart (decomp_predictor_impact.png), but computed slightly differently (mean × mean vs sum over time).
Warning signs
Negative media contributions: A channel with a negative bar reduces predicted KPI. Unless the coefficient is intentionally unconstrained, this suggests a fitting or identification problem.
Intercept dwarfs all other terms: The model attributes nearly all KPI to baseline demand. Media effects are marginal. This may be realistic for low-spend brands but limits the value of budget optimisation.
Missing plot (skipped with warning): The model type does not support direct waterfall decomposition.
Action
Use the waterfall to contextualise media contributions within the total predicted KPI. For stakeholder reporting, it provides a clear answer to “what drives our KPI?” — while emphasising that media is one factor among several.
Related artefacts
budget_kpi_waterfall.csv in 60_optimisation/ contains the waterfall data.
Marginal ROI curves
Filename:budget_marginal_roi.png
What it shows
Faceted line charts of marginal ROI (or marginal response, if no currency conversion is configured) as a function of spend for each channel. The marginal value is computed as the first difference of the response curve: the additional response per additional unit of spend. Current and optimised points are marked.
When it is generated
The runner generates this plot whenever the optimisation result includes response curve data with at least two points per channel.
How to interpret it
The marginal ROI curve is the derivative of the response curve. At the optimised allocation, the allocator equalises marginal ROI across channels (subject to constraints). If one channel’s marginal ROI at the optimised point is substantially higher than another’s, a constraint (spend floor or ceiling) is preventing further reallocation.
Diminishing returns appear as a downward-sloping marginal curve: each additional unit of spend yields less incremental response than the last. Channels with steeper slopes saturate faster.
Warning signs
Marginal ROI near zero at the optimised point: The channel is at or near saturation. Additional spend yields negligible incremental response.
Marginal ROI that increases with spend: This implies increasing returns, which is unusual for media. It may indicate a response curve misspecification or insufficient data in the high-spend region.
Large differences in marginal ROI at the optimised points across channels: Constraints are binding. The allocator cannot equalise marginal returns because spend bounds prevent it.
Action
Use marginal ROI to identify which channels have headroom (high marginal ROI at the optimised point) and which are saturated (marginal ROI near zero). This informs not just the current allocation but also the value of relaxing spend constraints.
Spend share comparison
Filename:budget_spend_share.png
What it shows
Two horizontal stacked bars showing the percentage allocation of total budget across channels: one for the current allocation and one for the optimised allocation. Percentage labels appear within each segment (for segments ≥ 4% of total). The subtitle reports the total budget in currency or model units for both allocations.
When it is generated
The runner generates this plot whenever the optimisation result includes a roi_cpa summary with spend_reference and spend_optimised columns.
How to interpret it
This is the most intuitive optimisation output for non-technical stakeholders. It answers: “how should we split the budget?” Segments that grow from current to optimised represent channels the allocator recommends investing more in; segments that shrink represent channels to reduce.
Warning signs
A channel disappears (0% share) in the optimised allocation: The allocator has hit the channel’s spend floor (which may be zero). If this is unintended, raise the minimum spend constraint.
Allocations are nearly identical: The current mix is already near-optimal, or the model cannot distinguish channel effects well enough to justify reallocation.
Very small segments in both allocations: Channels with negligible spend share contribute little to the optimisation. Consider whether they should be included or grouped.
Action
Present this chart as the primary recommendation visual. Accompany it with the confidence comparison to communicate the certainty of the recommendation and the allocation impact chart to show the expected consequence.
Cross-references
Post-run plots — decomposition that informs the optimisation inputs
Model selection plots — LOO diagnostics that validate the model underlying these recommendations
Provide task-oriented recipes for common DSAMbayes operational workflows. Each guide starts from a user objective, gives minimal reproducible steps, and includes expected output artefacts and quick verification checks.
Audience
Users who know the concepts but need execution steps.
Read and act on the diagnostics report produced by a DSAMbayes runner execution, understanding which checks matter most and what remediation steps to take.
Prerequisites
A completed runner run execution with artefacts under 40_diagnostics/.
Compare multiple DSAMbayes runner executions and select a candidate model for reporting or decision-making, using predictive scoring and diagnostic summaries.
Prerequisites
Two or more completed runner run executions (MCMC fit method).
Artefacts under 50_model_selection/ for each run (LOO summary, ELPD outputs).
Observations with k > 0.7 indicate unreliable LOO estimates
If many observations have high Pareto-k values, the LOO approximation is unreliable for that run. Consider time-series cross-validation as an alternative.
4. Review time-series CV (if available)
If diagnostics.time_series_selection.enabled: true was configured, check:
This provides expanding-window blocked CV scores (holdout ELPD, RMSE, SMAPE) that are more appropriate for time-series data than standard LOO.
5. Cross-reference diagnostics
For each candidate run, check the diagnostics overall status:
head -1 results/<run_dir>/40_diagnostics/diagnostics_report.csv
A model with better ELPD but failing diagnostics should not be preferred over a model with slightly lower ELPD and passing diagnostics.
6. Compare fit quality visually
Review the fit time series and scatter plots in 20_model_fit/ for each run:
Fit time series — does the model track the observed KPI?
Fit scatter — is the predicted-vs-observed relationship close to the diagonal?
Posterior forest — are coefficient estimates reasonable and well-identified?
7. Selection decision matrix
Criterion
Weight
Run A
Run B
ELPD (higher is better)
High
value
value
Pareto-k reliability (fewer high-k)
High
value
value
Diagnostics overall status
High
pass/warn/fail
pass/warn/fail
TSCV holdout RMSE (if available)
Medium
value
value
Coefficient plausibility
Medium
judgement
judgement
Fit visual quality
Low
judgement
judgement
8. Record the selection
Document the selected run directory and rationale. If using the runner for release evidence, the selected run’s artefacts form part of the evidence pack.
Caveats
ELPD is not causal validation. Predictive scoring measures in-sample predictive quality, not whether the model identifies causal media effects correctly.
Pooled models do not support time-series CV (rejected by config validation).
MAP-fitted models do not produce LOO diagnostics. Use MCMC for model comparison.
Canonical release quality gates for lint, style, tests, package check, runner smoke, and docs build.
docs/internal/quality-gates.md
response scale
Scale used inside the fitted model (identity or log).
docs/modelling/response-scale-semantics.md
Rhat
Convergence diagnostic comparing within- and between-chain variance.
Chain diagnostics outputs
runner
YAML/CLI execution layer around core DSAMbayes APIs.
scripts/dsambayes.R, R/run_from_yaml.R
run_dir
Output directory used by a runner validate/run execution.
docs/runner/output-artifacts.md
staged layout
Structured artefact layout with numbered folders (00_ to 70_).
docs/runner/output-artifacts.md
Stan cache
Compiled model cache location, typically under XDG_CACHE_HOME.
install/setup docs
SMAPE
Symmetric mean absolute percentage error metric used in fit summaries.
R/stats.R
time-components
Managed time control features, including holiday-derived regressors.
R/holiday_calendar.R
tscv
Time-series selection artefact prefix for blocked CV outputs.
50_model_selection/tscv_*.csv
warmup
Initial MCMC iterations used for adaptation and excluded from posterior draws.
fit.mcmc.warmup
Hill transform
Saturation function spend^n / (spend^n + k^n) used in budget optimisation response curves. k is the half-saturation point, n is the shape parameter.
R/optimise_budget.R
atan transform
Saturation function atan(spend / scale) mapping spend to a bounded response.
R/optimise_budget.R
log1p transform
Saturation function log(1 + spend / scale) providing diminishing-returns concavity.
R/optimise_budget.R
adstock
Media carry-over transform that spreads a spend effect over subsequent periods via geometric decay. Applied as a pre-transform in the data, not estimated within DSAMbayes.
Formula transforms
conditional mean
Bias-corrected back-transform for log-response models: exp(mu + sigma^2/2). Default in v1.2.2 for fitted_kpi().
R/fitted.R
Jensen's inequality
Mathematical property that E[exp(X)] != exp(E[X]) when X has non-zero variance. DSAMbayes avoids this bias by applying exp() draw-wise before summarising.
Provide a traceability reference that maps DSAMbayes issues and recommendations to implementation status and evidence.
Authoritative data source
The single source of truth for all issue and recommendation status is:
code_review/audit_report/issue_register.csv
This register contains every ENG, INF, and GOV issue and recommendation with columns for status, severity, owner, linked IDs, notes, and a long-form explanation field.
Two stakeholder-facing summary CSVs are published alongside this page under docs/appendices/traceability-data/:
CI deployment:.github/workflows/pkgdown.yaml builds and deploys to gh-pages on push to main or release events.
Output: generated site under docs/ (pkgdown output, not the Markdown docs).
2. Markdown documentation site (this site)
The docs/ directory contains hand-authored Markdown files organised into sections (Getting Started, Runner, Modelling, Plots, How-To, Appendices). These are configured via docs/docs-config.json.
Site generator: the configuration structure (docs-config.json with navigation, branding, and search settings) is designed for a static site generator. The specific generator and hosting are configured in the deployment environment.
Preview locally: open Markdown files directly, or use any Markdown preview tool. The inter-page links use root-relative paths (e.g. /getting-started/install-and-setup).
Deploy target:https://dsambayes.docs.wppma.space/ (as referenced in README.md).
Configuration
docs/docs-config.json defines:
metadata — site name, description, version.
branding — logo, favicon, primary colour.
navigation — navbar links and sidebar structure.
features — math rendering (enabled), search (local).
Adding a new page
Create the Markdown file in the appropriate section directory (e.g. docs/modelling/new-page.md).
Add a sidebar entry in docs/docs-config.json under the appropriate section.
Add a row to the section’s index.md page table.
Update docs/_plan/content-map.md if tracking authoring status.
Related pages
CI Workflows — automated build and deploy workflows
Recommended environment setup before running gates:
# Navigate to your local DSAMbayes checkoutcd /path/to/DSAMbayes
mkdir -p .Rlib .cache
exportR_LIBS_USER="$PWD/.Rlib"exportXDG_CACHE_HOME="$PWD/.cache"
Optional consolidated local gate (does not replace all release gates):
# Navigate to your local DSAMbayes checkoutcd /path/to/DSAMbayes
mkdir -p .Rlib .cache
exportR_LIBS_USER="$PWD/.Rlib"exportXDG_CACHE_HOME="$PWD/.cache"
Expected outcome: checks run in a repo-scoped environment with reproducible library and cache paths.
Define the minimal reproducible smoke-test matrix for the YAML runner validate and run commands, including exact commands and expected artefacts.
Audience
Maintainers preparing release evidence
Engineers triaging runner regressions
Reviewers confirming gate QG-5 and QG-6
Test scope
This smoke suite is intentionally small. It proves:
CLI argument handling for validate and run
Config resolution and runner pre-flight path
End-to-end artefact writing for one full run
This smoke suite does not replace unit tests or full package checks.
Preconditions
Run from repository root:
# Navigate to your local DSAMbayes checkoutcd /path/to/DSAMbayes
mkdir -p .Rlib .cache
exportR_LIBS_USER="$PWD/.Rlib"exportXDG_CACHE_HOME="$PWD/.cache"
Expected outcome: commands resolve local package/library paths and use repo-scoped Stan cache.
Install DSAMbayes locally if needed:
R_LIBS_USER="$PWD/.Rlib" R -q -e 'install.packages(".", repos = NULL, type = "source")'
Expected outcome: library(DSAMbayes) succeeds in the same shell session.
Smoke-test matrix
Test ID
Command
Config
Run directory
Expected result
SMK-VAL-01
validate
config/blm_synthetic_mcmc.yaml
results/smoke_validate_blm
Exit code 0; metadata artefacts written.
SMK-VAL-02
validate
config/hierarchical_re_synthetic_mcmc.yaml
results/smoke_validate_hier_re
Exit code 0; metadata artefacts written.
SMK-VAL-03
validate
config/pooled_synthetic_mcmc.yaml
results/smoke_validate_pooled
Exit code 0; metadata artefacts written.
SMK-RUN-01
run
config/blm_synthetic_mcmc.yaml
results/smoke_run_blm
Exit code 0; core fit, post-run, and diagnostics artefacts written.
Define the operational release flow from freeze to tag for DSAMbayes v1.2.2, including rollback and hotfix handling.
Audience and roles
Release owner (cshaw): drives the checklist, evidence pack, and final go or no-go call.
Maintainers: execute gates, review failures, and approve release PRs.
Reviewers: verify evidence and sign off risk acceptance.
Release inputs
Release work starts only when these inputs are ready:
CHANGELOG.md reflects the release candidate contents.
DESCRIPTION has the intended release version.
Required docs pages for quality and runner workflows are in place.
Candidate commit hash is identified on main or master.
Step-by-step flow
1. Freeze the release candidate
Actions:
Announce code freeze window and candidate commit hash.
Stop merging non-release changes until gate outcome is known.
Confirm release scope is documentation and refactor changes planned for v1.2.2.
Expected outcome: one stable candidate commit is selected for gate execution.
2. Prepare local release environment
Actions:
Create repo-local paths and environment variables.
Install local package into .Rlib.
Commands:
# Navigate to your local DSAMbayes checkoutcd /path/to/DSAMbayes
mkdir -p .Rlib .cache
exportR_LIBS_USER="$PWD/.Rlib"exportXDG_CACHE_HOME="$PWD/.cache"R_LIBS_USER="$PWD/.Rlib" R -q -e 'install.packages(".", repos = NULL, type = "source")'
Expected outcome: release checks run in a reproducible local environment.
3. Run mandatory quality gates
Execute gates QG-1 to QG-7 from Quality Gates in order.