Concepts

What is DSAMbayes?

DSAMbayes is an R package that fits Bayesian marketing mix models (MMM) using Stan. It provides a familiar lm()-style interface for specifying models, adds prior and boundary controls, and delegates estimation to Stan’s Hamiltonian Monte Carlo (HMC) sampler. The result is a full posterior distribution over model parameters — not just point estimates — enabling rigorous uncertainty quantification for media contribution and budget allocation decisions.

Why Bayesian MMM?

Classical OLS regression gives point estimates and confidence intervals that assume the model is correctly specified. In MMM, where media variables are collinear, sample sizes are small (often 100–200 weeks), and the functional form is uncertain, these assumptions are routinely violated.

Bayesian estimation addresses this by:

  • Regularisation through priors — weakly informative priors stabilise estimates when the data alone cannot separate correlated effects. This is particularly valuable for media channels with overlapping campaign timing.
  • Hard parameter constraints — boundary constraints (e.g. media coefficients must be non-negative) are enforced directly in the posterior, rather than post-hoc.
  • Full uncertainty propagation — every downstream output (fitted values, decomposition, budget allocation) carries the full posterior uncertainty, not just a point estimate ± standard error.
  • Principled model comparison — leave-one-out cross-validation (LOO-CV) via Pareto-smoothed importance sampling provides predictive model comparison without refitting.

Model classes

DSAMbayes supports three model classes, all sharing the same post-fit interface:

BLM (Bayesian Linear Model)

The simplest class. One market, one response variable, one set of predictors. Equivalent to lm() but with Bayesian estimation.

kpi ~ trend + seasonality + holidays + media_channels

Use when: single-market modelling with sufficient data (typically 100+ weeks).

Hierarchical

Panel data with multiple groups (e.g. markets, regions, brands). Random effects allow each group to deviate from the population average while sharing information across groups (partial pooling).

kpi ~ population_terms + (varying_terms | group)

Use when: multi-market data where you want to borrow strength across markets while allowing market-specific effects.

Pooled

Single-market data where media variables have a nested structure (e.g. campaign > channel > platform). Coefficients are pooled across labelled dimensions.

blm(formula, data) %>% pool(grouping_vars, variable_map)

Use when: single-market data with structured media hierarchies.

The estimation pipeline

Every DSAMbayes model follows the same lifecycle:

  1. Constructblm() creates an unfitted model object with default priors and boundaries.
  2. Configureset_prior(), set_boundary(), set_cre() adjust the specification.
  3. Fitfit() (MCMC) or optimise() (MAP) estimates the posterior.
  4. Extractget_posterior(), fitted(), decomp() retrieve results.
  5. Decideoptimise_budget() translates estimates into budget allocation recommendations.

Key concepts for practitioners

Priors

A prior distribution encodes what you believe about a parameter before seeing the data. In DSAMbayes:

  • Default priors are normal(0, 5) — weakly informative, centred at zero.
  • Informative priors can be set when domain knowledge justifies it (e.g. price_index ~ normal(-0.2, 0.1) if you know price has a small negative effect).
  • Priors are specified using formula notation: set_prior(model, m_tv ~ normal(0, 10)).

Boundaries

Hard constraints on parameter values. The posterior density is zero outside the boundary:

  • set_boundary(model, m_tv > 0) forces the TV coefficient to be non-negative.
  • Default boundaries are unconstrained (-Inf, Inf).

MCMC vs MAP

  • MCMC (fit()) draws samples from the full posterior distribution. Slower but gives complete uncertainty quantification. Use for final reporting.
  • MAP (optimise()) finds the single most probable parameter vector. Much faster but gives only a point estimate. Use for rapid iteration during model development.

Response scale

Models can operate on the original KPI scale (identity response) or the log scale:

  • Identity: kpi ~ ... — coefficients represent unit changes in KPI.
  • Log: log(kpi) ~ ... — coefficients represent approximate percentage changes.

Log-response models require careful back-transformation to the KPI scale. DSAMbayes handles this automatically via fitted_kpi(). See Response Scale Semantics.

Diagnostics

After fitting, DSAMbayes runs a battery of diagnostic checks:

  • Sampler quality — Rhat, effective sample size, divergences.
  • Residual behaviour — autocorrelation, normality, Ljung-Box test.
  • Identifiability — baseline-media correlation.
  • Boundary monitoring — share of draws hitting constraints.

Each check produces a pass, warn, or fail status. See Diagnostics Gates.

The YAML runner

For reproducible, configuration-driven runs, DSAMbayes provides a CLI runner:

Rscript scripts/dsambayes.R run --config config/my_model.yaml

The runner reads a YAML file specifying the data, formula, priors, boundaries, fit settings, diagnostics policy, and optional budget optimisation. It writes structured artefacts (CSVs, plots, model objects) to a timestamped directory under results/. See CLI Usage and Config Schema.

Further reading