Getting Started

Purpose

Onboard a new user from install to first successful DSAMbayes run.

Audience

New DSAMbayes users.
Analysts running DSAMbayes through R scripts or CLI.

Pages

Page	Topic
Install and Setup	Prerequisites, installation commands, and verification
Concepts	What DSAMbayes does and how Bayesian MMM works
Your First BLM Model	Build, fit, and interpret a single-market model using the R API
Your First Hierarchical Model	Multi-market model with partial pooling and CRE
Quickstart (YAML Runner)	Minimal end-to-end CLI run from config to output inspection
FAQ	Answers to common questions

Install and Setup

Audience

Engineers and analysts setting up DSAMbayes for local development or modelling runs.

Prerequisites

R >= 4.1 — check with R --version.
A C++ toolchain for Stan compilation. This is the most common source of setup issues:
- macOS: install Xcode Command Line Tools (xcode-select --install).
- Windows: install Rtools matching your R version. Ensure make is on your PATH.
- Linux (Ubuntu/Debian): sudo apt install build-essential.
- See the RStan Getting Started Guide for detailed platform instructions.
A local checkout of this repository.

Quick setup (recommended)

Open a terminal in the repository root and run:

# 1. Create repo-local library and cache directories
mkdir -p .Rlib .cache

# 2. Set environment variables (add to .bashrc/.zshrc for persistence)
export R_LIBS_USER="$PWD/.Rlib"
export XDG_CACHE_HOME="$PWD/.cache"

# 3. Install DSAMbayes from the local checkout
R_LIBS_USER="$PWD/.Rlib" R -q -e 'install.packages(".", repos = NULL, type = "source")'

This keeps all package libraries and Stan compilation caches inside the repo, avoiding permission issues with system library paths.

Verify the installation

1. Confirm DSAMbayes loads

R_LIBS_USER="$PWD/.Rlib" R -q -e 'library(DSAMbayes); cat("Version:", as.character(utils::packageVersion("DSAMbayes")), "\n")'

Expected: prints Version: 1.2.2 (or current version).

2. Confirm the runner works

R_LIBS_USER="$PWD/.Rlib" Rscript scripts/dsambayes.R validate --config config/blm_synthetic_mcmc.yaml

Expected: validation completes without errors.

3. (Optional) Run the test suite

R_LIBS_USER="$PWD/.Rlib" R -q -e 'testthat::test_dir("tests/testthat")'

Expected: all tests pass.

Alternative: install from GitHub

If you do not have a local checkout, install from the GitHub remote:

R -q -e 'remotes::install_github("groupm-global/DSAMbayes")'

This installs the latest version on main. For the development fork with v1.2.2 features, use the local-checkout path above.

Using renv (optional)

The repository includes a renv.lock file for fully reproducible dependency management. To use it:

R -q -e 'install.packages("renv"); renv::restore()'

This installs the exact dependency versions used during development. It is optional but recommended for production runs where reproducibility matters.

Runner setup and first execution

1. Validate the example config

R_LIBS_USER="$PWD/.Rlib" \
  Rscript scripts/dsambayes.R validate --config config/blm_synthetic_mcmc.yaml

2. Execute a full run

R_LIBS_USER="$PWD/.Rlib" \
  Rscript scripts/dsambayes.R run --config config/blm_synthetic_mcmc.yaml

Expected: a timestamped run directory is created under results/ with model outputs and diagnostics.

Troubleshooting

Stan compilation fails

Symptom: errors during Compiling model... referencing C++ or compiler issues.

Actions:

Confirm your C++ toolchain is working: R -q -e 'pkgbuild::has_build_tools(debug = TRUE)'.
On Windows, ensure Rtools is installed and make is on your PATH.
Clear the Stan cache and retry: rm -rf .cache/dsambayes.
Follow the RStan Getting Started Guide for your platform.

Package installation fails

Symptom: install.packages(".", repos = NULL, type = "source") errors.

Actions:

Confirm you are in the repository root directory.
Confirm .Rlib exists and is writable: ls -la .Rlib.
Check for missing system dependencies in the error output.

Stale Stan cache

Symptom: unexpected model behaviour after updating the package.

Actions:

Clear the cache: rm -rf .cache/dsambayes.
Re-run with model.force_recompile: true in your config (or leave default false — v1.2.2 auto-detects stale caches).

Permission issues

Symptom: write failures for library, cache, or run outputs.

Actions:

Ensure .Rlib, .cache, and results/ are writable.
Keep R_LIBS_USER and XDG_CACHE_HOME set in your shell session.
Run all commands from the repository root.

Concepts

What is DSAMbayes?

DSAMbayes is an R package that fits Bayesian marketing mix models (MMM) using Stan. It provides a familiar lm()-style interface for specifying models, adds prior and boundary controls, and delegates estimation to Stan’s Hamiltonian Monte Carlo (HMC) sampler. The result is a full posterior distribution over model parameters — not just point estimates — enabling rigorous uncertainty quantification for media contribution and budget allocation decisions.

Why Bayesian MMM?

Classical OLS regression gives point estimates and confidence intervals that assume the model is correctly specified. In MMM, where media variables are collinear, sample sizes are small (often 100–200 weeks), and the functional form is uncertain, these assumptions are routinely violated.

Bayesian estimation addresses this by:

Regularisation through priors — weakly informative priors stabilise estimates when the data alone cannot separate correlated effects. This is particularly valuable for media channels with overlapping campaign timing.
Hard parameter constraints — boundary constraints (e.g. media coefficients must be non-negative) are enforced directly in the posterior, rather than post-hoc.
Full uncertainty propagation — every downstream output (fitted values, decomposition, budget allocation) carries the full posterior uncertainty, not just a point estimate ± standard error.
Principled model comparison — leave-one-out cross-validation (LOO-CV) via Pareto-smoothed importance sampling provides predictive model comparison without refitting.

Model classes

DSAMbayes supports three model classes, all sharing the same post-fit interface:

BLM (Bayesian Linear Model)

The simplest class. One market, one response variable, one set of predictors. Equivalent to lm() but with Bayesian estimation.

kpi ~ trend + seasonality + holidays + media_channels

Use when: single-market modelling with sufficient data (typically 100+ weeks).

Hierarchical

Panel data with multiple groups (e.g. markets, regions, brands). Random effects allow each group to deviate from the population average while sharing information across groups (partial pooling).

kpi ~ population_terms + (varying_terms | group)

Use when: multi-market data where you want to borrow strength across markets while allowing market-specific effects.

Pooled

Single-market data where media variables have a nested structure (e.g. campaign > channel > platform). Coefficients are pooled across labelled dimensions.

blm(formula, data) %>% pool(grouping_vars, variable_map)

Use when: single-market data with structured media hierarchies.

The estimation pipeline

Every DSAMbayes model follows the same lifecycle:

Construct — blm() creates an unfitted model object with default priors and boundaries.
Configure — set_prior(), set_boundary(), set_cre() adjust the specification.
Fit — fit() (MCMC) or optimise() (MAP) estimates the posterior.
Extract — get_posterior(), fitted(), decomp() retrieve results.
Decide — optimise_budget() translates estimates into budget allocation recommendations.

Key concepts for practitioners

Priors

A prior distribution encodes what you believe about a parameter before seeing the data. In DSAMbayes:

Default priors are normal(0, 5) — weakly informative, centred at zero.
Informative priors can be set when domain knowledge justifies it (e.g. price_index ~ normal(-0.2, 0.1) if you know price has a small negative effect).
Priors are specified using formula notation: set_prior(model, m_tv ~ normal(0, 10)).

Boundaries

Hard constraints on parameter values. The posterior density is zero outside the boundary:

set_boundary(model, m_tv > 0) forces the TV coefficient to be non-negative.
Default boundaries are unconstrained (-Inf, Inf).

MCMC vs MAP

MCMC (fit()) draws samples from the full posterior distribution. Slower but gives complete uncertainty quantification. Use for final reporting.
MAP (optimise()) finds the single most probable parameter vector. Much faster but gives only a point estimate. Use for rapid iteration during model development.

Response scale

Models can operate on the original KPI scale (identity response) or the log scale:

Identity: kpi ~ ... — coefficients represent unit changes in KPI.
Log: log(kpi) ~ ... — coefficients represent approximate percentage changes.

Log-response models require careful back-transformation to the KPI scale. DSAMbayes handles this automatically via fitted_kpi(). See Response Scale Semantics.

Diagnostics

After fitting, DSAMbayes runs a battery of diagnostic checks:

Sampler quality — Rhat, effective sample size, divergences.
Residual behaviour — autocorrelation, normality, Ljung-Box test.
Identifiability — baseline-media correlation.
Boundary monitoring — share of draws hitting constraints.

Each check produces a pass, warn, or fail status. See Diagnostics Gates.

The YAML runner

For reproducible, configuration-driven runs, DSAMbayes provides a CLI runner:

Rscript scripts/dsambayes.R run --config config/my_model.yaml

The runner reads a YAML file specifying the data, formula, priors, boundaries, fit settings, diagnostics policy, and optional budget optimisation. It writes structured artefacts (CSVs, plots, model objects) to a timestamped directory under results/. See CLI Usage and Config Schema.

Your First BLM Model

Goal

Build, fit, and interpret a single-market Bayesian linear model (BLM) using the DSAMbayes R API.

Prerequisites

DSAMbayes installed locally (see Install and Setup).
Familiarity with R and lm()-style formulas.

Dataset

This walkthrough uses the synthetic dataset shipped at data/synthetic_dsam_example_wide_data.csv. It contains weekly observations for a single market with columns for:

Response: kpi_os_hfb01_value — weekly KPI (e.g. revenue).
Media: m_tv, m_search, m_social, m_display, m_ooh, m_email, m_affiliate.
Controls: t_scaled (trend), sin52_1/cos52_1/sin52_2/cos52_2 (Fourier seasonality), price_index, distribution, bm_ikea_trust_12r (brand metric).
Holidays: h_black_friday, h_christmas, h_new_year, h_easter, h_summer_sale.

library(DSAMbayes)
df <- read.csv("data/synthetic_dsam_example_wide_data.csv")
str(df)

Step 1: Construct the model

blm() creates an unfitted model object. No fitting happens yet.

model <- blm(
  kpi_os_hfb01_value ~
    t_scaled + sin52_1 + cos52_1 + sin52_2 + cos52_2 +
    h_black_friday + h_christmas + h_new_year + h_easter + h_summer_sale +
    bm_ikea_trust_12r + price_index + distribution +
    m_tv + m_search + m_social + m_display + m_ooh + m_email + m_affiliate,
  data = df
)

Inspect defaults:

peek_prior(model)      # normal(0, 5) for all terms
peek_boundary(model)   # (-Inf, Inf) — unconstrained

Step 2: Set boundaries

Media channels should have non-negative effects. Use inequality notation:

model <- model %>%
  set_boundary(
    m_tv > 0, m_search > 0, m_social > 0,
    m_display > 0, m_ooh > 0, m_email > 0, m_affiliate > 0
  )

Step 3: (Optional) Override priors

Default priors are weakly informative. Override only with domain knowledge:

model <- model %>%
  set_prior(price_index ~ normal(-0.2, 0.1))

See Minimal-Prior Policy for guidance.

Step 4: Fit with MCMC

fitted_model <- model %>%
  fit(cores = 2, iter = 2000, warmup = 1000, seed = 123)

First-time Stan compilation takes 1–3 minutes. Subsequent runs use a cached binary. With 2 chains on synthetic data, sampling typically completes in under 2 minutes.

Step 5: Sampler diagnostics

chain_diagnostics(fitted_model)

Metric	Good	Concern
Max Rhat	< 1.01	> 1.05 means chains have not converged
Min ESS (bulk)	> 400	< 200 means too few effective samples
Divergences	0	Any non-zero count warrants investigation

Step 6: Extract the posterior

post <- get_posterior(fitted_model)

post is a tibble with one row per draw containing coef (named coefficient list), yhat (fitted values), noise_sd, r2, rmse, and smape.

Summarise coefficients:

library(dplyr); library(tidyr)

coef_summary <- post %>%
  select(coef) %>%
  unnest_wider(coef) %>%
  pivot_longer(everything(), names_to = "term") %>%
  group_by(term) %>%
  summarise(
    mean = mean(value), median = median(value), sd = sd(value),
    ci_low = quantile(value, 0.025), ci_high = quantile(value, 0.975),
    .groups = "drop"
  )
print(coef_summary, n = 30)

What to look for:

Media coefficients should be positive (boundaries enforce this).
Wide credible intervals mean the prior dominates — the data cannot identify the effect precisely.

Step 7: Assess model fit

fit_tbl <- fitted(fitted_model)
cat("Median R²:", median(r2(fitted_model)), "\n")
cat("Median RMSE:", median(rmse(fitted_model)), "\n")

For a well-specified MMM on weekly data, in-sample R² above 0.85 is typical.

Step 8: Response decomposition

decomp_tbl <- decomp(fitted_model)
head(decomp_tbl)

Shows each term’s contribution (coefficient × design-matrix column) to the predicted KPI at each time point — the foundation for media contribution and ROI reporting.

Step 9: MAP for rapid iteration

During development, use MAP for fast point estimates:

map_model <- model %>% optimise(n_runs = 10)
get_posterior(map_model)

Use MCMC for final reporting; MAP for formula iteration.

Common pitfalls

Pitfall	Symptom	Fix
Forgetting to set boundaries	Media coefficients go negative	Add `set_boundary(m_x > 0)`
Too few iterations	High Rhat, low ESS	Increase `iter` and `warmup`
Missing controls	High residual autocorrelation	Add trend, seasonality, or holiday terms
Scaling confusion	Coefficients look wrong	`model.scale: true` is default; `get_posterior()` back-transforms automatically

Next steps

Run via YAML for reproducible runs: Run from YAML
Interpret diagnostics: Interpret Diagnostics
Budget optimisation: Budget Optimisation
Multi-market models: Your First Hierarchical Model

Your First Hierarchical Model

Goal

Build, fit, and interpret a multi-market hierarchical model with partial pooling and optional CRE (Mundlak) correction using the DSAMbayes R API.

Prerequisites

DSAMbayes installed locally (see Install and Setup).
Familiarity with the BLM workflow (see Your First BLM Model).
Understanding of random-effects / mixed-model concepts.

Dataset

This walkthrough uses data/synthetic_dsam_example_hierarchical_data.csv — a panel dataset with weekly observations across multiple markets. Key columns:

Response: kpi_value — weekly KPI per market.
Group: market — market identifier.
Media: m_tv, m_search, m_social — media exposure variables.
Controls: trend, seasonality, brand_metric.
Date: date — weekly date index.

library(DSAMbayes)
panel_df <- read.csv("data/synthetic_dsam_example_hierarchical_data.csv")
table(panel_df$market)  # Check group counts

Step 1: Construct the hierarchical model

The (term | group) syntax tells DSAMbayes to fit random effects. Terms inside the parentheses get group-specific deviations from the population mean:

model <- blm(
  kpi_value ~
    trend + seasonality + brand_metric +
    m_tv + m_search + m_social +
    (1 + m_tv + m_search + m_social | market),
  data = panel_df
)

This specifies:

Population (fixed) effects for all terms — the average effect across markets.
Random intercepts and slopes for media terms by market — each market can deviate from the population average.

Step 2: Set boundaries

model <- model %>%
  set_boundary(m_tv > 0, m_search > 0, m_social > 0)

Boundaries apply to the population-level coefficients.

Step 3: (Optional) Add CRE / Mundlak correction

If you suspect that group-level spending patterns are correlated with unobserved market characteristics (e.g. high-spend markets also have higher baseline demand), CRE controls for this:

model <- model %>%
  set_cre(vars = c("m_tv", "m_search", "m_social"))

This adds cre_mean_m_tv, cre_mean_m_search, cre_mean_m_social as fixed effects — the group-level means of each media variable. The within-group coefficients then represent purely temporal variation, controlling for between-group confounding.

See CRE / Mundlak for when and why to use this.

Step 4: Fit with MCMC

fitted_model <- model %>%
  fit(cores = 4, iter = 2000, warmup = 1000, seed = 42)

Hierarchical models are slower than BLM — expect 10–30 minutes depending on group count and data size. First-time Stan compilation of the hierarchical template adds 2–3 minutes.

Step 5: Check diagnostics

chain_diagnostics(fitted_model)

Pay special attention to Rhat and ESS for sd_* parameters (group-level standard deviations), which are often harder to estimate than population coefficients.

Step 6: Extract the posterior

post <- get_posterior(fitted_model)

For hierarchical models, coefficient draws from get_posterior() return vectors (one value per group) rather than scalars. The population-level (fixed-effect) estimates are averaged across groups.

Step 7: Group-level results

Fitted values and decomposition are returned per group:

# Fitted values — one row per observation, grouped by market
fit_tbl <- fitted(fitted_model)
head(fit_tbl)

# Decomposition — per-group predictor contributions
decomp_tbl <- decomp(fitted_model)

Step 8: Budget optimisation (population level)

Budget optimisation uses population-level (fixed-effect) beta draws, not group-specific totals:

# See Budget Optimisation docs for full scenario specification
result <- optimise_budget(fitted_model, scenario = my_scenario)

Key differences from BLM

Aspect	BLM	Hierarchical
Data structure	Single market	Panel (multiple groups)
Coefficient draws	Scalars	Vectors (one per group)
Fit time	2–5 min	10–30 min
Decomposition	Direct	May fail gracefully for `
Forest/prior-posterior plots	Direct	Group-averaged population estimates
Stan template	`bayes_lm_updater_revised.stan`	`general_hierarchical.stan` (templated per group count)

Common pitfalls

Pitfall	Symptom	Fix
Too few groups	Weak partial pooling; group SDs poorly estimated	Need 4+ groups for meaningful hierarchical structure
Too few obs per group	High Rhat on `sd_*` parameters	Increase iterations; simplify random-effect structure
CRE with too many vars	More CRE variables than groups	Reduce CRE variable set; see identification warnings
CRE mean has zero variance	`scale=TRUE` aborts with constant column error	Use `model.type: re` (without CRE) or `model.scale: false`

Next steps

CRE / Mundlak — full CRE documentation
Model Classes — detailed class comparison
Diagnostics Gates — hierarchical-specific checks (within-variation)
Run from YAML — reproducible hierarchical runs via the runner

Quickstart (YAML Runner)

Goal

Complete one reproducible DSAMbayes runner execution from validation to artefact inspection, then load the fitted model in R to explore the results interactively.

Before you start

Complete the setup in Install and Setup. If you want to build a model interactively from R code instead of YAML, see Your First BLM Model.

1. Set up the environment

Open a terminal in the repository root:

mkdir -p .Rlib .cache
export R_LIBS_USER="$PWD/.Rlib"
export XDG_CACHE_HOME="$PWD/.cache"

2. Validate the configuration (dry run)

R_LIBS_USER="$PWD/.Rlib" \
  Rscript scripts/dsambayes.R validate --config config/blm_synthetic_mcmc.yaml

Expected: exits with code 0. No Stan compilation or sampling occurs.

3. Execute the full run

R_LIBS_USER="$PWD/.Rlib" \
  Rscript scripts/dsambayes.R run --config config/blm_synthetic_mcmc.yaml

Expected: a timestamped run directory under results/ with staged outputs.

4. Locate and inspect the run directory

latest_run="$(ls -td results/* | head -n 1)"
echo "$latest_run"
find "$latest_run" -maxdepth 2 -type d | sort

Expected stage folders:

Folder	Content
`00_run_metadata/`	Resolved config, session info
`10_pre_run/`	VIF report, data dictionary, media spend plots
`20_model_fit/`	Fitted model object, fit plots
`30_post_run/`	Posterior summary, fitted/observed CSVs
`40_diagnostics/`	Diagnostics report, residual plots
`50_model_selection/`	LOO summary, Pareto-k diagnostics
`60_optimisation/`	Budget allocation, response curves (when enabled)

5. Verify key artefacts

test -f "$latest_run/00_run_metadata/config.resolved.yaml" && echo "ok: config.resolved.yaml"
test -f "$latest_run/20_model_fit/model.rds" && echo "ok: model.rds"
test -f "$latest_run/30_post_run/posterior_summary.csv" && echo "ok: posterior_summary.csv"
test -f "$latest_run/40_diagnostics/diagnostics_report.csv" && echo "ok: diagnostics_report.csv"

6. Load the model in R

The fitted model is saved as an RDS object. Load it interactively to explore:

library(DSAMbayes)

model <- readRDS("results/<run_dir>/20_model_fit/model.rds")

# Posterior coefficient summary
post <- get_posterior(model)

# Fit quality
cat("Median R²:", median(r2(model)), "\n")

# Sampler diagnostics
chain_diagnostics(model)

# Fitted values
head(fitted(model))

7. Review diagnostics

Open 40_diagnostics/diagnostics_report.csv — each row is one diagnostic check:

cat "$latest_run/40_diagnostics/diagnostics_summary.txt"

pass — no action needed.
warn — review recommended; see Interpret Diagnostics.
fail — remediate before using results for decisions.

8. Bootstrap a new config

Generate a template for your own data:

R_LIBS_USER="$PWD/.Rlib" \
  Rscript scripts/dsambayes.R init --template blm --out config/my_model.yaml

Edit config/my_model.yaml to point to your data and formula, then validate and run.

If the quickstart fails

Re-run validate before run to catch config errors early.
Check the error message — DSAMbayes uses cli::cli_abort() with descriptive context.
Inspect 00_run_metadata/config.resolved.yaml to see what defaults were applied.
See Debug Run Failures for common failure modes.

Next steps

Your First BLM Model — interactive R API walkthrough
Config Schema — full YAML reference
Output Artefacts — what each file means
Plot Catalogue — how to interpret every plot

FAQ

Installation and setup

How long does the first Stan compilation take?

1–3 minutes on most machines. Subsequent runs use a cached binary and start sampling immediately. If compilation seems stuck, check your C++ toolchain — see Install and Setup.

Do I need to set `R_LIBS_USER` every time?

Yes, unless you add it to your shell profile (.bashrc, .zshrc, or equivalent). The repo-local .Rlib path keeps DSAMbayes and its dependencies isolated from your system R library.

Can I use renv instead of `.Rlib`?

Yes. The repository includes a renv.lock file. Run renv::restore() to install exact dependency versions. See the renv section in Install and Setup.

Modelling

How many weeks of data do I need?

There is no hard minimum, but as a rough guide:

BLM: 100+ weeks for a model with 10–15 predictors. Below 80 weeks, most media effects will be prior-driven.
Hierarchical: 80+ weeks per group, ideally with 4+ groups for meaningful partial pooling.

More data is always better. Short series with many predictors will lean heavily on priors.

Should I use identity or log response?

Identity (kpi ~ ...) when the KPI is naturally additive and variance is roughly constant across levels. Coefficients represent absolute unit changes.
Log (log(kpi) ~ ...) when the KPI is strictly positive, variance scales with level, or you want multiplicative (percentage) effects. Common for revenue and sales.

If unsure, fit both and compare diagnostics.

How many MCMC iterations do I need?

The defaults (iter = 2000, warmup = 1000, chains = 4) are a reasonable starting point. Check diagnostics after fitting:

Rhat < 1.01 and ESS > 400 → iterations are sufficient.
Rhat > 1.05 or ESS < 200 → increase iter and warmup (e.g. double both).
Divergences > 0 → increase adapt_delta (e.g. 0.95 → 0.99) before increasing iterations.

For rapid iteration during development, use MAP estimation (optimise()) instead of MCMC.

When should I set boundaries on media coefficients?

Set m_channel > 0 when you are confident that additional media exposure cannot decrease the KPI. This is the most common boundary specification in MMM. Do not set boundaries on control variables (trend, seasonality, price) unless you have a clear structural reason — see the Minimal-Prior Policy.

When should I use CRE (Mundlak)?

Use CRE when fitting a hierarchical model where:

Time-varying regressors (e.g. media spend) have group-level means correlated with unobserved group heterogeneity.
You want to separate within-group (temporal) effects from between-group (cross-sectional) effects.

CRE adds group-mean terms to the population formula. See CRE / Mundlak.

What does `scale = TRUE` do?

It standardises the response and predictors (centre and divide by SD) before passing them to Stan. This improves sampler efficiency by putting all coefficients on a comparable scale. Post-fit, get_posterior() back-transforms coefficients to the original data scale automatically. Leave it on (the default) unless you have a specific reason to disable it.

Runner and outputs

How long does a typical run take?

Model type	Fit method	Data size	Typical time
BLM	MCMC (2 chains, 2000 iter)	150 weeks	2–5 minutes
BLM	MAP (10 starts)	150 weeks	10–30 seconds
Hierarchical	MCMC (4 chains, 2000 iter)	150 weeks × 5 groups	10–30 minutes
Pooled	MCMC (4 chains, 2000 iter)	150 weeks	5–15 minutes

First-time Stan compilation adds 1–3 minutes.

What is the difference between `validate` and `run`?

validate checks config structure, data paths, formula validity, and cross-field constraints — without compiling or fitting Stan models. Use it as a pre-run gate.
run does everything validate does, then compiles, fits, runs diagnostics, and writes artefacts.

Always validate before run when you change config or data.

Where do outputs go?

By default, under results/<timestamp>_<model_name>/ with numbered stage folders. See Output Artefacts for the full contract.

How do I compare two model runs?

Use compare_runs() in R or compare 50_model_selection/loo_summary.csv files manually. See Compare Runs.

Diagnostics

What does “Pareto-k > 0.7” mean?

It means the PSIS-LOO approximation is unreliable for that observation — the observation is highly influential. A few amber (0.5–0.7) points are normal. Red (> 0.7) points warrant investigation. See Model Selection Plots.

My diagnostics say “warn” — should I worry?

It depends on the policy mode:

explore — warnings are expected during development. Continue iterating.
publish — review warnings before sharing results. Most warns are acceptable if you understand the cause.
strict — warnings require documented justification.

See Diagnostics Gates for threshold details.

Budget optimisation

How does the allocator work?

It generates feasible spend allocations within channel bounds, evaluates each against the posterior, and selects the allocation that maximises the chosen objective (KPI uplift or profit). It is a Monte Carlo search, not an analytical optimiser. See Budget Optimisation.

Can I use budget optimisation with MAP-fitted models?

Yes, but the results will be based on a single point estimate rather than the full posterior distribution. Uncertainty intervals will not be meaningful.

Getting Started

Purpose

Audience

Pages

Subsections of Getting Started

Install and Setup

Audience

Prerequisites

Quick setup (recommended)

Verify the installation

1. Confirm DSAMbayes loads

2. Confirm the runner works

3. (Optional) Run the test suite

Alternative: install from GitHub

Using renv (optional)

Runner setup and first execution

1. Validate the example config

2. Execute a full run

Troubleshooting

Stan compilation fails

Package installation fails

Stale Stan cache

Permission issues

Concepts

What is DSAMbayes?

Why Bayesian MMM?

Model classes

BLM (Bayesian Linear Model)

Hierarchical

Pooled

The estimation pipeline

Key concepts for practitioners

Priors

Boundaries

MCMC vs MAP

Response scale

Diagnostics

The YAML runner

Further reading

Your First BLM Model

Goal

Prerequisites

Dataset

Step 1: Construct the model

Step 2: Set boundaries

Step 3: (Optional) Override priors

Step 4: Fit with MCMC

Step 5: Sampler diagnostics

Step 6: Extract the posterior

Step 7: Assess model fit

Step 8: Response decomposition

Step 9: MAP for rapid iteration

Common pitfalls

Next steps

Your First Hierarchical Model

Goal

Prerequisites

Dataset

Step 1: Construct the hierarchical model

Step 2: Set boundaries

Step 3: (Optional) Add CRE / Mundlak correction

Step 4: Fit with MCMC

Step 5: Check diagnostics

Step 6: Extract the posterior

Step 7: Group-level results

Step 8: Budget optimisation (population level)

Key differences from BLM

Common pitfalls

Next steps

Quickstart (YAML Runner)

Goal

Before you start

1. Set up the environment

2. Validate the configuration (dry run)

3. Execute the full run

4. Locate and inspect the run directory

5. Verify key artefacts

6. Load the model in R

7. Review diagnostics

8. Bootstrap a new config

Do I need to set `R_LIBS_USER` every time?

Can I use renv instead of `.Rlib`?

What does `scale = TRUE` do?

What is the difference between `validate` and `run`?