FAQ
Installation and setup
How long does the first Stan compilation take?
1–3 minutes on most machines. Subsequent runs use a cached binary and start sampling immediately. If compilation seems stuck, check your C++ toolchain — see Install and Setup.
Do I need to set R_LIBS_USER every time?
Yes, unless you add it to your shell profile (.bashrc, .zshrc, or equivalent). The repo-local .Rlib path keeps DSAMbayes and its dependencies isolated from your system R library.
Can I use renv instead of .Rlib?
Yes. The repository includes a renv.lock file. Run renv::restore() to install exact dependency versions. See the renv section in Install and Setup.
Modelling
How many weeks of data do I need?
There is no hard minimum, but as a rough guide:
- BLM: 100+ weeks for a model with 10–15 predictors. Below 80 weeks, most media effects will be prior-driven.
- Hierarchical: 80+ weeks per group, ideally with 4+ groups for meaningful partial pooling.
More data is always better. Short series with many predictors will lean heavily on priors.
Should I use identity or log response?
- Identity (
kpi ~ ...) when the KPI is naturally additive and variance is roughly constant across levels. Coefficients represent absolute unit changes. - Log (
log(kpi) ~ ...) when the KPI is strictly positive, variance scales with level, or you want multiplicative (percentage) effects. Common for revenue and sales.
If unsure, fit both and compare diagnostics.
How many MCMC iterations do I need?
The defaults (iter = 2000, warmup = 1000, chains = 4) are a reasonable starting point. Check diagnostics after fitting:
- Rhat < 1.01 and ESS > 400 → iterations are sufficient.
- Rhat > 1.05 or ESS < 200 → increase
iterandwarmup(e.g. double both). - Divergences > 0 → increase
adapt_delta(e.g. 0.95 → 0.99) before increasing iterations.
For rapid iteration during development, use MAP estimation (optimise()) instead of MCMC.
When should I set boundaries on media coefficients?
Set m_channel > 0 when you are confident that additional media exposure cannot decrease the KPI. This is the most common boundary specification in MMM. Do not set boundaries on control variables (trend, seasonality, price) unless you have a clear structural reason — see the Minimal-Prior Policy.
When should I use CRE (Mundlak)?
Use CRE when fitting a hierarchical model where:
- Time-varying regressors (e.g. media spend) have group-level means correlated with unobserved group heterogeneity.
- You want to separate within-group (temporal) effects from between-group (cross-sectional) effects.
CRE adds group-mean terms to the population formula. See CRE / Mundlak.
What does scale = TRUE do?
It standardises the response and predictors (centre and divide by SD) before passing them to Stan. This improves sampler efficiency by putting all coefficients on a comparable scale. Post-fit, get_posterior() back-transforms coefficients to the original data scale automatically. Leave it on (the default) unless you have a specific reason to disable it.
Runner and outputs
How long does a typical run take?
| Model type | Fit method | Data size | Typical time |
|---|---|---|---|
| BLM | MCMC (2 chains, 2000 iter) | 150 weeks | 2–5 minutes |
| BLM | MAP (10 starts) | 150 weeks | 10–30 seconds |
| Hierarchical | MCMC (4 chains, 2000 iter) | 150 weeks × 5 groups | 10–30 minutes |
| Pooled | MCMC (4 chains, 2000 iter) | 150 weeks | 5–15 minutes |
First-time Stan compilation adds 1–3 minutes.
What is the difference between validate and run?
validatechecks config structure, data paths, formula validity, and cross-field constraints — without compiling or fitting Stan models. Use it as a pre-run gate.rundoes everythingvalidatedoes, then compiles, fits, runs diagnostics, and writes artefacts.
Always validate before run when you change config or data.
Where do outputs go?
By default, under results/<timestamp>_<model_name>/ with numbered stage folders. See Output Artefacts for the full contract.
How do I compare two model runs?
Use compare_runs() in R or compare 50_model_selection/loo_summary.csv files manually. See Compare Runs.
Diagnostics
What does “Pareto-k > 0.7” mean?
It means the PSIS-LOO approximation is unreliable for that observation — the observation is highly influential. A few amber (0.5–0.7) points are normal. Red (> 0.7) points warrant investigation. See Model Selection Plots.
My diagnostics say “warn” — should I worry?
It depends on the policy mode:
explore— warnings are expected during development. Continue iterating.publish— review warnings before sharing results. Most warns are acceptable if you understand the cause.strict— warnings require documented justification.
See Diagnostics Gates for threshold details.
Budget optimisation
How does the allocator work?
It generates feasible spend allocations within channel bounds, evaluates each against the posterior, and selects the allocation that maximises the chosen objective (KPI uplift or profit). It is a Monte Carlo search, not an analytical optimiser. See Budget Optimisation.
Can I use budget optimisation with MAP-fitted models?
Yes, but the results will be based on a single point estimate rather than the full posterior distribution. Uncertainty intervals will not be meaningful.