Your First Hierarchical Model
Goal
Build, fit, and interpret a multi-market hierarchical model with partial pooling and optional CRE (Mundlak) correction using the DSAMbayes R API.
Prerequisites
- DSAMbayes installed locally (see Install and Setup).
- Familiarity with the BLM workflow (see Your First BLM Model).
- Understanding of random-effects / mixed-model concepts.
Dataset
This walkthrough uses data/synthetic_dsam_example_hierarchical_data.csv — a panel dataset with weekly observations across multiple markets. Key columns:
- Response:
kpi_value— weekly KPI per market. - Group:
market— market identifier. - Media:
m_tv,m_search,m_social— media exposure variables. - Controls:
trend,seasonality,brand_metric. - Date:
date— weekly date index.
Step 1: Construct the hierarchical model
The (term | group) syntax tells DSAMbayes to fit random effects. Terms inside the parentheses get group-specific deviations from the population mean:
This specifies:
- Population (fixed) effects for all terms — the average effect across markets.
- Random intercepts and slopes for media terms by market — each market can deviate from the population average.
Step 2: Set boundaries
Boundaries apply to the population-level coefficients.
Step 3: (Optional) Add CRE / Mundlak correction
If you suspect that group-level spending patterns are correlated with unobserved market characteristics (e.g. high-spend markets also have higher baseline demand), CRE controls for this:
This adds cre_mean_m_tv, cre_mean_m_search, cre_mean_m_social as fixed effects — the group-level means of each media variable. The within-group coefficients then represent purely temporal variation, controlling for between-group confounding.
See CRE / Mundlak for when and why to use this.
Step 4: Fit with MCMC
Hierarchical models are slower than BLM — expect 10–30 minutes depending on group count and data size. First-time Stan compilation of the hierarchical template adds 2–3 minutes.
Step 5: Check diagnostics
Pay special attention to Rhat and ESS for sd_* parameters (group-level standard deviations), which are often harder to estimate than population coefficients.
Step 6: Extract the posterior
For hierarchical models, coefficient draws from get_posterior() return vectors (one value per group) rather than scalars. The population-level (fixed-effect) estimates are averaged across groups.
Step 7: Group-level results
Fitted values and decomposition are returned per group:
Step 8: Budget optimisation (population level)
Budget optimisation uses population-level (fixed-effect) beta draws, not group-specific totals:
Key differences from BLM
| Aspect | BLM | Hierarchical |
|---|---|---|
| Data structure | Single market | Panel (multiple groups) |
| Coefficient draws | Scalars | Vectors (one per group) |
| Fit time | 2–5 min | 10–30 min |
| Decomposition | Direct | May fail gracefully for ` |
| Forest/prior-posterior plots | Direct | Group-averaged population estimates |
| Stan template | bayes_lm_updater_revised.stan |
general_hierarchical.stan (templated per group count) |
Common pitfalls
| Pitfall | Symptom | Fix |
|---|---|---|
| Too few groups | Weak partial pooling; group SDs poorly estimated | Need 4+ groups for meaningful hierarchical structure |
| Too few obs per group | High Rhat on sd_* parameters |
Increase iterations; simplify random-effect structure |
| CRE with too many vars | More CRE variables than groups | Reduce CRE variable set; see identification warnings |
| CRE mean has zero variance | scale=TRUE aborts with constant column error |
Use model.type: re (without CRE) or model.scale: false |
Next steps
- CRE / Mundlak — full CRE documentation
- Model Classes — detailed class comparison
- Diagnostics Gates — hierarchical-specific checks (within-variation)
- Run from YAML — reproducible hierarchical runs via the runner