Your First Hierarchical Model

Goal

Build, fit, and interpret a multi-market hierarchical model with partial pooling and optional CRE (Mundlak) correction using the DSAMbayes R API.

Prerequisites

DSAMbayes installed locally (see Install and Setup).
Familiarity with the BLM workflow (see Your First BLM Model).
Understanding of random-effects / mixed-model concepts.

Dataset

This walkthrough uses data/synthetic_dsam_example_hierarchical_data.csv — a panel dataset with weekly observations across multiple markets. Key columns:

Response: kpi_value — weekly KPI per market.
Group: market — market identifier.
Media: m_tv, m_search, m_social — media exposure variables.
Controls: trend, seasonality, brand_metric.
Date: date — weekly date index.

library(DSAMbayes)
panel_df <- read.csv("data/synthetic_dsam_example_hierarchical_data.csv")
table(panel_df$market)  # Check group counts

Step 1: Construct the hierarchical model

The (term | group) syntax tells DSAMbayes to fit random effects. Terms inside the parentheses get group-specific deviations from the population mean:

model <- blm(
  kpi_value ~
    trend + seasonality + brand_metric +
    m_tv + m_search + m_social +
    (1 + m_tv + m_search + m_social | market),
  data = panel_df
)

This specifies:

Population (fixed) effects for all terms — the average effect across markets.
Random intercepts and slopes for media terms by market — each market can deviate from the population average.

Step 2: Set boundaries

model <- model %>%
  set_boundary(m_tv > 0, m_search > 0, m_social > 0)

Boundaries apply to the population-level coefficients.

Step 3: (Optional) Add CRE / Mundlak correction

If you suspect that group-level spending patterns are correlated with unobserved market characteristics (e.g. high-spend markets also have higher baseline demand), CRE controls for this:

model <- model %>%
  set_cre(vars = c("m_tv", "m_search", "m_social"))

This adds cre_mean_m_tv, cre_mean_m_search, cre_mean_m_social as fixed effects — the group-level means of each media variable. The within-group coefficients then represent purely temporal variation, controlling for between-group confounding.

See CRE / Mundlak for when and why to use this.

Step 4: Fit with MCMC

fitted_model <- model %>%
  fit(cores = 4, iter = 2000, warmup = 1000, seed = 42)

Hierarchical models are slower than BLM — expect 10–30 minutes depending on group count and data size. First-time Stan compilation of the hierarchical template adds 2–3 minutes.

Step 5: Check diagnostics

chain_diagnostics(fitted_model)

Pay special attention to Rhat and ESS for sd_* parameters (group-level standard deviations), which are often harder to estimate than population coefficients.

Step 6: Extract the posterior

post <- get_posterior(fitted_model)

For hierarchical models, coefficient draws from get_posterior() return vectors (one value per group) rather than scalars. The population-level (fixed-effect) estimates are averaged across groups.

Step 7: Group-level results

Fitted values and decomposition are returned per group:

# Fitted values — one row per observation, grouped by market
fit_tbl <- fitted(fitted_model)
head(fit_tbl)

# Decomposition — per-group predictor contributions
decomp_tbl <- decomp(fitted_model)

Step 8: Budget optimisation (population level)

Budget optimisation uses population-level (fixed-effect) beta draws, not group-specific totals:

# See Budget Optimisation docs for full scenario specification
result <- optimise_budget(fitted_model, scenario = my_scenario)

Key differences from BLM

Aspect	BLM	Hierarchical
Data structure	Single market	Panel (multiple groups)
Coefficient draws	Scalars	Vectors (one per group)
Fit time	2–5 min	10–30 min
Decomposition	Direct	May fail gracefully for `
Forest/prior-posterior plots	Direct	Group-averaged population estimates
Stan template	`bayes_lm_updater_revised.stan`	`general_hierarchical.stan` (templated per group count)

Common pitfalls

Pitfall	Symptom	Fix
Too few groups	Weak partial pooling; group SDs poorly estimated	Need 4+ groups for meaningful hierarchical structure
Too few obs per group	High Rhat on `sd_*` parameters	Increase iterations; simplify random-effect structure
CRE with too many vars	More CRE variables than groups	Reduce CRE variable set; see identification warnings
CRE mean has zero variance	`scale=TRUE` aborts with constant column error	Use `model.type: re` (without CRE) or `model.scale: false`

Next steps

CRE / Mundlak — full CRE documentation
Model Classes — detailed class comparison
Diagnostics Gates — hierarchical-specific checks (within-variation)
Run from YAML — reproducible hierarchical runs via the runner