Your First Hierarchical Model

Goal

Build, fit, and interpret a multi-market hierarchical model with partial pooling and optional CRE (Mundlak) correction using the DSAMbayes R API.

Prerequisites

Dataset

This walkthrough uses data/synthetic_dsam_example_hierarchical_data.csv — a panel dataset with weekly observations across multiple markets. Key columns:

  • Response: kpi_value — weekly KPI per market.
  • Group: market — market identifier.
  • Media: m_tv, m_search, m_social — media exposure variables.
  • Controls: trend, seasonality, brand_metric.
  • Date: date — weekly date index.
library(DSAMbayes)
panel_df <- read.csv("data/synthetic_dsam_example_hierarchical_data.csv")
table(panel_df$market)  # Check group counts

Step 1: Construct the hierarchical model

The (term | group) syntax tells DSAMbayes to fit random effects. Terms inside the parentheses get group-specific deviations from the population mean:

model <- blm(
  kpi_value ~
    trend + seasonality + brand_metric +
    m_tv + m_search + m_social +
    (1 + m_tv + m_search + m_social | market),
  data = panel_df
)

This specifies:

  • Population (fixed) effects for all terms — the average effect across markets.
  • Random intercepts and slopes for media terms by market — each market can deviate from the population average.

Step 2: Set boundaries

model <- model %>%
  set_boundary(m_tv > 0, m_search > 0, m_social > 0)

Boundaries apply to the population-level coefficients.

Step 3: (Optional) Add CRE / Mundlak correction

If you suspect that group-level spending patterns are correlated with unobserved market characteristics (e.g. high-spend markets also have higher baseline demand), CRE controls for this:

model <- model %>%
  set_cre(vars = c("m_tv", "m_search", "m_social"))

This adds cre_mean_m_tv, cre_mean_m_search, cre_mean_m_social as fixed effects — the group-level means of each media variable. The within-group coefficients then represent purely temporal variation, controlling for between-group confounding.

See CRE / Mundlak for when and why to use this.

Step 4: Fit with MCMC

fitted_model <- model %>%
  fit(cores = 4, iter = 2000, warmup = 1000, seed = 42)

Hierarchical models are slower than BLM — expect 10–30 minutes depending on group count and data size. First-time Stan compilation of the hierarchical template adds 2–3 minutes.

Step 5: Check diagnostics

chain_diagnostics(fitted_model)

Pay special attention to Rhat and ESS for sd_* parameters (group-level standard deviations), which are often harder to estimate than population coefficients.

Step 6: Extract the posterior

post <- get_posterior(fitted_model)

For hierarchical models, coefficient draws from get_posterior() return vectors (one value per group) rather than scalars. The population-level (fixed-effect) estimates are averaged across groups.

Step 7: Group-level results

Fitted values and decomposition are returned per group:

# Fitted values — one row per observation, grouped by market
fit_tbl <- fitted(fitted_model)
head(fit_tbl)

# Decomposition — per-group predictor contributions
decomp_tbl <- decomp(fitted_model)

Step 8: Budget optimisation (population level)

Budget optimisation uses population-level (fixed-effect) beta draws, not group-specific totals:

# See Budget Optimisation docs for full scenario specification
result <- optimise_budget(fitted_model, scenario = my_scenario)

Key differences from BLM

Aspect BLM Hierarchical
Data structure Single market Panel (multiple groups)
Coefficient draws Scalars Vectors (one per group)
Fit time 2–5 min 10–30 min
Decomposition Direct May fail gracefully for `
Forest/prior-posterior plots Direct Group-averaged population estimates
Stan template bayes_lm_updater_revised.stan general_hierarchical.stan (templated per group count)

Common pitfalls

Pitfall Symptom Fix
Too few groups Weak partial pooling; group SDs poorly estimated Need 4+ groups for meaningful hierarchical structure
Too few obs per group High Rhat on sd_* parameters Increase iterations; simplify random-effect structure
CRE with too many vars More CRE variables than groups Reduce CRE variable set; see identification warnings
CRE mean has zero variance scale=TRUE aborts with constant column error Use model.type: re (without CRE) or model.scale: false

Next steps