CRE / Mundlak

Purpose

The correlated random effects (CRE) pathway, implemented as a Mundlak device, augments hierarchical DSAMbayes models with group-mean terms. This separates within-group variation from between-group variation for selected regressors, reducing confounding bias when group-level means are correlated with the random effects.

When to use CRE

Use CRE when:

The model is hierarchical (panel data with (term | group) syntax).
Time-varying regressors (e.g. media spend) have group-level means that may be correlated with the group intercept or slope.
You want to decompose effects into within-group (temporal) and between-group (cross-sectional) components.

Do not use CRE when:

The model is BLM or pooled (CRE requires hierarchical class).
The panel has only one group (no between-group variation exists).
All regressors of interest are time-invariant (CRE mean terms would be constant).

Construction

CRE is applied after model construction via set_cre():

model <- blm(
  kpi ~ m_tv + m_search + trend + (1 + m_tv + m_search | market),
  data = panel_df
)
model <- set_cre(model, vars = c("m_tv", "m_search"))

What `set_cre()` does

Resolves the grouping variable. If the formula has one group factor, it is used automatically. If multiple group factors exist, the group argument must be specified explicitly.
Generates group-mean column names. For each variable in vars, a mean-term column is named cre_mean_<variable> (configurable via prefix).
Augments the data. apply_cre_data() computes group-level means of each CRE variable and joins them back to the panel data as new columns.
Updates the formula. The generated mean terms are appended to the population formula as fixed effects.
Extends priors and boundaries. Default prior and boundary entries are added for each new mean term, matching the existing prior schema.

YAML runner configuration

When using the runner, CRE is configured via:

cre:
  enabled: true
  vars: [m_tv, m_search, m_social]
  group: market
  prefix: cre_mean_

The runner calls set_cre() during model construction if cre.enabled: true.

Mundlak decomposition

For a regressor $x_{gt}$ (group $g$, time $t$), the Mundlak device decomposes the effect into:

Within-group effect: the coefficient on $x_{gt}$ in the population formula captures temporal variation after conditioning on the group mean.
Between-group effect: the coefficient on $\bar{x}_g$ (the CRE mean term) captures cross-sectional variation in group-level averages.

The original coefficient on $x_{gt}$ in a standard random-effects model conflates both sources. Adding $\bar{x}_g$ as a fixed effect separates them.

Validation and identification warnings

Input validation

set_cre() validates:

The model is hierarchical (aborts for BLM or pooled).
All vars are present in the data and are numeric.
The group variable exists in the formula’s group factors.
No CRE mean terms appear in random-slope blocks (would cause double-counting).

Identification warnings

warn_cre_identification() checks two conditions after CRE setup:

More CRE variables than groups. If length(vars) > n_groups, between-effect estimates may be weakly identified. The function emits a warning.
Near-zero within-group variation. For each CRE variable, the within-group residual ($x_{gt} - \bar{x}_g$) standard deviation is checked. If it is effectively zero, within-effect identification is weak. The function emits a per-variable warning.

Zero-variance CRE mean terms

If a CRE mean term has zero variance across all observations (possible when the underlying variable has identical group means), calculate_scaling_terms() in R/scale.R will abort when scale=TRUE. The error message identifies the constant CRE columns and suggests using model.type: re (without CRE) or model.scale: false as workarounds.

Panel assumptions

Balanced panels are not required. apply_cre_data() computes group means using dplyr::group_by() and mean(), which handles unequal group sizes.
Missing values in CRE variables are excluded from the group-mean calculation (na.rm = TRUE).
Group-mean recomputation. CRE mean columns are recomputed each time apply_cre_data() is called, including during prep_data_for_fit.hierarchical(). Existing CRE mean columns are dropped and regenerated to prevent stale values.

Decomposition and reporting

CRE mean terms appear as ordinary fixed-effect terms in the population formula. This means:

Posterior summary includes CRE mean-term coefficients alongside other population coefficients.
Response decomposition via decomp() attributes fitted-value contributions to CRE mean terms separately from their within-group counterparts.
Plots (posterior forest, prior-vs-posterior) include CRE mean terms.

Interpretation note: the CRE mean-term coefficient represents the between-group effect conditional on the within-group variation. It does not represent the total effect of the underlying variable.

Cross-references

Model Classes — hierarchical class construction
Priors and Boundaries — prior schema for CRE terms
Config Schema — cre.* YAML keys
Diagnostics Gates — within-group variation check