CRE / Mundlak
Purpose
The correlated random effects (CRE) pathway, implemented as a Mundlak device, augments hierarchical DSAMbayes models with group-mean terms. This separates within-group variation from between-group variation for selected regressors, reducing confounding bias when group-level means are correlated with the random effects.
When to use CRE
Use CRE when:
- The model is hierarchical (panel data with
(term | group)syntax). - Time-varying regressors (e.g. media spend) have group-level means that may be correlated with the group intercept or slope.
- You want to decompose effects into within-group (temporal) and between-group (cross-sectional) components.
Do not use CRE when:
- The model is BLM or pooled (CRE requires hierarchical class).
- The panel has only one group (no between-group variation exists).
- All regressors of interest are time-invariant (CRE mean terms would be constant).
Construction
CRE is applied after model construction via set_cre():
What set_cre() does
-
Resolves the grouping variable. If the formula has one group factor, it is used automatically. If multiple group factors exist, the
groupargument must be specified explicitly. -
Generates group-mean column names. For each variable in
vars, a mean-term column is namedcre_mean_<variable>(configurable viaprefix). -
Augments the data.
apply_cre_data()computes group-level means of each CRE variable and joins them back to the panel data as new columns. -
Updates the formula. The generated mean terms are appended to the population formula as fixed effects.
-
Extends priors and boundaries. Default prior and boundary entries are added for each new mean term, matching the existing prior schema.
YAML runner configuration
When using the runner, CRE is configured via:
The runner calls set_cre() during model construction if cre.enabled: true.
Mundlak decomposition
For a regressor $x_{gt}$ (group $g$, time $t$), the Mundlak device decomposes the effect into:
- Within-group effect: the coefficient on $x_{gt}$ in the population formula captures temporal variation after conditioning on the group mean.
- Between-group effect: the coefficient on $\bar{x}_g$ (the CRE mean term) captures cross-sectional variation in group-level averages.
The original coefficient on $x_{gt}$ in a standard random-effects model conflates both sources. Adding $\bar{x}_g$ as a fixed effect separates them.
Validation and identification warnings
Input validation
set_cre() validates:
- The model is hierarchical (aborts for BLM or pooled).
- All
varsare present in the data and are numeric. - The
groupvariable exists in the formula’s group factors. - No CRE mean terms appear in random-slope blocks (would cause double-counting).
Identification warnings
warn_cre_identification() checks two conditions after CRE setup:
-
More CRE variables than groups. If
length(vars) > n_groups, between-effect estimates may be weakly identified. The function emits a warning. -
Near-zero within-group variation. For each CRE variable, the within-group residual ($x_{gt} - \bar{x}_g$) standard deviation is checked. If it is effectively zero, within-effect identification is weak. The function emits a per-variable warning.
Zero-variance CRE mean terms
If a CRE mean term has zero variance across all observations (possible when the underlying variable has identical group means), calculate_scaling_terms() in R/scale.R will abort when scale=TRUE. The error message identifies the constant CRE columns and suggests using model.type: re (without CRE) or model.scale: false as workarounds.
Panel assumptions
- Balanced panels are not required.
apply_cre_data()computes group means usingdplyr::group_by()andmean(), which handles unequal group sizes. - Missing values in CRE variables are excluded from the group-mean calculation (
na.rm = TRUE). - Group-mean recomputation. CRE mean columns are recomputed each time
apply_cre_data()is called, including duringprep_data_for_fit.hierarchical(). Existing CRE mean columns are dropped and regenerated to prevent stale values.
Decomposition and reporting
CRE mean terms appear as ordinary fixed-effect terms in the population formula. This means:
- Posterior summary includes CRE mean-term coefficients alongside other population coefficients.
- Response decomposition via
decomp()attributes fitted-value contributions to CRE mean terms separately from their within-group counterparts. - Plots (posterior forest, prior-vs-posterior) include CRE mean terms.
Interpretation note: the CRE mean-term coefficient represents the between-group effect conditional on the within-group variation. It does not represent the total effect of the underlying variable.
Cross-references
- Model Classes — hierarchical class construction
- Priors and Boundaries — prior schema for CRE terms
- Config Schema —
cre.*YAML keys - Diagnostics Gates — within-group variation check