Diagnostics Gates
Purpose
DSAMbayes runs a deterministic diagnostics framework after model fitting. Each diagnostic check produces a pass, warn, or fail status. The policy mode controls how lenient or strict the thresholds are. This page defines the check taxonomy, threshold tables, policy modes, identifiability gate, and the overall status aggregation rule.
Policy modes
The diagnostics framework supports three policy modes, configured via diagnostics.policy_mode in YAML:
| Mode | Intent | Threshold behaviour |
|---|---|---|
explore |
Rapid iteration during model development | Relaxed fail thresholds; many checks can only warn, not fail |
publish |
Default production mode for shareable outputs | Balanced thresholds; condition-number fail is downgraded to warn |
strict |
Audit-grade gating for release candidates | Tightest thresholds; rank deficit fails rather than warns |
The mode is resolved by diagnostics_policy_thresholds(mode) in R/diagnostics_report.R.
Check taxonomy
Checks are organised into phases:
| Phase | Scope | When evaluated |
|---|---|---|
P0 |
Data integrity and design matrix validity | Pre-fit (design matrix available) |
P1 |
Sampler quality, residual behaviour, identifiability | Post-fit (posterior available) |
Each check row includes:
| Field | Meaning |
|---|---|
check_id |
Unique identifier |
phase |
P0 or P1 |
severity |
Priority rating (P0 = critical, P1 = important) |
status |
pass, warn, fail, or skipped |
metric |
Metric name |
value |
Observed value |
threshold |
Applied threshold description |
message |
Human-readable explanation |
P0 design checks
| Check ID | Metric | Pass | Warn | Fail |
|---|---|---|---|---|
pre_response_finite |
non_finite_response_count |
== 0 |
— | > 0 |
pre_design_constants_duplicates |
constant_plus_duplicate_columns |
== 0 |
— | > 0 |
pre_design_rank_deficit |
rank_deficit |
== 0 |
> 0 (publish) |
> 0 (strict) |
pre_design_condition_number |
kappa_X |
≤ warn |
> warn |
> fail |
Condition number thresholds by mode
| Mode | Warn | Fail |
|---|---|---|
explore |
10,000 | ∞ (cannot fail) |
publish |
10,000 | 1,000,000 (downgraded to warn) |
strict |
10,000 | 1,000,000 |
P1 sampler checks (MCMC only)
| Check ID | Metric | Direction | Warn | Fail |
|---|---|---|---|---|
sampler_rhat_max |
max_rhat |
Lower is better | 1.01 | 1.05 |
sampler_ess_bulk_min |
min_ess_bulk |
Higher is better | 400 | 200 |
sampler_ess_tail_min |
min_ess_tail |
Higher is better | 200 | 100 |
sampler_ebfmi_min |
min_ebfmi |
Higher is better | 0.30 | 0.20 |
sampler_treedepth_frac |
treedepth_hit_fraction |
Lower is better | 0.00 | 0.01 |
sampler_divergences |
divergent_fraction |
Lower is better | 0.00 | 0.00 |
Mode adjustments for sampler checks
In explore mode, fail thresholds are substantially relaxed (e.g. rhat_fail = 1.10, ess_bulk_fail = 50). In strict mode, warn thresholds match publish fail thresholds.
P1 residual checks
| Check ID | Metric | Direction | Warn | Fail |
|---|---|---|---|---|
resid_ljung_box_p |
resid_lb_p |
Higher is better | 0.05 | 0.01 |
resid_acf_max |
resid_acf_max |
Lower is better | 0.20 | 0.40 |
Mode adjustments for residual checks
| Mode | resid_lb_p warn |
resid_lb_p fail |
resid_acf warn |
resid_acf fail |
|---|---|---|---|---|
explore |
0.05 | 0.00 (cannot fail) | 0.20 | ∞ (cannot fail) |
publish |
0.05 | 0.01 | 0.20 | 0.40 |
strict |
0.10 | 0.05 | 0.15 | 0.30 |
P1 boundary hit check
| Check ID | Metric | Direction | Warn | Fail |
|---|---|---|---|---|
boundary_hit_fraction |
boundary_hit_frac |
Lower is better | 0.05 | 0.20 |
In explore mode, boundary hits cannot fail. In strict mode, thresholds tighten to warn > 0.02, fail > 0.10.
P1 within-group variation check
| Check ID | Metric | Direction | Warn | Fail |
|---|---|---|---|---|
within_var_ratio |
within_var_min_ratio |
Higher is better | 0.10 | 0.05 |
This check applies to hierarchical models and flags groups where within-group variation is extremely low relative to between-group variation. In explore mode, the fail threshold is zero (cannot fail).
Identifiability gate
The identifiability gate measures the maximum absolute correlation between baseline terms and media terms in the design matrix. It is configured via diagnostics.identifiability in YAML:
Term detection
- Media terms: explicitly listed in
media_terms. - Baseline terms: union of
baseline_terms, generated time-component terms, and matches frombaseline_regexpatterns. - Both sets are intersected with actual design-matrix columns and filtered to remove constant columns.
Thresholds by mode
| Mode | Warn | Fail |
|---|---|---|
explore |
0.80 | ∞ (cannot fail) |
publish |
0.80 | 0.95 |
strict |
0.70 | 0.85 |
Skip conditions
The identifiability gate reports skipped when:
identifiability.enabled: false- No configured media terms found in the design matrix
- No baseline terms detected from configured terms/regex
- All resolved baseline or media terms are constant
Overall status aggregation
The overall diagnostics status is determined by diagnostics_overall_status():
- If any check has
status == "fail"→ overall status is fail. - If any check has
status == "warn"(and none fail) → overall status is warn. - Otherwise → overall status is pass.
Checks with status == "skipped" do not affect the overall status.
Runner artefact output
The diagnostics framework produces:
| Artefact | Location | Content |
|---|---|---|
diagnostics_report.csv |
40_diagnostics/ |
Full check table with all fields |
diagnostics_summary.txt |
40_diagnostics/ |
Human-readable summary of overall status and failing checks |
Interpretation guidance
pass— no remediation needed; model is suitable for the configured policy mode.warn— review recommended; the model may have quality concerns but does not block the configured policy.fail— remediation required before the model can be considered production-ready under the configured policy.
Common remediation actions
| Diagnostic area | Warning signs | Actions |
|---|---|---|
| High Rhat | > 1.01 |
Increase MCMC iterations or warmup; simplify model |
| Low ESS | < 400 bulk or < 200 tail |
Increase iterations; check for multimodality |
| Divergences | Any non-zero fraction | Increase adapt_delta; reparameterise model |
| High condition number | kappa > 10,000 |
Reduce collinearity; remove redundant terms |
| Residual autocorrelation | High ACF or low Ljung-Box p | Add time controls (trend, seasonality, holidays) |
| Boundary hits | > 5% of draws |
Review boundary specification; widen or remove constraints |
| High baseline-media correlation | > 0.80 |
Add controls to separate baseline from media; consider alternative model specifications |
Cross-references
- Model Classes — which diagnostics apply to each class
- Response Scale Semantics — residuals are computed on model scale
- Config Schema —
diagnostics.*YAML keys - Output Artefacts — diagnostics artefact paths
- Diagnostics Plots — visual diagnostic outputs