Diagnostics Gates

Purpose

DSAMbayes runs a deterministic diagnostics framework after model fitting. Each diagnostic check produces a pass, warn, or fail status. The policy mode controls how lenient or strict the thresholds are. This page defines the check taxonomy, threshold tables, policy modes, identifiability gate, and the overall status aggregation rule.

Policy modes

The diagnostics framework supports three policy modes, configured via diagnostics.policy_mode in YAML:

Mode Intent Threshold behaviour
explore Rapid iteration during model development Relaxed fail thresholds; many checks can only warn, not fail
publish Default production mode for shareable outputs Balanced thresholds; condition-number fail is downgraded to warn
strict Audit-grade gating for release candidates Tightest thresholds; rank deficit fails rather than warns

The mode is resolved by diagnostics_policy_thresholds(mode) in R/diagnostics_report.R.

Check taxonomy

Checks are organised into phases:

Phase Scope When evaluated
P0 Data integrity and design matrix validity Pre-fit (design matrix available)
P1 Sampler quality, residual behaviour, identifiability Post-fit (posterior available)

Each check row includes:

Field Meaning
check_id Unique identifier
phase P0 or P1
severity Priority rating (P0 = critical, P1 = important)
status pass, warn, fail, or skipped
metric Metric name
value Observed value
threshold Applied threshold description
message Human-readable explanation

P0 design checks

Check ID Metric Pass Warn Fail
pre_response_finite non_finite_response_count == 0 > 0
pre_design_constants_duplicates constant_plus_duplicate_columns == 0 > 0
pre_design_rank_deficit rank_deficit == 0 > 0 (publish) > 0 (strict)
pre_design_condition_number kappa_X ≤ warn > warn > fail

Condition number thresholds by mode

Mode Warn Fail
explore 10,000 ∞ (cannot fail)
publish 10,000 1,000,000 (downgraded to warn)
strict 10,000 1,000,000

P1 sampler checks (MCMC only)

Check ID Metric Direction Warn Fail
sampler_rhat_max max_rhat Lower is better 1.01 1.05
sampler_ess_bulk_min min_ess_bulk Higher is better 400 200
sampler_ess_tail_min min_ess_tail Higher is better 200 100
sampler_ebfmi_min min_ebfmi Higher is better 0.30 0.20
sampler_treedepth_frac treedepth_hit_fraction Lower is better 0.00 0.01
sampler_divergences divergent_fraction Lower is better 0.00 0.00

Mode adjustments for sampler checks

In explore mode, fail thresholds are substantially relaxed (e.g. rhat_fail = 1.10, ess_bulk_fail = 50). In strict mode, warn thresholds match publish fail thresholds.

P1 residual checks

Check ID Metric Direction Warn Fail
resid_ljung_box_p resid_lb_p Higher is better 0.05 0.01
resid_acf_max resid_acf_max Lower is better 0.20 0.40

Mode adjustments for residual checks

Mode resid_lb_p warn resid_lb_p fail resid_acf warn resid_acf fail
explore 0.05 0.00 (cannot fail) 0.20 ∞ (cannot fail)
publish 0.05 0.01 0.20 0.40
strict 0.10 0.05 0.15 0.30

P1 boundary hit check

Check ID Metric Direction Warn Fail
boundary_hit_fraction boundary_hit_frac Lower is better 0.05 0.20

In explore mode, boundary hits cannot fail. In strict mode, thresholds tighten to warn > 0.02, fail > 0.10.

P1 within-group variation check

Check ID Metric Direction Warn Fail
within_var_ratio within_var_min_ratio Higher is better 0.10 0.05

This check applies to hierarchical models and flags groups where within-group variation is extremely low relative to between-group variation. In explore mode, the fail threshold is zero (cannot fail).

Identifiability gate

The identifiability gate measures the maximum absolute correlation between baseline terms and media terms in the design matrix. It is configured via diagnostics.identifiability in YAML:

diagnostics:
  identifiability:
    enabled: true
    media_terms: [m_tv, m_search, m_social]
    baseline_terms: [trend, seasonality]
    baseline_regex: ["^h_", "^sin", "^cos"]
    abs_corr_warn: 0.80
    abs_corr_fail: 0.95

Term detection

  • Media terms: explicitly listed in media_terms.
  • Baseline terms: union of baseline_terms, generated time-component terms, and matches from baseline_regex patterns.
  • Both sets are intersected with actual design-matrix columns and filtered to remove constant columns.

Thresholds by mode

Mode Warn Fail
explore 0.80 ∞ (cannot fail)
publish 0.80 0.95
strict 0.70 0.85

Skip conditions

The identifiability gate reports skipped when:

  • identifiability.enabled: false
  • No configured media terms found in the design matrix
  • No baseline terms detected from configured terms/regex
  • All resolved baseline or media terms are constant

Overall status aggregation

The overall diagnostics status is determined by diagnostics_overall_status():

  1. If any check has status == "fail" → overall status is fail.
  2. If any check has status == "warn" (and none fail) → overall status is warn.
  3. Otherwise → overall status is pass.

Checks with status == "skipped" do not affect the overall status.

Runner artefact output

The diagnostics framework produces:

Artefact Location Content
diagnostics_report.csv 40_diagnostics/ Full check table with all fields
diagnostics_summary.txt 40_diagnostics/ Human-readable summary of overall status and failing checks

Interpretation guidance

  • pass — no remediation needed; model is suitable for the configured policy mode.
  • warn — review recommended; the model may have quality concerns but does not block the configured policy.
  • fail — remediation required before the model can be considered production-ready under the configured policy.

Common remediation actions

Diagnostic area Warning signs Actions
High Rhat > 1.01 Increase MCMC iterations or warmup; simplify model
Low ESS < 400 bulk or < 200 tail Increase iterations; check for multimodality
Divergences Any non-zero fraction Increase adapt_delta; reparameterise model
High condition number kappa > 10,000 Reduce collinearity; remove redundant terms
Residual autocorrelation High ACF or low Ljung-Box p Add time controls (trend, seasonality, holidays)
Boundary hits > 5% of draws Review boundary specification; widen or remove constraints
High baseline-media correlation > 0.80 Add controls to separate baseline from media; consider alternative model specifications

Cross-references