Diagnostics Gates

Purpose

DSAMbayes runs a deterministic diagnostics framework after model fitting. Each diagnostic check produces a pass, warn, or fail status. The policy mode controls how lenient or strict the thresholds are. This page defines the check taxonomy, threshold tables, policy modes, identifiability gate, and the overall status aggregation rule.

Policy modes

The diagnostics framework supports three policy modes, configured via diagnostics.policy_mode in YAML:

Mode	Intent	Threshold behaviour
`explore`	Rapid iteration during model development	Relaxed fail thresholds; many checks can only warn, not fail
`publish`	Default production mode for shareable outputs	Balanced thresholds; condition-number fail is downgraded to warn
`strict`	Audit-grade gating for release candidates	Tightest thresholds; rank deficit fails rather than warns

The mode is resolved by diagnostics_policy_thresholds(mode) in R/diagnostics_report.R.

Check taxonomy

Checks are organised into phases:

Phase	Scope	When evaluated
`P0`	Data integrity and design matrix validity	Pre-fit (design matrix available)
`P1`	Sampler quality, residual behaviour, identifiability	Post-fit (posterior available)

Each check row includes:

Field	Meaning
`check_id`	Unique identifier
`phase`	`P0` or `P1`
`severity`	Priority rating (`P0` = critical, `P1` = important)
`status`	`pass`, `warn`, `fail`, or `skipped`
`metric`	Metric name
`value`	Observed value
`threshold`	Applied threshold description
`message`	Human-readable explanation

P0 design checks

Check ID	Metric	Pass	Warn	Fail
`pre_response_finite`	`non_finite_response_count`	`== 0`	—	`> 0`
`pre_design_constants_duplicates`	`constant_plus_duplicate_columns`	`== 0`	—	`> 0`
`pre_design_rank_deficit`	`rank_deficit`	`== 0`	`> 0` (publish)	`> 0` (strict)
`pre_design_condition_number`	`kappa_X`	`≤ warn`	`> warn`	`> fail`

Condition number thresholds by mode

Mode	Warn	Fail
`explore`	10,000	∞ (cannot fail)
`publish`	10,000	1,000,000 (downgraded to warn)
`strict`	10,000	1,000,000

P1 sampler checks (MCMC only)

Check ID	Metric	Direction	Warn	Fail
`sampler_rhat_max`	`max_rhat`	Lower is better	1.01	1.05
`sampler_ess_bulk_min`	`min_ess_bulk`	Higher is better	400	200
`sampler_ess_tail_min`	`min_ess_tail`	Higher is better	200	100
`sampler_ebfmi_min`	`min_ebfmi`	Higher is better	0.30	0.20
`sampler_treedepth_frac`	`treedepth_hit_fraction`	Lower is better	0.00	0.01
`sampler_divergences`	`divergent_fraction`	Lower is better	0.00	0.00

Mode adjustments for sampler checks

In explore mode, fail thresholds are substantially relaxed (e.g. rhat_fail = 1.10, ess_bulk_fail = 50). In strict mode, warn thresholds match publish fail thresholds.

P1 residual checks

Check ID	Metric	Direction	Warn	Fail
`resid_ljung_box_p`	`resid_lb_p`	Higher is better	0.05	0.01
`resid_acf_max`	`resid_acf_max`	Lower is better	0.20	0.40

Mode adjustments for residual checks

Mode	`resid_lb_p` warn	`resid_lb_p` fail	`resid_acf` warn	`resid_acf` fail
`explore`	0.05	0.00 (cannot fail)	0.20	∞ (cannot fail)
`publish`	0.05	0.01	0.20	0.40
`strict`	0.10	0.05	0.15	0.30

P1 boundary hit check

Check ID	Metric	Direction	Warn	Fail
`boundary_hit_fraction`	`boundary_hit_frac`	Lower is better	0.05	0.20

In explore mode, boundary hits cannot fail. In strict mode, thresholds tighten to warn > 0.02, fail > 0.10.

P1 within-group variation check

Check ID	Metric	Direction	Warn	Fail
`within_var_ratio`	`within_var_min_ratio`	Higher is better	0.10	0.05

This check applies to hierarchical models and flags groups where within-group variation is extremely low relative to between-group variation. In explore mode, the fail threshold is zero (cannot fail).

Identifiability gate

The identifiability gate measures the maximum absolute correlation between baseline terms and media terms in the design matrix. It is configured via diagnostics.identifiability in YAML:

diagnostics:
  identifiability:
    enabled: true
    media_terms: [m_tv, m_search, m_social]
    baseline_terms: [trend, seasonality]
    baseline_regex: ["^h_", "^sin", "^cos"]
    abs_corr_warn: 0.80
    abs_corr_fail: 0.95

Term detection

Media terms: explicitly listed in media_terms.
Baseline terms: union of baseline_terms, generated time-component terms, and matches from baseline_regex patterns.
Both sets are intersected with actual design-matrix columns and filtered to remove constant columns.

Thresholds by mode

Mode	Warn	Fail
`explore`	0.80	∞ (cannot fail)
`publish`	0.80	0.95
`strict`	0.70	0.85

Skip conditions

The identifiability gate reports skipped when:

identifiability.enabled: false
No configured media terms found in the design matrix
No baseline terms detected from configured terms/regex
All resolved baseline or media terms are constant

Overall status aggregation

The overall diagnostics status is determined by diagnostics_overall_status():

If any check has status == "fail" → overall status is fail.
If any check has status == "warn" (and none fail) → overall status is warn.
Otherwise → overall status is pass.

Checks with status == "skipped" do not affect the overall status.

Runner artefact output

The diagnostics framework produces:

Artefact	Location	Content
`diagnostics_report.csv`	`40_diagnostics/`	Full check table with all fields
`diagnostics_summary.txt`	`40_diagnostics/`	Human-readable summary of overall status and failing checks

Interpretation guidance

pass — no remediation needed; model is suitable for the configured policy mode.
warn — review recommended; the model may have quality concerns but does not block the configured policy.
fail — remediation required before the model can be considered production-ready under the configured policy.

Common remediation actions

Diagnostic area	Warning signs	Actions
High Rhat	`> 1.01`	Increase MCMC iterations or warmup; simplify model
Low ESS	`< 400 bulk` or `< 200 tail`	Increase iterations; check for multimodality
Divergences	Any non-zero fraction	Increase `adapt_delta`; reparameterise model
High condition number	`kappa > 10,000`	Reduce collinearity; remove redundant terms
Residual autocorrelation	High ACF or low Ljung-Box p	Add time controls (trend, seasonality, holidays)
Boundary hits	`> 5%` of draws	Review boundary specification; widen or remove constraints
High baseline-media correlation	`> 0.80`	Add controls to separate baseline from media; consider alternative model specifications

Cross-references

Model Classes — which diagnostics apply to each class
Response Scale Semantics — residuals are computed on model scale
Config Schema — diagnostics.* YAML keys
Output Artefacts — diagnostics artefact paths
Diagnostics Plots — visual diagnostic outputs