How-To Guides

Purpose

Provide task-oriented recipes for common DSAMbayes operational workflows. Each guide starts from a user objective, gives minimal reproducible steps, and includes expected output artefacts and quick verification checks.

Audience

  • Users who know the concepts but need execution steps.
  • Engineers debugging run and artefact issues.

Pages

Guide Objective
Run from YAML Execute a complete runner workflow and verify staged outputs
Interpret Diagnostics Read and act on diagnostics gate results
Compare Runs Compare multiple runs and select a candidate model
Debug Run Failures Diagnose and resolve common runner failure modes

Subsections of How-To Guides

Run from YAML

Objective

Execute a complete DSAMbayes model run from a YAML configuration file and verify the staged output artefacts.

Prerequisites

  • DSAMbayes installed locally (see Install and Setup).
  • A YAML config file (see Config Schema for structure).
  • Data file(s) referenced by the config are accessible.

Steps

1. Set up the environment

mkdir -p .Rlib .cache
export R_LIBS_USER="$PWD/.Rlib"
export XDG_CACHE_HOME="$PWD/.cache"

2. Validate the configuration (dry run)

Rscript scripts/dsambayes.R validate --config config/my_config.yaml

Expected outcome:

  • Exit code 0.
  • A timestamped run directory under results/ containing 00_run_metadata/config.resolved.yaml and 00_run_metadata/session_info.txt.
  • No Stan compilation or sampling occurs.

If validation fails:

  • Check the error message for missing data paths, invalid YAML keys, or formula errors.
  • Fix the config and re-run validate before proceeding.

3. Run the model

Rscript scripts/dsambayes.R run --config config/my_config.yaml

Expected outcome:

  • Exit code 0.
  • Full staged artefact tree under the run directory.

4. Locate the run directory

The runner prints the run directory path during execution. It follows the pattern:

results/YYYYMMDD_HHMMSS_<run_label>/

5. Verify artefacts

Check that the following stage folders are populated:

Stage Folder Key files
Metadata 00_run_metadata/ config.resolved.yaml, session_info.txt
Pre-run 10_pre_run/ Media spend plots, VIF bar chart
Model fit 20_model_fit/ model.rds, fit plots
Post-run 30_post_run/ posterior_summary.csv, decomposition plots
Diagnostics 40_diagnostics/ diagnostics_report.csv, diagnostic plots
Model selection 50_model_selection/ LOO summary, Pareto-k plot (if MCMC)
Optimisation 60_optimisation/ Allocation summary, response curves (if enabled)

6. Quick verification commands

# Check diagnostics overall status
head -1 results/<run_dir>/40_diagnostics/diagnostics_report.csv

# View posterior summary
head results/<run_dir>/30_post_run/posterior_summary.csv

# Count artefact files
find results/<run_dir> -type f | wc -l

Failure handling

Symptom Likely cause Action
Exit code 1 during validate Config or data error Read error message; fix config
Exit code 1 during run Stan compilation or sampling failure Check Stan cache; increase iterations or warmup
Missing 20_model_fit/model.rds Fit did not complete Review runner log for Stan errors
Missing 40_diagnostics/ Diagnostics writer failed Check for upstream fit failures; review tryCatch messages

Interpret Diagnostics

Objective

Read and act on the diagnostics report produced by a DSAMbayes runner execution, understanding which checks matter most and what remediation steps to take.

Prerequisites

  • A completed runner run execution with artefacts under 40_diagnostics/.
  • Familiarity with Diagnostics Gates definitions.

Steps

1. Open the diagnostics report

cat results/<run_dir>/40_diagnostics/diagnostics_report.csv

Each row is one diagnostic check. The key columns are:

Column What to look at
check_id Identifies the specific diagnostic
status pass, warn, fail, or skipped
value The observed metric value
threshold The threshold that was applied
message Human-readable explanation

2. Check the overall status

The overall status follows a simple rule:

  • Any fail → overall fail.
  • Any warn (no fails) → overall warn.
  • All pass → overall pass.

If the overall status is pass, no further action is required for the configured policy mode.

3. Triage failing checks

Focus on fail rows first, then warn rows. Use the check phase to prioritise:

Phase Priority Meaning
P0 Highest Data integrity issues — fix before interpreting model results
P1 High Sampler quality or residual issues — may affect inference reliability

4. Common diagnostics and actions

Design matrix issues (P0)

Check Symptom Action
pre_response_finite fails Non-finite values in response Clean data; remove or impute NA/Inf rows
pre_design_constants_duplicates fails Constant or duplicate columns Remove redundant terms from formula
pre_design_condition_number warns/fails High collinearity Reduce correlated predictors; simplify formula

Sampler quality (P1, MCMC only)

Check Symptom Action
sampler_rhat_max warns/fails Poor convergence Increase fit.mcmc.iter and fit.mcmc.warmup; simplify model
sampler_ess_bulk_min or sampler_ess_tail_min warns/fails Insufficient effective samples Increase iterations; check for multimodality
sampler_divergences fails Divergent transitions Increase fit.mcmc.adapt_delta (e.g. 0.95 → 0.99); consider reparameterisation
sampler_treedepth_frac warns/fails Max treedepth saturation Increase fit.mcmc.max_treedepth
sampler_ebfmi_min warns/fails Low energy diagnostic Indicates difficult posterior geometry; simplify model or increase warmup

Residual behaviour (P1)

Check Symptom Action
resid_ljung_box_p warns/fails Significant residual autocorrelation Add time controls (trend, seasonality, holidays)
resid_acf_max warns/fails High residual ACF at early lags Same as above; check for missing structural components

Boundary and variation checks (P1)

Check Symptom Action
boundary_hit_fraction warns/fails Posterior draws hitting parameter bounds Review boundary specification; widen constraints or remove unnecessary bounds
within_var_ratio warns/fails Low within-group variation (hierarchical) Check group structure; some groups may have insufficient temporal variation

Identifiability gate (P1)

Check Symptom Action
pre_identifiability_baseline_media_corr warns/fails High baseline-media correlation Add controls to separate baseline from media effects; review formula specification

5. Review diagnostic plots

Cross-reference the numeric report with visual diagnostics in 40_diagnostics/:

  • Residual diagnostics plot — check for patterns in residuals over time.
  • Boundary hits plot — identify which parameters are constrained.
  • Latent residual ACF plot — confirm autocorrelation structure.

See Diagnostics Plots for interpretation guidance.

6. Decide on next steps

Overall status Action
pass Proceed to post-run analysis and reporting
warn Review warnings; proceed if acceptable for the use case
fail Remediate failing checks before using model results for decisions

7. Change policy mode if appropriate

If you are in early model development, consider switching to explore mode to relax thresholds:

diagnostics:
  policy_mode: explore

For production or audit runs, use publish (default) or strict.

Compare Runs

Objective

Compare multiple DSAMbayes runner executions and select a candidate model for reporting or decision-making, using predictive scoring and diagnostic summaries.

Prerequisites

  • Two or more completed runner run executions (MCMC fit method).
  • Artefacts under 50_model_selection/ for each run (LOO summary, ELPD outputs).
  • Familiarity with Diagnostics Gates and Model Selection Plots.

Steps

1. Collect run directories

Identify the run directories to compare:

results/20260228_083808_blm_synth_kpi_os_hfb01/
results/20260228_084410_blm_synth_kpi_os_hfb01/
results/20260228_084602_blm_synth_kpi_os_hfb01/

2. Compare ELPD scores

The compare_runs() helper ranks runs by expected log predictive density (ELPD):

library(DSAMbayes)
comparison <- compare_runs(
  run_dirs = c(
    "results/20260228_083808_blm_synth_kpi_os_hfb01",
    "results/20260228_084410_blm_synth_kpi_os_hfb01"
  )
)
print(comparison)

The output ranks runs by ELPD (higher is better) and reports Pareto-k diagnostics.

3. Check Pareto-k reliability

Examine the loo_summary.csv in each run’s 50_model_selection/ folder:

cat results/<run_dir>/50_model_selection/loo_summary.csv

Key metrics:

Metric Interpretation
elpd_loo Expected log predictive density; higher is better
p_loo Effective number of parameters
looic LOO information criterion; lower is better
Pareto-k counts Observations with k > 0.7 indicate unreliable LOO estimates

If many observations have high Pareto-k values, the LOO approximation is unreliable for that run. Consider time-series cross-validation as an alternative.

4. Review time-series CV (if available)

If diagnostics.time_series_selection.enabled: true was configured, check:

cat results/<run_dir>/50_model_selection/tscv_summary.csv

This provides expanding-window blocked CV scores (holdout ELPD, RMSE, SMAPE) that are more appropriate for time-series data than standard LOO.

5. Cross-reference diagnostics

For each candidate run, check the diagnostics overall status:

head -1 results/<run_dir>/40_diagnostics/diagnostics_report.csv

A model with better ELPD but failing diagnostics should not be preferred over a model with slightly lower ELPD and passing diagnostics.

6. Compare fit quality visually

Review the fit time series and scatter plots in 20_model_fit/ for each run:

  • Fit time series — does the model track the observed KPI?
  • Fit scatter — is the predicted-vs-observed relationship close to the diagonal?
  • Posterior forest — are coefficient estimates reasonable and well-identified?

7. Selection decision matrix

Criterion Weight Run A Run B
ELPD (higher is better) High value value
Pareto-k reliability (fewer high-k) High value value
Diagnostics overall status High pass/warn/fail pass/warn/fail
TSCV holdout RMSE (if available) Medium value value
Coefficient plausibility Medium judgement judgement
Fit visual quality Low judgement judgement

8. Record the selection

Document the selected run directory and rationale. If using the runner for release evidence, the selected run’s artefacts form part of the evidence pack.

Caveats

  • ELPD is not causal validation. Predictive scoring measures in-sample predictive quality, not whether the model identifies causal media effects correctly.
  • Pooled models do not support time-series CV (rejected by config validation).
  • MAP-fitted models do not produce LOO diagnostics. Use MCMC for model comparison.

Debug Run Failures

Objective

Diagnose and resolve the most common failure modes encountered when running DSAMbayes via the YAML/CLI runner.

Prerequisites

  • A failed runner execution (non-zero exit code or missing artefacts).
  • Access to the terminal output or log from the failed run.
  • Familiarity with CLI Usage and Config Schema.

Triage by failure stage

Stage 0: Config resolution failures

Symptoms: runner exits immediately after validate or at the start of run; no run directory created or only 00_run_metadata/ is present.

Error pattern Cause Fix
data_path not found Data file path is wrong or missing Check data.path in YAML; use absolute path or path relative to config file
Unknown YAML key Typo or unsupported config key Compare against Config Schema; fix spelling
Formula parse error Invalid R formula syntax Check model.formula for unmatched parentheses, missing ~, or invalid operators
holidays.calendar_path not found Holiday calendar file missing Check time_components.holidays.calendar_path; ensure file exists

Quick check:

Rscript scripts/dsambayes.R validate --config config/my_config.yaml

If validate passes, the config is structurally valid.

Stage 1: Stan compilation failures

Symptoms: runner reports compilation errors after “Compiling model”; may reference C++ or Stan syntax errors.

Error pattern Cause Fix
Stan syntax error in generated template Template rendering issue Clear the Stan cache (rm -rf .cache/dsambayes/) and retry
C++ compiler not found Toolchain not installed Install a C++ toolchain (see Install and Setup)
Permission denied on cache directory Cache path not writable Set XDG_CACHE_HOME to a writable directory

Quick check:

mkdir -p .cache
export XDG_CACHE_HOME="$PWD/.cache"

Stage 2: Data preparation failures

Symptoms: runner fails after compilation but before sampling; error messages reference prep_data_for_fit, model.frame, or scaling.

Error pattern Cause Fix
“Cannot scale model data with zero variance” A column in the model frame is constant Remove constant terms from formula, or set model.scale: false
“Constant CRE mean terms” CRE variable has identical group means Use model.type: re (without CRE) or add variation
“non-finite values” in model frame NA or Inf values in data Clean data before running; remove rows with missing values
“Offset vector length does not match” NA handling created length mismatch Ensure offset column has no NA values, or report as a bug

Stage 3: Sampling failures

Symptoms: runner fails during rstan::sampling() or rstan::optimizing(); may report Stan runtime errors.

Error pattern Cause Fix
“Exception: validate transformed params” Parameter hits boundary during sampling Widen boundaries; check for overly tight constraints
“Initialization failed” Poor initial values Increase fit.mcmc.init range or simplify model
Timeout or very slow sampling Model too complex for data size Reduce iterations for initial testing; simplify formula
All chains fail Severe model misspecification Review formula, priors, and data for fundamental issues

Stage 4: Post-fit artefact failures

Symptoms: runner completes sampling but some artefact folders are empty or missing files.

Error pattern Cause Fix
Missing 30_post_run/ files Decomposition failed Check formula compatibility with model.matrix(); hierarchical formulas with `
Missing 40_diagnostics/ files Diagnostics writer error Check for upstream issues in model object; review tryCatch messages in log
Missing 50_model_selection/ files LOO computation failed Ensure MCMC fit (not MAP); check for valid posterior
Missing 60_optimisation/ files Allocation not enabled or failed Check allocation.enabled: true in config; review scenario specification

Quick check:

find results/<run_dir> -type f | sort

Compare against the expected artefact list in Output Artefacts.

Stage 5: Plot generation failures

Symptoms: CSV artefacts are present but PNG plot files are missing.

Error pattern Cause Fix
“cannot open connection” for PNG Graphics device issue Check that grDevices is available; ensure sufficient disk space
Plot function error for hierarchical model Group-level coefficient draws are vectors, not scalars This was fixed in v1.2.0; ensure you are running the latest version

General debugging steps

  1. Read the full error message. DSAMbayes uses cli::cli_abort() with descriptive messages that identify the failing function and parameter.

  2. Check the resolved config. If a run directory was created, inspect 00_run_metadata/config.resolved.yaml to see what defaults were applied.

  3. Check session info. Inspect 00_run_metadata/session_info.txt for package version mismatches.

  4. Clear the Stan cache. Stale compiled models can cause unexpected failures:

    rm -rf .cache/dsambayes/
  5. Run validate before run. Always validate first to catch config errors before committing to a full MCMC run.

  6. Reduce iterations for debugging. Use a minimal config with fit.mcmc.iter: 200 and fit.mcmc.warmup: 100 to iterate quickly on formula and data issues.