Pre-run Plots

Purpose

Pre-run plots are generated before the model is fitted. They visualise the input data and flag structural problems — multicollinearity, missing spend periods, implausible KPI–media relationships — that could compromise inference. Treat these as a data quality gate: review them before interpreting any downstream output.

All pre-run plots are written to 10_pre_run/ within the run directory. They require ggplot2 and are generated by write_pre_run_plots() in R/run_artifacts_enrichment.R. The runner produces them whenever an allocation.channels block is present in the configuration and the data contains the referenced spend columns.

Plot catalogue

Filename What it shows Conditions
media_spend_timeseries.png Stacked channel spend over time Allocation channels defined with valid spend_col
kpi_media_overlay.png KPI and total spend on dual axes Allocation channels defined; response variable present
vif_bar.png VIF per predictor with severity thresholds Design matrix extractable with >1 predictor and >1 row

Media spend time series

Filename: media_spend_timeseries.png

Media spend time series Media spend time series

What it shows

A stacked area chart of weekly media spend by channel, drawn from the raw spend_col columns declared in the allocation configuration. The x-axis is the date variable; the y-axis is spend in model units.

When it is generated

The runner generates this plot when:

  • The configuration includes an allocation.channels block.
  • At least one declared spend_col exists in the input data.

If no valid spend columns are found, the plot is silently skipped.

How to interpret it

Look for three things. First, check that each channel has plausible seasonal patterns and no unexpected gaps — zero-spend weeks in the middle of a campaign period suggest data ingestion problems. Second, verify that the relative magnitudes make sense: if TV dominates the stack but the brand has historically been digital-first, the data may be mislabelled or aggregated incorrectly. Third, confirm that the date range matches the modelling window declared in the configuration.

Warning signs

  • Flat channels: A channel with constant spend across all weeks contributes no variation and cannot be identified by the model. The coefficient will be driven entirely by the prior.
  • Sudden jumps or drops: Step changes in spend that do not correspond to known campaign events may indicate data joins across sources with different reporting conventions.
  • Missing periods: Gaps where spend drops to zero mid-series can distort adstock calculations if the model applies geometric decay.

Action

If a channel shows no variation, consider removing it from the formula or fixing the upstream data. If gaps are genuine (e.g. a seasonal channel), confirm the adstock specification handles zero-spend periods correctly.

  • data_dictionary.csv in 10_pre_run/ provides summary statistics for every input column.

KPI–media overlay

Filename: kpi_media_overlay.png

KPI–media overlay KPI–media overlay

What it shows

A dual-axis time series with the KPI response variable on the left axis (blue) and total media spend (sum of all declared spend_col values) on the right axis (red, rescaled to share the vertical space). This is a visual correlation check, not a causal claim.

When it is generated

The runner generates this plot when:

  • The configuration includes an allocation.channels block with at least one valid spend_col.
  • The response variable exists in the data.

If the total spend has zero variance, the plot is skipped.

How to interpret it

The overlay reveals whether KPI and aggregate spend move together over time. A rough co-movement is expected in MMM data — media drives response — but the relationship need not be tight. Seasonal KPI peaks that precede or lag media bursts suggest confounding (e.g. demand-driven spend timing). Divergences where spend rises but KPI falls (or vice versa) are worth investigating: they may reflect diminishing returns, competitor activity, or a structural break in the data.

Warning signs

  • Perfect alignment: If the two series track each other almost exactly, the model may be fitting spend timing rather than incremental media effects.
  • Opposite trends: A persistent negative relationship between total spend and KPI suggests reverse causality or omitted-variable bias.
  • Scale artefacts: The dual-axis rescaling can exaggerate or suppress visual correlation. Do not draw quantitative conclusions from this plot.

Action

Use this plot as a sanity check only. If the relationship looks implausible, investigate the data and consider whether the formula includes adequate controls for seasonality, trend, and external factors.


Variance inflation factor (VIF) bar chart

Filename: vif_bar.png

VIF bar chart VIF bar chart

What it shows

A horizontal bar chart of variance inflation factors for each predictor in the model’s design matrix. Bars are colour-coded by severity: green (VIF < 5), amber (5 ≤ VIF < 10), and red (VIF ≥ 10). Dashed vertical lines mark the 5 and 10 thresholds.

When it is generated

The runner generates this plot when:

  • The design matrix has more than one predictor column and more than one row.
  • The VIF computation does not encounter a singular or degenerate correlation matrix.

For pooled models, the design matrix extraction may return zero rows, in which case the plot is skipped.

How to interpret it

VIF measures how much the variance of a coefficient estimate inflates due to correlation with other predictors. A VIF of 1 means no multicollinearity; a VIF of 10 means the standard error is roughly three times larger than it would be with orthogonal predictors. In Bayesian MMM, high VIF does not break inference the way it does in OLS — priors regularise the estimates — but it does reduce the data’s ability to inform the posterior, making results more prior-dependent.

Warning signs

  • VIF > 10 on media channels: The model cannot reliably separate the effects of those channels. Posterior estimates will lean heavily on the prior. Consider whether the channels can be combined or whether one should be dropped.
  • VIF > 10 on seasonality terms: Common and usually harmless if the terms are included as controls rather than as interpretive outputs.
  • All terms moderate or high: The overall collinearity structure may be too severe for the data length. Consider increasing the sample size or simplifying the formula.

Action

Review the top-VIF terms. If two media channels are highly collinear (e.g. search and affiliate), consider whether they can be meaningfully separated given the available data. If not, combine them or use informative priors to anchor the split.

  • design_matrix_manifest.csv in 10_pre_run/ lists all design matrix columns with variance and uniqueness statistics.
  • spec_summary.csv in 10_pre_run/ summarises the model specification.

Cross-references