Time Components
Purpose
DSAMbayes provides managed time-component generation through the time_components config section. When enabled, the runner deterministically generates holiday feature columns from a calendar file and optionally appends them to the model formula. This page defines the configuration contract, generation logic, naming conventions, and audit properties.
Overview
Time components in DSAMbayes cover:
- Holidays — deterministic weekly indicator features derived from an external calendar file.
- Trend and seasonality — specified directly in the model formula (e.g.
t_scaled,sin52_1,cos52_1). These are not generated by the time-components system; they are user-supplied columns in the data.
The time_components system is responsible only for holiday feature generation.
YAML configuration
Key definitions
| Key | Default | Description |
|---|---|---|
enabled |
false |
Master toggle for the time-components system |
holidays.enabled |
false |
Toggle for holiday feature generation |
holidays.calendar_path |
null |
Path to the holiday calendar CSV (resolved relative to the config file) |
holidays.date_col |
null |
Date column in the calendar; auto-detected from date, ds, or event_date |
holidays.label_col |
holiday |
Column containing holiday event labels |
holidays.date_format |
null |
Date parse format; null assumes ISO 8601 |
holidays.week_start |
monday |
Day-of-week anchor for weekly aggregation |
holidays.timezone |
UTC |
Timezone used when parsing POSIX date-time inputs |
holidays.prefix |
holiday_ |
Prefix prepended to generated feature column names |
holidays.window_before |
0 |
Days before each event date to include in the holiday window |
holidays.window_after |
0 |
Days after each event date to include in the holiday window |
holidays.aggregation_rule |
count |
Weekly aggregation: count sums event-days per week; any produces a binary indicator |
holidays.overlap_policy |
count_all |
Overlap handling: count_all counts every event-day; dedupe_label_date deduplicates per label and date |
holidays.add_to_formula |
true |
Whether generated holiday terms are appended to the model formula automatically |
holidays.overwrite_existing |
false |
Whether existing columns with matching names are overwritten |
Calendar file contract
The holiday calendar is a CSV (or data frame) with at minimum:
| Column | Required | Content |
|---|---|---|
| Date column | Yes | Daily event dates (one row per event occurrence) |
| Label column | Yes | Human-readable event name (e.g. Christmas, Black Friday) |
Date column detection
If date_col is null, the system tries column names in order: date, ds, event_date. If none is found, validation aborts.
Label normalisation
Holiday labels are normalised to lowercase, alphanumeric-plus-underscore form via normalise_holiday_label(). For example:
Black Friday→black_fridayNew Year's Day→new_year_s_day- Empty labels →
unnamed
The generated feature column name is {prefix}{normalised_label}, e.g. holiday_black_friday.
Generation pipeline
The runner calls build_weekly_holiday_features() with the following steps:
-
Parse and validate the calendar.
validate_holiday_calendar()checks column presence, date parsing, and label completeness. -
Expand holiday windows.
expand_holiday_windows()replicates each event row across the[event_date - window_before, event_date + window_after]range. -
Align to weekly index. Each expanded event-day is mapped to its containing week using
week_floor_date()with the configuredweek_start. -
Aggregate per week. Events are counted per week per feature. Under
aggregation_rule: any, counts are collapsed to binary (0/1). Underoverlap_policy: dedupe_label_date, duplicate label-date pairs within a week are removed before counting. -
Join to model data. The generated feature matrix is left-joined to the model data by the date column. Weeks with no events receive zero.
-
Append to formula. If
add_to_formula: true, generated feature columns are appended as additive terms to the population formula.
Weekly anchoring
All weekly alignment uses week_floor_date(), which computes the most recent occurrence of week_start on or before each date. The model data’s date column must contain week-start-aligned dates; normalise_weekly_index() validates this and aborts if dates are not aligned.
Supported week-start values
monday, tuesday, wednesday, thursday, friday, saturday, sunday.
Timezone handling
- Calendar dates are parsed using the configured
timezone(defaultUTC). - If the calendar contains
POSIXtvalues, they are coerced toDatein the configured timezone. - Character dates are parsed as ISO 8601 by default, or using
date_formatif specified.
Generated-term audit contract
Generated holiday terms are tracked for downstream diagnostics and reporting:
- The list of generated term names is stored in
model$.runner_time_components$generated_terms. - The identifiability gate in
R/diagnostics_report.Ruses this list to auto-detect baseline terms (viadetect_baseline_terms()), so generated holiday terms are included in baseline-media correlation checks without requiring explicit configuration.
Feature naming collision
If two different holiday labels normalise to the same feature name, build_weekly_holiday_features() aborts with a collision error. Ensure calendar labels are distinct after normalisation.
Interaction with existing data columns
- If
overwrite_existing: false(default), the runner aborts if any generated column name already exists in the data. - If
overwrite_existing: true, existing columns with matching names are replaced by the generated features.
Practical guidance
- Start with
aggregation_rule: countto capture multi-day holiday effects (e.g. a holiday spanning two days in one week produces a count of 2). - Use
window_beforeandwindow_afterfor events with known anticipation or lingering effects (e.g.window_before: 7for pre-Christmas shopping). - Use
aggregation_rule: anywhen you want binary holiday indicators regardless of how many event-days fall in a week. - Check generated terms in the resolved config (
config.resolved.yaml) and posterior summary to confirm which holidays entered the model.
Cross-references
- Config Schema —
time_components.*YAML keys - Diagnostics Gates — identifiability gate uses generated terms
- Priors and Boundaries — priors for generated holiday terms follow default schema
- Output Artefacts — resolved config records generated terms