Time Components

Purpose

DSAMbayes provides managed time-component generation through the time_components config section. When enabled, the runner deterministically generates holiday feature columns from a calendar file and optionally appends them to the model formula. This page defines the configuration contract, generation logic, naming conventions, and audit properties.

Overview

Time components in DSAMbayes cover:

  • Holidays — deterministic weekly indicator features derived from an external calendar file.
  • Trend and seasonality — specified directly in the model formula (e.g. t_scaled, sin52_1, cos52_1). These are not generated by the time-components system; they are user-supplied columns in the data.

The time_components system is responsible only for holiday feature generation.

YAML configuration

time_components:
  enabled: true
  holidays:
    enabled: true
    calendar_path: data/holidays.csv
    date_col: null          # auto-detected: date, ds, or event_date
    label_col: holiday
    date_format: null       # null = ISO 8601; or e.g. "%d/%m/%Y"
    week_start: monday
    timezone: UTC
    prefix: holiday_
    window_before: 0
    window_after: 0
    aggregation_rule: count # count | any
    overlap_policy: count_all # count_all | dedupe_label_date
    add_to_formula: true
    overwrite_existing: false

Key definitions

Key Default Description
enabled false Master toggle for the time-components system
holidays.enabled false Toggle for holiday feature generation
holidays.calendar_path null Path to the holiday calendar CSV (resolved relative to the config file)
holidays.date_col null Date column in the calendar; auto-detected from date, ds, or event_date
holidays.label_col holiday Column containing holiday event labels
holidays.date_format null Date parse format; null assumes ISO 8601
holidays.week_start monday Day-of-week anchor for weekly aggregation
holidays.timezone UTC Timezone used when parsing POSIX date-time inputs
holidays.prefix holiday_ Prefix prepended to generated feature column names
holidays.window_before 0 Days before each event date to include in the holiday window
holidays.window_after 0 Days after each event date to include in the holiday window
holidays.aggregation_rule count Weekly aggregation: count sums event-days per week; any produces a binary indicator
holidays.overlap_policy count_all Overlap handling: count_all counts every event-day; dedupe_label_date deduplicates per label and date
holidays.add_to_formula true Whether generated holiday terms are appended to the model formula automatically
holidays.overwrite_existing false Whether existing columns with matching names are overwritten

Calendar file contract

The holiday calendar is a CSV (or data frame) with at minimum:

Column Required Content
Date column Yes Daily event dates (one row per event occurrence)
Label column Yes Human-readable event name (e.g. Christmas, Black Friday)

Date column detection

If date_col is null, the system tries column names in order: date, ds, event_date. If none is found, validation aborts.

Label normalisation

Holiday labels are normalised to lowercase, alphanumeric-plus-underscore form via normalise_holiday_label(). For example:

  • Black Fridayblack_friday
  • New Year's Daynew_year_s_day
  • Empty labels → unnamed

The generated feature column name is {prefix}{normalised_label}, e.g. holiday_black_friday.

Generation pipeline

The runner calls build_weekly_holiday_features() with the following steps:

  1. Parse and validate the calendar. validate_holiday_calendar() checks column presence, date parsing, and label completeness.

  2. Expand holiday windows. expand_holiday_windows() replicates each event row across the [event_date - window_before, event_date + window_after] range.

  3. Align to weekly index. Each expanded event-day is mapped to its containing week using week_floor_date() with the configured week_start.

  4. Aggregate per week. Events are counted per week per feature. Under aggregation_rule: any, counts are collapsed to binary (0/1). Under overlap_policy: dedupe_label_date, duplicate label-date pairs within a week are removed before counting.

  5. Join to model data. The generated feature matrix is left-joined to the model data by the date column. Weeks with no events receive zero.

  6. Append to formula. If add_to_formula: true, generated feature columns are appended as additive terms to the population formula.

Weekly anchoring

All weekly alignment uses week_floor_date(), which computes the most recent occurrence of week_start on or before each date. The model data’s date column must contain week-start-aligned dates; normalise_weekly_index() validates this and aborts if dates are not aligned.

Supported week-start values

monday, tuesday, wednesday, thursday, friday, saturday, sunday.

Timezone handling

  • Calendar dates are parsed using the configured timezone (default UTC).
  • If the calendar contains POSIXt values, they are coerced to Date in the configured timezone.
  • Character dates are parsed as ISO 8601 by default, or using date_format if specified.

Generated-term audit contract

Generated holiday terms are tracked for downstream diagnostics and reporting:

  • The list of generated term names is stored in model$.runner_time_components$generated_terms.
  • The identifiability gate in R/diagnostics_report.R uses this list to auto-detect baseline terms (via detect_baseline_terms()), so generated holiday terms are included in baseline-media correlation checks without requiring explicit configuration.

Feature naming collision

If two different holiday labels normalise to the same feature name, build_weekly_holiday_features() aborts with a collision error. Ensure calendar labels are distinct after normalisation.

Interaction with existing data columns

  • If overwrite_existing: false (default), the runner aborts if any generated column name already exists in the data.
  • If overwrite_existing: true, existing columns with matching names are replaced by the generated features.

Practical guidance

  • Start with aggregation_rule: count to capture multi-day holiday effects (e.g. a holiday spanning two days in one week produces a count of 2).
  • Use window_before and window_after for events with known anticipation or lingering effects (e.g. window_before: 7 for pre-Christmas shopping).
  • Use aggregation_rule: any when you want binary holiday indicators regardless of how many event-days fall in a week.
  • Check generated terms in the resolved config (config.resolved.yaml) and posterior summary to confirm which holidays entered the model.

Cross-references