Forecasting algorithm
AutoARIMA
Classical seasonal ARIMA with exogenous regressors (SARIMAX) for a single time series. Searches for the best ARIMA structure automatically, then produces a point forecast plus a native, statistically-derived uncertainty band for every period in the horizon.
It is the classical baseline alongside lightgbm-v1 — the model every forecaster is expected to run as a yardstick. Pick it when your series is regular and reasonably stationary, when you want honest model-derived prediction intervals, or as a sanity check against the gradient-boosted forecaster.
What it does
You point it at a DataSource and pick a date column, a target column, and (optionally) some drivers. It outputs a new DataSource with one row per future period, in the same shape as every forecast algorithm:
| Column | Type | Meaning |
|---|---|---|
<date_col> | date | Echoes the input date column name |
yhat | float | Point prediction |
yhat_lower | float | Lower bound at chosen interval |
yhat_upper | float | Upper bound at chosen interval |
The output DataSource is a first-class table the rest of the app can plot, pin to dashboards, or include in reports.
How it works
Automatic ARIMA search
An ARIMA model describes a series through its own past values (the autoregressive and moving-average terms) and as many rounds of differencing as it takes to make the series stationary. A seasonal ARIMA adds a second set of those terms at the seasonal lag — month-of-year for monthly data, day-of-week for daily, and so on.
AutoARIMA does the hard part for you: it searches over candidate (p, d, q)(P, D, Q) structures and keeps the one with the best information criterion (AICc). You do not pick orders by hand. The chosen structure is recorded on the run as arima_order (e.g. ARIMA(1,1,1)(0,1,0)[12]) so the fit is transparent.
The seasonal period is derived from the cadence you choose — 7 (daily), 52 (weekly), 12 (monthly), 4 (quarterly), 1 (yearly; no seasonal terms).
Native prediction intervals
Unlike the LightGBM forecaster — which fits three quantile models and then conformally widens the band — ARIMA's interval is analytic: it falls straight out of the fitted model's error variance and propagates forward through the forecast horizon. The band naturally widens the further out you forecast, because the model's own uncertainty compounds.
There is no calibration split and no CQR step: the interval_level you choose is the model's stated coverage by construction. The forecast viz never shows "(uncalibrated)".
Exogenous regressors
Driver columns you select enter the model as exogenous regressors — the X in SARIMAX. ARIMA fits a linear coefficient for each, on top of the time-series structure. As with the LightGBM forecaster, the model assumes the drivers' future values are knowable (calendar events, planned spend); the backtest reads them straight from the held-out window.
ARIMA's use of drivers is linear and additive — it does not capture non-linear driver effects or interactions. If a driver matters non-linearly, the LightGBM forecaster will use it better.
Direct forecast
ARIMA forecasts the whole horizon directly — there is no recursive feeding of one step's prediction into the next step's inputs. Inference on fresh data re-applies the discovered ARIMA order to the new history (a fast fixed-order re-fit, no re-search) and reports the model's one-step-ahead fitted values.
Configuration
The form maps directly to the spec — same inputs as the LightGBM forecaster:
| Form input | Stored as |
|---|---|
| Source | PredictionModel.source_id |
| Date column | spec.date_col |
| Target column | spec.target_col |
| Drivers (multi-select) | spec.exogenous_cols |
| Cadence | spec.frequency |
| Horizon | spec.horizon (in periods) |
| Interval level | spec.interval_level |
| Validation horizon | spec.validation_horizon |
algorithm, version, task, and hyperparams are server-defaulted. ARIMA's fit is deterministic — the same input and spec always produce the same model.
Metrics
A successful run writes these into PredictionRun.metrics:
| Metric | Meaning |
|---|---|
mae | Mean absolute error against the held-out backtest period |
mape | Mean absolute % error (rows with target=0 excluded) |
smape | Symmetric MAPE — robust to zero/near-zero targets |
pi_coverage | Fraction of backtest rows where actual fell inside band |
arima_order | The ARIMA structure the search selected |
smape is the primary "is this any good" number. There is no feature_importances — ARIMA has no engineered feature set; the arima_order string is the interpretability surface instead.
Good for
- Regular, reasonably stationary series with clear seasonality. Monthly KPIs, weekly demand, quarterly figures — the classic ARIMA home ground.
- An honest baseline. Running AutoARIMA next to the LightGBM forecaster is the single most useful habit in forecasting — "we replaced our fancy model with ARIMA and it got better" is a common story.
- Calibrated uncertainty. When the width of the band matters (capacity planning, risk), ARIMA's analytic interval is derived from the model rather than conformally patched on.
- Small data. ARIMA is well-behaved on short series where a tree model would overfit.
Limitations
- Linear, single-series. ARIMA models one series and treats drivers linearly. Non-linear effects, regime shifts, threshold behaviour, and driver interactions are better served by the LightGBM forecaster.
- No multi-series support. One series in, one forecast out — split per series upstream.
- Stationarity assumptions. ARIMA expects structure that differencing can stabilise. Series with abrupt level shifts or changing seasonality can defeat it; check the backtest band before trusting it.
- No feature importance. There is no drivers panel —
arima_orderis the only structural readout. - No "project beyond the data" yet. AutoARIMA models are backtested but are not wired into the
future_forecastanalysis in this version; use the LightGBM forecaster when you need a forward projection past the dataset.
See also
lightgbm-v1.md— gradient-boosted sister; non-linear, handles regime shifts and non-linear drivers, usually higher accuracy on rich data.
Not sure which to pick?
Choosing a forecasting algorithmLightGBM forecasting vs AutoARIMA — when gradient-boosted forecasting wins, when the classical SARIMAX model is enough, and why it is worth running both.