Regression algorithm
Ridge regression
A linear model for predicting a numeric value from feature columns. Trains a scikit-learn Ridge regressor (least squares with L2 regularization) behind a preprocessing pipeline (impute → scale numeric, impute → one-hot categorical), scores on a random hold-out split, and surfaces the standard regression metric set plus coefficient-based feature importance.
It is the interpretable linear baseline alongside lightgbm-regressor-v1 — pick it when you want a fast, transparent model whose coefficients you can read, or as a yardstick for judging whether a more complex model is earning its keep.
What it does
You point it at a DataSource and pick:
- a numeric target column you want to predict, and
- one or more feature columns the model gets to look at.
Feature columns may be numeric, boolean, or string/categorical. Unlike the LightGBM regressor — which consumes categoricals natively — a linear model needs every feature numeric, so the model does that conversion for you, internally: there are no encoder or scaler nodes to wire. (You still can wire preprocessing upstream if you want explicit control.)
The output is a trained model + an eval_result carrying the metrics, predictions on the test rows, and a feature-importance chart.
How it works
The pipeline shape is identical to the LightGBM regressor — the same regressor_train / regressor_eval nodes:
data_source → random_split → train_data + test_data
│ │
▼ ▼
model ─────────► regressor_train regressor_eval
│ ▲
▼ │
trained_model ──────────────────┘
│
▼
eval_result
regressor_train is fit-only; regressor_eval runs the real prediction pass on the held-out test frame and emits the final scored result. The evaluation step is algorithm-agnostic — ridge regression and LightGBM share the exact same scoring + metric code.
The preprocessing pipeline
The model is a scikit-learn Pipeline. Inside it:
| Feature kind | Steps |
|---|---|
| Numeric / boolean | impute missing values with the median → standardize to zero mean, unit variance |
| String / categorical | impute missing values with the most frequent value → one-hot encode |
Standardizing the numeric features is not cosmetic for ridge: the L2 penalty is scale-sensitive, so without it a feature measured in large units would be regularized far more weakly than one in small units. Scaling puts every coefficient on a comparable footing — both for the penalty and for the importance chart.
The whole fitted pipeline — imputers, scaler, encoder, and coefficients — is serialized as one unit, so inference replays exactly what was fit.
Why ridge, not plain least squares
Ridge adds an L2 penalty (alpha) on the coefficient magnitudes. Plain ordinary least squares has no penalty — it overfits readily once there are many one-hot-expanded columns or correlated features, and the scaler would be a no-op. The penalty keeps the fit stable and the coefficients well-behaved; alpha is the one knob that trades fit against regularization.
Missing values & unseen categories
- Missing values are imputed inside the pipeline (median for numeric, most-frequent for categorical). LightGBM tolerates
NaNnatively; a linear model does not, so this step is required — the model handles it so you don't have to. - Unseen categories at inference — a category value that never appeared in training becomes an all-zero indicator rather than an error, matching how the LightGBM regressor treats an unseen level as missing.
Metric set
Same as the LightGBM regressor — the eval step is shared:
| Metric | Meaning |
|---|---|
| MAE | Mean absolute error — average prediction error in target units |
| RMSE | Root mean squared error — penalizes large errors more heavily |
| R² | Coefficient of determination — fraction of variance explained (1.0 = perfect, 0.0 = no better than predicting the mean) |
| MAPE | Mean absolute percentage error — relative error, None if any test row has target == 0 |
The runs panel surfaces RMSE as the headline number; all four are visible on the eval result detail.
Feature importance
The chart shows standardized-coefficient magnitude — |coefficient| for each (one-hot-expanded) feature. Because features are scaled to unit variance before the fit, these magnitudes are roughly comparable across columns.
These are linear coefficients, not split gains — they are not numerically comparable to the LightGBM regressor's importance bars, and they describe linear effects only.
Hyperparameters
Pass these on the model node's hyperparams (all optional):
| Key | Default | Meaning |
|---|---|---|
alpha | 1.0 | L2 regularization strength — larger means stronger regularization (coefficients shrink toward zero); smaller approaches plain least squares |
solver | "auto" | Linear-system solver — "auto" picks a direct method for dense data; "sag" / "saga" are iterative and honor the run seed |
Ridge's default solver is direct and deterministic, so no iteration cap is needed.
Limitations
- Linear relationship only. Ridge models a linear relationship between features and the target. On its own it cannot capture feature interactions or non-linear effects — if accuracy lags the LightGBM regressor badly, that is usually why. Engineer interaction features upstream, or use the LightGBM regressor.
- One-hot blow-up on high-cardinality columns. A categorical feature with hundreds of distinct values becomes hundreds of indicator columns. Prefer the LightGBM regressor (native categorical splits) for high-cardinality features, or reduce cardinality upstream.
- No prediction intervals. This is point regression — the model outputs a single number per row. If you need uncertainty bands, the time-series forecaster ships them.
- Random split assumes IID rows. If your data has temporal structure (rows from before vs. after some date should be split that way), use the forecast template instead.
See also
lightgbm-regressor-v1.md— gradient-boosted sister; non-linear, native categorical handling, usually higher accuracy.logreg-classifier-v1.md— the categorical-target linear baseline with the same pipeline shape.
Not sure which to pick?
Choosing a regression algorithmLightGBM vs Ridge vs OLS vs Random forest for predicting a number — start with a linear baseline, and when to reach for a tree-based model.