Models#
- class legateboost.models.Tree(max_depth: int = 8, split_samples: int = 256, alpha: float = 1.0)#
Decision tree model for gradient boosting.
Instead of exhaustive search over all possible split values, a random sample of size split_samples is taken from the dataset and used as split candidates. The tree learner matches very closely a histogram type algorithm from XGBoost/LightGBM, where the split_samples parameter can be tuned like the number of bins.
- Parameters:
max_depth (int) – The maximum depth of the tree.
split_samples (int) – The number of data points to sample for each split decision. Max value is 2048 due to constraints on shared memory in GPU kernels.
alpha (float) – The L2 regularization parameter.
- fit(X: ndarray, g: ndarray, h: ndarray) Tree #
Fit the model to a second order Taylor expansion of the loss function.
- Parameters:
X – The training data.
g – The first derivative of the loss function with respect to the predicted values.
h –
- The second derivative of the loss function with
respect to the predicted values.
- Returns:
The fitted model.
- Return type:
BaseModel
- predict(X: ndarray) ndarray #
Predict class labels for samples in X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input samples.
- Returns:
y_pred – The predicted class labels.
- Return type:
ndarray of shape (n_samples,)
- update(X: ndarray, g: ndarray, h: ndarray) Tree #
Update the model with new training data.
- Parameters:
X – The training data to update the model with.
g – The first derivative of the loss function with respect to the model’s predictions.
h – The second derivative of the loss function with respect to the model’s predictions.
- Returns:
The updated model.
- Return type:
BaseModel
- class legateboost.models.Linear(alpha: float = 1e-05, solver: str = 'direct')#
Generalised linear model. Boosting linear models is equivalent to fitting a single linear model where each boosting iteration is a newton step. Note that the l2 penalty is applied to the weights of each model, as opposed to the sum of all models. This can lead to different results when compared to fitting a linear model with sklearn.
It is recommended to normalize the data before fitting. This ensures regularisation is evenly applied to all features and prevents numerical issues.
Two solvers are available. A direct numerical solver that can be faster, but uses more memory, and an iterative L-BFGS solver that uses less memory but can be slower.
- Parameters:
alpha (L2 regularization parameter.)
solver ("direct" or "lbfgs") – If “direct”, use a direct solver. If “lbfgs”, use the lbfgs solver.
- bias_#
Intercept term.
- Type:
ndarray of shape (n_outputs,)
- betas_#
Coefficients of the linear model.
- Type:
ndarray of shape (n_features, n_outputs)
- fit(X: ndarray, g: ndarray, h: ndarray) Linear #
Fit the model to a second order Taylor expansion of the loss function.
- Parameters:
X – The training data.
g – The first derivative of the loss function with respect to the predicted values.
h –
- The second derivative of the loss function with
respect to the predicted values.
- Returns:
The fitted model.
- Return type:
BaseModel
- predict(X: ndarray) ndarray #
Predict class labels for samples in X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input samples.
- Returns:
y_pred – The predicted class labels.
- Return type:
ndarray of shape (n_samples,)
- update(X: ndarray, g: ndarray, h: ndarray) Linear #
Update the model with new training data.
- Parameters:
X – The training data to update the model with.
g – The first derivative of the loss function with respect to the model’s predictions.
h – The second derivative of the loss function with respect to the model’s predictions.
- Returns:
The updated model.
- Return type:
BaseModel
- class legateboost.models.KRR(n_components: int = 100, alpha: float = 1e-05, sigma: float | None = None, solver: str = 'direct')#
Kernel Ridge Regression model using the Nyström approximation. The accuracy of the approximation is governed by the parameter n_components <= n. Effectively, n_components rows will be randomly sampled (without replacement) from X in each boosting iteration.
The kernel is fixed to be the RBF kernel:
\(k(x_i, x_j) = \exp(-\frac{||x_i - x_j||^2}{2\sigma^2})\)
Standardising data is recommended.
The sigma parameter, if not given, is estimated using the method described in: Allerbo, Oskar, and Rebecka Jörnsten. “Bandwidth Selection for Gaussian Kernel Ridge Regression via Jacobian Control.” arXiv preprint arXiv:2205.11956 (2022).
See the following reference for more details on gradient boosting with kernel ridge regression: Sigrist, Fabio. “KTBoost: Combined kernel and tree boosting.” Neural Processing Letters 53.2 (2021): 1147-1160.
- Parameters:
n_components – Number of components to use in the model.
alpha – Regularization parameter.
sigma – Kernel bandwidth parameter. If None, use the mean squared distance.
solver – Solver to use for solving the linear system. Options are , ‘lbfgs’, and ‘direct’.
- betas_#
Coefficients of the regression model.
- Type:
ndarray of shape (n_train_samples, n_outputs)
- X_train#
Training data used to fit the model.
- Type:
ndarray of shape (n_components, n_features)
- indices#
Indices of the training data used to fit the model.
- Type:
ndarray of shape (n_components,)
- fit(X: ndarray, g: ndarray, h: ndarray) KRR #
Fit the model to a second order Taylor expansion of the loss function.
- Parameters:
X – The training data.
g – The first derivative of the loss function with respect to the predicted values.
h –
- The second derivative of the loss function with
respect to the predicted values.
- Returns:
The fitted model.
- Return type:
BaseModel
- predict(X: ndarray) ndarray #
Predict class labels for samples in X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input samples.
- Returns:
y_pred – The predicted class labels.
- Return type:
ndarray of shape (n_samples,)
- update(X: ndarray, g: ndarray, h: ndarray) KRR #
Update the model with new training data.
- Parameters:
X – The training data to update the model with.
g – The first derivative of the loss function with respect to the model’s predictions.
h – The second derivative of the loss function with respect to the model’s predictions.
- Returns:
The updated model.
- Return type:
BaseModel