Models#

class legateboost.models.Tree(*, max_depth: int = 8, split_samples: int = 256, feature_fraction: float | Callable[[], array] = 1.0, l1_regularization: float = 0.0, l2_regularization: float = 1.0, min_split_gain: float = 0.0, alpha: Any = 'deprecated')#

Decision tree model for gradient boosting.

Instead of exhaustive search over all possible split values, a random sample of size split_samples is taken from the dataset and used as split candidates. The tree learner matches very closely a histogram type algorithm from XGBoost/LightGBM, where the split_samples parameter can be tuned like the number of bins.

Parameters:
  • max_depth (int) – The maximum depth of the tree.

  • split_samples (int) – The number of data points to sample for each split decision. Max value is 2048 due to constraints on shared memory in GPU kernels.

  • feature_fraction – If float, the subsampled fraction of features considered in building this model. Features are sampled without replacement, the number of features is rounded up and at least 1. Users may implement an arbitrary function returning a cupynumeric array of booleans of shape (n_features,) to specify the feature subset.

  • l1_regularization (float) – The L1 regularization parameter applied to leaf weights.

  • l2_regularization (float) – The L2 regularization parameter applied to leaf weights.

  • alpha (deprecated) – Deprecated, use l2_regularization instead.

  • min_split_gain (float) – The minimum improvement in the loss function required to make a split. Increasing this value generates smaller trees. Equivalent to the gamma parameter from XGBoost. Is applied on a per output basis e.g. if there are 3 output classes then the gain must be greater than 3 * min_split_gain.

static batch_predict(models: Sequence[BaseModel], X: ndarray) ndarray#

Predict labels for samples in X with given list of models. Is implemented as a static method with a list of model inputs to allow the underlying implementation to parallelize or otherwise optimise over a batch of models.

Parameters:
  • models (list of BaseModel)

  • X (array-like of shape (n_samples, n_features)) – The input samples.

Returns:

y_pred – The predicted labels.

Return type:

ndarray of shape (n_samples,)

fit(X: ndarray, g: ndarray, h: ndarray) Tree#

Fit the model to a second order Taylor expansion of the loss function.

Parameters:
  • X – The training data.

  • g – The first derivative of the loss function with respect to the predicted values.

  • h

    The second derivative of the loss function with

    respect to the predicted values.

Returns:

The fitted model.

Return type:

BaseModel

predict(X: ndarray) ndarray#

Predict for samples in X.

Parameters:

X (array-like of shape (n_samples, n_features)) – The input samples.

Returns:

y_pred

Return type:

ndarray of shape (n_samples,)

update(X: ndarray, g: ndarray, h: ndarray) Tree#

Update the model with new training data.

Parameters:
  • X – The training data to update the model with.

  • g – The first derivative of the loss function with respect to the model’s predictions.

  • h – The second derivative of the loss function with respect to the model’s predictions.

Returns:

The updated model.

Return type:

BaseModel

class legateboost.models.Linear(*, l2_regularization: float = 1e-05, alpha: Any = 'deprecated', solver: str = 'direct')#

Generalised linear model. Boosting linear models is equivalent to fitting a single linear model where each boosting iteration is a newton step. Note that the l2 penalty is applied to the weights of each model, as opposed to the sum of all models. This can lead to different results when compared to fitting a linear model with sklearn.

It is recommended to normalize the data before fitting. This ensures regularisation is evenly applied to all features and prevents numerical issues.

Two solvers are available. A direct numerical solver that can be faster, but uses more memory, and an iterative L-BFGS solver that uses less memory but can be slower.

Parameters:
  • l2_regularization – An L2 penalty applied to the coefficients.

  • alpha (deprecated) – Deprecated, use l2_regularization instead.

  • solver ("direct" or "lbfgs") – If “direct”, use a direct solver. If “lbfgs”, use the lbfgs solver.

bias_#

Intercept term.

Type:

ndarray of shape (n_outputs,)

betas_#

Coefficients of the linear model.

Type:

ndarray of shape (n_features, n_outputs)

static batch_predict(models: Sequence[BaseModel], X: ndarray) ndarray#

Predict labels for samples in X with given list of models. Is implemented as a static method with a list of model inputs to allow the underlying implementation to parallelize or otherwise optimise over a batch of models.

Parameters:
  • models (list of BaseModel)

  • X (array-like of shape (n_samples, n_features)) – The input samples.

Returns:

y_pred – The predicted labels.

Return type:

ndarray of shape (n_samples,)

fit(X: ndarray, g: ndarray, h: ndarray) Linear#

Fit the model to a second order Taylor expansion of the loss function.

Parameters:
  • X – The training data.

  • g – The first derivative of the loss function with respect to the predicted values.

  • h

    The second derivative of the loss function with

    respect to the predicted values.

Returns:

The fitted model.

Return type:

BaseModel

predict(X: ndarray) ndarray#

Predict for samples in X.

Parameters:

X (array-like of shape (n_samples, n_features)) – The input samples.

Returns:

y_pred

Return type:

ndarray of shape (n_samples,)

update(X: ndarray, g: ndarray, h: ndarray) Linear#

Update the model with new training data.

Parameters:
  • X – The training data to update the model with.

  • g – The first derivative of the loss function with respect to the model’s predictions.

  • h – The second derivative of the loss function with respect to the model’s predictions.

Returns:

The updated model.

Return type:

BaseModel

class legateboost.models.KRR(*, n_components: int = 100, alpha: Any = 'deprecated', l2_regularization: float = 1e-05, sigma: float | None = None, solver: str = 'direct')#

Kernel Ridge Regression model using the Nyström approximation. The accuracy of the approximation is governed by the parameter n_components <= n. Effectively, n_components rows will be randomly sampled (without replacement) from X in each boosting iteration.

The kernel is fixed to be the RBF kernel:

\(k(x_i, x_j) = \exp(-\frac{||x_i - x_j||^2}{2\sigma^2})\)

Standardising data is recommended.

The sigma parameter, if not given, is estimated using the method described in: Allerbo, Oskar, and Rebecka Jörnsten. “Bandwidth Selection for Gaussian Kernel Ridge Regression via Jacobian Control.” arXiv preprint arXiv:2205.11956 (2022).

See the following reference for more details on gradient boosting with kernel ridge regression: Sigrist, Fabio. “KTBoost: Combined kernel and tree boosting.” Neural Processing Letters 53.2 (2021): 1147-1160.

Parameters:
  • n_components – Number of components to use in the model.

  • l2_regularization – l2 regularization parameter on the weights.

  • alpha – Deprecated. Use l2_regularization instead.

  • sigma – Kernel bandwidth parameter. If None, use the mean squared distance.

  • solver – Solver to use for solving the linear system. Options are , ‘lbfgs’, and ‘direct’.

betas_#

Coefficients of the regression model.

Type:

ndarray of shape (n_train_samples, n_outputs)

X_train#

Training data used to fit the model.

Type:

ndarray of shape (n_components, n_features)

indices#

Indices of the training data used to fit the model.

Type:

ndarray of shape (n_components,)

static batch_predict(models: Sequence[BaseModel], X: ndarray) ndarray#

Predict labels for samples in X with given list of models. Is implemented as a static method with a list of model inputs to allow the underlying implementation to parallelize or otherwise optimise over a batch of models.

Parameters:
  • models (list of BaseModel)

  • X (array-like of shape (n_samples, n_features)) – The input samples.

Returns:

y_pred – The predicted labels.

Return type:

ndarray of shape (n_samples,)

fit(X: ndarray, g: ndarray, h: ndarray) KRR#

Fit the model to a second order Taylor expansion of the loss function.

Parameters:
  • X – The training data.

  • g – The first derivative of the loss function with respect to the predicted values.

  • h

    The second derivative of the loss function with

    respect to the predicted values.

Returns:

The fitted model.

Return type:

BaseModel

predict(X: ndarray) ndarray#

Predict for samples in X.

Parameters:

X (array-like of shape (n_samples, n_features)) – The input samples.

Returns:

y_pred

Return type:

ndarray of shape (n_samples,)

update(X: ndarray, g: ndarray, h: ndarray) KRR#

Update the model with new training data.

Parameters:
  • X – The training data to update the model with.

  • g – The first derivative of the loss function with respect to the model’s predictions.

  • h – The second derivative of the loss function with respect to the model’s predictions.

Returns:

The updated model.

Return type:

BaseModel

class legateboost.models.NN(*, max_iter: int = 100, hidden_layer_sizes: Tuple[int] = (100,), alpha: Any = 'deprecated', l2_regularization: float = 1e-05, verbose: bool = False, m: int = 10, gtol: float = 1e-05)#

A multi-layer perceptron base model.

Parameters:
  • max_iter (int, default=100) – Maximum number of lbfgs iterations.

  • hidden_layer_sizes (Tuple[int], default=(100,)) – The ith element represents the number of neurons in the ith hidden layer.

  • alpha (str, default="deprecated") – Deprecated parameter for L2 regularization. Use l2_regularization instead.

  • l2_regularization (float, default=1e-5) – L2 regularization term.

  • verbose (bool, default=False) – Whether to print progress messages to stdout.

  • m (int, default=10) – L-BFGS optimization parameter - number of previos steps to store.

  • gtol (float, default=1e-5) – Gradient norm tolerance for L-BFGS optimization.

static batch_predict(models: Sequence[BaseModel], X: ndarray) ndarray#

Predict labels for samples in X with given list of models. Is implemented as a static method with a list of model inputs to allow the underlying implementation to parallelize or otherwise optimise over a batch of models.

Parameters:
  • models (list of BaseModel)

  • X (array-like of shape (n_samples, n_features)) – The input samples.

Returns:

y_pred – The predicted labels.

Return type:

ndarray of shape (n_samples,)

fit(X: ndarray, g: ndarray, h: ndarray) Any#

Fit the model to a second order Taylor expansion of the loss function.

Parameters:
  • X – The training data.

  • g – The first derivative of the loss function with respect to the predicted values.

  • h

    The second derivative of the loss function with

    respect to the predicted values.

Returns:

The fitted model.

Return type:

BaseModel

predict(X: ndarray) ndarray#

Predict for samples in X.

Parameters:

X (array-like of shape (n_samples, n_features)) – The input samples.

Returns:

y_pred

Return type:

ndarray of shape (n_samples,)

update(X: ndarray, g: ndarray, h: ndarray) NN#

Update the model with new training data.

Parameters:
  • X – The training data to update the model with.

  • g – The first derivative of the loss function with respect to the model’s predictions.

  • h – The second derivative of the loss function with respect to the model’s predictions.

Returns:

The updated model.

Return type:

BaseModel