Implementation of a gradient boosting algorithm for regression problems.
Learns component models to iteratively improve a loss function.
Parameters:
n_estimators – The number of boosting stages to perform.
objective – The loss function to optimize. Possible values are [‘squared_error’].
metric – Metric for evaluation. ‘default’ indicates for the objective function to choose
the accompanying metric. Possible values: [‘mse’] or instance of BaseMetric. Can
be a list multiple metrics.
learning_rate – The learning rate shrinks the contribution of each model.
subsample – The fraction of samples to be used for fitting the individual base models.
init – The initial prediction of the model. If None, the initial prediction
is zero. If ‘average’, the initial prediction minimises a second order
approximation of the loss-function (simply the mean label in the case of
regression).
base_models – The base models to use for each iteration. The model used in each iteration
i is base_models[i % len(base_models)].
callbacks – List of callbacks to apply during training e.g. early stopping.
See callbacks module for more information.
verbose – Controls the verbosity when fitting and predicting.
random_state – Controls the randomness of the estimator. Pass an int for reproducible
results across multiple function calls.
Compute global feature attributions for the model. Global
attributions show the effect of a feature on a model’s loss function.
We use a Shapley value approach to compute the attributions:
\(Sh_i(v)=\frac{1}{|N|!} \sum_{\sigma \in \mathfrak{S}_d} \big[ v([\sigma]_{i-1} \cup\{i\}) - v([\sigma]_{i-1}) \big],\)
where \(v\) is the model’s loss function, \(N\) is the set of features, and \(\mathfrak{S}_d\) is the set of all permutations of the features.
\([\sigma]_{i-1}\) represents the set of players ranked lower than \(i\) in the ordering \(\sigma\).
In effect the shapley value shows the effect of adding a feature to the model, averaged over all possible orderings of the features. In our case the above function is approximated using an antithetic-sampling method [1], where n_samples corresponds to pairs of permutation samples. This method also returns the standard error, which decreases according to \(1/\sqrt{n\_samples}\).
This definition of attributions requires removing a feature from the active set. We use a random sample of values from X to fill in the missing feature values. This choice of background distribution corresponds to an ‘interventional’ Shapley value approach discussed in [2].
The method uses memory (and time) proportional to \(n\_samples \times n\_features \times n\_background\_samples\). Reduce the number of background samples or the size of X to speed up computation and reduce memory usage. X does not need to be the entire training set to get useful estimates.
See the method local_attributions() for the effect of features on individual prediction outputs.
Parameters:
X (cn.array) – The input data.
y (cn.array) – The target values.
metric (BaseMetric, optional) – The metric to evaluate the model. If None, the model default metric is used.
random_state (int, optional) – The random state for reproducibility.
n_samples (int, optional) – The number of sample pairs to use in the antithetic sampling method.
check_efficiency (bool, optional) – If True, check that shapley values + null coalition add up to the final loss for X, y (the so called efficiency property of Shapley values)’.
Returns:
cn.array – The Shapley value estimates for each feature. The last value is the null coalition loss. The sum of this array results in the loss for X, y.
cn.array – The standard error of the Shapley value esimates, with respect to n_samples. The standard error decreases according to \(1/\sqrt{n\_samples}\).
Local feature attributions for model predictions. Shows the effect
of a feature on each output prediction. See the definition of Shapley
values in global_attributions(), where the
\(v\) function is here the model prediction instead of the loss
function.
Parameters:
X (cn.array) – The input data.
X_background (cn.array) – The background data to use for missing feature values. This could be a random sample of training data (e.g. between 10-100 instances).
random_state (int, optional) – The random state for reproducibility.
n_samples (int) – The number of sample pairs to use in the antithetic sampling method.
check_efficiency (bool) – If True, check that shapley values + null prediction add up to the final predictions for X (the so called efficiency property of Shapley values).
Returns:
cn.array – The Shapley value estimates for each feature. The final value is the ‘null prediction’, where all features are turned off. The sum of this array results in the model prediction.
cn.array – The standard error of the Shapley value esimates, with respect to n_samples. The standard error decreases according to \(1/\sqrt{n\_samples}\).
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as
\((1 - \frac{u}{v})\), where \(u\) is the residual
sum of squares ((y_true-y_pred)**2).sum() and \(v\)
is the total sum of squares ((y_true-y_true.mean())**2).sum().
The best possible score is 1.0 and it can be negative (because the
model can be arbitrarily worse). A constant model that always predicts
the expected value of y, disregarding the input features, would get
a \(R^2\) score of 0.0.
Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
(n_samples,n_samples_fitted), where n_samples_fitted
is the number of samples used in the fitting for the estimator.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns:
score – \(R^2\) of self.predict(X) w.r.t. y.
Return type:
float
Notes
The \(R^2\) score used when calling score on a regressor uses
multioutput='uniform_average' from version 0.23 to keep consistent
with default value of r2_score().
This influences the score method of all the multioutput
regressors (except for
MultiOutputRegressor).
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
eval_result (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_result parameter in fit.
eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
The method works on simple estimators as well as on nested objects
(such as Pipeline). The latter have
parameters of the form <component>__<parameter> so that it’s
possible to update each component of a nested object.
Request metadata passed to the partial_fit method.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to partial_fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
eval_result (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_result parameter in partial_fit.
eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in partial_fit.
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in partial_fit.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Update a gradient boosting model from the training set (X, y). This
method does not add any new models to the ensemble, only updates
existing models to fit the new data.
Parameters:
X – The training input samples.
y – The target values (class labels) as integers or as floating point numbers.
sample_weight – Sample weights. If None, then samples are equally weighted.
eval_set – A list of (X, y) or (X, y, w) tuples.
The metric will be evaluated on each tuple.
eval_result – Returns evaluation result dictionary on training completion.
Implements a gradient boosting algorithm for classification problems.
Parameters:
n_estimators – The number of boosting stages to perform.
objective – The loss function to be optimized. Possible values: [‘log_loss’, ‘exp’]
or instance of BaseObjective.
metric – Metric for evaluation. ‘default’ indicates for the objective function to
choose the accompanying metric. Possible values: [‘log_loss’, ‘exp’] or
instance of BaseMetric. Can be a list multiple metrics.
learning_rate – The learning rate shrinks the contribution of each model.
subsample – The fraction of samples to be used for fitting the individual base models.
init – The initial prediction of the model. If None, the initial prediction
is zero. If ‘average’, the initial prediction minimises a second order
approximation of the loss-function.
base_models – The base models to use for each iteration. The model used in each iteration
i is base_models[i % len(base_models)].
callbacks – List of callbacks to apply during training e.g. early stopping.
See callbacks module for more information.
verbose – Controls the verbosity of the boosting process.
random_state – Controls the randomness of the estimator. Pass an int for reproducible output
across multiple function calls.
Compute global feature attributions for the model. Global
attributions show the effect of a feature on a model’s loss function.
We use a Shapley value approach to compute the attributions:
\(Sh_i(v)=\frac{1}{|N|!} \sum_{\sigma \in \mathfrak{S}_d} \big[ v([\sigma]_{i-1} \cup\{i\}) - v([\sigma]_{i-1}) \big],\)
where \(v\) is the model’s loss function, \(N\) is the set of features, and \(\mathfrak{S}_d\) is the set of all permutations of the features.
\([\sigma]_{i-1}\) represents the set of players ranked lower than \(i\) in the ordering \(\sigma\).
In effect the shapley value shows the effect of adding a feature to the model, averaged over all possible orderings of the features. In our case the above function is approximated using an antithetic-sampling method [3], where n_samples corresponds to pairs of permutation samples. This method also returns the standard error, which decreases according to \(1/\sqrt{n\_samples}\).
This definition of attributions requires removing a feature from the active set. We use a random sample of values from X to fill in the missing feature values. This choice of background distribution corresponds to an ‘interventional’ Shapley value approach discussed in [4].
The method uses memory (and time) proportional to \(n\_samples \times n\_features \times n\_background\_samples\). Reduce the number of background samples or the size of X to speed up computation and reduce memory usage. X does not need to be the entire training set to get useful estimates.
See the method local_attributions() for the effect of features on individual prediction outputs.
Parameters:
X (cn.array) – The input data.
y (cn.array) – The target values.
metric (BaseMetric, optional) – The metric to evaluate the model. If None, the model default metric is used.
random_state (int, optional) – The random state for reproducibility.
n_samples (int, optional) – The number of sample pairs to use in the antithetic sampling method.
check_efficiency (bool, optional) – If True, check that shapley values + null coalition add up to the final loss for X, y (the so called efficiency property of Shapley values)’.
Returns:
cn.array – The Shapley value estimates for each feature. The last value is the null coalition loss. The sum of this array results in the loss for X, y.
cn.array – The standard error of the Shapley value esimates, with respect to n_samples. The standard error decreases according to \(1/\sqrt{n\_samples}\).
Local feature attributions for model predictions. Shows the effect
of a feature on each output prediction. See the definition of Shapley
values in global_attributions(), where the
\(v\) function is here the model prediction instead of the loss
function.
Parameters:
X (cn.array) – The input data.
X_background (cn.array) – The background data to use for missing feature values. This could be a random sample of training data (e.g. between 10-100 instances).
random_state (int, optional) – The random state for reproducibility.
n_samples (int) – The number of sample pairs to use in the antithetic sampling method.
check_efficiency (bool) – If True, check that shapley values + null prediction add up to the final predictions for X (the so called efficiency property of Shapley values).
Returns:
cn.array – The Shapley value estimates for each feature. The final value is the ‘null prediction’, where all features are turned off. The sum of this array results in the model prediction.
cn.array – The standard error of the Shapley value esimates, with respect to n_samples. The standard error decreases according to \(1/\sqrt{n\_samples}\).
This method is used for incremental fitting on a batch of samples.
Requires the classes to be provided up front, as they may not be
inferred from the first batch.
Parameters:
X – The training input samples.
y – The target values
classes – The unique labels of the target. Must be provided at the first call.
sample_weight – Weights applied to individual samples (1D array). If None, then
samples are equally weighted.
eval_set – A list of (X, y) or (X, y, w) tuples.
The metric will be evaluated on each tuple.
eval_result – Returns evaluation result dictionary on training completion.
Returns:
Returns self.
Return type:
self
Raises:
ValueError – If the classes provided are not whole numbers, or
if provided classes do not match previous fit.
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy
which is a harsh metric since you require for each sample that
each label set be correctly predicted.
Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
Returns:
score – Mean accuracy of self.predict(X) w.r.t. y.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
eval_result (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_result parameter in fit.
eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
The method works on simple estimators as well as on nested objects
(such as Pipeline). The latter have
parameters of the form <component>__<parameter> so that it’s
possible to update each component of a nested object.
Request metadata passed to the partial_fit method.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to partial_fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
classes (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for classes parameter in partial_fit.
eval_result (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_result parameter in partial_fit.
eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in partial_fit.
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in partial_fit.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Update a gradient boosting model from the training set (X, y). This
method does not add any new models to the ensemble, only updates
existing models to fit the new data.
Parameters:
X – The training input samples.
y – The target values (class labels) as integers or as floating point numbers.
sample_weight – Sample weights. If None, then samples are equally weighted.
eval_set – A list of (X, y) or (X, y, w) tuples.
The metric will be evaluated on each tuple.
eval_result – Returns evaluation result dictionary on training completion.