legate-boost#
GBM implementation on Legate. The primary goals of legate-boost
is to provide a state-of-the-art distributed GBM implementation on Legate, capable of running on CPUs or GPUs at supercomputer scale.
For developers - see contributing
Installation#
Install using conda
.
# stable release
conda install -c legate -c conda-forge -c nvidia legate-boost
# nightly release
conda install -c legate/label/experimental -c legate -c conda-forge -c nvidia legate-boost
On systems without a GPU, the CPU-only package should automatically be installed. On systems with a GPU and compatible CUDA version, the GPU package should automatically be installed.
To force conda
to prefer one, pass the build strings *_cpu*
or *_gpu*
, for example:
# nightly release (CPU-only)
conda install --dry-run -c legate/label/experimental -c legate -c conda-forge -c nvidia \
'legate-boost=*=*_cpu*'
For more details on building from source and setting up a development environment, see contributing.md
.
Simple example#
Run with the legate launcher
legate example_script.py
import cupynumeric as cn
import legateboost as lb
X = cn.random.random((1000, 10))
y = cn.random.random(X.shape[0])
model = lb.LBRegressor(verbose=1, n_estimators=100, random_state=0, max_depth=2).fit(
X, y
)
Features#
Model ensembling#
legate-boost
can create models from linear combinations of other models. Ensembling is as easy as:
model_a = lb.LBClassifier().fit(X_train_a, y_train_a)
model_b = lb.LBClassifier().fit(X_train_b, y_train_b)
model_c = (model_a + model_b) * 0.5
Probabilistic regression#
legate-boost
can learn distributions for continuous data. This is useful in cases where simply predicting the mean does not carry enough information about the training data:
The above example can be found here: examples/probabilistic_regression.
Batch training#
legate-boost
can train on datasets that do not fit into memory by splitting the dataset into batches and training the model with partial_fit
.
total_estimators = 100
model = lb.LBRegressor(n_estimators=estimators_per_batch)
for i in range(total_estimators // estimators_per_batch):
X_batch, y_batch = train_batches[i % n_batches]
model.partial_fit(
X_batch,
y_batch,
)
The above example can be found here: examples/batch_training.
Different model types#
legate-boost
supports tree models, linear models, kernel ridge regression models, custom user models and any combinations of these models.
The following example shows a model combining linear and decision tree base learners on a synthetic dataset.
model = lb.LBRegressor(base_models=(lb.models.Linear(), lb.models.Tree(max_depth=1),), **params).fit(X, y)
The second example shows a model combining kernel ridge regression and decision tree base learners on the wine quality dataset.
model = lb.LBRegressor(base_models=(lb.models.KRR(sigma=0.5), lb.models.Tree(max_depth=5),), **params).fit(X, y)