Contributing to legate-boost#

legate-boost depends on some libraries that are not easily installable with pip.

Use conda to create a development environment that includes them.

# CUDA 12.2
conda env create \
    --name legate-boost-dev \
    -f ./conda/environments/all_cuda-122.yaml

source activate legate-boost-dev

The easiest way to develop is to compile the shared library separately, then build and install an editable wheel that uses it.

./build.sh legate-boost --editable

Running tests#

CPU:

ci/run_pytests_cpu.sh

GPU:

ci/run_pytests_gpu.sh

Add new tests#

Test cases should go in legateboost/test.

Utility code re-used by multiple tests should be added in legateboost/testing.

Change default CUDA architectures#

By default, builds here default to CMAKE_CUDA_ARCHITECTURES=native (whatever GPU exists on the system where the build is running).

If installing with pip, set the CUDAARCHS environment variable, as described in the CMake docs (link).

CUDAARCHS="70;80" \
    pip install --no-build-isolation --no-deps .

For CMake-based builds, pass CMAKE_CUDA_ARCHITECTURES.

cmake -B build -S . -DCMAKE_CUDA_ARCHITECTURES="70;80"
cmake --build build -j

Pre-commit hooks#

The pre-commit package is used for linting, formatting and type checks. This project uses strict mypy type checking.

Install pre-commit.

pip install pre-commit

Run all checks manually.

pre-commit run --all-files

Change the project version#

The VERSION file at the root of the repo is the single source for legate-boost’s version. Modify that file to change the version for wheels, conda packages, the CMake project, etc.

Work with the conda packages#

Run the commands in this section in a container using the same base image as CI.

# NOTE: remove '--gpus' to test the CPU-only version
docker run \
  --rm \
  --gpus 1 \
  -v $(pwd):/opt/legate-boost \
  -w /opt/legate-boost \
  -it rapidsai/ci-conda:cuda12.5.1-ubuntu22.04-py3.11 \
  bash

Build conda packages locally#

Before doing this, be sure to remove any other left-over build artifacts.

git clean -d -f -X

Build the packages.

CMAKE_GENERATOR=Ninja \
CONDA_OVERRIDE_CUDA="${RAPIDS_CUDA_VERSION}" \
rapids-conda-retry mambabuild \
    --channel legate \
    --channel conda-forge \
    --channel nvidia \
    --no-force-upload \
    conda/recipes/legate-boost

Download conda package created in CI#

Packages built in CI are hosted on the GitHub Artifact Store.

To start, authenticate with the GitHub CLI. By default, this will require interactively entering a code in a browser window. That can be avoided by setting environment variable GH_TOKEN, as described in the GitHub docs (link).

# authenticate with the GitHub CLI
# (can skip this by providing GH_TOKEN environment variable)
gh auth login

Next, select a CI run whose artifacts you want to test. The run IDs can be found in the URLs at https://github.com/rapidsai/legate-boost/actions/workflows/github-actions.yml. For example, given a URL like

https://github.com/rapidsai/legate-boost/actions/runs/10566116913

The run ID is 10566116913.

# choose a specific CI run ID
RUN_ID=10566116913

It’s possible to omit the run ID and just have these commands download whatever the latest artifact produced was. For details on that, see the GitHub docs (link).

Download the packages. This will download and unpack a single artifact which contains all of the conda packages built for a particular combination of CUDA version, CPU architecture, and Python version.

gh run download \
    --dir "${RAPIDS_CONDA_BLD_OUTPUT_DIR}" \
    --repo rapidsai/legate-boost \
    --name "legate-boost-conda-cuda${RAPIDS_CUDA_VERSION}-amd64-py${PYTHON_VERSION}" \
    "${RUN_ID}"

Work with conda packages locally#

After using either of the above approaches, use the tips in this section to work with those local conda packages.

Environment variable RAPIDS_CONDA_BLD_OUTPUT_DIR points to a location with the packages and all the necessary data to be used as a full conda channel.

# list the package contents
cph list \
    "$(echo ${RAPIDS_CONDA_BLD_OUTPUT_DIR}/linux-64/legate-boost-*_gpu.tar.bz2)"

# check that the dependency metadata is correct
conda search \
    --override-channels \
    --channel ${RAPIDS_CONDA_BLD_OUTPUT_DIR} \
    --info \
        legate-boost

# create an environment with the package installed
conda create \
    --name legate-boost-test \
    -c legate \
    -c conda-forge \
    -c "${RAPIDS_CONDA_BLD_OUTPUT_DIR}" \
        legate-boost

Development principles#

The following general principles should be followed when developing legate-boost.

Coding style#

  • Strive for simple and clear design, appropriate for a reference implementation.

  • Algorithm accuracy and reliability is more important than speed.

    • e.g. do not replace double precision floats with single precision in order to achieve small constant factor implementation speedups.

    • Do not be afraid to use 64 bit integers for indexing if it means avoiding any possible overflow issues.

  • Avoid optimisation where possible in favour of clear implementation

  • Favour cunumeric implementations where appropriate. e.g. elementwise or matrix operations

  • Use mypy type annotations if at all possible. The typing can be checked by running the following command under the project root:

ci/run_mypy.sh

Performance#

  • Memory usage is more often a limiting factor than computation time in large distributed training runs. E.g. A proposal that improves runtime by 2x but increases memory usage by 1.5x is likely to be rejected.

  • legate-boost should support CPUs and GPUs as first class citizens.

  • legate-boost will strive for acceptable to good performance on single machine and state-of-the-art performance in a distributed setting.

  • Accepting performance improvements will depend on how maintainable the changes are versus the improvement for a single machine and distributed setting, with a heavier weighting towards the distributed setting.

  • In deciding what level of performance optimisation is appropriate, see the below performance guidelines

    • legate-boost should be expected to run faster than equivalent python based implementations on a single machine e.g. Sklearn.

    • legate-boost should not be expected to run faster than highly optimised native implementations on a single machine. e.g. LightGBM/XGBoost.

    • legate-boost should compete with the above implementions in a distributed setting.

Testing#

  • High level interfaces (e.g. estimators) should be tested using property based testing (e.g. the hypothesis library in python). These tests will automatically test a wide range of inputs.

  • Test run times should be optimised. Minimise the number of boosting rounds or dataset size required to achieve a test result. Cache datasets, preprocessing or other commonly used functionality.

  • Enable santisers in CI to check for various C++ errors.

Supported platforms#

  • Platform support:

    • legate-boost will support the same platforms as the legate ecosystem.

    • legate-boost will also support conda or pip following the legate ecosystem.

  • Installation should be as simple as possible. e.g. pip install legate-boost or conda install legate-boost.

  • Dependency minimisation will facilitate the above.

Data science considerations#

TODO: review by experts

  • Minimise the number of hyperparameters. e.g. XGBoost supports 3 different types of column sampling, where columns are sampled for each tree, each layer or each node. Are the differences between these methods statistically significant? Do we only need one?

  • Some functionality (e.g. preprocessing data) can be deffered to other libraries, although direct implementation can sometimes significantly improve usability or performance (e.g. cross validation).

  • Categorical support is important.

  • Support for sparse data is important. Applications such as NLP involve very sparse data.

Non-goals#

  • Federated learning or privacy preserving machine learning. The literature is not advanced enough to indicate what the best approach is here.

  • External memory.

    • legate-boost will defer data management to legate/legion.

    • legate-boost will not implement its own external memory algorithms, unless the functionality is already implemented in legate.