Developer Documentation ======================= Code Organization ----------------- This shows the basic code organization. Currently, the repo is basically flat. All implementations are directly under under the ``cudf/`` directory. All tests are in ``cudf/tests/`` directory. Here's a quick map to decide which file contains which feature: - ``DataFrame``: - ```` - ``Series``: - ```` - ``Column`` and its subclasses: - ```` - ```` - ```` for numeric columns - ```` for categorical columns - ``Buffer``: - ```` - ``.apply()`` and simliar functions: - ```` - ``.query()`` and similar functions: - ```` - GPU helper functions: - ```` - Docstring helpers: - ```` - Output formating: - ```` - Arrow: - ```` - Groupby: - ```` - Dask serialization helpers: - ```` - ``Index``: - ```` - Operations on multiple DataFrame, Series or Indices: - ```` - Other general helper functions: - ```` Code that should move to libgdf -------------------------------- Code that should be re-implemented in libgdf in CUDA-C for better reusability and performance. - ``cudf/`` contains a lot of GPU helper functions that are jitted by numba with ``@cuda.jit`` into CUDA kernels. All CUDA kernels in this file should be moved to libgdf if possible. - Some logic in ``cudf/`` should be move to libgdf to make groupby operation faster. Some groupby aggregations are implemented with ``@cuda.jit`` here. Code that cannot move to libgdf ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Some features requires the jit to be useful; e.g features that use user-defined functions. These features cannot be moved to libgdf.