libcudf  0.8
Classes | Typedefs | Enumerations | Functions
cudf Namespace Reference

Invokes an instance of a functor template with the appropriate type determined by a gdf_dtype enum value. More...

Classes

struct  csv_read_arg
 Input arguments to the read_csv interface. More...
 
struct  cuda_error
 Exception thrown when a CUDA error is encountered. More...
 
struct  DeviceAnd
 
struct  DeviceCount
 
struct  DeviceMax
 
struct  DeviceMin
 
struct  DeviceOr
 
struct  DeviceProduct
 
struct  DeviceSum
 
struct  DeviceXor
 
struct  json_read_arg
 Input arguments to the read_json interface. More...
 
struct  logic_error
 Exception thrown when logical precondition is violated. More...
 
struct  meanvar
 intermediate struct to calculate mean and variance This is an example case to output a struct from column input. More...
 
struct  orc_read_arg
 

Input arguments to the read_orc interface

More...
 
struct  pair_accessor
 pair accessor with/without null bitmask A unary function returns thrust::pair<ResultType, bool>. If the element at index i is valid, returns ResultType{data[i]} and true indicating the value was valid. If the element at i is null, returns ResultType{identity} and false indicating the element was null. More...
 
struct  pair_accessor< ElementType, ResultType, false >
 
struct  pair_accessor< ElementType, ResultType, true >
 
struct  parquet_read_arg
 

Input arguments to the read_parquet interface

More...
 
struct  source_info
 

Input source info for xxx_read_arg arguments

More...
 
struct  table
 A wrapper for a set of gdf_columns of equal number of rows. More...
 
struct  transformer_meanvar
 Uses a scalar value to construct a meanvar object. This transforms thrust::pair<ElementType, bool> into ResultType = meanvar<ElementType> form. More...
 
struct  transformer_squared
 Transforms a scalar by first casting to another type, and then squaring the result. More...
 
struct  value_accessor
 value accessor with/without null bitmask A unary function returns scalar value at id. operator() (gdf_index_type id) computes data value and valid flag at id More...
 
struct  value_accessor< ElementType, ResultType, false >
 
struct  value_accessor< ElementType, ResultType, true >
 

Typedefs

using category = detail::wrapper< gdf_category, GDF_CATEGORY >
 
using nvstring_category = detail::wrapper< gdf_nvstring_category, GDF_STRING_CATEGORY >
 
using timestamp = detail::wrapper< gdf_timestamp, GDF_TIMESTAMP >
 
using date32 = detail::wrapper< gdf_date32, GDF_DATE32 >
 
using date64 = detail::wrapper< gdf_date64, GDF_DATE64 >
 
using bool8 = detail::wrapper< gdf_bool8, GDF_BOOL8 >
 

Enumerations

enum  duplicate_keep_option { KEEP_FIRST = 0, KEEP_LAST, KEEP_NONE }
 Choices for drop_duplicates API for retainment of duplicate rows. More...
 

Functions

void binary_operation (gdf_column *out, gdf_scalar *lhs, gdf_column *rhs, gdf_binary_operator ope)
 Performs a binary operation between a gdf_scalar and a gdf_column. More...
 
void binary_operation (gdf_column *out, gdf_column *lhs, gdf_scalar *rhs, gdf_binary_operator ope)
 Performs a binary operation between a gdf_column and a gdf_scalar. More...
 
void binary_operation (gdf_column *out, gdf_column *lhs, gdf_column *rhs, gdf_binary_operator ope)
 Performs a binary operation between two gdf_columns. More...
 
void binary_operation (gdf_column *out, gdf_column *lhs, gdf_column *rhs, const std::string &ptx)
 Performs a binary operation between two gdf_columns using a user-defined PTX function. More...
 
rmm::device_vector< bit_mask::bit_mask_t > row_bitmask (cudf::table const &table, cudaStream_t stream=0)
 Computes a bitmask indicating the presence of NULL values in rows of a table. More...
 
gdf_column empty_like (gdf_column const &input)
 
gdf_column allocate_like (gdf_column const &input, bool allocate_mask_if_exists=true, cudaStream_t stream=0)
 Allocates a new column of the same size and type as the input. More...
 
gdf_column copy (gdf_column const &input, cudaStream_t stream=0)
 Creates a new column that is a copy of input. More...
 
table empty_like (table const &t)
 Creates a table of empty columns with the same types as the inputs. More...
 
table allocate_like (table const &t, bool allocate_mask_if_exists=true, cudaStream_t stream=0)
 Creates a table of columns with the same type and allocation size as the input. More...
 
table copy (table const &t, cudaStream_t stream=0)
 Creates a table of columns and deep copies the data from an input table. More...
 
void copy_range (gdf_column *out_column, gdf_column const &in_column, gdf_index_type out_begin, gdf_index_type out_end, gdf_index_type in_begin)
 Copies a range of elements from one column to another. More...
 
void gather (table const *source_table, gdf_index_type const gather_map[], table *destination_table)
 Gathers the rows (including null values) of a set of source columns into a set of destination columns. More...
 
void scatter (table const *source_table, gdf_index_type const scatter_map[], table *destination_table)
 Scatters the rows (including null values) of a set of source columns into a set of destination columns. More...
 
std::vector< gdf_column * > slice (gdf_column const &input_column, gdf_index_type const *indices, gdf_size_type num_indices)
 Slices a column (including null values) into a set of columns according to a set of indices. More...
 
std::vector< gdf_column * > split (gdf_column const &input_column, gdf_index_type const *indices, gdf_size_type num_indices)
 Splits a column (including null values) into a set of columns according to a set of indices. More...
 
void fill (gdf_column *column, gdf_scalar const &value, gdf_index_type begin, gdf_index_type end)
 Fills a range of elements in a column with a scalar value. More...
 
gdf_column point_in_polygon (gdf_column const &polygon_latitudes, gdf_column const &polygon_longitudes, gdf_column const &query_point_latitudes, gdf_column const &query_point_longitudes)
 Determine whether or not coordinates (query points) are completely inside a static polygon. More...
 
gdf_dtype convertStringToDtype (const std::string &dtype)
 
std::string inferCompressionType (const std::string &compression_arg, gdf_input_type source_type, const std::string &source, const std::map< std::string, std::string > &ext_to_compression)
 Infer the compression type from the compression parameter and the input data. More...
 
table read_csv (csv_read_arg const &args)
 
table read_json (json_read_arg const &args)
 
table read_orc (orc_read_arg const &args)
 
table read_parquet (parquet_read_arg const &args)
 
template<bool has_nulls, typename ElementType , typename ResultType = ElementType, typename Iterator_Index = thrust::counting_iterator<gdf_index_type>>
auto make_iterator (const ElementType *data, const bit_mask::bit_mask_t *valid=nullptr, ResultType identity=ResultType{0}, Iterator_Index const it=Iterator_Index(0))
 Constructs an iterator over the elements of a column. More...
 
template<bool has_nulls, typename ElementType , typename ResultType = ElementType, typename Iterator_Index = thrust::counting_iterator<gdf_index_type>>
auto make_iterator (const ElementType *data, const gdf_valid_type *valid=nullptr, ResultType identity=ResultType{0}, Iterator_Index const it=Iterator_Index(0))
 
template<bool has_nulls, typename ElementType , typename ResultType = ElementType, typename Iterator_Index = thrust::counting_iterator<gdf_index_type>>
auto make_iterator (const gdf_column &column, ResultType identity=ResultType{0}, Iterator_Index const it=Iterator_Index(0))
 
template<bool has_nulls, typename ElementType , typename ResultType = ElementType, typename Iterator_Index = thrust::counting_iterator<gdf_index_type>>
auto make_pair_iterator (const ElementType *data, const bit_mask::bit_mask_t *valid=nullptr, ResultType identity=ResultType{0}, Iterator_Index const it=Iterator_Index(0))
 Constructs an iterator over the elements of a column Input iterator which can be used for cub and thrust. More...
 
template<bool has_nulls, typename ElementType , typename ResultType = ElementType, typename Iterator_Index = thrust::counting_iterator<gdf_index_type>>
auto make_pair_iterator (const ElementType *data, const gdf_valid_type *valid=nullptr, ResultType identity=ResultType{0}, Iterator_Index const it=Iterator_Index(0))
 
template<bool has_nulls, typename ElementType , typename ResultType = ElementType, typename Iterator_Index = thrust::counting_iterator<gdf_index_type>>
auto make_pair_iterator (const gdf_column &column, ResultType identity=ResultType{0}, Iterator_Index const it=Iterator_Index(0))
 
gdf_scalar reduction (const gdf_column *col, gdf_reduction_op op, gdf_dtype output_dtype)
 Computes the reduction of the values in all rows of a column This function does not detect overflows in reductions. Using a higher precision dtype may prevent overflow. Only min and max ops are supported for reduction of non-arithmetic types (date32, timestamp, category...). The null values are skipped for the operation. If the column is empty, the member is_valid of the output gdf_scalar will contain false. More...
 
void scan (const gdf_column *input, gdf_column *output, gdf_scan_op op, bool inclusive)
 Computes the scan (a.k.a. prefix sum) of a column. The null values are skipped for the operation, and if an input element at i is null, then the output element at i will also be null. More...
 
gdf_column find_and_replace_all (const gdf_column &input_col, const gdf_column &values_to_replace, const gdf_column &replacement_values)
 Replace elements from input_col according to the mapping values_to_replace to replacement_values, that is, replace all values_to_replace[i] present in input_col with replacement_values[i]. More...
 
gdf_column replace_nulls (const gdf_column &input, const gdf_column &replacement)
 Replaces all null values in a column with corresponding values of another column. More...
 
gdf_column replace_nulls (const gdf_column &input, const gdf_scalar &replacement)
 Replaces all null values in a column with a scalar. More...
 
gdf_columnrolling_window (const gdf_column &input_col, gdf_size_type window, gdf_size_type min_periods, gdf_size_type forward_window, gdf_agg_op agg_type, const gdf_size_type *window_col, const gdf_size_type *min_periods_col, const gdf_size_type *forward_window_col)
 
gdf_column apply_boolean_mask (gdf_column const &input, gdf_column const &boolean_mask)
 Filters a column using a column of boolean values as a mask. More...
 
gdf_column drop_nulls (gdf_column const &input)
 Filters a column to remove null elements. More...
 
cudf::table drop_duplicates (const cudf::table &input_table, const cudf::table &key_columns, const duplicate_keep_option keep, const bool nulls_are_equal=true)
 Create a new table without duplicate rows. More...
 
std::vector< gdf_dtype > column_dtypes (cudf::table const &table)
 Returns vector of the dtypes of the columns in a table. More...
 
bool has_nulls (cudf::table const &table)
 Indicates if a table contains any null values. More...
 
void validate (const gdf_column &column)
 Ensures a gdf_column is valid, i.e. that its fields are consistent with each other, and logical in themselves, in representing a proper column. More...
 
void validate (const gdf_column *column_ptr)
 
bool have_same_type (const gdf_column &validated_column_1, const gdf_column &validated_column_2, bool ignore_extra_type_info=false) noexcept
 Ensures two (valid!) columns have the same type. More...
 
bool have_same_type (const gdf_column *validated_column_ptr_1, const gdf_column *validated_column_ptr_2, bool ignore_extra_type_info) noexcept
 
template<typename T >
T * get_data (const gdf_column &column) noexcept
 
template<typename T >
T * get_data (const gdf_column *column) noexcept
 
constexpr bool is_an_integer (gdf_dtype element_type) noexcept
 
constexpr bool is_integral (const gdf_column &column) noexcept
 
constexpr bool is_integral (const gdf_column *column) noexcept
 
constexpr bool is_nullable (const gdf_column &column) noexcept
 
constexpr bool has_nulls (const gdf_column &column) noexcept
 
constexpr std::size_t size_of (gdf_dtype element_type)
 Returns the size in bytes of values of a column element type.
 
std::size_t byte_width (const gdf_column &col) noexcept
 Returns the size in bytes of each element of a column (a.k.a. the column's width)
 
template<typename T , typename BinaryOp >
__forceinline__ __device__ T genericAtomicOperation (T *address, T const &update_value, BinaryOp op)
 compute atomic binary operation reads the old located at the address in global or shared memory, computes 'BinaryOp'('old', 'update_value'), and stores the result back to memory at the same address. These three operations are performed in one atomic transaction. More...
 
template<typename BinaryOp >
__forceinline__ __device__ cudf::bool8 genericAtomicOperation (cudf::bool8 *address, cudf::bool8 const &update_value, BinaryOp op)
 
template<class functor_t , typename... Ts>
CUDA_HOST_DEVICE_CALLABLEdecltype(auto) constexpr type_dispatcher (gdf_dtype dtype, functor_t f, Ts &&... args)
 
template<typename T >
constexpr gdf_dtype gdf_dtype_of ()
 Maps a C++ type to it's corresponding gdf_dtype. More...
 
template<>
constexpr gdf_dtype gdf_dtype_of< int8_t > ()
 
template<>
constexpr gdf_dtype gdf_dtype_of< int16_t > ()
 
template<>
constexpr gdf_dtype gdf_dtype_of< int32_t > ()
 
template<>
constexpr gdf_dtype gdf_dtype_of< int64_t > ()
 
template<>
constexpr gdf_dtype gdf_dtype_of< float > ()
 
template<>
constexpr gdf_dtype gdf_dtype_of< double > ()
 
template<>
constexpr gdf_dtype gdf_dtype_of< cudf::bool8 > ()
 
template<>
constexpr gdf_dtype gdf_dtype_of< cudf::date32 > ()
 
template<>
constexpr gdf_dtype gdf_dtype_of< cudf::date64 > ()
 
template<>
constexpr gdf_dtype gdf_dtype_of< cudf::timestamp > ()
 
template<>
constexpr gdf_dtype gdf_dtype_of< cudf::category > ()
 
template<>
constexpr gdf_dtype gdf_dtype_of< cudf::nvstring_category > ()
 
template<>
constexpr gdf_dtype gdf_dtype_of< NVStrings > ()
 

Detailed Description

Invokes an instance of a functor template with the appropriate type determined by a gdf_dtype enum value.

This helper function accepts any object with an "operator()" template, e.g., a functor. It will invoke an instance of the template by passing in as the template argument an appropriate type determined by the value of the gdf_dtype argument.

The template may have 1 or more template parameters, but the first parameter must be the type dispatched from the gdf_dtype enum. The remaining template parameters must be able to be automatically deduced.

There is a 1-to-1 mapping of gdf_dtype enum values and dispatched types. However, different gdf_dtype values may have the same underlying type. Therefore, in order to provide the 1-to-1 mapping, a wrapper struct may be dispatched for certain gdf_dtype enum values in order to emulate a "strong typedef".

A strong typedef provides a new, concrete type unlike a normal C++ typedef which is simply a type alias. These "strong typedef" structs simply wrap a single member variable of a fundamental type called 'value'.

The standard arithmetic operators are defined for the wrapper structs and therefore the wrapper struct types can be used as if they were fundamental types.

See wrapper_types.hpp for more detail.

Example usage with a functor that returns the size of the dispatched type:

struct example_functor{ template <typename t>=""> int operator()(){ return sizeof(T); } };

cudf::type_dispatcher(GDF_INT8, example_functor); // returns 1 cudf::type_dispatcher(GDF_INT64, example_functor); // returns 8

Example usage of a functor for checking if element "i" in column "lhs" is equal to element "j" in column "rhs":

struct elements_are_equal{ template <typename columntype>=""> bool operator()(void const * lhs, int i, void const * rhs, int j) { // Cast the void* data buffer to the dispatched type and retrieve elements // "i" and "j" from the respective columns ColumnType const i_elem = static_cast<ColumnType const*>(lhs)[i]; ColumnType const j_elem = static_cast<ColumnType const*>(rhs)[j];

// operator== is defined for wrapper structs such that it performs the // operator== on the underlying values. Therefore, the wrapper structs // can be used as if they were fundamental arithmetic types return i_elem == j_elem; } };

The return type for all template instantiations of the functor's "operator()" lambda must be the same, else there will be a compiler error as you would be trying to return different types from the same function.

NOTE: It is undefined behavior if an unsupported or invalid gdf_dtype is supplied.

Parameters
dtypeThe gdf_dtype enum that determines which type will be dispatched
fThe functor with a templated "operator()" that will be invoked with the dispatched type
argsA parameter-pack (i.e., arbitrary number of arguments) that will be perfectly-forwarded as the arguments of the functor's "operator()".
Returns
Whatever is returned by the functor's "operator()".

Enumeration Type Documentation

◆ duplicate_keep_option

Choices for drop_duplicates API for retainment of duplicate rows.

Enumerator
KEEP_FIRST 

Keeps first duplicate row and unique rows.

KEEP_LAST 

Keeps last duplicate row and unique rows.

KEEP_NONE 

Don't keep any duplicate rows, Keeps only unique rows.

Function Documentation

◆ allocate_like() [1/2]

gdf_column cudf::allocate_like ( gdf_column const &  input,
bool  allocate_mask_if_exists = true,
cudaStream_t  stream = 0 
)

Allocates a new column of the same size and type as the input.

Parameters
inputThe input column to emulate
allocate_mask_if_existsOptional whether or not to allocate bitmask if it exists in input
streamOptional stream in which to perform copies
Returns
gdf_column An allocated column of same size and type of input

◆ allocate_like() [2/2]

table cudf::allocate_like ( table const &  t,
bool  allocate_mask_if_exists = true,
cudaStream_t  stream = 0 
)

Creates a table of columns with the same type and allocation size as the input.

Creates the gdf_column objects, and allocates underlying device memory for each column matching the input columns

Note
It is the caller's responsibility to free each column's device memory allocation in addition to deleting the gdf_column object for every column in the new table.
Parameters
tThe table to emulate
allocate_mask_if_existsOptional whether or not to allocate the bitmask for each column if it exists in the corresponding input column
streamOptional stream in which to perform allocations
Returns
table A table of columns with same type and allocation size as input

◆ apply_boolean_mask()

gdf_column cudf::apply_boolean_mask ( gdf_column const &  input,
gdf_column const &  boolean_mask 
)

Filters a column using a column of boolean values as a mask.

Given an input column and a mask column, an element i from the input column is copied to the output if the corresponding element i in the mask is non-null and true. This operation is stable: the input order is preserved.

The input and mask columns must be of equal size.

The output column has size equal to the number of elements in boolean_mask that are both non-null and true. Note that the output column memory is allocated by this function but must be freed by the caller when finished.

Note
that the boolean_mask may have just boolean data (no valid bitmask), or just a valid bitmask (no boolean data), or it may have both. The filter adapts to these three situations.
if input.size is zero, there is no error, and an empty column is returned.
Parameters
[in]inputThe input column to filter
[in]boolean_maskA column of type GDF_BOOL8 used as a mask to filter the input column corresponding index passes the filter.
Returns
gdf_column Column containing copy of all elements of input passing the filter defined by boolean_mask.

◆ binary_operation() [1/4]

void cudf::binary_operation ( gdf_column out,
gdf_scalar lhs,
gdf_column rhs,
gdf_binary_operator  ope 
)

Performs a binary operation between a gdf_scalar and a gdf_column.

The desired output type must be specified in out->dtype.

If the valid field in the gdf_column output is not nullptr, then it will be filled with the bitwise AND of the valid mask of rhs gdf_column and is_valid bool of lhs gdf_scalar

Parameters
out(gdf_column) Output of the operation.
lhs(gdf_scalar) First operand of the operation.
rhs(gdf_column) Second operand of the operation.
ope(enum) The binary operator to use

◆ binary_operation() [2/4]

void cudf::binary_operation ( gdf_column out,
gdf_column lhs,
gdf_scalar rhs,
gdf_binary_operator  ope 
)

Performs a binary operation between a gdf_column and a gdf_scalar.

The desired output type must be specified in out->dtype.

If the valid field in the gdf_column output is not nullptr, then it will be filled with the bitwise AND of the valid mask of lhs gdf_column and is_valid bool of rhs gdf_scalar

Parameters
out(gdf_column) Output of the operation.
lhs(gdf_column) First operand of the operation.
rhs(gdf_scalar) Second operand of the operation.
ope(enum) The binary operator to use

◆ binary_operation() [3/4]

void cudf::binary_operation ( gdf_column out,
gdf_column lhs,
gdf_column rhs,
gdf_binary_operator  ope 
)

Performs a binary operation between two gdf_columns.

The desired output type must be specified in out->dtype.

If the valid field in the gdf_column output is not nullptr, then it will be filled with the bitwise AND of the valid masks of lhs and rhs gdf_columns

Parameters
out(gdf_column) Output of the operation.
lhs(gdf_column) First operand of the operation.
rhs(gdf_column) Second operand of the operation.
ope(enum) The binary operator to use

◆ binary_operation() [4/4]

void cudf::binary_operation ( gdf_column out,
gdf_column lhs,
gdf_column rhs,
const std::string &  ptx 
)

Performs a binary operation between two gdf_columns using a user-defined PTX function.

Accepts a user-defined PTX function to apply between the lhs and rhs.

The desired output type must be specified in out->dtype.

If the valid field in the gdf_column output is not nullptr, then it will be filled with the bitwise AND of the valid masks of lhs and rhs gdf_columns

Parameters
out(gdf_column) Output of the operation.
lhs(gdf_column) First operand of the operation.
rhs(gdf_column) Second operand of the operation.
ptxString containing the PTX of a binary function to apply between lhs and rhs

◆ column_dtypes()

std::vector< gdf_dtype > cudf::column_dtypes ( cudf::table const &  table)

Returns vector of the dtypes of the columns in a table.

------------------------------------------------------------------------—*

Parameters
tableThe table to get the column dtypes from

std::vector<gdf_dtype>

◆ copy() [1/2]

gdf_column cudf::copy ( gdf_column const &  input,
cudaStream_t  stream = 0 
)

Creates a new column that is a copy of input.

Parameters
inputThe input column to copy
streamOptional stream in which to perform copies
Returns
gdf_column A copy of input

◆ copy() [2/2]

table cudf::copy ( table const &  t,
cudaStream_t  stream = 0 
)

Creates a table of columns and deep copies the data from an input table.

Note
It is the caller's responsibility to free each column's device memory allocation in addition to deleting the gdf_column object for every column in the new table.
Parameters
tThe table to copy
streamOptional stream in which to perform allocations and copies
Returns
table A table that is an exact copy of t

◆ copy_range()

void cudf::copy_range ( gdf_column out_column,
gdf_column const &  in_column,
gdf_index_type  out_begin,
gdf_index_type  out_end,
gdf_index_type  in_begin 
)

Copies a range of elements from one column to another.

Copies N elements of in_column starting at in_begin to the N elements of out_column starting at out_begin, where N = (out_end - out_begin)

The datatypes of in_column and out_column must be the same.

If the input and output columns are the same and ranges overlap, the behavior is undefined.

Parameters
[out]out_columnThe preallocated column to copy into
[in]in_columnThe column to copy from
[in]out_beginThe starting index of the output range
[in]out_endThe index one past the end of the output range
[in]in_beginThe starting index of the input range
Returns
void

◆ drop_duplicates()

cudf::table cudf::drop_duplicates ( const cudf::table input_table,
const cudf::table key_columns,
const duplicate_keep_option  keep,
const bool  nulls_are_equal = true 
)

Create a new table without duplicate rows.

Given an input table, each row is copied to output table if the corresponding row of key column table is unique, where the definition of unique depends on the value of keep:

  • KEEP_FIRST: only the first of a sequence of duplicate rows is copied
  • KEEP_LAST: only the last of a sequence of duplicate rows is copied
  • KEEP_NONE: no duplicate rows are copied

The input table and key columns table should have same number of rows. Note that the memory for the output table columns is allocated by this function, so it must be freed by the caller when finished.

Parameters
[in]input_tableinput table to copy only unique rows
[in]key_columnscolumns to consider to identify duplicate rows
[in]keepkeep first entry, last entry, or no entries if duplicates found
[in]nulls_are_equalflag to denote nulls are equal if true, nulls are not equal if false
Returns
out_table with only unique rows

◆ drop_nulls()

gdf_column cudf::drop_nulls ( gdf_column const &  input)

Filters a column to remove null elements.

Given an input column an element i from the input column is copied to the output if the corresponding element i in the input's valid bitmask is non-null.

The output column has size equal to the number of elements in boolean_mask that are both non-null and true. Note that the output column memory is allocated by this function but must be freed by the caller when finished.

If the input column is not nullable, this function just copies the input to the output.

  • Note
    if input.size is zero, there is no error, and an empty column is returned.
    Parameters
    [in]inputThe input column to filter
    Returns
    gdf_column Column containing copy of all non-null elements of input.

◆ empty_like()

table cudf::empty_like ( table const &  t)

Creates a table of empty columns with the same types as the inputs.

Creates the gdf_column objects, but does not allocate any underlying device memory for the column's data or bitmask.

Note
It is the caller's responsibility to delete the gdf_column object for every column in the new table.
Parameters
tThe input table to emulate
Returns
table A table of empty columns of same type as input

◆ fill()

void cudf::fill ( gdf_column column,
gdf_scalar const &  value,
gdf_index_type  begin,
gdf_index_type  end 
)

Fills a range of elements in a column with a scalar value.

Fills N elements of column starting at begin with value, where N = (end - begin)

The datatypes of column and value must be the same.

Parameters
[out]columnThe preallocated column to fill into
[in]valueThe scalar value to fill
[in]beginThe starting index of the fill range
[in]endThe index one past the end of the fill range
Returns
void

◆ find_and_replace_all()

gdf_column cudf::find_and_replace_all ( const gdf_column input_col,
const gdf_column values_to_replace,
const gdf_column replacement_values 
)

Replace elements from input_col according to the mapping values_to_replace to replacement_values, that is, replace all values_to_replace[i] present in input_col with replacement_values[i].

Replace elements from input_col according to the mapping old_values to new_values, that is, replace all old_values[i] present in col with new_values[i] and return a new gdf_column output.

Parameters
[in]colgdf_column with the data to be modified
[in]values_to_replacegdf_column with the old values to be replaced
[in]replacement_valuesgdf_column with the new values
Returns
output gdf_column with the modified data
Parameters
[in]colgdf_column with the data to be modified
[in]values_to_replacegdf_column with the old values to be replaced
[in]replacement_valuesgdf_column with the new replacement values
Returns
output gdf_column with the modified data

◆ gather()

void cudf::gather ( table const *  source_table,
gdf_index_type const  gather_map[],
table destination_table 
)

Gathers the rows (including null values) of a set of source columns into a set of destination columns.

The two sets of columns must have equal numbers of columns.

Gathers the rows of the source columns into the destination columns according to a gather map such that row "i" in the destination columns will contain row "gather_map[i]" from the source columns.

The datatypes between coresponding columns in the source and destination columns must be the same.

The number of elements in the gather_map must equal the number of rows in the destination columns.

If any index in the gather_map is outside the range [0, num rows in source_columns), the result is undefined.

If the same index appears more than once in gather_map, the result is undefined.

Parameters
[in]source_tableThe input columns whose rows will be gathered
[in]gather_mapAn array of indices that maps the rows in the source columns to rows in the destination columns.
[out]destination_tableA preallocated set of columns with a number of rows equal in size to the number of elements in the gather_map that will contain the rearrangement of the source columns based on the mapping. Can be the same as source_table (in-place gather).
Returns
GDF_SUCCESS upon successful completion

◆ gdf_dtype_of()

template<typename T >
constexpr gdf_dtype cudf::gdf_dtype_of ( )
inline

Maps a C++ type to it's corresponding gdf_dtype.

------------------------------------------------------------------------—* When explicitly passed a template argument of a given type, returns the appropriate gdf_dtype for the specified C++ type.

For example:

return gdf_dtype_of<int32_t>(); // Returns GDF_INT32
return gdf_dtype_of<cudf::category>(); // Returns GDF_CATEGORY

T The type to map to a gdf_dtype

◆ genericAtomicOperation()

template<typename T , typename BinaryOp >
__forceinline__ __device__ T cudf::genericAtomicOperation ( T *  address,
T const &  update_value,
BinaryOp  op 
)

compute atomic binary operation reads the old located at the address in global or shared memory, computes 'BinaryOp'('old', 'update_value'), and stores the result back to memory at the same address. These three operations are performed in one atomic transaction.

----------------------------------------------------------------------—* The supported cudf types for genericAtomicOperation are: int8_t, int16_t, int32_t, int64_t, float, double, cudf::date32, cudf::date64, cudf::timestamp, cudf::category, cudf::nvstring_category, cudf::bool8

Parameters
[in]addressThe address of old value in global or shared memory
[in]valThe value to be computed
[in]opThe binary operator used for compute

The old value at address

◆ has_nulls()

bool cudf::has_nulls ( cudf::table const &  table)

Indicates if a table contains any null values.

------------------------------------------------------------------------—*

Parameters
tableThe table to check for null values
Returns
true If the table contains one or more null values

false If the table contains zero null values

◆ have_same_type()

bool cudf::have_same_type ( const gdf_column validated_column_1,
const gdf_column validated_column_2,
bool  ignore_extra_type_info = false 
)
noexcept

Ensures two (valid!) columns have the same type.

Parameters
validated_column_1A column which would pass validate() .
validated_column_2A column which would pass validate() .
ignore_extra_type_infoFor some column element types, a column carries some qualifying information which applies to all elements (and thus not repeated for each one). Generally, this information should not be ignored, so that for two columns to have the same type, they must also share it. However, for potential practical reasons (with this being a utility rather than an API function), we allow the extra information to be ignored by setting this parameter to true.

◆ inferCompressionType()

std::string cudf::inferCompressionType ( const std::string &  compression_arg,
gdf_input_type  source_type,
const std::string &  source,
const std::map< std::string, std::string > &  ext_to_compression 
)

Infer the compression type from the compression parameter and the input data.

Infer the compression type from the compression parameter and the input file extension.

------------------------------------------------------------------------—* Returns "none" if the input is not compressed. Throws if the input is not not valid.

Parameters
[in]compression_argInput string that is potentially describing the compression type. Can also be "none" or "infer".
[in]source_typeEnum describing the type of the data source
[in]sourceIf source_type is FILE_PATH, contains the filepath. If source_type is HOST_BUFFER, contains the input JSON data.

string representing the compression type.

------------------------------------------------------------------------—* Returns "none" if the input is not compressed. Throws if the input is not valid.

Parameters
[in]compression_argInput string that is potentially describing the compression type. Can also be "none" or "infer".
[in]source_typeEnum describing the type of the data source.
[in]sourceIf source_type is FILE_PATH, contains the filepath. If source_type is HOST_BUFFER, contains the input data.
[in]ext_to_compressionMap between file extensions and compression types.

string representing the compression type.

◆ make_iterator()

template<bool has_nulls, typename ElementType , typename ResultType = ElementType, typename Iterator_Index = thrust::counting_iterator<gdf_index_type>>
auto cudf::make_iterator ( const ElementType *  data,
const bit_mask::bit_mask_t *  valid = nullptr,
ResultType  identity = ResultType{0},
Iterator_Index const  it = Iterator_Index(0) 
)

Constructs an iterator over the elements of a column.

----------------------------------------------------------------------—* If the column contains no null values (indicated by has_nulls == false) then dereferencing an iterator it returned by this function as *(it + n) will return ResultType{ static_cast<ElementType*>(data)[n] }.

If the column contains null values (indicated by has_nulls == true) then the result of de-referencing an iterator it returned by this function as *(it+n) will depend if element is valid or null. If the element is valid, it will return ResultType{ static_cast<ElementType*>(data)[n] }. If the element is null, it will return ResultType{identity}.

Template Parameters
has_nullsIndicates if the column contains null values (null_count > 0)
ElementTypeThe cudf data type of input array
ResultTypecudf data type of output and identity value which is used when null bitmaps flag is false.
Iterator_IndexThe base iterator which gives the index of array. The default is thrust::counting_iterator
Parameters
[in]dataThe pointer of column data array
[in]validThe pointer of null bitmask of column
[in]identityThe identity value used when the mask value is false

it The index iterator, thrust::counting_iterator by default

◆ make_pair_iterator()

template<bool has_nulls, typename ElementType , typename ResultType = ElementType, typename Iterator_Index = thrust::counting_iterator<gdf_index_type>>
auto cudf::make_pair_iterator ( const ElementType *  data,
const bit_mask::bit_mask_t *  valid = nullptr,
ResultType  identity = ResultType{0},
Iterator_Index const  it = Iterator_Index(0) 
)

Constructs an iterator over the elements of a column Input iterator which can be used for cub and thrust.

----------------------------------------------------------------------—* The iterator returns thrust::pair<ResultType, bool> This is useful for more complex logic that depends on the validity. e.g. group_by.count, mean_var, sort algorism.

Template Parameters
has_nullsTrue if the data has valid bit mask, False else
ElementTypeThe cudf data type of input array
ResultTypecudf data type of output and identity value which is used when null bitmaps flag is false.
Iterator_IndexThe base iterator which gives the index of array. The default is thrust::counting_iterator
Parameters
[in]dataThe pointer of column data array
[in]validThe pointer of null bitmask of column
[in]identityThe identity value used when the mask value is false

it The index iterator, thrust::counting_iterator by default

◆ point_in_polygon()

gdf_column cudf::point_in_polygon ( gdf_column const &  polygon_latitudes,
gdf_column const &  polygon_longitudes,
gdf_column const &  query_point_latitudes,
gdf_column const &  query_point_longitudes 
)

Determine whether or not coordinates (query points) are completely inside a static polygon.

Note: The polygon must not have holes or intersect with itself, but it is not required to be convex.

The polygon is defined by a set of coordinates (latitudes and longitudes), where the first and last coordinates must have the same value (closed).

This function supports clockwise and counter-clockwise polygons.

If a query point is colinear with two contiguous polygon coordinates then this query point isn't inside.

polygon_latitudes and polygon_longitudes must have equal size.

point_latitudes and point_longitudes must have equal size.

All input params must have equal datatypes (for numeric operations).

Parameters
[in]polygon_latitudescolumn with latitudes of a polygon
[in]polygon_longitudescolumn with longitudes of a polygon
[in]query_point_latitudescolumn with latitudes of query points
[in]query_point_longitudescolumn with longitudes of query points
Returns
gdf_column of type GDF_BOOL8 indicating whether the i-th query point is inside (true) or not (false)

◆ reduction()

gdf_scalar cudf::reduction ( const gdf_column col,
gdf_reduction_op  op,
gdf_dtype  output_dtype 
)

Computes the reduction of the values in all rows of a column This function does not detect overflows in reductions. Using a higher precision dtype may prevent overflow. Only min and max ops are supported for reduction of non-arithmetic types (date32, timestamp, category...). The null values are skipped for the operation. If the column is empty, the member is_valid of the output gdf_scalar will contain false.

-----------------------------------------------------------------------—*

Parameters
[in]colInput column
[in]opThe operator applied by the reduction
[in]dtypeThe computation and output precision. dtype must be a data type that is convertible from the input dtype. If the input column has arithmetic type, any arithmetic type can be specified. If the input column has non-arithmetic type (date32, timestamp, category...), the same type must be specified.
Returns
gdf_scalar the result value If the reduction fails, the member is_valid of the output gdf_scalar

will contain false.

◆ replace_nulls() [1/2]

gdf_column cudf::replace_nulls ( const gdf_column input,
const gdf_column replacement 
)

Replaces all null values in a column with corresponding values of another column.

Returns a column output such that if input[i] is valid, its value will be copied to output[i]. Otherwise, replacements[i] will be copied to output[i].

The input and replacement columns must be of same size and have the same data type.

Parameters
[in]inputA gdf_column containing null values
[in]replacementA gdf_column whose values will replace null values in input
Returns
gdf_column Column with nulls replaced

◆ replace_nulls() [2/2]

gdf_column cudf::replace_nulls ( const gdf_column input,
const gdf_scalar replacement 
)

Replaces all null values in a column with a scalar.

Returns a column output such that if input[i] is valid, its value will be copied to output[i]. Otherise, replacement will be coped to output[i].

replacement must have the same data type as input.

Parameters
[in]inputA gdf_column containing null values
[in]replacementA gdf_scalar whose value will replace null values in input
Returns
gdf_column Column with nulls replaced

◆ row_bitmask()

rmm::device_vector< bit_mask::bit_mask_t > cudf::row_bitmask ( cudf::table const &  table,
cudaStream_t  stream = 0 
)

Computes a bitmask indicating the presence of NULL values in rows of a table.

------------------------------------------------------------------------—* If a row i in table contains one or more NULL values, then bit i in the returned bitmask will be 0.

Otherwise, bit i will be 1.

Parameters
tableThe table to compute the row bitmask of.
Returns
bit_mask::bit_mask_t* The bitmask indicating the presence of NULLs in

a row

◆ scan()

void cudf::scan ( const gdf_column input,
gdf_column output,
gdf_scan_op  op,
bool  inclusive 
)

Computes the scan (a.k.a. prefix sum) of a column. The null values are skipped for the operation, and if an input element at i is null, then the output element at i will also be null.

-----------------------------------------------------------------------—*

Parameters
[in]inputThe input column for the san
[out]outputThe pre-allocated output column
[in]opThe operation of the scan
[in]inclusiveThe flag for applying an inclusive scan if true,

an exclusive scan if false.

◆ scatter()

void cudf::scatter ( table const *  source_table,
gdf_index_type const  scatter_map[],
table destination_table 
)

Scatters the rows (including null values) of a set of source columns into a set of destination columns.

The two sets of columns must have equal numbers of columns.

Scatters the rows of the source columns into the destination columns according to a scatter map such that row "i" from the source columns will be scattered to row "scatter_map[i]" in the destination columns.

The datatypes between coresponding columns in the source and destination columns must be the same.

The number of elements in the scatter_map must equal the number of rows in the source columns.

If any index in scatter_map is outside the range of [0, num rows in destination_columns), the result is undefined.

If the same index appears more than once in scatter_map, the result is undefined.

[in] source_table The columns whose rows will be scattered [in] scatter_map An array that maps rows in the input columns to rows in the output columns. [out] destination_table A preallocated set of columns with a number of rows equal in size to the maximum index contained in scatter_map

Returns
GDF_SUCCESS upon successful completion

◆ slice()

std::vector< gdf_column * > cudf::slice ( gdf_column const &  input_column,
gdf_index_type const *  indices,
gdf_size_type  num_indices 
)

Slices a column (including null values) into a set of columns according to a set of indices.

The "slice" function divides part of the input column into multiple intervals of rows using the indices values and it stores the intervals into the output columns. Regarding the interval of indices, a pair of values are taken from the indices array in a consecutive manner. The pair of indices are left-closed and right-open.

The pairs of indices in the array are required to comply with the following conditions: a, b belongs to Range[0, input column size] a <= b, where the position of a is less or equal to the position of b.

Exceptional cases for the indices array are: When the values in the pair are equal, the function returns an empty column. When the values in the pair are 'strictly decreasing', the outcome is undefined. When any of the values in the pair don't belong to the range[0, input column size), the outcome is undefined. When the indices array is empty, an empty vector of columns is returned.

The output columns will be allocated by the function.

Example: input: {10, 12, 14, 16, 18, 20, 22, 24, 26, 28} indices: {1, 3, 5, 9, 2, 4, 8, 8} output: {{12, 14}, {20, 22, 24, 26}, {14, 16}, {}}

Parameters
[in]input_columnThe input column whose rows will be sliced.
[in]indicesAn device array of indices that are used to take 'slices' of the input column.
Returns
A std::vector of gdf_column*, each of which may have a different number of rows. a different number of rows that are equal to the difference of two consecutive indices in the indices array.

◆ split()

std::vector< gdf_column * > cudf::split ( gdf_column const &  input_column,
gdf_index_type const *  indices,
gdf_size_type  num_indices 
)

Splits a column (including null values) into a set of columns according to a set of indices.

The "split" function divides the input column into multiple intervals of rows using the indices values and it stores the intervals into the output columns. Regarding the interval of indices, a pair of values are taken from the indices array in a consecutive manner. The pair of indices are left-closed and right-open.

The indices array ('indices') is require to be a monotonic non-decreasing set. The indices in the array are required to comply with the following conditions: a, b belongs to Range[0, input column size] a <= b, where the position of a is less or equal to the position of b.

The split function will take a pair of indices from the indices array ('indices') in a consecutive manner. For the first pair, the function will take the value 0 and the first element of the indices array. For the last pair, the function will take the last element of the indices array and the size of the input column.

Exceptional cases for the indices array are: When the values in the pair are equal, the function return an empty column. When the values in the pair are 'strictly decreasing', the outcome is undefined. When any of the values in the pair don't belong to the range[0, input column size), the outcome is undefined. When the indices array is empty, an empty vector of columns is returned.

It is required that the output columns will be preallocated. The size of each of the columns can be of different value. The number of columns must be equal to the number of indices in the array plus one. The datatypes of the input column and the output columns must be the same.

Example: input: {10, 12, 14, 16, 18, 20, 22, 24, 26, 28} indices: {2, 5, 9} output: {{10, 12}, {14, 16, 18}, {20, 22, 24, 26}, {28}}

Parameters
[in]input_columnThe input column whose rows will be split.
[in]indicesAn device array of indices that are used to divide the input column into multiple columns.
Returns
A std::vector of gdf_column*, each of which may have a different size a different number of rows.

◆ validate()

void cudf::validate ( const gdf_column column)

Ensures a gdf_column is valid, i.e. that its fields are consistent with each other, and logical in themselves, in representing a proper column.

Ensure the input is in a valid state representing a proper column. Specifically, ensures all fields have valid (rather than junk, uninitialized or declared-invalid values), and that they are consistent with each other.