Column functions#
- legate_dataframe.lib.unaryop.cast(LogicalColumn col, dtype) LogicalColumn #
Cast a logical column to the desired data type.
- Parameters:
col – Logical column as input
dtype – The data type of the result.
- Return type:
Logical column of same size as col but with new data type.
- legate_dataframe.lib.unaryop.round(LogicalColumn col, int32_t digits, mode='half_to_even') LogicalColumn #
Cast a logical column to the desired data type.
- Parameters:
col – Logical column as input
decimals – Number of decimals to round to.
mode – Rounding mode, currently either “half_to_even” or “half_away_from_zero” are supported.
- Return type:
Logical column of same size as col but with new data type.
- legate_dataframe.lib.unaryop.unary_operation(LogicalColumn col, str op: str) LogicalColumn #
Performs unary operation on all values in column
Note: For decimal32 and decimal64, only abs, ceil and floor are supported.
- Parameters:
col – Logical column as input
op – Operation to perform, see arrow compute functions.
- Return type:
Logical column of same size as col containing result of the operation.
- legate_dataframe.lib.binaryop.binary_operation(lhs: LogicalColumn | ScalarLike, rhs: LogicalColumn | ScalarLike, str op: str, output_type: DTypeLike) LogicalColumn #
Performs a binary operation between two columns or a column and a scalar.
The output contains the result of
op(lhs[i], rhs[i])
for all0 <= i < lhs.size()
wherelhs[i]
orrhs[i]
(but not both) can be replaced with a scalar value.Regardless of the operator, the validity of the output value is the logical AND of the validity of the two operands except for NullMin and NullMax (logical OR).
- Parameters:
lhs – The left operand
lhs – The right operand
op – String for arrow compute function e.g. “add”, “multiply”
output_type – The desired data type of the output column
- Returns:
Output column of output_type type containing the result of the binary
operation
- Raises:
ValueError – if lhs and rhs are both scalars
RuntimeError – if lhs and rhs are different sizes
RuntimeError – if output_type dtype isn’t boolean for comparison and logical operations.
RuntimeError – if output_type dtype isn’t fixed-width
RuntimeError – if the operation is not supported for the types of lhs and rhs
- legate_dataframe.lib.copying.copy_if_else(LogicalColumn cond, lhs: LogicalColumn | ScalarLike, rhs: LogicalColumn | ScalarLike) LogicalColumn #
Performs a ternary if/else operation along the columns.
The result will contain the values of lhs[i] if cond[i] else rhs[i]. Both
lhs
andrhs
may be scalar columns in which case they are broadcast against cond. lhs and rhs must have the same type.- Parameters:
cond – Boolean column deciding which column each result element is taken from.
lhs – The left operand
lhs – The right operand
- Return type:
Output column containing the result of the ternary if/else operation
- Raises:
ValueError – If lhs and rhs do not have the same type or cond is not boolean.
- legate_dataframe.lib.copying.concatenate(columns)#
Concetenate columns into a single long column.
Creates a new column concatenating all columns. Must have at least one column and all columns must have the same type.
- Parameters:
columns – Iterable of logical columns.
- Return type:
Output column with as many rows as all input columns combined.
- legate_dataframe.lib.timestamps.to_timestamps(LogicalColumn col, timestamp_type: DTypeLike, str format_pattern: str) LogicalColumn #
Converting a strings column into timestamps using the provided format pattern.
The format pattern can include the following specifiers: “%Y,%y,%m,%d,%H,%I,%p, %M,%S,%f,%z”.
Please see
to_timestamps()
for details.Warning
Invalid formats are not checked, the format pattern must be well defined as per the C++ API.
- Parameters:
col – Strings instance for this operation
timestamp_type – The timestamp type used for creating the output column
format_pattern – String specifying the timestamp format in strings
- Return type:
New datetime column
- Raises:
RuntimeError – if timestamp_type is not a timestamp type.:
- legate_dataframe.lib.timestamps.extract_timestamp_component(LogicalColumn col, str component: str) LogicalColumn #
Extract part of the timestamp as int16.
- Parameters:
col (LogicalColumn) – Column of timestamps
component – The component which to extract. A string like “year”, “month”, “day”, “millisecond” etc. See arrow documentation for “Temporal component extraction” for a full list.
- Return type:
New int64 column
Notes
Unlike pandas and cudf, this function counts the days of the week as Monday-Sunday being 1-7 and
microsecond_fraction
does not include milliseconds.
- legate_dataframe.lib.reduction.reduce(LogicalColumn col, str op, output_type, *, initial=None)#
Apply a reduction along a column.
- Parameters:
col – The column to reduce.
op – The operation to apply, must be one of the following: “sum”, “mean”, “min”, “max”, “product”, “count_valid”.
output_type – The result dtype, must be specified.
initial – Scalar column containing an initial value for the reduction.
- legate_dataframe.lib.replace.replace_nulls(LogicalColumn col, replacement: ScalarLike) LogicalColumn #
Return a new column with NULL entries replaced by value.
- Parameters:
lhs – Operand column
replacement – Value to replace NULLs with (currently limited to scalars).
- Return type:
Output column of output_type type without NULL entries.
- Raises:
ValueError – if the value is not of the correct scalar type.:
- legate_dataframe.lib.search.contains(LogicalColumn haystack, LogicalColumn needles) LogicalColumn #
Check if haystack contains the values in needles.
The result will contain boolean values indicating whether each element in the input column exists in the set of values. This is an elementwise
needles[i] in haystack
.- Parameters:
haystack – Column of values to search against. This column is currently broadcast to all workers and assumed to be small.
needles – Column of values to check if they exist in the haystack.
- Returns:
Boolean column indicating which values exist in the set, has the same
size and nullability as haystack.
- Raises:
ValueError – If the input columns have different types.
- legate_dataframe.lib.strings.match(str match_func, LogicalColumn column, str pattern) LogicalColumn #
Check if strings match a given pattern.
- Parameters:
match_func – The type of matching to perform: “starts_with”, “ends_with”, “match_substring”, or “match_substring_regex”. (Note that the “match_substring*” check for containment not full matches.)
column – The column of string values to check
pattern – The pattern string to check for. A regular expression for “match_substring_regex”.
- Return type:
A boolean column indicating which values match the pattern