Table functions#

legate_dataframe.lib.stream_compaction.apply_boolean_mask(LogicalTable tbl, LogicalColumn boolean_mask)#

Filter a table busing a boolean mask.

Select all rows from the table where the boolean mask column is true (non-null and not false). The operation is stable.

Parameters:

tbl – The table to filter.
boolean_mask – The boolean mask to apply.

Return type:

The LogicalTable containing only the rows where the boolean_mask was true.

legate_dataframe.lib.groupby_aggregation.groupby_aggregation(LogicalTable table, keys: Iterable[str], column_aggregations: Iterable[Tuple[str, AggregationKind, str]]) → LogicalTable#

Perform a groupby and aggregation in a single operation.

Warning

non-default cudf::aggregation arguments are ignored. The default constructor is used always. This also means that we only support aggregations that have a default constructor!

Parameters:

table – The table to group and aggregate.
keys – The names of the columns whose rows act as the groupby keys.
column_aggregations – A list of column aggregations to perform. Each column aggregation produces a column in the output table by performing an AggregationKind on a column in table. It consist of a tuple: (<input-column-name>, <aggregation-kind>, <output-column-name>). E.g. ("x", SUM, "sum-of-x")} will produce a column named “sum-of-x” in the output table, which, for each groupby key, has a row that contains the sum of the values in the column “x”. Multiple column aggregations can share the same input column but all output columns must be unique and not conflict with the name of the key columns.

Returns:

A new logical table that contains the key columns and the aggregated columns
using the output column names and order specified in column_aggregations.

legate_dataframe.lib.join.join(LogicalTable lhs, LogicalTable rhs, *, lhs_keys: Iterable[str], rhs_keys: Iterable[str], JoinType join_type, lhs_out_columns: Optional[Iterable[str]] = None, rhs_out_columns: Optional[Iterable[str]] = None, null_equality compare_nulls=null_equality.EQUAL, BroadcastInput broadcast=BroadcastInput.AUTO)#

Perform an join between the specified tables.

By default, the returned Table includes the columns from both lhs and rhs. In order to select the desired output columns, please use the lhs_out_columns and rhs_out_columns arguments. This can be useful to avoid duplicate key names and columns.

Parameters:

lhs – The left table
rhs – The right table
lhs_keys – The column names of the left table to join on
rhs_keys – The column names of the right table to join on
join_type – The JoinType such as INNER, LEFT, FULL
lhs_out_columns – Left table column names to include in the result. If None, all columns are included. All names in lhs_out_columns and rhs_out_columns must be unique.
rhs_out_columns – Right table column names to include in the result. If None, all columns are included. All names in lhs_out_columns and rhs_out_columns must be unique.
compare_nulls – Controls whether null join-key values should match or not
broadcast (BroadcastInput) – Can be RIGHT or LEFT to indicate that the array is “broadcast” to all workers (i.e. copied fully). This can be much faster, as it avoids more complex all-to-all communication. Defaults to AUTO which may do this based on the data size.

Returns:

The result of the join, which include the columns specified in lhs_out_columns
and rhs_out_columns (in that order).

Raises:

ValueError – If number of elements in lhs_keys or rhs_keys mismatch or if the column names of lhs_out_columns and rhs_out_columns are not unique.

legate_dataframe.lib.sort.sort(LogicalTable tbl, list keys, *, list column_order=None, list null_precedence=None, stable=False)#

Perform a sort of the table based on the given columns.

Parameters:

tbl – The table to sort
keys – The column names to sort by.
column_order – An Order.ASCENDING or Order.DESCENDING for each key denoting the final order for that column. Defaults to all ascending.
null_precedence – A NullOrder.BEFORE or NullOrder.AFTER for each key denoting if NULL values are considered considered smaller (before) or larger (after) any value. I.e. by default nulls are sorted “after” meaning they come last after an ascending sort and first after a descending sort.
stable – Whether to perform a stable sort (default False). Stable sort currently uses a less efficient merge and may not perform as well as it should.

Return type:

A new sorted table.

Table functions#

Related options/enums#