The names of columns in this DataFrame
A map of this DataFrame's Series names to their DataTypes
Return a new DataFrame with new columns added.
Casts each selected Series in this DataFrame to a new dtype (similar to static_cast in C++).
The map from column names to new dtypes.
OptionalmemoryResource: MemoryResourceThe optional MemoryResource used to allocate the result Series's device memory.
DataFrame of Series cast to the new dtype
import {DataFrame, Series, Int32, Float32} from '@rapidsai/cudf';
const df = new DataFrame({
a: Series.new({type: new Int32, data: [0, 1, 1, 2, 2, 2]}),
b: Series.new({type: new Int32, data: [0, 1, 2, 3, 4, 4]})
});
df.cast({a: new Float32}); // returns df with a as Float32Series and b as Int32Series
Casts all the Series in this DataFrame to a new dtype (similar to static_cast in C++).
The new dtype.
OptionalmemoryResource: MemoryResourceThe optional MemoryResource used to allocate the result Series's device memory.
DataFrame of Series cast to the new dtype const df = new DataFrame({ a: Series.new({type: new Int32, data: [0, 1, 1, 2, 2, 2]}), b: Series.new({type: new Int32, data: [0, 1, 2, 3, 4, 4]}) })
df.castAll(new Float32); // returns df with a and b as Float32Series
Compute the smallest integer value not less than arg for all NumericSeries in the DataFrame
OptionalmemoryResource: MemoryResourceA DataFrame with the operation performed on all NumericSeries
Concat DataFrame(s) to the end of the caller, returning a new DataFrame.
The DataFrame(s) to concat to the end of the caller.
import {DataFrame, Series} from '@rapidsai/cudf';
const df = new DataFrame({
a: Series.new([1, 2, 3, 4]),
b: Series.new([1, 2, 3, 4]),
});
const df2 = new DataFrame({
a: Series.new([5, 6, 7, 8]),
});
df.concat(df2);
// return {
// a: [1, 2, 3, 4, 5, 6, 7, 8],
// b: [1, 2, 3, 4, null, null, null, null],
// }
Drops duplicate rows from a DataFrame
Determines whether to keep the first, last, or none of the duplicate items.
Determines whether nulls are handled as equal values.
Determines whether null values are inserted before or after non-null values.
List of columns to consider when dropping rows (all columns are considered by default).
OptionalmemoryResource: MemoryResourceMemory resource used to allocate the result Column's device memory.
a DataFrame without duplicate rows
Drops rows (or columns) containing NaN, provided the columns are of type float
Whether to drop rows (axis=0, default) or columns (axis=1) containing NaN
drops every row (or column) containing less than thresh non-NaN values.
thresh=1 (default) drops rows (or columns) containing all NaN values (non-NaN < thresh(1)).
if axis = 0, thresh=df.numColumns: drops only rows containing at-least one NaN value (non-NaN values in a row < thresh(df.numColumns)).
if axis = 1, thresh=df.numRows: drops only columns containing at-least one NaN values (non-NaN values in a column < thresh(df.numRows)).
Optionalsubset: (string & keyof T)[] | Series<R>List of float columns to consider when dropping rows (all float columns are
considered by default).
Alternatively, when dropping columns, subset is a Series
DataFrame
import {DataFrame, Series} from '@rapidsai/cudf';
const df = new DataFrame({
a: Series.new([0, NaN, 2, NaN, 4, 4]),
b: Series.new([0, NaN, 2, 3, NaN, 4]),
c: Series.new([NaN, NaN, NaN, NaN, NaN, NaN])
});
// delete rows with all NaNs (default thresh=1)
df.dropNaNs(0);
// return {
// a: [0, 2, NaN, 4, 4], b: [0, 2, 3, NaN, 4],
// c: [NaN, NaN, NaN, NaN,NaN]
// }
// delete rows with atleast one NaN
df.dropNaNs(0, df.numColumns);
// returns empty df, since each row contains atleast one NaN
// delete columns with all NaNs (default thresh=1)
df.dropNaNs(1);
// returns {a: [0, NaN, 2, NaN, 4, 4], b: [0, NaN, 2, 3, NaN, 4]}
// delete columns with atleast one NaN
df.dropNaNs(1, df.numRows);
// returns empty df, since each column contains atleast one NaN
Drops rows (or columns) containing nulls (*Note: only null values are dropped and not NaNs)
Whether to drop rows (axis=0, default) or columns (axis=1) containing nulls
drops every row (or column) containing less than thresh non-null values.
thresh=1 (default) drops rows (or columns) containing all null values (non-null < thresh(1)).
if axis = 0, thresh=df.numColumns: drops only rows containing at-least one null value (non-null values in a row < thresh(df.numColumns)).
if axis = 1, thresh=df.numRows: drops only columns containing at-least one null values (non-null values in a column < thresh(df.numRows)).
Optionalsubset: Series<R> | (string & keyof T)[]List of columns to consider when dropping rows (all columns are considered by
default).
Alternatively, when dropping columns, subset is a Series
DataFrame
import {DataFrame, Series} from '@rapidsai/cudf';
const df = new DataFrame({
a: Series.new([0, null, 2, null, 4, 4]),
b: Series.new([0, null, 2, 3, null, 4]),
c: Series.new([null, null, null, null, null, null])
});
// delete rows with all nulls (default thresh=1)
df.dropNulls(0);
// return {
// a: [0, 2, null, 4, 4], b: [0, 2, 3, null, 4],
// c: [null, null, null, null, null]
// }
// delete rows with atleast one null
df.dropNulls(0, df.numColumns);
// returns empty df, since each row contains atleast one null
// delete columns with all nulls (default thresh=1)
df.dropNulls(1);
// returns {a: [0, null, 2, null, 4, 4], b: [0, null, 2, 3, null, 4]}
// delete columns with atleast one null
df.dropNulls(1, df.numRows);
// returns empty df, since each column contains atleast one null
Return sub-selection from a DataFrame from the specified boolean mask.
OptionalmemoryResource: MemoryResourceNames of List Columns to flatten. Defaults to all list Columns.
OptionalincludeNulls: boolean = trueWhether to retain null entries and map empty lists to null.
OptionalmemoryResource: MemoryResourceAn optional MemoryResource used to allocate the result's device memory.
Names of List Columns to flatten. Defaults to all list Columns.
OptionalincludeNulls: boolean = trueWhether to retain null entries and map empty lists to null.
OptionalmemoryResource: MemoryResourceAn optional MemoryResource used to allocate the result's device memory.
Compute the largest integer value not greater than arg for all NumericSeries in the DataFrame
OptionalmemoryResource: MemoryResourceA DataFrame with the operation performed on all NumericSeries
A Series of 8/16/32-bit signed or unsigned integer indices to gather.
If true, coerce rows that corresponds to out-of-bounds indices
in the selection to null. If false, skips all bounds checking for selection values. Pass
false if you are certain that the selection contains only valid indices for better
performance. If false and there are out-of-bounds indices in the selection, the behavior
is undefined. Defaults to false.
OptionalmemoryResource: MemoryResourceAn optional MemoryResource used to allocate the result's device memory.
Gathers the rows of the source columns according to selection, such that row "i"
in the resulting Table's columns will contain row selection[i] from the source columns. The
number of rows in the result table will be equal to the number of elements in selection. A
negative value i in the selection is interpreted as i+n, where n is the number of rows in
the source table.
For dictionary columns, the keys column component is copied and not trimmed if the gather results in abandoned key elements.
import {DataFrame, Series, Int32} from '@rapidsai/cudf';
const df = new DataFrame({
a: Series.new({type: new Int32, data: [0, 1, 2, 3, 4, 5]}),
b: Series.new([0.0, 1.0, 2.0, 3.0, 4.0, 5.0])
});
const selection = Series.new({type: new Int32, data: [2,4,5]});
df.gather(selection); // {a: [2, 4, 5], b: [2.0, 4.0, 5.0]}
Return a group-by on a single column.
configuration for the groupby
Return a group-by on a multiple columns.
configuration for the groupby
import {DataFrame, Series} from '@rapidsai/cudf';
const df = new DataFrame({
a: Series.new([0, 1, 1, 2, 2, 2]),
b: Series.new([0, 1, 2, 3, 4, 4]),
c: Series.new([1, 2, 3, 4, 5, 6])
})
df.groupby({by: ['a', 'b']}).max()
// {
// "a_b": [{"a": [2, 1, 1, 2, 0], "b": [4, 2, 1, 3, 0]}],
// "c": [6, 3, 2, 4, 1]
// }
Return whether the DataFrame has a Series.
Name of the Series to return.
Returns the first n rows as a new DataFrame.
The number of rows to return.
import {DataFrame, Series, Int32} from '@rapidsai/cudf';
const df = new DataFrame({
a: Series.new({type: new Int32, data: [0, 1, 2, 3, 4, 5, 6]}),
b: Series.new([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
});
a.head();
// {a: [0, 1, 2, 3, 4], b: [0.0, 1.0, 2.0, 3.0, 4.0]}
b.head(1);
// {a: [0], b: [0.0]}
a.head(-1);
// throws index out of bounds error
OptionaldataType: R | nullThe dtype of the result Series (required if the DataFrame has mixed dtypes).
OptionalmemoryResource: MemoryResourceAn optional MemoryResource used to allocate the result's device memory.
Series representing a packed row-major matrix of all the source DataFrame's Series.
import {DataFrame, Series} from '@rapidsai/cudf';
new DataFrame({
a: Series.new([1, 2, 3]),
b: Series.new([4, 5, 6]),
}).interleaveColumns()
// Float64Series [
// 1, 4, 2, 5, 3, 6
// ]
new DataFrame({
b: Series.new([ [0, 1, 2], [3, 4, 5], [6, 7, 8]]),
c: Series.new([[10, 11, 12], [13, 14, 15], [16, 17, 18]]),
}).interleaveColumns()
// ListSeries [
// [0, 1, 2],
// [10, 11, 12],
// [3, 4, 5],
// [13, 14, 15],
// [6, 7, 8],
// [16, 17, 18],
Creates a DataFrame replacing any FloatSeries with a Bool8Series where true indicates the
value is NaN and false indicates the value is valid.
OptionalmemoryResource: MemoryResourcea DataFrame replacing instances of FloatSeries with a Bool8Series where true
indicates the value is NaN
Creates a DataFrame of BOOL8 Series where true indicates the value is null and
false indicates the value is valid.
OptionalmemoryResource: MemoryResourcea DataFrame containing Series of 'BOOL8' where 'true' indicates the value is null
Join columns with other DataFrame.
the joined DataFrame
Return a Series containing the unbiased kurtosis result for each Series in the DataFrame.
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
A Series containing the unbiased kurtosis result for all Series in the DataFrame
Convert NaNs (if any) to nulls.
Optionalsubset: (keyof T)[]List of float columns to consider to replace NaNs with nulls.
OptionalmemoryResource: MemoryResourceDataFrame
import {DataFrame, Series, Int32, Float32} from '@rapidsai/cudf';
const df = new DataFrame({
a: Series.new({type: new Int32, data: [0, 1, 2, 3, 4, 4]}),
b: Series.new({type: new Float32, data: [0, NaN, 2, 3, 4, 4]})
});
df.get("b").nullCount; // 0
const df1 = df.nansToNulls();
df1.get("b").nullCount; // 1
Generate an ordering that sorts DataFrame columns in a specified way
mapping of column names to sort order specifications
OptionalmemoryResource: MemoryResourceAn optional MemoryResource used to allocate the result's device memory.
Series containting the permutation indices for the desired sort order
import {DataFrame, Series, Int32, NullOrder} from '@rapidsai/cudf';
const df = new DataFrame({a: Series.new([null, 4, 3, 2, 1, 0])});
df.orderBy({a: {ascending: true, null_order: 'before'}});
// Int32Series [0, 5, 4, 3, 2, 1]
df.orderBy({a: {ascending: true, null_order: 'after'}});
// Int32Series [5, 4, 3, 2, 1, 0]
df.orderBy({a: {ascending: false, null_order: 'before'}});
// Int32Series [1, 2, 3, 4, 5, 0]
df.orderBy({a: {ascending: false, null_order: 'after'}});
// Int32Series [0, 1, 2, 3, 4, 5]
Return a new DataFrame with specified columns renamed.
Object mapping old to new Column names.
import {DataFrame, Series, Int32, Float32} from '@rapidsai/cudf';
const df = new DataFrame({
a: Series.new({type: new Int32, data: [0, 1, 1, 2, 2, 2]}),
b: Series.new({type: new Float32, data: [0, 1, 2, 3, 4, 4]})
});
df.rename({a: 'c'}) // returns df {b: [0, 1, 2, 3, 4, 4], c: [0, 1, 1, 2, 2, 2]}
Return a Series containing the unbiased skew result for each Series in the DataFrame.
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
A Series containing the unbiased skew result for all Series in the DataFrame
Generate a new DataFrame sorted in the specified way.
OptionalmemoryResource: MemoryResourceA new DataFrame of sorted values
import {DataFrame, Series, Int32} from '@rapidsai/cudf';
const df = new DataFrame({
a: Series.new([null, 4, 3, 2, 1, 0]),
b: Series.new([0, 1, 2, 3, 4, 5])
});
df.sortValues({a: {ascending: true, null_order: 'after'}})
// {a: [0, 1, 2, 3, 4, null], b: [5, 4, 3, 2, 1, 0]}
df.sortValues({a: {ascending: true, null_order: 'before'}})
// {a: [null, 0, 1, 2, 3, 4], b: [0, 5, 4, 3, 2, 1]}
df.sortValues({a: {ascending: false, null_order: 'after'}})
// {a: [4, 3, 2, 1, 0, null], b: [1, 2, 3, 4, 5, 0]}
df.sortValues({a: {ascending: false, null_order: 'before'}})
// {a: [null, 4, 3, 2, 1, 0], b: [0, 1, 2, 3, 4, 5]}
Compute the sum for all Series in the DataFrame.
Optionalsubset: readonly P[]List of columns to select (all columns are considered by default).
The optional skipNulls if true drops NA and null values before computing reduction, else if skipNulls is false, reduction is computed directly.
OptionalmemoryResource: MemoryResourceMemory resource used to allocate the result Column's device memory.
A Series containing the sum of all values for each Series
Returns the last n rows as a new DataFrame.
The number of rows to return.
import {DataFrame, Series, Int32} from '@rapidsai/cudf';
const df = new DataFrame({
a: Series.new({type: new Int32, data: [0, 1, 2, 3, 4, 5, 6]}),
b: Series.new([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
});
a.tail();
// {a: [2, 3, 4, 5, 6], b: [2.0, 3.0, 4.0, 5.0, 6.0]}
b.tail(1);
// {a: [6], b: [6.0]}
a.tail(-1);
// throws index out of bounds error
Copy a Series to an Arrow vector in host memory
Serialize this DataFrame to CSV format.
Options controlling CSV writing behavior.
A node ReadableStream of the CSV data.
Write a DataFrame to ORC format.
File path or root directory path.
Options controlling ORC writing behavior.
Write a DataFrame to Parquet format.
File path or root directory path.
Options controlling Parquet writing behavior.
Return a string with a tabular representation of the DataFrame, pretty-printed according to the options given.
StaticfromStaticfromStaticreadRead a CSV file from disk and create a cudf.DataFrame
Optionaloptions: ReadCSVOptionsCommon<T>Read a CSV file from disk and create a cudf.DataFrame
StaticreadStaticread
A GPU Dataframe object.