Series containing the utf8 characters of each string
Whether the Series contains null elements.
The number of elements in this Series.
The DeviceBuffer for for the validity bitmask in GPU memory.
The number of null elements in this Series.
A boolean indicating whether a validity bitmask exists.
The number of child columns in this Series.
The offset of elements in this Series underlying Column.
Series of integer offsets for each string
The data type of elements in the underlying data.
Copy the underlying device memory to host, and return an Iterator of the values.
Returns an Int32 series the number of bytes of each string in the Series.
The optional MemoryResource used to allocate the result Series's device memory.
Casts the values to a new dtype (similar to static_cast
in C++).
The new dtype.
The optional MemoryResource used to allocate the result Series's device memory.
Series of same size as the current Series containing result of the cast
operation.
Concat a Series to the end of the caller, returning a new Series of a common dtype.
The Series to concat to the end of the caller.
Returns a boolean series identifying rows which match the given regex pattern.
Regex pattern to match to each string.
The optional MemoryResource used to allocate the result Series's device memory.
The regex pattern strings accepted are described here:
https://docs.rapids.ai/api/libcudf/stable/md_regex.html
A RegExp may also be passed, however all flags are ignored (only pattern.source
is used)
Return a copy of this Series.
Return the number of non-null elements in the Series.
The number of non-null elements
Returns an Int32 series the number of times the given regex pattern matches in each string.
Regex pattern to match to each string.
The optional MemoryResource used to allocate the result Series's device memory.
The regex pattern strings accepted are described here:
https://docs.rapids.ai/api/libcudf/stable/md_regex.html
A RegExp may also be passed, however all flags are ignored (only pattern.source
is used)
Returns a new Series with duplicate values from the original removed
Determines whether or not to keep the duplicate items.
Determines whether nulls are handled as equal values.
Determines whether null values are inserted before or after non-null values.
Memory resource used to allocate the result Column's device memory.
series without duplicate values
drop Null values from the series
Memory resource used to allocate the result Column's device memory.
series without Null values
Encode the Series values into integer labels.
The optional Series of values to encode into integers. Defaults to the unique elements in this Series.
The optional integer DataType to use for the returned Series. Defaults to Uint32.
The optional value used to indicate missing category. Defaults to -1.
The optional MemoryResource used to allocate the result Column's device memory.
A sequence of encoded integer labels with values between 0
and n-1
categories, and nullSentinel
for any null values
Fills a range of elements in a column out-of-place with a scalar value.
The scalar value to fill.
The starting index of the fill range (inclusive).
The index of the last element in the fill range (exclusive), default this.length .
The optional MemoryResource used to allocate the result Column's device memory.
Fills a range of elements in-place in a column with a scalar value.
The scalar value to fill
The starting index of the fill range (inclusive)
The index of the last element in the fill range (exclusive)
Return a sub-selection of this Series using the specified boolean mask.
A Series of boolean values for whose corresponding element in this Series will be selected or ignored.
An optional MemoryResource used to allocate the result's device memory.
A Series of 8/16/32-bit signed or unsigned integer indices to gather.
If true
, coerce rows that corresponds to out-of-bounds indices
in the selection to null. If false
, skips all bounds checking for selection values. Pass
false if you are certain that the selection contains only valid indices for better
performance. If false
and there are out-of-bounds indices in the selection, the behavior
is undefined. Defaults to false
.
An optional MemoryResource used to allocate the result's device memory.
Applies a JSONPath(string) where each row in the series is a valid json string. Returns New StringSeries containing the retrieved json object strings
The JSONPath string to be applied to each row of the input column
The optional MemoryResource used to allocate the result Series's device memory.
Return a value at the specified index to host memory
the index in this Series to return a value for
Returns the first n rows.
The number of rows to return.
Returns a new integer numeric series parsing hexadecimal values.
Any null entries will result in corresponding null entries in the output series.
Only characters [0-9] and [A-F] are recognized. When any other character is encountered, the parsing ends for that string. No interpretation is made on the sign of the integer.
Overflow of the resulting integer type is not checked. Each string is converted using an int64 type and then cast to the target integer type before storing it into the output series. If the resulting integer type is too small to hold the value, the stored value will be undefined.
Type of integer numeric series to return.
The optional MemoryResource used to allocate the result Series' device memory.
Converts IPv4 addresses into integers.
The IPv4 format is 1-3 character digits [0-9] between 3 dots (e.g. 123.45.67.890). Each section can have a value between [0-255].
The four sets of digits are converted to integers and placed in 8-bit fields inside the resulting integer.
i0.i1.i2.i3 -> (i0 << 24) | (i1 << 16) | (i2 << 8) | (i3)
No checking is done on the format. If a string is not in IPv4 format, the resulting integer is undefined.
The resulting 32-bit integer is placed in an int64_t to avoid setting the sign-bit in an int32_t type. This could be changed if cudf supported a UINT32 type in the future.
Any null entries will result in corresponding null entries in the output column.Returns a new Int64 numeric series parsing hexadecimal values from the provided string series.
The optional MemoryResource used to allocate the result Series' device memory.
Returns a boolean column identifying strings in which all characters are valid for conversion to integers from hex.
The output row entry will be set to true if the corresponding string element has at least one character in [0-9A-Za-z]. Also, the string may start with '0x'.
The optional MemoryResource used to allocate the result Series's device memory.
Returns a boolean column identifying strings in which all characters are valid for conversion to integers from IPv4 format.
The output row entry will be set to true if the corresponding string element has the following format xxx.xxx.xxx.xxx where xxx is integer digits between 0-255.
The optional MemoryResource used to allocate the result Series's device memory.
Creates a Series of BOOL8
elements where true
indicates the value is valid and
false
indicates the value is null.
Memory resource used to allocate the result Column's device memory.
A non-nullable Series of BOOL8
elements with false
representing null
values.
Creates a Series of BOOL8
elements where true
indicates the value is null and false
indicates the value is valid.
Memory resource used to allocate the result Column's device memory.
A non-nullable Series of BOOL8
elements with true
representing null
values.
Returns an Int32 series the length of each string in the Series.
The optional MemoryResource used to allocate the result Series's device memory.
Returns a boolean series identifying rows which match the given regex pattern only at the beginning of the string
Regex pattern to match to each string.
The optional MemoryResource used to allocate the result Series's device memory.
The regex pattern strings accepted are described here:
https://docs.rapids.ai/api/libcudf/stable/md_regex.html
A RegExp may also be passed, however all flags are ignored (only pattern.source
is used)
Returns the n largest element(s).
The number of values to retrieve.
Determines whether to keep the first or last of any duplicate values.
Returns the n smallest element(s).
The number of values to retrieve.
Determines whether to keep the first or last of any duplicate values.
Generate an ordering that sorts the Series in a specified way.
whether to sort ascending (true) or descending (false)
whether nulls should sort before or after other values
An optional MemoryResource used to allocate the result's device memory.
Series containting the permutation indices for the desired sort order
Add padding to each string using a provided character.
If the string is already width or more characters, no padding is performed. No strings are truncated.
Null string entries result in null entries in the output column.
The minimum number of characters for each string.
Where to place the padding characters. Default is pad right (left justify).
Single UTF-8 character to use for padding. Default is the space character.
The optional MemoryResource used to allocate the result Column's device memory.
Returns a set of 3 columns by splitting each string using the specified delimiter.
The number of rows in the output columns will be the same as the input column. The first column will contain the first tokens of each string as a result of the split. The second column will contain the delimiter. The third column will contain the remaining characters of each string after the delimiter.
Any null string entries return corresponding null output columns.
UTF-8 encoded string indicating where to split each string. Default of empty string indicates split on whitespace.
The optional MemoryResource used to allocate the result Column's device memory.
3 new string columns representing before the delimiter, the delimiter, and after the delimiter.
Replace null values with a scalar value.
The scalar value to use in place of nulls.
The optional MemoryResource used to allocate the result Column's device memory.
Replace null values with the corresponding elements from another Series.
The Series to use in place of nulls.
The optional MemoryResource used to allocate the result Column's device memory.
Replace null values with the non-null value following the null value in the same series.
The optional MemoryResource used to allocate the result Column's device memory.
Replace null values with the non-null value preceding the null value in the same series.
The optional MemoryResource used to allocate the result Column's device memory.
For each string in the column, replaces any character sequence matching the given pattern with the provided replacement string.
Null string entries will return null output string entries.
Position values are 0-based meaning position 0 is the first character of each string.
This function can be used to insert a string into specific position by specifying the same position value for start and stop. The repl string can be appended to each string by specifying -1 for both start and stop.
The regular expression pattern to search within each string.
The string used to replace the matched sequence in each string. Default is an empty string.
The maximum number of times to replace the matched pattern within each string. Default replaces every substring that is matched.
The optional MemoryResource used to allocate the result Column's device memory.
New strings column with matching elements replaced.
Replaces each string in the column with the provided repl string within the [start,stop) character position range.
Null string entries will return null output string entries.
Position values are 0-based meaning position 0 is the first character of each string.
This function can be used to insert a string into specific position by specifying the same position value for start and stop. The repl string can be appended to each string by specifying -1 for both start and stop.
Replacement string for specified positions found.
Start position where repl will be added. Default is 0, first character position.
End position (exclusive) to use for replacement. Default of -1 specifies the end of each string.
The optional MemoryResource used to allocate the result Column's device memory.
Returns a new series with reversed elements.
An optional MemoryResource used to allocate the result's device memory.
Scatters single value into this Series according to provided indices.
A column of values to be scattered in to this Series
A column of integral indices that indicate the rows in the this Series to be
replaced by value
.
An optional MemoryResource used to allocate the result's device memory.
Scatters a column of values into this Series according to provided indices.
A column of integral indices that indicate the rows in the this Series to be
replaced by value
.
An optional MemoryResource used to allocate the result's device memory.
The null-mask. Valid values are marked as 1; otherwise 0. The mask bit given the data index idx is computed as:
(mask[idx // 8] >> (idx % 8)) & 1
The number of null values. If None, it is calculated automatically.
set value at the specified index
the index in this Series to set a value for
the value to set at index
set values at the specified indices
the indices in this Series to set values for
the values to set at Series of indices
Generate a new Series that is sorted in a specified way.
whether to sort ascending (true) or descending (false) Default: true
whether nulls should sort before or after other values Default: before
An optional MemoryResource used to allocate the result's device memory.
Sorted values
Splits a StringSeries along the delimiter.
Optional delimiter.
Series with new splits determined by the delimiter.
Returns the last n rows.
The number of rows to return.
Copy the underlying device memory to host and return an Array (or TypedArray) of the values.
Copy a Series to an Arrow vector in host memory
Return a string with a tabular representation of the Series, pretty-printed according to the options given.
Returns a new Series with only the unique values that were found in the original
Determines whether nulls are handled as equal values.
Memory resource used to allocate the result Column's device memory.
series without duplicate values
Returns an object with keys "value" and "count" whose respective values are new Series containing the unique values in the original series and the number of times they occur in the original series.
object with keys "value" and "count"
Add '0' as padding to the left of each string.
If the string is already width or more characters, no padding is performed. No strings are truncated.
This equivalent to pad(width, 'left', '0')
but is more optimized for this special case.
Null string entries result in null entries in the output column.
The minimum number of characters for each string.
The optional MemoryResource used to allocate the result Column's device memory.
Row-wise concatenates the given list of strings series and returns a single string series result.
List of string series to concatenate.
Options for the concatenation
New series with concatenated results.
Create a new cudf.Series from an apache arrow vector
Create a new cudf.Series from SeriesProps or a cudf.Column
Create a new cudf.Int8Series
Create a new cudf.Int16Series
Create a new cudf.Int32Series
Create a new cudf.Uint8Series
Create a new cudf.Uint16Series
Create a new cudf.Uint32Series
Create a new cudf.Uint64Series
Create a new cudf.Float32Series
Create a new cudf.StringSeries
Create a new cudf.Float64Series
Create a new cudf.Int64Series
Create a new cudf.Bool8Series
Create a new cudf.TimestampMillisecondSeries
Create a new cudf.ListSeries that contain cudf.StringSeries elements.
Create a new cudf.ListSeries that contain cudf.Float64Series elements.
Create a new cudf.ListSeries that contain cudf.Int64Series elements.
Create a new cudf.ListSeries that contain cudf.Bool8Series elements.
Create a new cudf.ListSeries that contain cudf.TimestampMillisecondSeries elements.
Constructs a Series from a text file path.
Path of the input file.
Optional delimiter.
StringSeries from the file, split by delimiter.
Constructs a Series with a sequence of values.
Options for creating the sequence
Series with the sequence
A Series of utf8-string values in GPU memory.