Input/Output#

Parquet files#

legate_dataframe.lib.parquet.parquet_read(glob_string: pathlib.Path | str, *, columns=None) → LogicalTable#

Read Parquet files into a logical table

Parameters:

glob_string (str or pathlib.Path) – The glob string to specify the Parquet files. All glob matches must be valid Parquet files and have the same LogicalTable data types. See <https://linux.die.net/man/7/glob>.
columns – List of strings selecting a subset of columns to read.

Return type:

The read logical table.

See also

parquet_write: Write parquet data

legate_dataframe.lib.parquet.parquet_read_array(glob_string: pathlib.Path | str, *, columns=None, null_value=None, type=None) → LogicalArray#

Read Parquet files into a logical array

To successfully read the files, all selected columns must have the same type that is compatible with legate (currently only numeric types).

Parameters:

glob_string (str or pathlib.Path) – The glob string to specify the Parquet files. All glob matches must be valid Parquet files and have the same LogicalTable data types. See <https://linux.die.net/man/7/glob>.
columns – List of strings selecting a subset of columns to read.
null_value (legate.core.Scalar or None) – If given (not None) the result will not have a null mask and null values are instead replaced with this value.
type (legate.core.Type or None) – The desired result legate type. If given, columns are cast to this type. If not given the dtype is inferred, but all columns must have the same one.

Return type:

The read logical array.

See also

parquet_read: Write parquet data into a table

legate_dataframe.lib.parquet.parquet_write(LogicalTable tbl, path: pathlib.Path | str) → None#

Write logical table to Parquet files

Each partition will be written to a separate file.

Parameters:

tbl (LogicalTable) – The table to write.
path (str or pathlib.Path) – Destination directory for data.
the (Files will be created in the specified output directory using)
part.0.parquet (convention)
part.1.parquet
part.2.parquet
and (...)
table:: (so on for each partition in the) –

/path/to/output/
├── part.0.parquet ├── part.1.parquet ├── part.2.parquet └── …

See also

parquet_read: Read parquet data

CSV files#

Legate-dataframe supports writing data to parquet or CSV files via:

legate_dataframe.lib.csv.csv_read(glob_string, *, dtypes, na_filter=True, delimiter=',', usecols=None, names=None)#

Read csv files into a logical table

Parameters:

glob_string (str) – The glob string to specify the csv files. All glob matches must be valid csv files and have the same LogicalTable data types. See <https://linux.die.net/man/7/glob>.
dtypes (iterable of cudf dtype-likes) – The cudf dtypes to extract for each column (or a single one for all).
na_filter (bool, optional) – Whether to detect missing values, set to False to improve performance.
delimiter (str, optional) – The field delimiter.
usecols (iterable of str or int or None, optional) – If given, must match dtypes in length and denotes column names to be extracted from the file. If passes as integers, implies the file has no header and names must be passed.
names (iterable of str) – The names of the read columns, must be used with integral usecols.

Return type:

The read logical table

See also

csv_write: Write csv data
lib.parquet.parquet_write: Write parquet data

legate_dataframe.lib.csv.csv_write(LogicalTable tbl, path, delimiter=', ')#

Write logical table to csv files

Each partition will be written to a separate file.

Parameters:

tbl (LogicalTable) – The table to write.
path (str or pathlib.Path) – Destination directory for data.
delimiter (str) – The field delimiter.
the (Files will be created in the specified output directory using)
part.0.csv (convention)
part.1.csv
part.2.csv
and (...)
table:: (so on for each partition in the) –

/path/to/output/
├── part.0.csv ├── part.1.csv ├── part.2.csv └── …

See also

csv_read: Read csv data
lib.parquet.parquet_read: Read parquet data