Input/Output#
Parquet files#
- legate_dataframe.lib.parquet.parquet_read(glob_string: pathlib.Path | str, *, columns=None) LogicalTable #
Read Parquet files into a logical table
- Parameters:
glob_string (str or pathlib.Path) – The glob string to specify the Parquet files. All glob matches must be valid Parquet files and have the same LogicalTable data types. See <https://linux.die.net/man/7/glob>.
columns – List of strings selecting a subset of columns to read.
- Return type:
The read logical table.
See also
parquet_write
Write parquet data
- legate_dataframe.lib.parquet.parquet_read_array(glob_string: pathlib.Path | str, *, columns=None, null_value=None, type=None) LogicalArray #
Read Parquet files into a logical array
To successfully read the files, all selected columns must have the same type that is compatible with legate (currently only numeric types).
- Parameters:
glob_string (str or pathlib.Path) – The glob string to specify the Parquet files. All glob matches must be valid Parquet files and have the same LogicalTable data types. See <https://linux.die.net/man/7/glob>.
columns – List of strings selecting a subset of columns to read.
null_value (legate.core.Scalar or None) – If given (not
None
) the result will not have a null mask and null values are instead replaced with this value.type (legate.core.Type or None) – The desired result legate type. If given, columns are cast to this type. If not given the
dtype
is inferred, but all columns must have the same one.
- Return type:
The read logical array.
See also
parquet_read
Write parquet data into a table
- legate_dataframe.lib.parquet.parquet_write(LogicalTable tbl, path: pathlib.Path | str) None #
Write logical table to Parquet files
Each partition will be written to a separate file.
- Parameters:
tbl (LogicalTable) – The table to write.
path (str or pathlib.Path) – Destination directory for data.
the (Files will be created in the specified output directory using)
part.0.parquet (convention)
part.1.parquet
part.2.parquet
and (...)
table:: (so on for each partition in the) –
- /path/to/output/
├── part.0.parquet ├── part.1.parquet ├── part.2.parquet └── …
See also
parquet_read
Read parquet data
CSV files#
Legate-dataframe supports writing data to parquet or CSV files via:
- legate_dataframe.lib.csv.csv_read(glob_string, *, dtypes, na_filter=True, delimiter=',', usecols=None, names=None)#
Read csv files into a logical table
- Parameters:
glob_string (str) – The glob string to specify the csv files. All glob matches must be valid csv files and have the same LogicalTable data types. See <https://linux.die.net/man/7/glob>.
dtypes (iterable of cudf dtype-likes) – The cudf dtypes to extract for each column (or a single one for all).
na_filter (bool, optional) – Whether to detect missing values, set to
False
to improve performance.delimiter (str, optional) – The field delimiter.
usecols (iterable of str or int or None, optional) – If given, must match dtypes in length and denotes column names to be extracted from the file. If passes as integers, implies the file has no header and names must be passed.
names (iterable of str) – The names of the read columns, must be used with integral usecols.
- Return type:
The read logical table
See also
csv_write
Write csv data
lib.parquet.parquet_write
Write parquet data
- legate_dataframe.lib.csv.csv_write(LogicalTable tbl, path, delimiter=', ')#
Write logical table to csv files
Each partition will be written to a separate file.
- Parameters:
tbl (LogicalTable) – The table to write.
path (str or pathlib.Path) – Destination directory for data.
delimiter (str) – The field delimiter.
the (Files will be created in the specified output directory using)
part.0.csv (convention)
part.1.csv
part.2.csv
and (...)
table:: (so on for each partition in the) –
- /path/to/output/
├── part.0.csv ├── part.1.csv ├── part.2.csv └── …
See also
csv_read
Read csv data
lib.parquet.parquet_read
Read parquet data