Input/Output#
Parquet files#
- legate_dataframe.lib.parquet.parquet_read(glob_string: pathlib.Path | str) LogicalTable #
Read Parquet files into a logical table
- Parameters:
glob_string (str or pathlib.Path) – The glob string to specify the Parquet files. All glob matches must be valid Parquet files and have the same LogicalTable data types. See <https://linux.die.net/man/7/glob>.
- Return type:
The read logical table.
See also
parquet_write
Write parquet data
- legate_dataframe.lib.parquet.parquet_write(LogicalTable tbl, path: pathlib.Path | str) None #
Write logical table to Parquet files
Each partition will be written to a separate file.
- Parameters:
tbl (LogicalTable) – The table to write.
path (str or pathlib.Path) – Destination directory for data.
the (Files will be created in the specified output directory using)
part.0.parquet (convention)
part.1.parquet
part.2.parquet
and (...)
table:: (so on for each partition in the) –
- /path/to/output/
├── part.0.parquet ├── part.1.parquet ├── part.2.parquet └── …
See also
parquet_read
Read parquet data
CSV files#
Legate-dataframe supports writing data to parquet or CSV files via:
- legate_dataframe.lib.csv.csv_read(glob_string, *, dtypes, na_filter=True, delimiter=',', usecols=None)#
Read csv files into a logical table
- Parameters:
glob_string (str) – The glob string to specify the csv files. All glob matches must be valid csv files and have the same LogicalTable data types. See <https://linux.die.net/man/7/glob>.
dtypes (iterable of cudf dtype-likes) – The cudf dtypes to extract for each column (or a single one for all).
na_filter (bool, optional) – Whether to detect missing values, set to
False
to improve performance.delimiter (str, optional) – The field delimiter.
usecols (iterable of str or None, optional) – If given, must match dtypes in length and denotes column names to be extracted from the file.
- Return type:
The read logical table
See also
csv_write
Write csv data
lib.parquet.parquet_write
Write parquet data
- legate_dataframe.lib.csv.csv_write(LogicalTable tbl, path, delimiter=u', ')#
Write logical table to csv files
Each partition will be written to a separate file.
- Parameters:
tbl (LogicalTable) – The table to write.
path (str or pathlib.Path) – Destination directory for data.
delimiter (str) – The field delimiter.
the (Files will be created in the specified output directory using)
part.0.csv (convention)
part.1.csv
part.2.csv
and (...)
table:: (so on for each partition in the) –
- /path/to/output/
├── part.0.csv ├── part.1.csv ├── part.2.csv └── …
See also
csv_read
Read csv data
lib.parquet.parquet_read
Read parquet data