Input/Output#
Parquet files#
- legate_dataframe.lib.parquet.parquet_read(glob_string: pathlib.Path | str, *, columns=None) LogicalTable #
Read Parquet files into a logical table
- Parameters:
glob_string (str or pathlib.Path) – The glob string to specify the Parquet files. All glob matches must be valid Parquet files and have the same LogicalTable data types. See <https://linux.die.net/man/7/glob>.
columns – List of strings selecting a subset of columns to read.
- Return type:
The read logical table.
See also
parquet_write
Write parquet data
- legate_dataframe.lib.parquet.parquet_write(LogicalTable tbl, path: pathlib.Path | str) None #
Write logical table to Parquet files
Each partition will be written to a separate file.
- Parameters:
tbl (LogicalTable) – The table to write.
path (str or pathlib.Path) – Destination directory for data.
the (Files will be created in the specified output directory using)
part.0.parquet (convention)
part.1.parquet
part.2.parquet
and (...)
table:: (so on for each partition in the) –
- /path/to/output/
├── part.0.parquet ├── part.1.parquet ├── part.2.parquet └── …
See also
parquet_read
Read parquet data
CSV files#
Legate-dataframe supports writing data to parquet or CSV files via:
- legate_dataframe.lib.csv.csv_read(glob_string, *, dtypes, na_filter=True, delimiter=',', usecols=None, names=None)#
Read csv files into a logical table
- Parameters:
glob_string (str) – The glob string to specify the csv files. All glob matches must be valid csv files and have the same LogicalTable data types. See <https://linux.die.net/man/7/glob>.
dtypes (iterable of cudf dtype-likes) – The cudf dtypes to extract for each column (or a single one for all).
na_filter (bool, optional) – Whether to detect missing values, set to
False
to improve performance.delimiter (str, optional) – The field delimiter.
usecols (iterable of str or int or None, optional) – If given, must match dtypes in length and denotes column names to be extracted from the file. If passes as integers, implies the file has no header and names must be passed.
names (iterable of str) – The names of the read columns, must be used with integral usecols.
- Return type:
The read logical table
See also
csv_write
Write csv data
lib.parquet.parquet_write
Write parquet data
- legate_dataframe.lib.csv.csv_write(LogicalTable tbl, path, delimiter=u', ')#
Write logical table to csv files
Each partition will be written to a separate file.
- Parameters:
tbl (LogicalTable) – The table to write.
path (str or pathlib.Path) – Destination directory for data.
delimiter (str) – The field delimiter.
the (Files will be created in the specified output directory using)
part.0.csv (convention)
part.1.csv
part.2.csv
and (...)
table:: (so on for each partition in the) –
- /path/to/output/
├── part.0.csv ├── part.1.csv ├── part.2.csv └── …
See also
csv_read
Read csv data
lib.parquet.parquet_read
Read parquet data