Input/Output#

Parquet files#

legate_dataframe.lib.parquet.parquet_read(glob_string: pathlib.Path | str) LogicalTable#

Read Parquet files into a logical table

Parameters:

glob_string (str or pathlib.Path) – The glob string to specify the Parquet files. All glob matches must be valid Parquet files and have the same LogicalTable data types. See <https://linux.die.net/man/7/glob>.

Return type:

The read logical table.

See also

parquet_write

Write parquet data

legate_dataframe.lib.parquet.parquet_write(LogicalTable tbl, path: pathlib.Path | str) None#

Write logical table to Parquet files

Each partition will be written to a separate file.

Parameters:
  • tbl (LogicalTable) – The table to write.

  • path (str or pathlib.Path) – Destination directory for data.

  • the (Files will be created in the specified output directory using)

  • part.0.parquet (convention)

  • part.1.parquet

  • part.2.parquet

  • and (...)

  • table:: (so on for each partition in the) –

    /path/to/output/

    ├── part.0.parquet ├── part.1.parquet ├── part.2.parquet └── …

See also

parquet_read

Read parquet data

CSV files#

Legate-dataframe supports writing data to parquet or CSV files via:

legate_dataframe.lib.csv.csv_read(glob_string, *, dtypes, na_filter=True, delimiter=',', usecols=None)#

Read csv files into a logical table

Parameters:
  • glob_string (str) – The glob string to specify the csv files. All glob matches must be valid csv files and have the same LogicalTable data types. See <https://linux.die.net/man/7/glob>.

  • dtypes (iterable of cudf dtype-likes) – The cudf dtypes to extract for each column (or a single one for all).

  • na_filter (bool, optional) – Whether to detect missing values, set to False to improve performance.

  • delimiter (str, optional) – The field delimiter.

  • usecols (iterable of str or None, optional) – If given, must match dtypes in length and denotes column names to be extracted from the file.

Return type:

The read logical table

See also

csv_write

Write csv data

lib.parquet.parquet_write

Write parquet data

legate_dataframe.lib.csv.csv_write(LogicalTable tbl, path, delimiter=u', ')#

Write logical table to csv files

Each partition will be written to a separate file.

Parameters:
  • tbl (LogicalTable) – The table to write.

  • path (str or pathlib.Path) – Destination directory for data.

  • delimiter (str) – The field delimiter.

  • the (Files will be created in the specified output directory using)

  • part.0.csv (convention)

  • part.1.csv

  • part.2.csv

  • and (...)

  • table:: (so on for each partition in the) –

    /path/to/output/

    ├── part.0.csv ├── part.1.csv ├── part.2.csv └── …

See also

csv_read

Read csv data

lib.parquet.parquet_read

Read parquet data