AvailableDataFrame
Lazy query plan builder for batch and bounded streaming workloads.
Overview
DataFrame is a lazy plan builder. Operations compose a logical plan; execution only happens on collect(), show(), or execute_stream().
| Method | Returns | Description |
select(exprs) | Result<DataFrame> | Project columns or expressions. |
select_columns(names) | Result<DataFrame> | Project by column name. |
filter(expr: Expr) | Result<DataFrame> | Apply a boolean filter predicate. |
group_by(exprs) | Result<GroupedDataFrame> | Group rows for aggregation. |
sort(exprs) | Result<DataFrame> | Sort by expressions. |
limit(n: usize) | Result<DataFrame> | Retain at most n rows. |
join(right, kind, left_cols, right_cols) | Result<DataFrame> | Join with another DataFrame. |
join_on(right, kind, expr) | Result<DataFrame> | Join using an arbitrary ON expression. |
union(other) | Result<DataFrame> | UNION ALL of two DataFrames. |
union_distinct(other) | Result<DataFrame> | UNION DISTINCT (deduplicated). |
intersect(other) | Result<DataFrame> | INTERSECT DISTINCT. |
except(other) | Result<DataFrame> | EXCEPT DISTINCT. |
distinct() | Result<DataFrame> | Remove duplicate rows. |
with_column(name, expr) | Result<DataFrame> | Add or replace a column. |
drop_columns(names) | Result<DataFrame> | Remove named columns. |
rename(old, new) | Result<DataFrame> | Rename a column. |
alias(name) | Result<DataFrame> | Alias the DataFrame as a subquery. |
repartition(n) | Result<DataFrame> | Set the output partition count. |
cache() | Future<Result<DataFrame>> | Materialise and cache results in-memory. |
Execution Methods
| Method | Returns | Description |
collect() | Future<Result<Vec<RecordBatch>>> | Execute and collect all batches into memory. |
collect_partitioned() | Future<Result<Vec<Vec<RecordBatch>>>> | Collect preserving partition boundaries. |
show() | Future<Result<()>> | Execute and print a formatted table to stdout. |
execute_stream() | Future<Result<SendableRecordBatchStream>> | Execute as a streaming batch iterator. |
explain(verbose: bool) | Future<Result<DataFrame>> | Return a DataFrame containing the plan text. |
explain_mode(mode: ExplainMode) | Future<Result<DataFrame>> | Return plan text in the requested explain format. |
Schema / Metadata
| Method | Returns | Description |
schema() | &Schema | Return the logical output schema. |
columns() | Vec<&str> | Return column names. |
is_bounded() | bool | True if the plan is finite/batch; false if unbounded/streaming. |
boundedness() | Boundedness | Enum: Bounded or Unbounded. |
Write Methods
| Method | Description |
write_parquet(path, opts) | Write results to a local Parquet file. |
write_csv(path, opts) | Write results to a CSV file. |
write_json(path) | Write results as NDJSON. |