ProductDocsArchitectureBlogGitHubGitHubGet Started
Available

DataFrame

Lazy query plan builder for batch and bounded streaming workloads.

Overview

DataFrame is a lazy plan builder. Operations compose a logical plan; execution only happens on collect(), show(), or execute_stream().

Transformation Methods

MethodReturnsDescription
select(exprs)Result<DataFrame>Project columns or expressions.
select_columns(names)Result<DataFrame>Project by column name.
filter(expr: Expr)Result<DataFrame>Apply a boolean filter predicate.
group_by(exprs)Result<GroupedDataFrame>Group rows for aggregation.
sort(exprs)Result<DataFrame>Sort by expressions.
limit(n: usize)Result<DataFrame>Retain at most n rows.
join(right, kind, left_cols, right_cols)Result<DataFrame>Join with another DataFrame.
join_on(right, kind, expr)Result<DataFrame>Join using an arbitrary ON expression.
union(other)Result<DataFrame>UNION ALL of two DataFrames.
union_distinct(other)Result<DataFrame>UNION DISTINCT (deduplicated).
intersect(other)Result<DataFrame>INTERSECT DISTINCT.
except(other)Result<DataFrame>EXCEPT DISTINCT.
distinct()Result<DataFrame>Remove duplicate rows.
with_column(name, expr)Result<DataFrame>Add or replace a column.
drop_columns(names)Result<DataFrame>Remove named columns.
rename(old, new)Result<DataFrame>Rename a column.
alias(name)Result<DataFrame>Alias the DataFrame as a subquery.
repartition(n)Result<DataFrame>Set the output partition count.
cache()Future<Result<DataFrame>>Materialise and cache results in-memory.

Execution Methods

MethodReturnsDescription
collect()Future<Result<Vec<RecordBatch>>>Execute and collect all batches into memory.
collect_partitioned()Future<Result<Vec<Vec<RecordBatch>>>>Collect preserving partition boundaries.
show()Future<Result<()>>Execute and print a formatted table to stdout.
execute_stream()Future<Result<SendableRecordBatchStream>>Execute as a streaming batch iterator.
explain(verbose: bool)Future<Result<DataFrame>>Return a DataFrame containing the plan text.
explain_mode(mode: ExplainMode)Future<Result<DataFrame>>Return plan text in the requested explain format.

Schema / Metadata

MethodReturnsDescription
schema()&SchemaReturn the logical output schema.
columns()Vec<&str>Return column names.
is_bounded()boolTrue if the plan is finite/batch; false if unbounded/streaming.
boundedness()BoundednessEnum: Bounded or Unbounded.

Write Methods

MethodDescription
write_parquet(path, opts)Write results to a local Parquet file.
write_csv(path, opts)Write results to a CSV file.
write_json(path)Write results as NDJSON.