Introducing Krishiv: One Engine for Batch, Streaming, and Incremental Data Processing
Data teams often split closely related work across batch jobs, streaming systems, and separate incremental pipelines. Krishiv is being built around a different shape: one Rust-native compute framework that keeps Arrow data, DataFusion planning, runtime routing, scheduler/executor behavior, state, shuffle, checkpoints, and connectors in one coherent architecture.
What exists now
Krishiv has implemented batch SQL foundations, streaming APIs and windowing examples, explicit embedded/single-node/distributed runtime modes, scheduler and executor crates, shuffle/state/checkpoint abstractions, and Python bindings. Iceberg is the primary lakehouse target, while Kafka, Parquet, S3/object-store, and catalog integrations are represented with preview maturity where certification is still pending.
Why Rust, Arrow, and DataFusion
Rust and Tokio provide the runtime foundation. Apache Arrow RecordBatch is the columnar data model and IPC shape. DataFusion gives Krishiv SQL parsing, planning, expressions, and local execution so the project can focus its engine work on runtime placement, dataflow, state, shuffle, checkpoints, and connector boundaries.
Batch, streaming, and delta batch / IVM
Batch SQL and streaming workloads are available as public concepts in the repository. DeltaBatch and IncrementalFlow provide experimental incremental view maintenance based on weighted Arrow rows. The project intentionally labels distributed executor-side IVM and end-to-end connector certifications as in progress rather than complete.
Where to go next
Read the docs for current APIs and the architecture page for the system boundaries. The website copy is deliberately conservative: it avoids benchmarks, competitor comparisons, and unsupported guarantees.