Architecture

A unified engine boundary for local and distributed data work.

Krishiv routes APIs through one planning and runtime model, with scheduler, executor, state, shuffle, checkpoint, metadata, and connector behavior kept behind explicit crate APIs.

System layers

Interfaces

SQL · Rust · Python

Unified Runtime

Batch · Streaming · Incremental Processing

Execution Foundation

DataFusion · Apache Arrow

Distributed Primitives

Scheduling · Shuffle · State · Checkpoints

Data Ecosystem

Iceberg · Kafka · Parquet · Object Storage · Catalogs

Explore feature maturity →

Request flow

APIs accept SQL, Rust, or Python calls and create sessions or dataframes.

DataFusion parses and plans SQL; Krishiv plan and policy modules add typed runtime contracts.

ExecutionRuntime selects embedded, single-node, or remote placement without silent distributed fallback.

Coordinators own job/task lifecycle; executors run replaceable data-plane work.

State, checkpoints, shuffle, and connectors use durability profiles and explicit capabilities.

Results return as Arrow RecordBatch values or streaming batches depending on the API.

Batch, streaming, and delta / IVM

Batch SQL, streaming windows, and delta-oriented IVM share Arrow and DataFusion foundations. IVM is experimental: IncrementalFlow exists, but distributed executor-side IVM is in progress.

State, checkpoints, scheduling, and shuffle

Krishiv exposes dev-local, single-node-durable, and distributed-durable profiles. These profiles select metadata, shuffle, state, and checkpoint storage choices instead of implying universal exactly-once behavior.

Storage, catalogs, and topology

Storage and catalogs

Iceberg is the primary lakehouse platform. REST catalog compatible paths, Hive, Glue, Parquet, Kafka, S3/object store, and ADLS are documented with preview or feature-gated maturity.

Local versus distributed

Embedded mode runs in process. Single-node runs components on one host. Distributed mode requires explicit remote endpoints, bearer-token production control-plane paths, and replaceable executors.

Read architecture docs Distributed mode