A unified engine boundary for local and distributed data work.
Krishiv routes APIs through one planning and runtime model, with scheduler, executor, state, shuffle, checkpoint, metadata, and connector behavior kept behind explicit crate APIs.
System layers
Interfaces
SQL · Rust · Python
Unified Runtime
Batch · Streaming · Incremental Processing
Execution Foundation
DataFusion · Apache Arrow
Distributed Primitives
Scheduling · Shuffle · State · Checkpoints
Data Ecosystem
Iceberg · Kafka · Parquet · Object Storage · Catalogs
Request flow
APIs accept SQL, Rust, or Python calls and create sessions or dataframes.
DataFusion parses and plans SQL; Krishiv plan and policy modules add typed runtime contracts.
ExecutionRuntime selects embedded, single-node, or remote placement without silent distributed fallback.
Coordinators own job/task lifecycle; executors run replaceable data-plane work.
State, checkpoints, shuffle, and connectors use durability profiles and explicit capabilities.
Results return as Arrow RecordBatch values or streaming batches depending on the API.
Batch, streaming, and delta / IVM
Batch SQL, streaming windows, and delta-oriented IVM share Arrow and DataFusion foundations. IVM is experimental: IncrementalFlow exists, but distributed executor-side IVM is in progress.
State, checkpoints, scheduling, and shuffle
Krishiv exposes dev-local, single-node-durable, and distributed-durable profiles. These profiles select metadata, shuffle, state, and checkpoint storage choices instead of implying universal exactly-once behavior.
Storage, catalogs, and topology
Storage and catalogs
Iceberg is the primary lakehouse platform. REST catalog compatible paths, Hive, Glue, Parquet, Kafka, S3/object store, and ADLS are documented with preview or feature-gated maturity.
Local versus distributed
Embedded mode runs in process. Single-node runs components on one host. Distributed mode requires explicit remote endpoints, bearer-token production control-plane paths, and replaceable executors.