Capability status, backed by codebase evidence.
Each status below is derived from inspecting Rust sources, tests, examples, and public APIs in the Krishiv workspace. Statuses reflect what is implemented, not what is intended.
Available
Implemented, tested, and used in core workflows. APIs are stable within minor versions.
Batch SQL
DataFusion-backed SQL over Apache Arrow RecordBatches and registered sources.
Apache Arrow data model
RecordBatch is the internal and IPC columnar format across all runtime paths.
Rust Session / DataFrame API
Session, DataFrame, and Stream types are the primary Rust-facing API surface.
DataFusion SQL planning
SQL parsing, logical planning, expression evaluation, and local execution via DataFusion.
Embedded runtime mode
Runs all components in-process; no network endpoints required. Used in tests and local API calls.
Single-node runtime mode
Runs coordinator, executor, and Flight/gRPC endpoints on one host with local filesystem and RocksDB.
Python bindings (core)
PyO3 bindings expose Session, DataFrame, and streaming APIs. Optional connector features are feature-gated.
Explicit durability profiles
dev-local, single-node-durable, and distributed-durable profiles control metadata, shuffle, state, and checkpoint storage.
Experimental
Implemented and functional. APIs and semantics may change. Not certified for production use.
Delta Batch / IVM
DeltaBatch (weighted Arrow rows) and IncrementalFlow (view maintenance across ticks) are implemented with partitioning, snapshots, and checkpoint hooks. Distributed executor-side IVM execution is deferred.
Python connector features
Kafka, Iceberg, and vector sink bindings exist as optional Cargo features. API surface is not yet stable.
Preview
Scaffolding and initial implementation exist. End-to-end certification work is ongoing. Use with caution.
Distributed runtime mode
Remote coordinator and executor transport with bearer-token auth. Requires explicit Flight endpoint; no silent local fallback.
Iceberg catalog integration
REST, Hive, and Glue catalog paths. Iceberg is the primary lakehouse target; certification work continues.
Kafka connector
Source and transactional sink via rdkafka. End-to-end exactly-once depends on certified checkpoint combinations.
Parquet / S3 / ADLS connectors
Connector contracts and implementations exist; end-to-end guarantees depend on certified combinations.
Shuffle service
In-memory, local disk, object-store, and Flight-oriented shuffle paths behind the krishiv-shuffle crate API.
Checkpoint storage
Async checkpoint primitives with sync compatibility wrappers. Scheduler gRPC checkpoint acks use the async path.
State management
In-memory and RocksDB-backed keyed state, TTL, migration, and incremental state behind the krishiv-state crate API.
Kubernetes operator / CRD
CRD and operator integration in the krishiv-operator crate. Manifests live in k8s/.
Scheduler fault tolerance
Job/task lifecycle, metadata stores, and leadership coordination via krishiv-scheduler. Failure handling foundations are in place.
Planned
On the roadmap but not yet implemented. Do not rely on these without maintainer confirmation.
Distributed IVM
Executor-side incremental view maintenance across a distributed cluster. Requires distributed IVM protocol design.
Full exactly-once guarantees
End-to-end exactly-once across arbitrary source/sink/checkpoint combinations. Currently scoped to certified combinations only.
Krishiv Cloud
Managed compute offering. Not yet implemented.