ProductDocsArchitectureBlogGitHubGitHubGet Started
Available

Checkpointing

Checkpoint and savepoint configuration, paths, and recovery.

Overview

Krishiv uses barrier-based checkpointing similar to Apache Flink. A barrier event is injected into all input streams simultaneously. When all operators have processed the barrier, the coordinator saves a consistent snapshot of all keyed state and source offsets.

Durability Profiles

ProfileState BackendCheckpoint Storage
dev-localIn-memoryEphemeral temp dir; not restart-durable.
single-node-durableRocksDB (local)Local filesystem. Survives process restarts on same host.
distributed-durableRocksDB (restored from checkpoint)Object store (S3/ADLS/GCS). Etcd tracks checkpoint metadata.

Configuration

Config KeyDefaultDescription
krishiv.checkpoint.interval_ms60000Checkpoint interval in milliseconds.
krishiv.checkpoint.storage.pathLocal path or object-store URI (e.g. s3://bucket/checkpoints/).
krishiv.checkpoint.retain3Number of completed checkpoints to retain.
krishiv.checkpoint.min_pause_ms500Minimum gap between successive checkpoint barriers.
krishiv.savepoint.pathManual savepoint output path. Triggered via session.take_savepoint().

Recovery

# Resume from latest checkpoint
cargo run -p krishiv -- start --job-id <id> --resume-from latest

# Resume from specific checkpoint
cargo run -p krishiv -- start --job-id <id> --resume-from <checkpoint-uri>

# Resume from savepoint
cargo run -p krishiv -- start --job-id <id> --resume-from savepoint://<path>
Warning: Savepoints are operator-topology-aware. Adding, removing, or reordering operators between a savepoint and the recovery job may fail or produce incorrect state restoration.