Sinks
Sink classes for writing data to Parquet, Cassandra, Elasticsearch, HBase, and vector stores.
Overview
Sink classes accept Arrow RecordBatch data and write it to external systems. All sinks take &RecordBatch (borrow) — do not transfer ownership. They have no flush() method; batches are committed on write_batches() completion.
ParquetSink
ParquetSink(path: str)
| Method | Description |
|---|---|
path() -> str | Return the target file path. |
import krishiv as ks
sink = ks.ParquetSink("/tmp/output.parquet")
# Write is triggered by session.sql(...).write_parquet(sink)
CassandraSink
CassandraSink(hosts: list[str], keyspace: str, table: str)
| Method | Description |
|---|---|
write_batches(batches: list[RecordBatch]) | Write a list of Arrow batches to Cassandra. |
ElasticsearchSink
ElasticsearchSink(url: str, index: str)
| Method | Description |
|---|---|
write_batches(batches) | Bulk-index batches into Elasticsearch. |
HBaseSink
HBaseSink(zookeeper_quorum: str, table: str)
| Method | Description |
|---|---|
write_batches(batches) | Write batches to an HBase table. |
Vector Sinks
Vector sinks implement a shared interface: sink_name(), upsert_batch(batch), delete_by_ids(ids), query_nearest(vector, k). They require the vector-sinks or platform-specific Cargo feature.
| Sink | Constructor | Feature |
|---|---|---|
InMemoryVectorSink | InMemoryVectorSink(dim: int) | Always available |
LanceDbSink | LanceDbSink.open(path, table) | vector-sinks |
PineconeSink | PineconeSink(api_key, index_name) | vector-sinks |
QdrantSink | QdrantSink.connect(url, collection) | qdrant |
PgvectorSink | PgvectorSink.connect(conn_str, table) | pgvector |
WeaviateSink | WeaviateSink(url, class_name) | vector-sinks |
Vector sinks that depend on a disabled Cargo feature return a friendly
RuntimeError naming the missing feature and the maturin develop --features command.