ProductDocsArchitectureBlogGitHubGitHubGet Started
Preview

Sinks

Sink classes for writing data to Parquet, Cassandra, Elasticsearch, HBase, and vector stores.

Overview

Sink classes accept Arrow RecordBatch data and write it to external systems. All sinks take &RecordBatch (borrow) — do not transfer ownership. They have no flush() method; batches are committed on write_batches() completion.

ParquetSink

ParquetSink(path: str)
MethodDescription
path() -> strReturn the target file path.
import krishiv as ks
sink = ks.ParquetSink("/tmp/output.parquet")
# Write is triggered by session.sql(...).write_parquet(sink)

CassandraSink

CassandraSink(hosts: list[str], keyspace: str, table: str)
MethodDescription
write_batches(batches: list[RecordBatch])Write a list of Arrow batches to Cassandra.

ElasticsearchSink

ElasticsearchSink(url: str, index: str)
MethodDescription
write_batches(batches)Bulk-index batches into Elasticsearch.

HBaseSink

HBaseSink(zookeeper_quorum: str, table: str)
MethodDescription
write_batches(batches)Write batches to an HBase table.

Vector Sinks

Vector sinks implement a shared interface: sink_name(), upsert_batch(batch), delete_by_ids(ids), query_nearest(vector, k). They require the vector-sinks or platform-specific Cargo feature.

SinkConstructorFeature
InMemoryVectorSinkInMemoryVectorSink(dim: int)Always available
LanceDbSinkLanceDbSink.open(path, table)vector-sinks
PineconeSinkPineconeSink(api_key, index_name)vector-sinks
QdrantSinkQdrantSink.connect(url, collection)qdrant
PgvectorSinkPgvectorSink.connect(conn_str, table)pgvector
WeaviateSinkWeaviateSink(url, class_name)vector-sinks
Vector sinks that depend on a disabled Cargo feature return a friendly RuntimeError naming the missing feature and the maturin develop --features command.