Sinks | Krishiv

Sink classes for writing data to Parquet, Cassandra, Elasticsearch, HBase, and vector stores.

Overview

Sink classes accept Arrow RecordBatch data and write it to external systems. All sinks take &RecordBatch (borrow) — do not transfer ownership. They have no flush() method; batches are committed on write_batches() completion.

ParquetSink

ParquetSink(path: str)

Method	Description
`path() -> str`	Return the target file path.

import krishiv as ks
sink = ks.ParquetSink("/tmp/output.parquet")
# Write is triggered by session.sql(...).write_parquet(sink)

CassandraSink

CassandraSink(hosts: list[str], keyspace: str, table: str)

Method	Description
`write_batches(batches: list[RecordBatch])`	Write a list of Arrow batches to Cassandra.

ElasticsearchSink

ElasticsearchSink(url: str, index: str)

Method	Description
`write_batches(batches)`	Bulk-index batches into Elasticsearch.

HBaseSink

HBaseSink(zookeeper_quorum: str, table: str)

Method	Description
`write_batches(batches)`	Write batches to an HBase table.

Vector Sinks

Vector sinks implement a shared interface: sink_name(), upsert_batch(batch), delete_by_ids(ids), query_nearest(vector, k). They require the vector-sinks or platform-specific Cargo feature.

Sink	Constructor	Feature
`InMemoryVectorSink`	`InMemoryVectorSink(dim: int)`	Always available
`LanceDbSink`	`LanceDbSink.open(path, table)`	`vector-sinks`
`PineconeSink`	`PineconeSink(api_key, index_name)`	`vector-sinks`
`QdrantSink`	`QdrantSink.connect(url, collection)`	`qdrant`
`PgvectorSink`	`PgvectorSink.connect(conn_str, table)`	`pgvector`
`WeaviateSink`	`WeaviateSink(url, class_name)`	`vector-sinks`

Vector sinks that depend on a disabled Cargo feature return a friendly RuntimeError naming the missing feature and the maturin develop --features command.