Session
Session class — entry point for all Krishiv Python workloads.
Constructors
| Method | Description |
|---|---|
Session.embedded() | Create an in-process embedded session. No daemon or cluster needed. |
Session.local() | Create a single-node local session. |
Session.from_env() | Create a session using KRISHIV_COORDINATOR environment variable. |
Session.connect(url: str) | Connect to a remote Krishiv coordinator at url. |
Session() | Create a default embedded session (same as Session.embedded()). |
SQL Methods
| Method | Returns | Description |
|---|---|---|
sql(query: str) -> DataFrame | DataFrame | Plan and return a lazy DataFrame for the given SQL. |
sql_as(query: str, name: str) -> DataFrame | DataFrame | Plan SQL and alias the result as a temp view. |
sql_with_timeout(query: str, timeout_ms: int) -> DataFrame | DataFrame | SQL with execution timeout in milliseconds. |
prepare(query: str) -> PreparedStatement | PreparedStatement | Create a parameterised prepared statement. |
execute_local(query: str) -> QueryResult | QueryResult | Execute SQL immediately and return all results. |
execute_remote(query: str) -> QueryHandle | QueryHandle | Submit SQL to a remote coordinator asynchronously. |
Data Registration
| Method | Returns | Description |
|---|---|---|
read_parquet(path: str) -> DataFrame | DataFrame | Read a local Parquet file. |
read_parquet_with_options(path, opts) -> DataFrame | DataFrame | Read Parquet with typed options (batch_size, etc.). |
read_csv(path: str) -> DataFrame | DataFrame | Read a local CSV file (auto-detects header). |
read_csv_with_options(path, opts) -> DataFrame | DataFrame | Read CSV with options (delimiter, has_header). |
read_json(path: str) -> DataFrame | DataFrame | Read a local NDJSON file. |
read_file(path: str) -> DataFrame | DataFrame | Read a file (format inferred from extension). |
register_parquet(name, path) | None | Register a Parquet file as a named SQL table. |
register_record_batches(name, batches) | None | Register PyArrow batches as a SQL table. |
register_unbounded(name, schema) | None | Register an unbounded streaming table. Returns a push handle. |
register_kafka_source(name, schema, brokers, topic, group) | None | Register a Kafka topic as a streaming table. |
table(name: str) -> DataFrame | DataFrame | Reference a registered table as a DataFrame. |
dataframe(record_batches) -> DataFrame | DataFrame | Create a DataFrame from a list of PyArrow RecordBatches. |
deregister_table(name) | None | Remove a registered table. |
table_exists(name: str) -> bool | bool | Check if a table is registered. |
list_tables() -> list[str] | list[str] | Return registered table names. |
UDF Registration
| Method | Description |
|---|---|
register_udf(name, fn, input_types, return_type) | Register a Python callable as a scalar UDF. |
register_aggregate_udf(name, accumulator_class) | Register a Python class as an aggregate UDF. |
register_table_udf(name, fn, schema) | Register a Python callable as a table-valued UDF. |
register_function(name, fn) | Register a generic Python function (type-inferred). |
list_udfs() -> list[str] | List registered scalar UDF names. |
list_aggregate_udfs() -> list[str] | List registered aggregate UDF names. |
list_table_udfs() -> list[str] | List registered table UDF names. |
Streaming Methods
| Method | Returns | Description |
|---|---|---|
stream(name: str) -> Stream | Stream | Reference a registered unbounded table as a Stream. |
memory_stream(schema) -> tuple[Stream, Sender] | (Stream, Sender) | Create an in-memory stream and its push handle. |
memory_stream_collect(schema, batches) -> Stream | Stream | Create a bounded stream pre-loaded with batches. |
from_bounded_stream(schema, batches) -> Stream | Stream | Create a bounded (finite) stream. |
from_source(source) -> Stream | Stream | Create a stream from a source connector object. |
submit_stream_job(plan, name) -> JobStatus | JobStatus | Submit a streaming plan to the scheduler. |
push_stream_job_input(job_id, batch) | None | Push a batch to a running stream job's unbounded input. |
poll_stream_job(job_id) -> JobStatus | JobStatus | Poll status of a submitted stream job. |
close_unbounded_input(table_name) | None | Signal end-of-stream for an unbounded input table. |
read_stream() -> DataStreamReader | DataStreamReader | Get a Spark-style structured streaming reader. |
submit_async(plan) -> QueryHandle | QueryHandle | Submit a streaming plan and get an async handle. |
Config and Auth
| Method | Description |
|---|---|
get_config(key: str) -> str | None | Read a session config value. |
set_config(key: str, value: str) | Set a session config value. |
unset_config(key: str) | Remove a session config override. |
configs() -> dict[str, str] | Return all current session configs. |
mode() -> str | Return the current execution mode string. |
with_auth_token(token: str) -> Session | Return a session copy with a bearer token attached. |
with_oidc_provider(provider) -> Session | Return a session copy with an OIDC auth provider. |
with_policy(hook) -> Session | Return a session copy with a governance policy hook. |
operation_registry() -> OperationRegistry | Access the operation cancellation/progress registry. |
jobs() -> list[JobStatus] | List running and recent jobs. |
live_table(name: str) -> LiveTable | Access a registered live table. |
is_streaming_query(query: str) -> bool | Check if a SQL string references a streaming source. |