SlumberChunker
Agentic chunking powered by generative models via the Genie interface
Meet the SlumberChunker
– Chonkie’s first agentic chunker! This isn’t your average chunker; it uses the reasoning power of large generative models (LLMs) to understand your text deeply and create truly S-tier chunks.
API Reference
To use the SlumberChunker
via the API, check out the API reference documentation.
Introducing Genie! 🧞
The magic behind SlumberChunker
is Genie, Chonkie’s new interface for integrating generative models and APIs (like Gemini, OpenAI, Anthropic, etc.). Genie allows SlumberChunker
to intelligently analyze text structure, identify optimal split points, and even summarize or rephrase content for the best possible chunk quality.
Requires [genie] Install
To unleash the power of SlumberChunker and Genie, you need the [genie]
optional install. This includes the necessary libraries to connect to various generative model APIs.
Installation
As mentioned, SlumberChunker requires the [genie]
optional install:
Initialization
Parameters
An instance of a Genie interface (e.g., GeminiGenie
, OpenAIGenie
). If None
, tries to load a default Genie configuration. Required for operation.
Tokenizer or token counting function used for initial splitting and size estimation.
The target maximum number of tokens per chunk. Genie will try to adhere to this.
Initial recursive rules used to generate candidate split points before Genie refines them. See RecursiveChunker for details.
The number of tokens around a potential split point that Genie examines to make its decision.
Minimum number of characters required for a chunk to be considered valid.
Whether to return chunks as SlumberChunk
objects or plain text strings.
If True
, prints detailed information about Genie’s decision-making process during chunking. Useful for debugging!
Usage
Single Text Chunking
Batch Chunking
Using as a Callable
Return Type
SlumberChunker returns chunks as Chunk
objects, potentially with extra metadata attached depending on the configuration and Genie’s output.