Agentic chunking powered by generative models via the Genie interface
Meet the SlumberChunker – Chonkie’s first agentic chunker! This isn’t your average chunker; it uses the reasoning power of large generative models (LLMs) to understand your text deeply and create truly S-tier chunks.
The magic behind SlumberChunker is Genie, Chonkie’s new interface for integrating generative models and APIs (like Gemini, OpenAI, Anthropic, etc.). Genie allows SlumberChunker to intelligently analyze text structure, identify optimal split points, and even summarize or rephrase content for the best possible chunk quality.
Requires [genie] Install
To unleash the power of SlumberChunker and Genie, you need the [genie] optional install. This includes the necessary libraries to connect to various generative model APIs.
from chonkie import SlumberChunkerfrom chonkie.genie import GeminiGenie# Optional: Initialize Geniegenie = GeminiGenie("gemini-2.5-flash-preview-04-17")# Basic initializationchunker = SlumberChunker( genie=genie, # Genie interface to use tokenizer_or_token_counter="gpt2", # Tokenizer or token counter to use chunk_size=1024, # Maximum chunk size candidate_size=128, # How many tokens Genie looks at for potential splits min_characters_per_chunk=24, # Minimum number of characters per chunk verbose=True # See the progress bar for the chunking process)# You can also rely on default Genie setup if configured globally# chunker = SlumberChunker() # Uses default Genie if available
An instance of a Genie interface (e.g., GeminiGenie). If None, tries to load a default Genie configuration, which is GeminiGenie("gemini-2.5-pro-preview-03-25")
text = """Complex document with interwoven ideas. Section 1 introduces concept A.Section 2 discusses concept B, but references A frequently.Section 3 concludes by merging A and B. Traditional chunkers might struggle here."""# Assuming 'chunker' is initialized as shown abovechunks = chunker.chunk(text)for chunk in chunks: print(f"Chunk text: {chunk.text}") print(f"Token count: {chunk.token_count}") print(f"Start index: {chunk.start_index}") print(f"End index: {chunk.end_index}") # SlumberChunk might have additional metadata from Genie
SlumberChunker returns chunks as Chunk objects, potentially with extra metadata attached depending on the configuration and Genie’s output.
Copy
Ask AI
from dataclasses import dataclassfrom typing import Optional# Definition similar to TokenChunker's return type@dataclassclass Context: text: str token_count: int start_index: Optional[int] = None end_index: Optional[int] = None@dataclassclass Chunk: text: str # The chunk text start_index: int # Starting position in original text end_index: int # Ending position in original text token_count: int # Number of tokens in chunk context: Optional[Context] = None # Contextual information if any