Chonkie Documentation

Chonkie provides multiple chunking strategies to handle different text processing needs. Each chunker in Chonkie is designed to follow the same core principles outlined in the concepts page.

TokenChunker

Splits text into fixed-size token chunks. Best for maintaining consistent chunk sizes and working with token-based models.

SentenceChunker

Splits text at sentence boundaries. Perfect for maintaining semantic completeness at the sentence level.

RecursiveChunker

Recursively chunks documents into smaller chunks. Best for long documents with well-defined structure.

SemanticChunker

Groups content based on semantic similarity. Best for preserving context and topical coherence.

SDPMChunker

Chunks using Semantic Double-Pass Merging (SDPM) algorithm, best for maintaining topical coherence when text has frequent breaks.

LateChunker

Chunks using Late Chunking algorithm, best for higher recall in your RAG applications.

CodeChunker

Splits code based on its structure using ASTs. Ideal for chunking source code files.

NeuralChunker

Uses a fine-tuned BERT model to split text based on semantic shifts. Great for topic-coherent chunks.

SlumberChunker

Agentic chunking using generative models (LLMs) via the Genie interface for S-tier chunk quality. 🦛🧞

Availability

Different chunkers are available depending on your installation:

Chunker	Default	embeddings	’all’	Chonkie Cloud
TokenChunker	✅	✅	✅	✅
SentenceChunker	✅	✅	✅	✅
RecursiveChunker	✅	✅	✅	✅
CodeChunker	❌	✅	✅	✅
SemanticChunker	❌	✅	✅	✅
SDPMChunker	❌	✅	✅	✅
LateChunker	❌	✅	✅	✅
NeuralChunker	❌	✅	✅	✅
SlumberChunker	❌	✅	✅	✅

Common Interface

All chunkers share a consistent interface:

# Single text chunking
chunks = chunker.chunk(text)

# Batch processing
chunks = chunker.chunk_batch(texts)

# Direct calling
chunks = chunker(text)  # or chunker([text1, text2])

F.A.Q.

Are all the chunkers thread-safe?

Getting Started

Chunkers

Embeddings

Refinery

Handshakes

Porters

Utils

Experimental

Changelog

Chunkers Overview

TokenChunker

SentenceChunker

RecursiveChunker

SemanticChunker

SDPMChunker

LateChunker

CodeChunker

NeuralChunker

SlumberChunker

Availability

Common Interface

F.A.Q.

Getting Started

Chunkers

Embeddings

Refinery

Handshakes

Porters

Utils

Experimental

Changelog

TokenChunker

SentenceChunker

RecursiveChunker

SemanticChunker

SDPMChunker

LateChunker

CodeChunker

NeuralChunker

SlumberChunker

​Availability

​Common Interface

​F.A.Q.

Availability

Common Interface

F.A.Q.