TokenChunker
Split text into fixed-size token chunks with configurable overlap
The TokenChunker
splits text into chunks based on token count, ensuring each chunk stays within specified token limits.
API Reference
To use the TokenChunker
via the API, check out the API reference documentation.
Installation
TokenChunker is included in the base installation of Chonkie. No additional dependencies are required.
Initialization
Parameters
Tokenizer to use. Can be a string identifier or a tokenizer instance
Maximum number of tokens per chunk
Number or percentage of overlapping tokens between chunks
Whether to return chunks as Chunk
objects or plain text strings
Usage
Single Text Chunking
Batch Chunking
Using as a Callable
Supported Tokenizers
TokenChunker supports multiple tokenizer backends:
-
TikToken (Recommended)
-
AutoTikTokenizer
-
Hugging Face Tokenizers
-
Transformers
Return Type
TokenChunker returns chunks as Chunk
objects.
Chunks object include a custom Context
class for additional metadata alongside other attributes: