Split text into fixed-size token chunks with configurable overlap
TokenChunker
splits text into chunks based on token count, ensuring each chunk stays within specified token limits. It is ideal for preparing text for models with token limits, or for consistent chunking across different texts.
TokenChunker
via the API, check out the API reference documentation.
Xenova/gpt2
tokenizer.Chunk
objects (with metadata) or plain text strings.Chunk
objects by default. Each chunk includes metadata:
returnType
is set to 'texts'
, only the chunked text strings are returned.