Late Chunker
Implements the late chunking strategy. Encodes the entire text, then splits recursively, deriving chunk embeddings from the full document embedding.
Authorizations
Your API Key from the Chonkie Cloud dashboard
Body
The file to chunk.
SentenceTransformer model identifier to use for embedding.
Maximum number of tokens per chunk.
Pre-defined recursive rules for splitting. Find all recipes on our Hugging Face Hub.
Language of the text, used with recipes. Must match the language of the recipe.
Minimum number of characters per chunk.
Response
The actual text content of the chunk.
The starting character index of the chunk within the original input text.
The ending character index (exclusive) of the chunk within the original input text.
The number of tokens in this specific chunk, according to the tokenizer used.
List of standard Sentence
objects contained within this chunk.
Represents a single sentence with metadata, used within sentence-based chunks.
Optional embedding vector (list of floats) for the entire chunk, derived from the full document embedding.