Semantic Chunker
Splits text into chunks based on semantic similarity, ensuring that related content stays together in the same chunk.
Authorizations
Your API Key from the Chonkie Cloud dashboard
Body
The file to chunk.
Model identifier or embedding model instance to use for semantic analysis.
Similarity threshold for grouping sentences. Can be a float [0,1] for direct threshold, int (1,100] for percentile, or 'auto' for automatic calculation.
"auto"
Maximum tokens per chunk.
Number of preceding sentences to consider for similarity comparison.
Minimum number of sentences per chunk.
Minimum tokens per chunk (optional).
Minimum number of characters per sentence.
Step size used when automatically calculating the similarity threshold.
Delimiters to split sentences on.
Include delimiters in the chunk text. If so, specify whether to include the previous or next delimiter.
prev
, next
Return type for chunking. If 'chunks', returns a list of SemanticChunk
objects. If 'texts', returns a list of strings.
texts
, chunks
Response
The actual text content of the chunk.
The starting character index of the chunk within the original input text.
The ending character index (exclusive) of the chunk within the original input text.
The number of tokens in this specific chunk, according to the tokenizer used.
List of SemanticSentence
objects contained within this chunk.
Represents a single sentence within a semantic chunk, including an optional embedding vector.