SDPM Chunker
Extends semantic chunking using a double-pass merging approach (Semantic Double-Pass Merging). Connects related content even if not consecutive.
Authorizations
Your API Key from the Chonkie Cloud dashboard
Body
The file to chunk.
Model identifier or embedding model instance to use for semantic analysis.
Similarity threshold for grouping sentences. Can be a float [0,1] for direct threshold, int (1,100] for percentile, or 'auto' for automatic calculation.
"auto"
Mode for grouping sentences, either 'cumulative' or 'window'.
Maximum tokens per chunk.
Number of preceding sentences to consider for similarity comparison.
Minimum number of sentences per chunk.
Minimum number of characters per sentence.
Step size used when automatically calculating the similarity threshold.
Delimiters to split sentences on.
Include delimiters in the chunk text. If so, specify whether to include the previous or next delimiter.
prev
, next
Return type for the chunking process. If 'chunks', returns a list of SemanticChunk
objects. If 'texts', returns a list of strings.
texts
, chunks
Response
The actual text content of the chunk.
The starting character index of the chunk within the original input text.
The ending character index (exclusive) of the chunk within the original input text.
The number of tokens in this specific chunk, according to the tokenizer used.
List of SemanticSentence
objects contained within this chunk.
Represents a single sentence within a semantic chunk, including an optional embedding vector.