Authorizations
Your API Key from the Chonkie Cloud dashboard
Body
The file to chunk.
Model identifier or embedding model instance to use for semantic analysis.
Similarity threshold for grouping sentences. Can be a float [0,1] for direct threshold, int (1,100] for percentile, or 'auto' for automatic calculation.
"auto"
Mode for grouping sentences, either 'cumulative' or 'window'.
Maximum tokens per chunk.
Number of preceding sentences to consider for similarity comparison.
Minimum number of sentences per chunk.
Minimum number of characters per sentence.
Step size used when automatically calculating the similarity threshold.
Delimiters to split sentences on.
Include delimiters in the chunk text. If so, specify whether to include the previous or next delimiter.
prev
, next
Return type for the chunking process. If 'chunks', returns a list of SemanticChunk
objects. If 'texts', returns a list of strings.
texts
, chunks
Response
Successful Response: A list of SemanticChunk
objects.
A list containing SemanticChunk
objects (as SDPM uses semantic chunking), detailing segments and sentences with optional embeddings.
The actual text content of the chunk.
The starting character index of the chunk within the original input text.
The ending character index (exclusive) of the chunk within the original input text.
The number of tokens in this specific chunk, according to the tokenizer used.
List of SemanticSentence
objects contained within this chunk.