POST
/
v1
/
chunk
/
late
curl --request POST \
  --url https://api.chonkie.ai/v1/chunk/late \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form embedding_model=all-MiniLM-L6-v2 \
  --form chunk_size=512 \
  --form recipe=default \
  --form lang=en \
  --form min_characters_per_chunk=24
[
  {
    "text": "<string>",
    "start_index": 123,
    "end_index": 123,
    "token_count": 123,
    "sentences": [
      {
        "text": "<string>",
        "start_index": 123,
        "end_index": 123,
        "token_count": 123
      }
    ],
    "embedding": [
      123
    ]
  }
]

Authorizations

Authorization
string
header
required

Your API Key from the Chonkie Cloud dashboard

Body

multipart/form-data
file
file

The file to chunk.

embedding_model
string
default:all-MiniLM-L6-v2

SentenceTransformer model identifier to use for embedding.

chunk_size
integer
default:512

Maximum number of tokens per chunk.

recipe
string
default:default

Pre-defined recursive rules for splitting. Find all recipes on our Hugging Face Hub.

lang
string
default:en

Language of the text, used with recipes. Must match the language of the recipe.

min_characters_per_chunk
integer
default:24

Minimum number of characters per chunk.

Response

200 - application/json
Successful Response: A list of `LateChunk` objects.
text
string

The actual text content of the chunk.

start_index
integer

The starting character index of the chunk within the original input text.

end_index
integer

The ending character index (exclusive) of the chunk within the original input text.

token_count
integer

The number of tokens in this specific chunk, according to the tokenizer used.

sentences
object[]

List of standard Sentence objects contained within this chunk.

Represents a single sentence with metadata, used within sentence-based chunks.

embedding
number[] | null

Optional embedding vector (list of floats) for the entire chunk, derived from the full document embedding.