Python
from chonkie.cloud import RecursiveChunker 

chunker = RecursiveChunker(api_key="{api_key}") 

chunks = chunker(text="YOUR_TEXT")
[
  {
    "text": "<string>",
    "start_index": 123,
    "end_index": 123,
    "token_count": 123,
    "level": 123
  }
]

Authorizations

Authorization
string
header
required

Your API Key from the Chonkie Cloud dashboard

Body

multipart/form-data
file
file

The file to chunk.

tokenizer_or_token_counter
string
default:gpt2

Tokenizer or token counting function to use. Can be a string identifier or an instance.

chunk_size
integer
default:2048

Maximum number of tokens per chunk.

recipe
string
default:default

Pre-defined rules for chunking. Find all recipes on our Hugging Face Hub.

lang
string
default:en

Language of the text, used with recipes. Must match the language of the recipe.

min_characters_per_chunk
integer
default:12

Minimum number of characters per chunk.

return_type
enum<string>
default:chunks

Whether to return chunks as text strings or as RecursiveChunk objects.

Available options:
texts,
chunks

Response

200 - application/json

Successful Response: A list of RecursiveChunk objects.

A list containing RecursiveChunk objects, each detailing a segment of the original text and its level in the recursive tree.

text
string

The actual text content of the chunk.

start_index
integer

The starting character index of the chunk within the original input text.

end_index
integer

The ending character index (exclusive) of the chunk within the original input text.

token_count
integer

The number of tokens in this specific chunk, according to the tokenizer used.

level
integer

The level of this chunk in the recursive splitting process (starts from 0).