POST
/
v1
/
chunk
/
code
curl --request POST \
  --url https://api.chonkie.ai/v1/chunk/code \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form tokenizer_or_token_counter=gpt2 \
  --form chunk_size=1500 \
  --form 'language=<string>' \
  --form include_nodes=false \
  --form return_type=chunks
[
  {
    "text": "<string>",
    "start_index": 123,
    "end_index": 123,
    "token_count": 123,
    "nodes": [
      {
        "text": "<string>",
        "start_byte": 123,
        "end_byte": 123,
        "type": "<string>"
      }
    ]
  }
]

Authorizations

Authorization
string
header
required

Your API Key from the Chonkie Cloud dashboard

Body

multipart/form-data
language
string
required

The programming language of the code. Accepts languages supported by tree-sitter-language-pack.

file
file

The file containing code to be chunked.

tokenizer_or_token_counter
string
default:gpt2

Tokenizer or token counting function to use for measuring chunk size.

chunk_size
integer
default:1500

Maximum number of tokens per chunk.

include_nodes
boolean
default:false

Whether to include the list of corresponding AST Node objects within each CodeChunk.

return_type
enum<string>
default:chunks

Whether to return chunks as CodeChunk objects or plain text strings.

Available options:
texts,
chunks

Response

200 - application/json
Successful Response: A list of `CodeChunk` objects.
text
string

The actual code text content of the chunk.

start_index
integer

The starting character index of the chunk within the original input code.

end_index
integer

The ending character index (exclusive) of the chunk within the original input code.

token_count
integer

The number of tokens in this specific chunk, according to the tokenizer used.

nodes
object[] | null

Optional list of AST Node objects corresponding to this code chunk (present if include_nodes is true).

Represents a node in the Abstract Syntax Tree (AST) of the code, used by CodeChunker.