Ever found yourself making a RAG pipeline yet again (your 2,342,148th one), only to realize you’re stuck having to write chunking with bloated software library X or the painfully feature-less library Y? WHY CAN’T THIS JUST BE SIMPLE, UGH?

Well, look no further than Chonkie! (chonkie boi is a gud boi 🦛)

Feature-rich

All the CHONKs you’d ever need for your RAG applications

Easy to use

Install, Import, CHONK - it’s that simple!

Lightning Fast

CHONK at the speed of light! zooooom

Wide Support

Supports all your favorite tokenizer, model and API CHONKs

Lightweight

No bloat, just CHONK - only 9.7MB base installation

Cute Mascot

psst it’s a pygmy hippo btw! Moto Moto approved


Quick Start

Get started with Chonkie in three simple steps: Install, Import and CHONK!

pip install chonkie

Want more features? :

pip install chonkie[all]

Chonkie follows a special approach to dependencies, keeping the base installation lightweight while allowing you to add extra features as and when needed. Please check the Installation page for more details.

Release the CHONK! 🦛✨

# First import the chunker you want from Chonkie 
from chonkie import TokenChunker

# Initialize the chunker
chunker = TokenChunker() # defaults to using GPT2 tokenizer

# Here's some text to chunk
text = """Woah! Chonkie, the chunking library is so cool!"""

# Chunk some text
chunks = chunker(text)

# Access chunks
for chunk in chunks:
    print(f"Chunk: {chunk.text}")
    print(f"Tokens: {chunk.token_count}")

Chonkie Cloud

Don’t wanna chunk locally? No problem! Chonkie Cloud is here to save the day!

  1. Make a free account on Chonkie Cloud
  2. Get your API key
  3. Send your CHONK reqests!
curl -X POST \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer <YOUR_API_KEY>" \
   -d '{
         "text": "The tiny hippo lives in the clouds!",
         "args": {}
       }' \
   https://api.chonkie.ai/v1/chunk/<your-favorite-chunker>

Build With Chonkie

Ready to learn more about Chonkie?


Support

Got questions? We’re here to help!