The no-nonsense RAG chunking library that’s lightweight, fast, and ready to CHONK your texts!
Ever found yourself making a RAG pipeline yet again (your 2,342,148th one), only to realize you’re stuck having to write chunking with bloated software library X or the painfully feature-less library Y?
WHY CAN’T THIS JUST BE SIMPLE, UGH?Well, look no further than Chonkie! (chonkie boi is a gud boi 🦛)
Feature-rich
All the CHONKs you’d ever need for your RAG applications
Easy to use
Install, Import, CHONK - it’s that simple!
Lightning Fast
CHONK at the speed of light! zooooom
Wide Support
Supports all your favorite tokenizer, model and API CHONKs
Lightweight
No bloat, just CHONK - only 9.7MB base installation
Get started with Chonkie in three simple steps: Install, Import and CHONK!
Copy
Ask AI
pip install chonkie
Want more features? :
Copy
Ask AI
pip install chonkie[all]
Chonkie follows a special approach to dependencies, keeping the base installation lightweight while allowing you to add extra features as and when needed.
Please check the Installation page for more details.
Release the CHONK! 🦛✨
Copy
Ask AI
# First import the chunker you want from Chonkie from chonkie import TokenChunker# Initialize the chunkerchunker = TokenChunker() # defaults to using GPT2 tokenizer# Here's some text to chunktext = """Woah! Chonkie, the chunking library is so cool!"""# Chunk some textchunks = chunker(text)# Access chunksfor chunk in chunks: print(f"Chunk: {chunk.text}") print(f"Tokens: {chunk.token_count}")