Installing Chonkie and its various components
Chunker | Default | embeddings | ’all’ |
---|---|---|---|
TokenChunker | ✅ | ✅ | ✅ |
RecursiveChunker | ✅ | ✅ | ✅ |
SentenceChunker | ✅ | ✅ | ✅ |
SemanticChunker | ❌ | ✅ | ❌ |
SDPMChunker | ❌ | ✅ | ❌ |
LateChunker | ❌ | ✅ | ❌ |
CodeChunker | ❌ | ❌ | ✅ |
NeuralChunker | ❌ | ✅ | ✅ |
SlumberChunker | ❌ | ✅ | ✅ |
Embeddings Provider | Default | ’model2vec' | 'st' | 'openai' | 'semantic' | 'all’ |
---|---|---|---|---|---|---|
Model2VecEmbeddings | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ |
SentenceTransformerEmbeddings | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ |
OpenAIEmbeddings | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ |
Installation Option | Additional Dependencies |
---|---|
Default | autotiktokenizer |
’hub’ | + huggingface-hub, jsonschema |
’viz’ | + rich |
’model2vec’ | + model2vec, numpy |
’st’ | + sentence-transformers, numpy, accelerate |
’openai’ | + openai, tiktoken, numpy |
’cohere’ | + cohere, numpy |
’jina’ | + numpy |
’semantic’ | + model2vec, numpy |
’code’ | + tree-sitter, tree-sitter-language-pack, magika |
’neural’ | + transformers, torch (or tensorflow/flax), sentencepiece |
’genie’ | + pydantic, google-genai |
’all’ | all above dependencies |
[genie]
might vary slightly based on implementation details and chosen models/APIs.)
semantic
and all
installs pre-packaged that might match other installation options breeding redundancy. This redundancy is intentional to provide users with the best experience and freedom to choose their preferred means.semantic
and all
optional installs may change in future versions, so what you download today may not be the same for tomorrow.