Chonkie CLI
Chonkie provides a powerful Command Line Interface (CLI) to perform chunking and run pipelines directly from your terminal.Installation
The CLI is included with the defaultchonkie installation:
Basic Usage
The CLI provides a singlechonkie command with two primary subcommands:
chunk– Quickly chunk text or files.pipeline– Run full Chonkie pipelines (fetch → chef → chunk → refine → handbook).
Chunking Texts or Files
Use thechunk command to quickly chunk text or a single file.
Syntax:
--chunker: The chunking method to use (default:semantic). Options:semantic,token,sentence,recursive, etc.--chunk-size: Maximum number of tokens per chunk (e.g.,512,1024).--chunk-overlap: Number of tokens to overlap between chunks (e.g.,50,100).--threshold: Threshold for semantic similarity (0-1), used by semantic chunkers.--chunker-params: Additional chunker parameters askey=valuepairs. Can be used multiple times.--handshaker: Optional storage backend to export chunks.
Running Pipelines
Thepipeline command is more powerful and supports processing directories, applying chefs/refiners, and exporting data.
Syntax:
--d: Directory to process (mutually exclusive with text/file argument).--ext: File extensions to include when processing a directory (e.g.,.md,.txt). Can be used multiple times.--chef: Preprocessor to use (e.g.,text,markdown).--chef-params: Parameters for the chef askey=valuepairs. Can be used multiple times.--chunker: Chunking method (default:semantic).--chunk-size: Maximum number of tokens per chunk.--chunk-overlap: Number of tokens to overlap between chunks.--threshold: Threshold for semantic similarity (0-1).--chunker-params: Additional chunker parameters askey=valuepairs. Can be used multiple times.--refiner: Optional refinement strategy (e.g.,overlap).--refiner-params: Parameters for the refiner askey=valuepairs. Can be used multiple times.--handshaker: Optional destination storage.--handshaker-params: Parameters for the handshaker askey=valuepairs. Can be used multiple times.
1. Process a Directory
Process all markdown and text files in thedocs directory:
2. Process a Single File
Run a pipeline on a single file:3. Pipeline with Custom Chunking Parameters
Use explicit parameters and additional chunker options:4. Pipeline with Multiple Component Parameters
Configure chef, chunker, and refiner with custom parameters:5. Full RAG Pipeline
Run a full RAG pipeline: fetch from directory -> process markdown -> chunk recursively -> export to ChromaDB.Parameter Configuration
Explicit Parameters
For commonly used parameters, you can use dedicated options:--chunk-size: Set the maximum tokens per chunk--chunk-overlap: Set overlap between chunks--threshold: Set semantic similarity threshold
Key-Value Parameters
For additional or component-specific parameters, use the*_params options with key=value syntax:
true/false→ booleannone/null→ None- Numeric strings → int or float
- Other strings → string
--chunk-size) override values in --chunker-params if both are provided.
Tips
- Use
--helpon any command to see full options:chonkie pipeline --help. - Directory processing recursively walks subdirectories.
- Output is printed to stdout by default unless a handshaker is specified.
- Combine explicit parameters with
*_paramsfor maximum flexibility. - Check component documentation for available parameters for each chunker, chef, refiner, or handshaker.
