Split text into chunks based on semantic similarity with advanced features
SemanticChunker
splits text into chunks based on semantic similarity, ensuring that related content stays together in the same chunk. This chunker now includes advanced features like Savitzky-Golay filtering for smoother boundary detection and skip-window merging for connecting related content that may not be consecutive. This chunker is inspired by the work of Greg Kamradt.
SemanticChunker
via the API, check out the API reference documentation.
0
(default): No skip-and-merge, uses standard semantic grouping1
or higher: Enables merging of semantically similar groups within the skip windowBasic Semantic Chunking
Skip-Window Merging
Fine-tuned Similarity Control
Batch Document Processing
Custom Embeddings Integration
Advanced Filtering Options
Sentence Configuration
RAG Pipeline Integration
skip_window > 0
, the chunker can merge semantically similar groups that are not consecutive. This is useful for:
Chunk
objects:
skip_window > 0
(recommended)