Semantic Double-Pass Merging chunker - now integrated into SemanticChunker
SemanticChunker
.Recommended Migration:SDPMChunker
extends semantic chunking by using a double-pass merging approach. It first groups content by semantic similarity, then merges similar groups within a skip window, allowing it to connect related content that may not be consecutive in the text.
SemanticChunker
includes all SDPM functionality plus:
embedding_model
: Model identifier or embedding instancemode
: “cumulative” or “window” (removed in new version)threshold
: Similarity threshold (0-1) or “auto”chunk_size
: Maximum tokens per chunksimilarity_window
: Sentences for threshold calculationmin_sentences
: Minimum sentences per chunk (now min_sentences_per_chunk
)min_chunk_size
: Minimum tokens per chunk (removed in new version)min_characters_per_sentence
: Minimum characters per sentencethreshold_step
: Step size for threshold calculation (removed in new version)skip_window
: Number of chunks to skip when mergingSemanticChunk
objects with sentence details:
Chunk
objects: