NeuralChunker
Split text using a fine-tuned BERT model to detect semantic shifts
The NeuralChunker
leverages the power of deep learning! It uses a fine-tuned BERT model specifically trained to identify semantic shifts within text, allowing it to split documents at points where the topic or context changes significantly. This provides highly coherent chunks ideal for RAG.
API Reference
To use the NeuralChunker
via the API, check out the API reference documentation.
Installation
NeuralChunker requires specific dependencies for its deep learning model. You can install it with:
Initialization
Parameters
The identifier or path to the fine-tuned BERT model used for detecting semantic shifts.
The device to run the inference on (e.g., “cpu”, “cuda”, “mps”). Chonkie will try to auto-detect the best available device if not specified.
The minimum number of characters required for a text segment to be considered a valid chunk.
Whether to return chunks as NeuralChunk
objects or plain text strings.
Usage
Single Text Chunking
Batch Chunking
Using as a Callable
Return Type
NeuralChunker returns chunks as Chunk
objects.