The MongoDBHandshake class provides integration between Chonkie’s chunking system and MongoDB, a popular NoSQL database. Embed and store your Chonkie chunks in MongoDB directly from the Chonkie SDK.

Installation

Before using the MongoDB handshake, make sure to install the required dependencies:
pip install chonkie[mongodb]

Basic Usage

Initialization

from chonkie import MongoDBHandshake

# Initialize with default settings (auto-generated database/collection)
handshake = MongoDBHandshake(uri="mongodb://localhost:27017")

# Or connect to an existing MongoDB client
import pymongo
client = pymongo.MongoClient("mongodb://localhost:27017")
handshake = MongoDBHandshake(client=client, db_name="my_db", collection_name="my_collection")

# Specify embedding model and connection details
handshake = MongoDBHandshake(
    uri="mongodb://localhost:27017",
    db_name="my_db",
    collection_name="my_collection",
    embedding_model="minishlab/potion-retrieval-32M"
)

Writing Chunks to MongoDB

from chonkie import MongoDBHandshake, SemanticChunker    

# Initialize the handshake
handshake = MongoDBHandshake(uri="mongodb://localhost:27017", db_name="my_documents")

# Create some chunks
chunker = SemanticChunker()
chunks = chunker.chunk("Chonkie loves to chonk your texts!")

# Write chunks to MongoDB
handshake.write(chunks)

Parameters

client
Optional[pymongo.MongoClient]
default:"None"
MongoDB client instance. If not provided, a new client will be created based on other parameters.
uri
Optional[str]
default:"None"
MongoDB connection URI.
username
Optional[str]
default:"None"
MongoDB username for authentication.
password
Optional[str]
default:"None"
MongoDB password for authentication.
hostname
Optional[str]
default:"None"
MongoDB host address.
port
Optional[Union[int, str]]
default:"None"
MongoDB port number.
db_name
Union[str, Literal['random']]
default:"random"
Name of the database to use. If “random”, a unique name will be generated.
collection_name
Union[str, Literal['random']]
default:"random"
Name of the collection to use. If “random”, a unique name will be generated.
embedding_model
Union[str, BaseEmbeddings]
default:"minishlab/potion-retrieval-32M"
Embedding model to use. Can be a model name or a BaseEmbeddings instance.
**kwargs
Dict[str, Any]
default:"{}"
Additional keyword arguments to pass to the MongoDB client or collection creation.