Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.chonkie.ai/llms.txt

Use this file to discover all available pages before exploring further.

The TeraflopAIChunker uses the TeraflopAI Segmentation API to split text into semantically meaningful segments. It is especially useful for domain-specific segmentation such as legal documents.

Installation

TeraflopAI Chunker requires the teraflopai Python package:
pip install "chonkie[teraflopai]"
For general installation instructions, see the Installation Guide.

Initialization

from chonkie import TeraflopAIChunker

# Using an API key (or set the TERAFLOPAI_API_KEY environment variable)
chunker = TeraflopAIChunker(api_key="your_api_key_here")

Parameters

client
Optional[TeraflopAI]
default:"None"
An existing TeraflopAI client instance. If provided, url and api_key are ignored.
url
str
The URL for the TeraflopAI segmentation API endpoint.
api_key
Optional[str]
default:"None"
The API key for authentication. If not provided, it will be read from the TERAFLOPAI_API_KEY environment variable.
tokenizer
Union[str, TokenizerProtocol]
default:"character"
The tokenizer used to compute token counts for returned chunks.

Usage

Single Text Chunking

text = """
Global warming refers to the long-term increase in Earth’s average surface temperature due to human activities, primarily the emission of greenhouse gases such as carbon dioxide and methane. These gases trap heat in the atmosphere, leading to significant changes in climate patterns across the globe.

Scientists have observed rising temperatures, melting polar ice caps, and increasing sea levels, all of which pose serious risks to ecosystems and human societies. Extreme weather events such as hurricanes, droughts, and heatwaves are becoming more frequent and intense as a result of these changes.

Governments and organizations around the world are working to reduce emissions, transition to renewable energy sources, and promote sustainable practices. However, global cooperation and immediate action are essential to mitigate the long-term impacts and protect future generations from the most severe consequences of climate change.

Public awareness and individual responsibility also play a crucial role in addressing global warming. Simple actions like reducing energy consumption, minimizing waste, and supporting environmentally friendly initiatives can collectively make a meaningful difference in slowing down this global crisis.
"""

chunks = chunker.chunk(text)

for chunk in chunks:
    print(f"Chunk text: {chunk.text}")
    print(f"Token count: {chunk.token_count}")
    print(f"Start index: {chunk.start_index}")
    print(f"End index: {chunk.end_index}")

Batch Chunking

texts = [
    "First document to segment.",
    "Second document with more content to segment.",
]

batch_results = chunker(texts)

for i, chunks in enumerate(batch_results):
    print(f"Document {i}: {len(chunks)} chunks")

Using with Environment Variable

export TERAFLOPAI_API_KEY="your_api_key_here"
from chonkie import TeraflopAIChunker

# No need to pass api_key — it will be read from the environment
chunker = TeraflopAIChunker()
chunks = chunker.chunk("Your text here.")

How It Works

  1. The text is sent to the TeraflopAI Segmentation API endpoint.
  2. The API returns a list of text segments.
  3. Each segment is converted into a Chonkie Chunk object with proper start_index, end_index, and token_count fields.
The TeraflopAI Segmentation API performs the segmentation on the server side. This chunker requires an active internet connection and a valid API key.