Use this file to discover all available pages before exploring further.
The TeraflopAIChunker uses the TeraflopAI Segmentation API to split text into semantically meaningful segments. It is especially useful for domain-specific segmentation such as legal documents.
from chonkie import TeraflopAIChunker# Using an API key (or set the TERAFLOPAI_API_KEY environment variable)chunker = TeraflopAIChunker(api_key="your_api_key_here")
text = """Global warming refers to the long-term increase in Earth’s average surface temperature due to human activities, primarily the emission of greenhouse gases such as carbon dioxide and methane. These gases trap heat in the atmosphere, leading to significant changes in climate patterns across the globe.Scientists have observed rising temperatures, melting polar ice caps, and increasing sea levels, all of which pose serious risks to ecosystems and human societies. Extreme weather events such as hurricanes, droughts, and heatwaves are becoming more frequent and intense as a result of these changes.Governments and organizations around the world are working to reduce emissions, transition to renewable energy sources, and promote sustainable practices. However, global cooperation and immediate action are essential to mitigate the long-term impacts and protect future generations from the most severe consequences of climate change.Public awareness and individual responsibility also play a crucial role in addressing global warming. Simple actions like reducing energy consumption, minimizing waste, and supporting environmentally friendly initiatives can collectively make a meaningful difference in slowing down this global crisis."""chunks = chunker.chunk(text)for chunk in chunks: print(f"Chunk text: {chunk.text}") print(f"Token count: {chunk.token_count}") print(f"Start index: {chunk.start_index}") print(f"End index: {chunk.end_index}")
texts = [ "First document to segment.", "Second document with more content to segment.",]batch_results = chunker(texts)for i, chunks in enumerate(batch_results): print(f"Document {i}: {len(chunks)} chunks")
from chonkie import TeraflopAIChunker# No need to pass api_key — it will be read from the environmentchunker = TeraflopAIChunker()chunks = chunker.chunk("Your text here.")