
Hacker News · Feb 28, 2026 · Collected from RSS
I built a document chunking library for RAG pipelines with a Rust core and Python bindings. The problem: LangChain's chunker is pure Python and becomes a bottleneck at scale — slow and memory-hungry on large document sets. What Krira Chunker does differently: - Rust-native processing — 40x faster than LangChain's implementation - O(1) space complexity — memory stays flat regardless of document size - Drop-in Python API — works with any existing RAG pipeline - Production-ready — 17 versions shipped, 315+ installs pip install krira-augment Would love brutal feedback from anyone building RAG systems — what chunking problems are you running into that this doesn't solve yet? Comments URL: https://news.ycombinator.com/item?id=47196069 Points: 4 # Comments: 0
Krira Augment Presents Krira Chunker(beta) High-Performance Rust Chunking Engine for RAG Pipelines Process gigabytes of text in seconds. 40x faster than LangChain with O(1) memory usage. Installation pip install krira-augment Quick Usage from krira_augment.krira_chunker import Pipeline, PipelineConfig, SplitStrategy config = PipelineConfig( chunk_size=512, strategy=SplitStrategy.SMART, clean_html=True, clean_unicode=True, ) pipeline = Pipeline(config=config) result = pipeline.process("sample.csv", output_path="output.jsonl") print(result) print(f"Chunks Created: {result.chunks_created}") print(f"Execution Time: {result.execution_time:.2f}s") print(f"Throughput: {result.mb_per_second:.2f} MB/s") print(f"Preview: {result.preview_chunks[:3]}") Performance Benchmark Processing 42.4 million chunks in 113.79 seconds (47.51 MB/s). ============================================================ ✅ KRIRA AUGMENT - Processing Complete ============================================================ 📊 Chunks Created: 42,448,765 ⏱️ Execution Time: 113.79 seconds 🚀 Throughput: 47.51 MB/s 📁 Output File: output.jsonl ============================================================ 📝 Preview (Top 3 Chunks): ------------------------------------------------------------ [1] event_time,event_type,product_id,category_id,category_code,brand,price,user_id,user_session [2] 2019-10-01 00:00:00 UTC,view,44600062,2103807459595387724,,shiseido,35.79,541312140,72d76fde-8bb3-4e00-8c23-a032dfed738c [3] 2019-10-01 00:00:00 UTC,view,3900821,2053013552326770905,appliances.environment.water_heater... Krira-Chunker Architecture Working of Krira-Chunker Complete Example: Local (ChromaDB) - FREE No API keys required. Runs entirely on your machine. pip install sentence-transformers chromadb from krira_augment.krira_chunker import Pipeline, PipelineConfig from sentence_transformers import SentenceTransformer import chromadb import json # Step 1: Chunk the file (Rust Core) config = PipelineConfig(chunk_size=512, chunk_overlap=50) pipeline = Pipeline(config=config) result = pipeline.process("sample.csv", output_path="chunks.jsonl") print(f"Chunks Created: {result.chunks_created}") print(f"Execution Time: {result.execution_time:.2f}s") print(f"Throughput: {result.mb_per_second:.2f} MB/s") print(f"Preview: {result.preview_chunks[:3]}") # Step 2: Embed and store (Local) print("Loading model...") model = SentenceTransformer('all-MiniLM-L6-v2') client = chromadb.Client() # Note: In new versions of Chroma, use get_or_create_collection collection = client.get_or_create_collection("my_rag_db") with open("chunks.jsonl", "r") as f: for line_num, line in enumerate(f, 1): chunk = json.loads(line) embedding = model.encode(chunk["text"]) # Handle empty metadata meta = chunk.get("metadata") collection.add( ids=[f"chunk_{line_num}"], embeddings=[embedding.tolist()], metadatas=[meta] if meta else None, documents=[chunk["text"]] ) if line_num % 100 == 0: print(f"Processed {line_num} chunks...") print("Done! All chunks stored in ChromaDB.") Cloud Integrations (OpenAI, Pinecone, Cohere) If you have API keys, you can swap Step 2 with these integrations: OpenAI + Pinecone pip install openai pinecone-client from openai import OpenAI from pinecone import Pinecone # API Keys OPENAI_API_KEY = "sk-..." PINECONE_API_KEY = "pcone-..." PINECONE_INDEX_NAME = "my-rag" client = OpenAI(api_key=OPENAI_API_KEY) pc = Pinecone(api_key=PINECONE_API_KEY) index = pc.Index(PINECONE_INDEX_NAME) with open("chunks.jsonl", "r") as f: for line_num, line in enumerate(f, 1): chunk = json.loads(line) response = client.embeddings.create( input=chunk["text"], model="text-embedding-3-small" ) embedding = response.data[0].embedding index.upsert(vectors=[(f"chunk_{line_num}", embedding, chunk.get("metadata", {}))]) if line_num % 100 == 0: print(f"Processed {line_num} chunks...") OpenAI + Qdrant pip install openai qdrant-client from openai import OpenAI from qdrant_client import QdrantClient from qdrant_client.models import PointStruct client = OpenAI(api_key="sk-...") qdrant = QdrantClient(url="https://xyz.qdrant.io", api_key="qdrant-...") with open("chunks.jsonl", "r") as f: for line_num, line in enumerate(f, 1): chunk = json.loads(line) response = client.embeddings.create(input=chunk["text"], model="text-embedding-3-small") embedding = response.data[0].embedding qdrant.upsert(collection_name="my-chunks", points=[PointStruct(id=line_num, vector=embedding, payload=chunk.get("metadata", {}))]) if line_num % 100 == 0: print(f"Processed {line_num} chunks...") OpenAI + Weaviate pip install openai weaviate-client import weaviate import weaviate.classes as wvc from openai import OpenAI # Connect to Weaviate Cloud client_w = weaviate.connect_to_wcs( cluster_url="https://xyz.weaviate.network", auth_credentials=weaviate.auth.AuthApiKey("weaviate-...") ) client_o = OpenAI(api_key="sk-...") # Get collection collection = client_w.collections.get("Chunk") with open("chunks.jsonl", "r") as f: for line_num, line in enumerate(f, 1): chunk = json.loads(line) response = client_o.embeddings.create(input=chunk["text"], model="text-embedding-3-small") embedding = response.data[0].embedding # Insert with vector collection.data.insert( properties={"text": chunk["text"], "metadata": str(chunk.get("metadata", {}))}, vector=embedding ) if line_num % 100 == 0: print(f"Processed {line_num} chunks...") Cohere + Pinecone pip install cohere pinecone-client import cohere from pinecone import Pinecone co = cohere.Client("co-...") pc = Pinecone(api_key="pcone-...") index = pc.Index("my-rag") with open("chunks.jsonl", "r") as f: for line_num, line in enumerate(f, 1): chunk = json.loads(line) response = co.embed(texts=[chunk["text"]], model="embed-english-v3.0") embedding = response.embeddings[0] index.upsert(vectors=[(f"chunk_{line_num}", embedding, chunk.get("metadata", {}))]) if line_num % 100 == 0: print(f"Processed {line_num} chunks...") Cohere + Qdrant pip install cohere qdrant-client import cohere from qdrant_client import QdrantClient from qdrant_client.models import PointStruct co = cohere.Client("co-...") qdrant = QdrantClient(url="https://xyz.qdrant.io", api_key="qdrant-...") with open("chunks.jsonl", "r") as f: for line_num, line in enumerate(f, 1): chunk = json.loads(line) response = co.embed(texts=[chunk["text"]], model="embed-english-v3.0") embedding = response.embeddings[0] qdrant.upsert( collection_name="my-chunks", points=[PointStruct(id=line_num, vector=embedding, payload=chunk.get("metadata", {}))] ) if line_num % 100 == 0: print(f"Processed {line_num} chunks...") Hugging Face + FAISS (FREE) pip install transformers torch faiss-cpu from transformers import AutoTokenizer, AutoModel import torch import torch.nn.functional as F import faiss import numpy as np import json # Helper for Mean Pooling def mean_pooling(model_output, attention_mask): token_embeddings = model_output[0] input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float() return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9) print("Loading model...") tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2") model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2") index = faiss.IndexFlatL2(384) batch_embeddings = [] BATCH_SIZE = 64 with open("chunks.jsonl", "r") as f: for line_num, line in enumerate(f, 1): chunk = json.loads(line) # Tokenize encoded_input = tokenizer(chunk["text"], padding=True, truncation=True, max_length=512, return_tensors='pt') # Compute Token Embeddings with torch.no_grad(): model_output = model(**encoded_input) # Pooling & Normalization sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask']) sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1) batch_embeddings.append(sentence_embeddings.squeeze().numpy()) if len(batch_embeddings) >= BATCH_SIZE: index.add(np.vstack(batch_embeddings).astype('float32')) batch_embeddings = [] if line_num % 100 == 0: print(f"Processed {line_num} chunks...") if batch_embeddings: index.add(np.vstack(batch_embeddings).astype('float32')) faiss.write_index(index, "my_vectors.index") print("Done! Vectors saved to my_vectors.index") Streaming Mode (No Files) Process chunks without saving to disk - maximum efficiency for real-time pipelines: Complete Example: OpenAI + Pinecone (Streaming) pip install openai pinecone-client from krira_augment.krira_chunker import Pipeline, PipelineConfig from openai import OpenAI from pinecone import Pinecone # API Keys OPENAI_API_KEY = "sk-..." # https://platform.openai.com/api-keys PINECONE_API_KEY = "pcone-..." # https://app.pinecone.io/ PINECONE_INDEX_NAME = "my-rag" # Initialize client = OpenAI(api_key=OPENAI_API_KEY) pc = Pinecone(api_key=PINECONE_API_KEY) index = pc.Index(PINECONE_INDEX_NAME) # Configure pipeline config = PipelineConfig(chunk_size=512, chunk_overlap=50) pipeline = Pipeline(config=config) # Stream and embed (no file created) chunk_count = 0 print("Starting streaming pipeline...") for chunk in pipeline.process_stream("data.csv"): chunk_count += 1 # Embed response = client.embeddings.create( input=chunk["text"], model="text-embedding-3-small" ) embedding = response.data[0].embedding # Store immediately index.upsert(vectors=[( f"chunk_{chunk_count}", embedding, chunk["metadata"] )]) # Progress if chunk_count % 100 == 0: print(f"Processed {chunk_count} chunks...") print(f"Done! Embedded {chunk_count} chunks. No intermediate file created.") Other Streaming Integrations Replace the embedding/storage logic with any of these: OpenAI + Qdrant (Streaming) pip install openai qdrant-client from krira_augment.krira_chunker import Pipeline, PipelineConfig from openai import OpenAI from qdrant_client import QdrantClient from qdrant_client.models import PointStruct # Initialize client = OpenAI(api_key="sk-...") qdrant = QdrantClient(url="https://xyz.qdrant.io", api_key="qdrant-...") # Configure and stream config =