Show HN: Rust-powered document chunker for RAG – 40x faster, O(1) memory

Hacker News

Published about 4 hours ago

Show HN: Rust-powered document chunker for RAG – 40x faster, O(1) memory

Hacker News · Feb 28, 2026 · Collected from RSS

Summary

I built a document chunking library for RAG pipelines with a Rust core and Python bindings. The problem: LangChain's chunker is pure Python and becomes a bottleneck at scale — slow and memory-hungry on large document sets. What Krira Chunker does differently: - Rust-native processing — 40x faster than LangChain's implementation - O(1) space complexity — memory stays flat regardless of document size - Drop-in Python API — works with any existing RAG pipeline - Production-ready — 17 versions shipped, 315+ installs pip install krira-augment Would love brutal feedback from anyone building RAG systems — what chunking problems are you running into that this doesn't solve yet? Comments URL: https://news.ycombinator.com/item?id=47196069 Points: 4 # Comments: 0

Full Article

Krira Augment Presents Krira Chunker(beta) High-Performance Rust Chunking Engine for RAG Pipelines Process gigabytes of text in seconds. 40x faster than LangChain with O(1) memory usage. Installation pip install krira-augment Quick Usage from krira_augment.krira_chunker import Pipeline, PipelineConfig, SplitStrategy config = PipelineConfig( chunk_size=512, strategy=SplitStrategy.SMART, clean_html=True, clean_unicode=True, ) pipeline = Pipeline(config=config) result = pipeline.process("sample.csv", output_path="output.jsonl") print(result) print(f"Chunks Created: {result.chunks_created}") print(f"Execution Time: {result.execution_time:.2f}s") print(f"Throughput: {result.mb_per_second:.2f} MB/s") print(f"Preview: {result.preview_chunks[:3]}") Performance Benchmark Processing 42.4 million chunks in 113.79 seconds (47.51 MB/s). ============================================================ ✅ KRIRA AUGMENT - Processing Complete ============================================================ 📊 Chunks Created: 42,448,765 ⏱️ Execution Time: 113.79 seconds 🚀 Throughput: 47.51 MB/s 📁 Output File: output.jsonl ============================================================ 📝 Preview (Top 3 Chunks): ------------------------------------------------------------ [1] event_time,event_type,product_id,category_id,category_code,brand,price,user_id,user_session [2] 2019-10-01 00:00:00 UTC,view,44600062,2103807459595387724,,shiseido,35.79,541312140,72d76fde-8bb3-4e00-8c23-a032dfed738c [3] 2019-10-01 00:00:00 UTC,view,3900821,2053013552326770905,appliances.environment.water_heater... Krira-Chunker Architecture Working of Krira-Chunker Complete Example: Local (ChromaDB) - FREE No API keys required. Runs entirely on your machine. pip install sentence-transformers chromadb from krira_augment.krira_chunker import Pipeline, PipelineConfig from sentence_transformers import SentenceTransformer import chromadb import json # Step 1: Chunk the file (Rust Core) config = PipelineConfig(chunk_size=512, chunk_overlap=50) pipeline = Pipeline(config=config) result = pipeline.process("sample.csv", output_path="chunks.jsonl") print(f"Chunks Created: {result.chunks_created}") print(f"Execution Time: {result.execution_time:.2f}s") print(f"Throughput: {result.mb_per_second:.2f} MB/s") print(f"Preview: {result.preview_chunks[:3]}") # Step 2: Embed and store (Local) print("Loading model...") model = SentenceTransformer('all-MiniLM-L6-v2') client = chromadb.Client() # Note: In new versions of Chroma, use get_or_create_collection collection = client.get_or_create_collection("my_rag_db") with open("chunks.jsonl", "r") as f: for line_num, line in enumerate(f, 1): chunk = json.loads(line) embedding = model.encode(chunk["text"]) # Handle empty metadata meta = chunk.get("metadata") collection.add( ids=[f"chunk_{line_num}"], embeddings=[embedding.tolist()], metadatas=[meta] if meta else None, documents=[chunk["text"]] ) if line_num % 100 == 0: print(f"Processed {line_num} chunks...") print("Done! All chunks stored in ChromaDB.") Cloud Integrations (OpenAI, Pinecone, Cohere) If you have API keys, you can swap Step 2 with these integrations: OpenAI + Pinecone pip install openai pinecone-client from openai import OpenAI from pinecone import Pinecone # API Keys OPENAI_API_KEY = "sk-..." PINECONE_API_KEY = "pcone-..." PINECONE_INDEX_NAME = "my-rag" client = OpenAI(api_key=OPENAI_API_KEY) pc = Pinecone(api_key=PINECONE_API_KEY) index = pc.Index(PINECONE_INDEX_NAME) with open("chunks.jsonl", "r") as f: for line_num, line in enumerate(f, 1): chunk = json.loads(line) response = client.embeddings.create( input=chunk["text"], model="text-embedding-3-small" ) embedding = response.data[0].embedding index.upsert(vectors=[(f"chunk_{line_num}", embedding, chunk.get("metadata", {}))]) if line_num % 100 == 0: print(f"Processed {line_num} chunks...") OpenAI + Qdrant pip install openai qdrant-client from openai import OpenAI from qdrant_client import QdrantClient from qdrant_client.models import PointStruct client = OpenAI(api_key="sk-...") qdrant = QdrantClient(url="https://xyz.qdrant.io", api_key="qdrant-...") with open("chunks.jsonl", "r") as f: for line_num, line in enumerate(f, 1): chunk = json.loads(line) response = client.embeddings.create(input=chunk["text"], model="text-embedding-3-small") embedding = response.data[0].embedding qdrant.upsert(collection_name="my-chunks", points=[PointStruct(id=line_num, vector=embedding, payload=chunk.get("metadata", {}))]) if line_num % 100 == 0: print(f"Processed {line_num} chunks...") OpenAI + Weaviate pip install openai weaviate-client import weaviate import weaviate.classes as wvc from openai import OpenAI # Connect to Weaviate Cloud client_w = weaviate.connect_to_wcs( cluster_url="https://xyz.weaviate.network", auth_credentials=weaviate.auth.AuthApiKey("weaviate-...") ) client_o = OpenAI(api_key="sk-...") # Get collection collection = client_w.collections.get("Chunk") with open("chunks.jsonl", "r") as f: for line_num, line in enumerate(f, 1): chunk = json.loads(line) response = client_o.embeddings.create(input=chunk["text"], model="text-embedding-3-small") embedding = response.data[0].embedding # Insert with vector collection.data.insert( properties={"text": chunk["text"], "metadata": str(chunk.get("metadata", {}))}, vector=embedding ) if line_num % 100 == 0: print(f"Processed {line_num} chunks...") Cohere + Pinecone pip install cohere pinecone-client import cohere from pinecone import Pinecone co = cohere.Client("co-...") pc = Pinecone(api_key="pcone-...") index = pc.Index("my-rag") with open("chunks.jsonl", "r") as f: for line_num, line in enumerate(f, 1): chunk = json.loads(line) response = co.embed(texts=[chunk["text"]], model="embed-english-v3.0") embedding = response.embeddings[0] index.upsert(vectors=[(f"chunk_{line_num}", embedding, chunk.get("metadata", {}))]) if line_num % 100 == 0: print(f"Processed {line_num} chunks...") Cohere + Qdrant pip install cohere qdrant-client import cohere from qdrant_client import QdrantClient from qdrant_client.models import PointStruct co = cohere.Client("co-...") qdrant = QdrantClient(url="https://xyz.qdrant.io", api_key="qdrant-...") with open("chunks.jsonl", "r") as f: for line_num, line in enumerate(f, 1): chunk = json.loads(line) response = co.embed(texts=[chunk["text"]], model="embed-english-v3.0") embedding = response.embeddings[0] qdrant.upsert( collection_name="my-chunks", points=[PointStruct(id=line_num, vector=embedding, payload=chunk.get("metadata", {}))] ) if line_num % 100 == 0: print(f"Processed {line_num} chunks...") Hugging Face + FAISS (FREE) pip install transformers torch faiss-cpu from transformers import AutoTokenizer, AutoModel import torch import torch.nn.functional as F import faiss import numpy as np import json # Helper for Mean Pooling def mean_pooling(model_output, attention_mask): token_embeddings = model_output[0] input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float() return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9) print("Loading model...") tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2") model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2") index = faiss.IndexFlatL2(384) batch_embeddings = [] BATCH_SIZE = 64 with open("chunks.jsonl", "r") as f: for line_num, line in enumerate(f, 1): chunk = json.loads(line) # Tokenize encoded_input = tokenizer(chunk["text"], padding=True, truncation=True, max_length=512, return_tensors='pt') # Compute Token Embeddings with torch.no_grad(): model_output = model(**encoded_input) # Pooling & Normalization sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask']) sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1) batch_embeddings.append(sentence_embeddings.squeeze().numpy()) if len(batch_embeddings) >= BATCH_SIZE: index.add(np.vstack(batch_embeddings).astype('float32')) batch_embeddings = [] if line_num % 100 == 0: print(f"Processed {line_num} chunks...") if batch_embeddings: index.add(np.vstack(batch_embeddings).astype('float32')) faiss.write_index(index, "my_vectors.index") print("Done! Vectors saved to my_vectors.index") Streaming Mode (No Files) Process chunks without saving to disk - maximum efficiency for real-time pipelines: Complete Example: OpenAI + Pinecone (Streaming) pip install openai pinecone-client from krira_augment.krira_chunker import Pipeline, PipelineConfig from openai import OpenAI from pinecone import Pinecone # API Keys OPENAI_API_KEY = "sk-..." # https://platform.openai.com/api-keys PINECONE_API_KEY = "pcone-..." # https://app.pinecone.io/ PINECONE_INDEX_NAME = "my-rag" # Initialize client = OpenAI(api_key=OPENAI_API_KEY) pc = Pinecone(api_key=PINECONE_API_KEY) index = pc.Index(PINECONE_INDEX_NAME) # Configure pipeline config = PipelineConfig(chunk_size=512, chunk_overlap=50) pipeline = Pipeline(config=config) # Stream and embed (no file created) chunk_count = 0 print("Starting streaming pipeline...") for chunk in pipeline.process_stream("data.csv"): chunk_count += 1 # Embed response = client.embeddings.create( input=chunk["text"], model="text-embedding-3-small" ) embedding = response.data[0].embedding # Store immediately index.upsert(vectors=[( f"chunk_{chunk_count}", embedding, chunk["metadata"] )]) # Progress if chunk_count % 100 == 0: print(f"Processed {chunk_count} chunks...") print(f"Done! Embedded {chunk_count} chunks. No intermediate file created.") Other Streaming Integrations Replace the embedding/storage logic with any of these: OpenAI + Qdrant (Streaming) pip install openai qdrant-client from krira_augment.krira_chunker import Pipeline, PipelineConfig from openai import OpenAI from qdrant_client import QdrantClient from qdrant_client.models import PointStruct # Initialize client = OpenAI(api_key="sk-...") qdrant = QdrantClient(url="https://xyz.qdrant.io", api_key="qdrant-...") # Configure and stream config =

Share this story

Read Original at Hacker News

Hacker Newsabout 2 hours ago

Verified Spec-Driven Development (VSDD)

Article URL: https://gist.github.com/dollspace-gay/d8d3bc3ecf4188df049d7a4726bb2a00 Comments URL: https://news.ycombinator.com/item?id=47197595 Points: 19 # Comments: 6

Hacker Newsabout 2 hours ago

The whole thing was a scam

Article URL: https://garymarcus.substack.com/p/the-whole-thing-was-scam Comments URL: https://news.ycombinator.com/item?id=47197505 Points: 33 # Comments: 2

Hacker Newsabout 3 hours ago

Obsidian Sync now has a headless client

Article URL: https://help.obsidian.md/sync/headless Comments URL: https://news.ycombinator.com/item?id=47197267 Points: 94 # Comments: 32

Hacker Newsabout 3 hours ago

Show HN: SQLite for Rivet Actors – one database per agent, tenant, or document

Hey HN! We posted Rivet Actors here previously [1] as an open-source alternative to Cloudflare Durable Objects. Today we've released SQLite storage for actors (Apache 2.0). Every actor gets its own SQLite database. This means you can have millions of independent databases: one for each agent, tenant, user, or document. Useful for: - AI agents: per-agent DB for message history, state, embeddings - Multi-tenant SaaS: real per-tenant isolation, no RLS hacks - Collaborative documents: each document gets its own database with built-in multiplayer - Per-user databases: isolated, scales horizontally, runs at the edge The idea of splitting data per entity isn't new: Cassandra and DynamoDB use partition keys to scale horizontally, but you're stuck with rigid schemas ("single-table design" [3]), limited queries, and painful migrations. SQLite per entity gives you the same scalability without those tradeoffs [2]. How this compares: - Cloudflare Durable Objects & Agents: most similar to Rivet Actors with colocated SQLite and compute, but closed-source and vendor-locked - Turso Cloud: Great platform, but closed-source + diff use case. Clients query over the network, so reads are slow or stale. Rivet's single-writer actor model keeps reads local and fresh. - D1, Turso (the DB), Litestream, rqlite, LiteFS: great tools for running a single SQLite database with replication. Rivet is for running lots of isolated databases. Under the hood, SQLite runs in-process with each actor. A custom VFS persists writes to HA storage (FoundationDB or Postgres). Rivet Actors also provide realtime (WebSockets), React integration (useActor), horizontal scalability, and actors that sleep when idle. GitHub: https://github.com/rivet-dev/rivet Docs: https://www.rivet.dev/docs/actors/sqlite/ [1] https://news.ycombinator.com/item?id=42472519 [2] https://rivet.dev/blog/2025-02-16-sqlite-on-the-server-is-mi... [3] https://www.alexdebrie.com/posts/dynamodb-single-table/ Comments URL: https://news.ycombinator.

Hacker Newsabout 3 hours ago

Cognitive Debt: When Velocity Exceeds Comprehension

Article URL: https://www.rockoder.com/beyondthecode/cognitive-debt-when-velocity-exceeds-comprehension/ Comments URL: https://news.ycombinator.com/item?id=47196582 Points: 197 # Comments: 79

Hacker Newsabout 5 hours ago

Please do not use auto-scrolling content on the web and in applications

Article URL: https://cerovac.com/a11y/2026/01/please-do-not-use-auto-scrolling-content-on-the-web-and-in-applications/ Comments URL: https://news.ycombinator.com/item?id=47195582 Points: 35 # Comments: 1

All Articles

Hacker News

Published about 4 hours ago

Show HN: Rust-powered document chunker for RAG – 40x faster, O(1) memory

Hacker News · Feb 28, 2026 · Collected from RSS

Summary

Full Article

Share this story

Read Original at Hacker News

Hacker Newsabout 2 hours ago

Verified Spec-Driven Development (VSDD)

Article URL: https://gist.github.com/dollspace-gay/d8d3bc3ecf4188df049d7a4726bb2a00 Comments URL: https://news.ycombinator.com/item?id=47197595 Points: 19 # Comments: 6

Hacker Newsabout 2 hours ago

The whole thing was a scam

Article URL: https://garymarcus.substack.com/p/the-whole-thing-was-scam Comments URL: https://news.ycombinator.com/item?id=47197505 Points: 33 # Comments: 2

Hacker Newsabout 3 hours ago

Obsidian Sync now has a headless client

Article URL: https://help.obsidian.md/sync/headless Comments URL: https://news.ycombinator.com/item?id=47197267 Points: 94 # Comments: 32

Hacker Newsabout 3 hours ago

Show HN: SQLite for Rivet Actors – one database per agent, tenant, or document

Hacker Newsabout 3 hours ago

Cognitive Debt: When Velocity Exceeds Comprehension

Article URL: https://www.rockoder.com/beyondthecode/cognitive-debt-when-velocity-exceeds-comprehension/ Comments URL: https://news.ycombinator.com/item?id=47196582 Points: 197 # Comments: 79

Hacker Newsabout 5 hours ago

Show HN: Rust-powered document chunker for RAG – 40x faster, O(1) memory

Full Article

Related Articles

Verified Spec-Driven Development (VSDD)

The whole thing was a scam

Obsidian Sync now has a headless client

Show HN: SQLite for Rivet Actors – one database per agent, tenant, or document

Cognitive Debt: When Velocity Exceeds Comprehension

Please do not use auto-scrolling content on the web and in applications

Show HN: Rust-powered document chunker for RAG – 40x faster, O(1) memory

Full Article

Related Articles

Verified Spec-Driven Development (VSDD)

The whole thing was a scam

Obsidian Sync now has a headless client

Show HN: SQLite for Rivet Actors – one database per agent, tenant, or document

Cognitive Debt: When Velocity Exceeds Comprehension

Please do not use auto-scrolling content on the web and in applications