Archive

103 posts · Page 7 of 9. ← Blog

· Engineering ·5 min read

HyDE — generate a hypothetical answer to improve retrieval

Embedding a question and embedding an answer often produce different vectors. HyDE generates a hypothetical answer to the question, embeds *that*, and retrieves on it. Retrieval quality goes up disproportionately.

· Engineering ·5 min read

Multilingual RAG for India — Bhashini hooks and cross-lingual retrieval

An Indian banking deployment needs to handle Hindi, Marathi, Tamil, Bengali, and English in the same retrieval pipeline. Bhashini (the government's language stack) plus cross-lingual embeddings make it tractable.

· Engineering ·5 min read

Cost-aware agent dispatch — when the cheap agent is enough

Not every query needs the production agent. A cost-aware dispatcher decides whether to route to the cheap-and-fast agent or the expensive-and-thorough one. Same UX, dramatically lower bill.

· Engineering ·6 min read

The case for boring stack choices in regulated AI

Postgres over the latest vector DB. Go stdlib over the framework du jour. Single binary over Kubernetes operator. The choices that bore reviewers and delight on-call engineers.

· Engineering ·6 min read

Default-to-Prototype as a culture, not just a flag

An agent that doesn't declare a tier defaults to Prototype, not Production. The flag is the code; the culture is what enforces "new code is not production until someone says so."

· Engineering ·6 min read

GOMEMLIMIT and the soft GC pacing change every Go service should set

GOMEMLIMIT tells the Go runtime to keep memory below a soft cap by running GC harder when it's close. For containers with hard memory limits, this prevents OOM kills. The setting every Go service in K8s should have.