/ 01 — The problem
What was broken.
Every team building a RAG feature kept asking the same question: 'what chunk size should I use?' The honest answer was 'depends on your data' — but nobody had time to find out.
An internal tool that finds the optimal chunk size, overlap, and reranker for any corpus — one click, sweep done in minutes.
Every team building a RAG feature kept asking the same question: 'what chunk size should I use?' The honest answer was 'depends on your data' — but nobody had time to find out.
Built a tool that takes a corpus + a set of eval queries, sweeps across chunk sizes / overlap / rerankers, and outputs a ranked recommendation. Visualized recall@k so engineers could see the trade-off, not just take a number on faith.
Used by 4 internal teams. Average recall@5 jumped from 0.74 (default config) to 0.91 (tuned).