I’ve been building retrieval-based agents on Replit and kept hitting issues with unstable chunking and hallucinated matches. After several failed vector attempts (FAISS, etc.), I started experimenting with a semantic firewall approach that prioritizes chunk boundary coherence over token limits.
This led to an interesting solution—designing chunking boundaries using what I call semantic-tension scoring, which avoids slicing meaning mid-sentence. I logged the results in a small paper + demo (open source):
GitHub - onestardao/WFGY: Semantic Reasoning Engine for LLMs · WFGY 推理引擎 / 萬法歸一
It includes:
- RAG failure analysis examples
- Semantic chunking vs token window chunking
- Simple scoring algorithm for inter-paragraph tension
- Fully FOSS implementation (MIT licensed)
The idea is: “If your chunk forgets why the question is asked, the best answer won’t matter.”
Would love feedback if others have tried similar approaches on Replit’s LLM infra.
