[Guide] Solving Semantic Chunking Limits in AI Agents (RAG Troubleshooting Log)

I’ve been building retrieval-based agents on Replit and kept hitting issues with unstable chunking and hallucinated matches. After several failed vector attempts (FAISS, etc.), I started experimenting with a semantic firewall approach that prioritizes chunk boundary coherence over token limits.

This led to an interesting solution—designing chunking boundaries using what I call semantic-tension scoring, which avoids slicing meaning mid-sentence. I logged the results in a small paper + demo (open source):

:memo: GitHub - onestardao/WFGY: Semantic Reasoning Engine for LLMs · WFGY 推理引擎 / 萬法歸一

It includes:

  • RAG failure analysis examples
  • Semantic chunking vs token window chunking
  • Simple scoring algorithm for inter-paragraph tension
  • Fully FOSS implementation (MIT licensed)

The idea is: “If your chunk forgets why the question is asked, the best answer won’t matter.”

Would love feedback if others have tried similar approaches on Replit’s LLM infra.