Building My First Intelligent Knowledge base
I built a Retrieval-Augmented Generation Tool using Open Source Everything. This is the story
I built a Retrieval-Augmented Generation Tool using Open Source Everything. Originally published in Medium. This is the story —
I Watched Engineers Waste Hours Searching 5G Specs. So I Built an AI System to Fix It.
How a real-world pain point in telecom led me to build a fully local, zero-cost RAG system and what it taught me about building AI products for specialized industries.
If you’ve ever worked in telecom, you know the ritual.
An engineer has a question about 5G network architecture such as, how the gNB-CU and gNB-DU split actually works. They open a 3GPP technical specification. Then another one. Then a third. Two hours later, they’ve skimmed hundreds of pages across multiple documents and they’re still not confident in the answer. I have been doing this myself. I still do.
Senior engineers, people with decades of experience, spending their mornings buried in PDFs instead of solving actual problems. New hires (me during covid) were even worse off! I would be completely clueless and either I would interrupt senior teammates with “basic” questions or spend entire days lost in the documentation maze.
The thing is, the answers are there. 3GPP specifications are the authoritative source for everything in cellular standards. But they were never designed to be searchable, conversational, or fast. And that gap between “the knowledge exists” and “people can actually access it” is exactly the kind of product opportunity that excites me.
So I built something about it.
The Problem Is Bigger Than It Looks
3GPP, the standards body behind 4G LTE, 5G NR, and the emerging 6G standards, publishes thousands of technical specifications. The ones most critical to 5G alone TS 38.300, TS 38.401, TS 23.501, the physical layer specs in TS 38.211 through 38.215 each run hundreds to thousands of pages.
Why did I only choose the small sample-size, it is because my computer has space constraints.
The terminology is dense and highly specialized. Answers to a single question often span multiple documents. And the only search available is keyword matching inside PDFs which is useless for conceptual questions like “How does the 5G core network handle session management?” where the relevant passage might not contain any of those exact words.
I’ve talked to engineers, researchers, standards professionals, and technical writers who all face some version of this pain. The pattern is remarkably consistent: high-expertise people, spending low-value time on information retrieval, because the tools don’t match the complexity of the domain.
When I mapped out user needs, four distinct personas emerged. The core user is a 5G RAN engineer who just wants to ask a question in plain English and get a cited answer in under five seconds. But there’s also the standards researcher tracking feature evolution across spec releases, the new engineer who needs a judgment-free way to ramp up, and the technical writer hunting for authoritative excerpts with proper references.
Each of these users shares a common requirement: the answer has to be traceable. In standards work, an uncited answer is barely better than a guess. You need to know which document, which section, and how closely the source matches your question.
Why RAG and Why Local
When you’re building an AI product for a specialized domain, the first architectural decision is also the most consequential: do you fine-tune a model on your data, or do you use Retrieval-Augmented Generation?
For this problem, RAG won decisively and the reasoning reveals something important about product thinking in AI.
Fine-tuning would require labeled question-answer pairs that simply don’t exist for 3GPP specifications. Nobody has curated a dataset of “here’s the question, here’s the correct spec excerpt.” You’d be starting from scratch, and you’d need to redo the work every time 3GPP publishes a new release cycle. The maintenance burden alone would kill the product.
RAG, on the other hand, lets you re-index new documents and immediately serve answers from them. The retrieval layer provides built-in citations — every answer comes with a source document and a similarity score. And structured technical text with consistent terminology, which is exactly what 3GPP specs are, happens to be the sweet spot for vector search.
But the decision that I’m most proud of is the fully local architecture.
No API keys. No cloud dependencies. No per-query costs.
This wasn’t just a technical choice. It was a product choice rooted in who the users actually are.
Telecom engineers at large carriers often work under strict data governance policies. Sending proprietary network architecture questions to a cloud API might not fly. Graduate students researching 5G standards shouldn’t need a $20/month subscription to get help with their thesis. And a DevOps team evaluating the system for production needs to run it in their own environment.
The entire pipeline, embedding generation with sentence-transformers, vector storage with ChromaDB, and answer generation with Ollama running open-source LLMs runs on a single machine. Once the models are downloaded, it works offline. That constraint shaped every subsequent design decision and, I believe, made the product better.
The Decisions Behind the Architecture
Building a RAG system is straightforward in theory. Making it work well for a specific domain requires a series of informed trade-offs.
Embedding model selection was one of the most researched decisions. I evaluated four models across the spectrum of size and performance: all-MiniLM-L6-v2 as a baseline, all-mpnet-base-v2 for more capacity, and the BAAI General Embeddings family (bge-small and bge-base). BGE models consistently outperform alternatives on technical and scientific text retrieval benchmarks. The small variant at 130MB offered the best accuracy-to-size ratio for this use case, while the base variant is the recommendation for production deployments that can afford 440MB.
Chunking strategy matters more than people think. I settled on 1,000-character chunks with 200-character overlap after testing against the structure of actual 3GPP documents. The chunking is sentence-boundary aware you don’t want a chunk to cut off mid-sentence when that sentence contains the key definition you need. The pipeline also strips 3GPP-specific headers, footers, page numbers, and revision markers that would otherwise pollute the embeddings. Tables, which are everywhere in spec documents, get extracted as structured text rows so they’re actually searchable.
LLM selection was optimized for the constraint of running locally. Llama 3.2 at 2GB became the default solid instruction-following, fast enough for interactive use. Mistral at 4GB is the alternative for users who want stronger technical text performance. For complex reasoning queries that require synthesizing information from multiple retrieved passages, DeepSeek-R1 at 4.7GB is available.
Conversation memory was a feature that user research made non-negotiable. Engineers don’t ask one question in isolation — they drill down. “What’s the gNB-CU/DU split?” is usually followed by “How does the F1 interface handle that?” and then “What about dual connectivity scenarios?” Without conversation history, each follow-up loses the thread. The system maintains configurable multi-turn context so follow-ups work naturally.
What I’d Build Next
The current system proves the concept and solves the core retrieval problem. But treating this as a product rather than a project reveals clear opportunities for the next iteration.
Cross-release comparison is the most requested feature from standards researchers. They want to ask “How has the handover procedure changed between Release 15 and Release 17?” and get a structured diff with citations from both versions. The vector store already supports document-level filtering, so the retrieval layer is ready and the generation prompt just needs to be redesigned for comparative queries.
Confidence scoring is another gap.
Right now, the system returns similarity scores from the vector search, but it doesn’t distinguish between “I found an exact match” and “I found something vaguely related.” A calibrated confidence indicator would help users know when to trust the answer and when to go verify manually.
And the observability story needs work. There’s a health endpoint and per-query timing metrics, but production deployment in a telecom environment would demand structured logging, alerting on retrieval quality degradation, and usage analytics to understand which spec sections get queried most data that could inform how the index is built and maintained.
The Bigger Picture
This project is small in scope a RAG system for a specific document corpus but it illustrates a pattern I think about constantly: the gap between “AI can do this” and “AI is actually doing this for real users in real workflows.”
The 3GPP spec problem isn’t unique to telecom. Every regulated, standards-heavy industry has its own version. Aviation has FAA regulations and airworthiness directives. Pharma has FDA guidance documents and clinical trial protocols. Finance has a labyrinth of regulatory filings. In each case, highly paid professionals spend significant time on manual information retrieval from authoritative but unwieldy document sets.
RAG is well suited to all of these not because it’s the fanciest approach, but because it offers citations (critical in regulated domains), it handles evolving document sets without retraining, and it can run in environments with strict data governance requirements.
The product opportunity isn’t the AI model. It’s understanding who the users are, what workflow the system fits into, what constraints actually matter (cost, latency, privacy, traceability), and which trade-offs serve the user best. That’s the work that turns a technical capability into something people use.
The 3GPP Technical Specification RAG Assistant is open source and available on GitHub. It runs fully locally, requires no API keys, and is covered by 82 unit tests. If you work with complex technical documentation and want to explore what RAG can do for your domain, I’d love to hear from you.


