Agentic RAG Knowledge System

PythonLLMs ChromaDBVector Search FastAPIREST API OpenAIOllama Prompt Engineeringsentence-transformers

Why Agentic RAG?

Standard RAG does one retrieval and generates an answer. It fails on multi-hop questions that need evidence from multiple sources. The agent classifies complexity first, then decides how to answer.

❌ Standard RAG

→Single retrieval per query

→Fails on multi-hop questions

→No reasoning about what to retrieve

→Missing cross-document synthesis

✅ Agentic RAG

→Classifies simple vs complex

→Decomposes complex questions

→Retrieves evidence per sub-question

→Synthesizes final grounded answer

5pipeline components

3agent step types

7API endpoints

18unit tests

System Pipeline

DocumentLoader

Loads .txt, .md, .pdf from files or directories

loader

TextChunker

Splits into 512-token chunks with 64-token overlap — prevents information loss at boundaries

chunker

Embedder

sentence-transformers/all-MiniLM-L6-v2 (local, free) or OpenAI text-embedding-ada-002

embedder

VectorStore (ChromaDB)

Persistent cosine similarity search — stores chunks with embeddings and metadata

vector store

AgenticRAG

Classify → simple path (single retrieve+generate) or complex path (decompose→retrieve×N→synthesize)

agent

Agentic Reasoning Flow

agentic_rag.py

# Step 1: Classify

complexity = classify(question) # "simple" | "complex"

if complexity == "simple":

  # Standard RAG — single retrieval

  chunks = vector_store.search(question, top_k=5)

  answer = llm.complete(context=chunks, question=question)

else:

  # Step 2: Decompose into sub-questions

  sub_questions = decompose(question) # LLM returns JSON array

  # Step 3: Retrieve per sub-question

  evidence = []

  for sub_q in sub_questions:

    chunks = vector_store.search(sub_q, top_k=3)

    evidence.append({"question": sub_q, "context": chunks})

  # Step 4: Synthesize final answer

  answer = llm.complete(evidence=evidence, original=question)

Code Structure

agentic-rag-knowledge-system/

├── src/

├── rag_pipeline.py # Document, Loader, Chunker, Embedder, VectorStore, RAGPipeline

└── agentic_rag.py # AgentStep, AgentResponse, AgenticRAG

├── api.py # FastAPI — /ingest, /query, /agent/query endpoints

├── demo.py # CLI demo with 3 sample docs + example queries

├── tests/test_rag.py # 18 unit tests — all mocked, no API key needed

├── requirements.txt

└── .env.example

Class	File	Responsibility
DocumentLoader	rag_pipeline.py	Load txt, md, pdf — file or directory
TextChunker	rag_pipeline.py	Paragraph-aware chunking with overlap
Embedder	rag_pipeline.py	Local or OpenAI embeddings — swappable
VectorStore	rag_pipeline.py	ChromaDB wrapper — add, search, clear
RAGPipeline	rag_pipeline.py	End-to-end ingest + query orchestration
AgenticRAG	agentic_rag.py	Classify → decompose → retrieve → synthesize

REST API

POST/ingest/textAdd raw text to knowledge base — returns chunk count

POST/ingest/fileUpload .txt, .md, or .pdf file — auto-chunked and indexed

POST/queryStandard RAG query — returns answer + sources + latency

POST/agent/queryAgentic query — includes reasoning steps, is_complex flag, all sources

GET/healthService health check

GET/statsVector store stats — chunk count, embedder, model info

DELETE/knowledgeClear all documents from knowledge base

Example — Agentic Query Response

POST /agent/query

{

  "answer": "Physical products have a 30-day return window while digital products are non-refundable after download. Physical refunds take 5-7 business days...",

  "query": "How does the refund policy differ for digital vs physical products?",

  "is_complex": true,

  "reasoning_steps": [

    "Classified as: complex",

    "Decomposed into 2 sub-questions",

    "Retrieved 3 chunks for sub-question 1 (score: 0.91)",

    "Retrieved 3 chunks for sub-question 2 (score: 0.88)",

    "Synthesized final answer from all evidence"

  ],

  "latency_ms": 842.3

}

Setup

terminal

# Clone and install

git clone https://github.com/sadhanageddam27/agentic-rag-knowledge-system.git

pip install -r requirements.txt

# Configure (OpenAI or Ollama)

cp .env.example .env

# Run demo

python demo.py

# Or start API server

python api.py   # → http://localhost:8000/docs

# Run tests (no API key needed)

pytest tests/ -v

Run fully offline with Ollama — set LLM_BACKEND=ollama and LLM_MODEL=llama3. Local embeddings via sentence-transformers are used by default — no API key needed for either component.

Agentic RAGKnowledge System