Python · ChromaDB · FastAPI · LLM

Agentic RAG
Knowledge System

Agent-driven retrieval pipeline that grounds every LLM response in real documents. For complex multi-hop questions, the agent decomposes, retrieves evidence independently per sub-question, then synthesizes — giving far better results than single-retrieval RAG.

RAGretrieval core
agentmulti-step
RESTFastAPI
18unit tests
PythonLLMs ChromaDBVector Search FastAPIREST API OpenAIOllama Prompt Engineeringsentence-transformers
01

Why Agentic RAG?

Standard RAG does one retrieval and generates an answer. It fails on multi-hop questions that need evidence from multiple sources. The agent classifies complexity first, then decides how to answer.

❌ Standard RAG
Single retrieval per query
Fails on multi-hop questions
No reasoning about what to retrieve
Missing cross-document synthesis
✅ Agentic RAG
Classifies simple vs complex
Decomposes complex questions
Retrieves evidence per sub-question
Synthesizes final grounded answer
5pipeline components
3agent step types
7API endpoints
18unit tests
02

System Pipeline

01
DocumentLoader
Loads .txt, .md, .pdf from files or directories
loader
02
TextChunker
Splits into 512-token chunks with 64-token overlap — prevents information loss at boundaries
chunker
03
Embedder
sentence-transformers/all-MiniLM-L6-v2 (local, free) or OpenAI text-embedding-ada-002
embedder
04
VectorStore (ChromaDB)
Persistent cosine similarity search — stores chunks with embeddings and metadata
vector store
05
AgenticRAG
Classify → simple path (single retrieve+generate) or complex path (decompose→retrieve×N→synthesize)
agent

Agentic Reasoning Flow

agentic_rag.py
# Step 1: Classify
complexity = classify(question) # "simple" | "complex"

if complexity == "simple":
  # Standard RAG — single retrieval
  chunks = vector_store.search(question, top_k=5)
  answer = llm.complete(context=chunks, question=question)

else:
  # Step 2: Decompose into sub-questions
  sub_questions = decompose(question) # LLM returns JSON array

  # Step 3: Retrieve per sub-question
  evidence = []
  for sub_q in sub_questions:
    chunks = vector_store.search(sub_q, top_k=3)
    evidence.append({"question": sub_q, "context": chunks})

  # Step 4: Synthesize final answer
  answer = llm.complete(evidence=evidence, original=question)
03

Code Structure

agentic-rag-knowledge-system/
  ├── src/
    ├── rag_pipeline.py  # Document, Loader, Chunker, Embedder, VectorStore, RAGPipeline
    └── agentic_rag.py  # AgentStep, AgentResponse, AgenticRAG
  ├── api.py           # FastAPI — /ingest, /query, /agent/query endpoints
  ├── demo.py          # CLI demo with 3 sample docs + example queries
  ├── tests/test_rag.py  # 18 unit tests — all mocked, no API key needed
  ├── requirements.txt
  └── .env.example
ClassFileResponsibility
DocumentLoaderrag_pipeline.pyLoad txt, md, pdf — file or directory
TextChunkerrag_pipeline.pyParagraph-aware chunking with overlap
Embedderrag_pipeline.pyLocal or OpenAI embeddings — swappable
VectorStorerag_pipeline.pyChromaDB wrapper — add, search, clear
RAGPipelinerag_pipeline.pyEnd-to-end ingest + query orchestration
AgenticRAGagentic_rag.pyClassify → decompose → retrieve → synthesize
04

REST API

POST/ingest/textAdd raw text to knowledge base — returns chunk count
POST/ingest/fileUpload .txt, .md, or .pdf file — auto-chunked and indexed
POST/queryStandard RAG query — returns answer + sources + latency
POST/agent/queryAgentic query — includes reasoning steps, is_complex flag, all sources
GET/healthService health check
GET/statsVector store stats — chunk count, embedder, model info
DELETE/knowledgeClear all documents from knowledge base

Example — Agentic Query Response

POST /agent/query
{
  "answer": "Physical products have a 30-day return window while digital products are non-refundable after download. Physical refunds take 5-7 business days...",
  "query": "How does the refund policy differ for digital vs physical products?",
  "is_complex": true,
  "reasoning_steps": [
    "Classified as: complex",
    "Decomposed into 2 sub-questions",
    "Retrieved 3 chunks for sub-question 1 (score: 0.91)",
    "Retrieved 3 chunks for sub-question 2 (score: 0.88)",
    "Synthesized final answer from all evidence"
  ],
  "latency_ms": 842.3
}
05

Setup

terminal
# Clone and install
git clone https://github.com/sadhanageddam27/agentic-rag-knowledge-system.git
pip install -r requirements.txt

# Configure (OpenAI or Ollama)
cp .env.example .env

# Run demo
python demo.py

# Or start API server
python api.py   # → http://localhost:8000/docs

# Run tests (no API key needed)
pytest tests/ -v

Run fully offline with Ollama — set LLM_BACKEND=ollama and LLM_MODEL=llama3. Local embeddings via sentence-transformers are used by default — no API key needed for either component.