Retrieval Augmented Generation (RAG)

In this exercise, we will learn how to build a Retrieval Augmented Generation (RAG) system from scratch.

In this tutorial, you will learn:

Note: RAG powers most real-world AI applications today.

RAG pipeline diagram showing data injection, embeddings, vector search, and LLM response generation

What is RAG?

Definition: Retrieval Augmented Generation (RAG) improves LLM responses by retrieving relevant information from an external knowledge base before generating an answer.

In simple words:

LLM + your data = accurate answers

Why RAG is Needed

Problem 1 — Hallucination

LLMs may generate incorrect answers when data is missing.

Example:

Problem 2 — No Access to Private Data

Your company data may include:

First Solution: Fine-Tuning

What is Fine-Tuning? Fine-tuning means retraining a pretrained model on domain-specific data.

Goal: Add domain knowledge to the model.

Example Analogy

Problems with Fine-Tuning

So we need a better solution.

RAG Solves Both

Traditional LLM vs RAG

Traditional LLM Flow

User Query → Prompt → LLM → Answer

Problems:

RAG Flow

User Query → Retrieve relevant data → Provide context to LLM → Generate accurate answer

RAG Architecture Overview

RAG has two main pipelines:

  1. Data Injection Pipeline

    This pipeline prepares documents and stores them in a searchable format.

  2. Retrieval Pipeline

    This pipeline retrieves relevant information when a user asks a question.

Pipeline 1: Data Injection Pipeline

Step-by-step:

  1. Data Sources
    • PDF
    • HTML
    • Excel
    • SQL
    • JSON
    • text files
  2. Data Parsing
    • Extract readable text
    • improves retrieval accuracy
    • handles structured and unstructured data
  3. Chunking (Very Important)
    • Large documents are divided into smaller chunks
    • fits LLM context size
    • improves retrieval precision
    • reduces memory usage
  4. Embeddings
    • Convert text into vectors
    • allows similarity search
    • provides semantic understanding
    • OpenAI
    • Gemini
    • Hugging Face
    • open-source models
  5. Vector Database
    • Stores vector embeddings
    • ChromaDB
    • FAISS
    • Pinecone
    • Weaviate

Result: You now have a searchable knowledge base.

Retrieval Pipeline

When a user asks a question, the retrieval pipeline works as follows:

  1. Convert query into embedding
  2. Search the vector database
  3. Retrieve the most relevant context
  4. Send context and prompt to the LLM
  5. LLM generates the final answer

Example

User asks: What is leave policy?

System:

Core RAG Formula

Workflow

User Query
   ↓
Embedding
   ↓
Vector Search
   ↓
Relevant Context
   ↓
Prompt + Context
   ↓
LLM Response

Key Concept: Context Augmentation

Context augmentation means adding retrieved context before the LLM generates the response.

Without context, the model may hallucinate. With context, the answer becomes more accurate.

Document Structure (LangChain Concept)

Documents usually contain two important parts:

Metadata may include:

Why metadata matters?

Chunking Strategies

Chunking divides documents into smaller pieces.

Common methods:

Benefits:

Embeddings Explained

Embeddings convert text into numbers.

Example: "machine learning" → [0.12, 0.98, …]

They are used for:

Vector Database Role

A vector database stores embeddings for fast retrieval.

It supports:

RAG Reduces Hallucination

RAG does not eliminate hallucination fully.

But:

Real-World Example

Perplexity AI uses RAG for:

Implementation Workflow

  1. Phase 1 — Basic
    • build simple RAG
    • load documents
    • chunk and embed
  2. Phase 2 — Intermediate
    • modular code
    • vector search
    • context retrieval
  3. Phase 3 — Advanced
    • agentic RAG
    • optimization
    • context engineering

Modular RAG Architecture

Module Structure

📁 RAG System
├── 📁 Data Loader
├── 📁 Chunk and Embedding Module
├── 📁 Vector Store Module
├── 📁 Retriever
└── 📁 LLM Generator

Production systems usually split RAG into multiple modules:

Optimization Topics Covered

The course also discusses:

Why RAG is Important Today

According to industry trends, many enterprise AI applications now use RAG.

Common use cases include: