Skip to main content

Lesson 2: How RAG Actually Works

The Open-Book Exam

Remember taking exams in school? There were two kinds: closed-book (rely entirely on memory) and open-book (reference materials allowed). Which type did you prefer? Most people say open-book, and for good reason. When you can look things up, you give better, more accurate answers. You’re not guessing or trying to reconstruct facts from fuzzy memories. You’re working with real information. RAG gives AI the same advantage. Instead of relying solely on what it “remembers” from training, it can look up relevant information before answering. Let’s break down exactly how this works.

Core Concepts

The Three-Step Dance: Retrieve, Augment, Generate

RAG stands for Retrieval Augmented Generation, and those three words describe exactly what happens: Step 1: Retrieve When you ask a question, the system searches through a knowledge base to find information relevant to your query. This isn’t like typing keywords into Google; it’s a semantic search that understands meaning (more on this in Lesson 3). Step 2: Augment The retrieved information gets added to your original question. Think of it as giving the AI a cheat sheet along with your question. “Here’s what the user asked, and here’s some relevant information that might help you answer.” Step 3: Generate The AI generates a response, but now it’s informed by the retrieved context. Instead of relying only on training data, it can reference the specific information that was just retrieved. That’s it. Three steps. The magic is in how well each step is executed.

A Concrete Example

Let’s walk through a real scenario. Imagine you have a RAG system connected to your company’s HR documentation, and an employee asks:
“How many vacation days do new employees get?”
Without RAG (traditional AI): The AI would try to answer from its general training data. It might say something like “Vacation policies vary by company, typically ranging from 10-20 days for new employees.” Technically true, but completely unhelpful for your specific situation. With RAG:
  1. Retrieve: The system searches the HR documents and finds a chunk of text that says “New employees receive 15 days of paid vacation in their first year, increasing to 20 days after two years of employment.”
  2. Augment: This retrieved text gets combined with the original question into a prompt like: “Based on the following company policy: ‘New employees receive 15 days of paid vacation in their first year, increasing to 20 days after two years of employment.’ Please answer this question: How many vacation days do new employees get?”
  3. Generate: The AI responds: “New employees receive 15 days of paid vacation in their first year. After two years with the company, this increases to 20 days.”
See the difference? The answer is specific, accurate, and grounded in your actual company policy.

The Library Analogy

Here’s another way to think about it. Imagine you’re a reference librarian, and someone asks you a question. Bad librarian approach: Try to answer from memory, even if you’re not sure. Good librarian approach:
  1. Listen to the question
  2. Walk to the relevant section of the library
  3. Pull out the books that might help
  4. Read the relevant passages
  5. Synthesize an answer based on what you found
RAG is the good librarian. It doesn’t pretend to know everything from memory. It finds the right information first, then crafts a helpful response.

What Makes Retrieval “Smart”?

You might be wondering: how does the system know which documents are relevant? This is where things get interesting. Traditional search (like old-school Google) looks for keyword matches. If you search for “vacation days,” it finds documents containing those exact words. RAG typically uses semantic search, which understands meaning rather than just matching keywords. So if you ask about “time off for new hires,” it can still find the document about “vacation days for new employees” because it understands these concepts are related. We’ll dive deep into how this works in Lesson 3 when we explore embeddings. For now, just know that the retrieval step is smarter than simple keyword matching.

The Augmented Prompt

The “augment” step is where retrieval meets everything you learned about prompting in previous courses. The retrieved information needs to be presented to the AI in a way that’s clear and useful. A typical augmented prompt might look like this:
Use the following context to answer the question. If the context doesn't contain enough information to answer fully, say so.

Context:
[Retrieved document chunks go here]

Question: [User's original question]

Answer:
The structure matters. You’re essentially telling the AI: “Here’s some information that should help. Use it to answer this question.”

Why “Augmented” and Not “Replaced”?

An important nuance: RAG augments the AI’s capabilities; it doesn’t replace them. The AI still uses its general knowledge and reasoning abilities. The retrieved context provides specific information, but the AI synthesizes, summarizes, and presents that information using its language capabilities. This is powerful because:
  • The AI can combine retrieved facts with general knowledge
  • It can explain concepts in accessible language
  • It can draw connections the source documents don’t explicitly make
  • It can format responses appropriately for the question
RAG is a collaboration between retrieval and generation, not retrieval alone.

Try It Yourself

Exercise 1: Trace the RAG Steps

Think of a question you might ask a company’s internal knowledge base. For example: “What’s the process for requesting a new laptop?” Now trace through what would happen at each step:
  1. Retrieve: What documents or sections would be relevant? (IT policies? Equipment request forms? Procurement procedures?)
  2. Augment: What would the combined prompt look like?
  3. Generate: How would the AI synthesize this into a helpful answer?

Exercise 2: Compare With and Without

Take any factual question about a specific organization or domain. Write out:
  1. What a general AI might say (from training data only)
  2. What specific information would need to be retrieved to improve the answer
  3. What the improved answer might look like
This helps you internalize the value RAG provides.

Exercise 3: Design the Context

Imagine you’re building a RAG system for a recipe website. A user asks: “What can I make with chicken and broccoli that’s ready in under 30 minutes?” What information would you want retrieved? How would you structure the augmented prompt? Think about what context would be most helpful for generating a good response.

Common Pitfalls

Pitfall 1: Retrieving Too Much

If you retrieve every vaguely related document and stuff it all into the prompt, you create problems. The AI might get confused by contradictory information, miss the most relevant details in the noise, or hit token limits. The fix: Quality over quantity. Retrieve the most relevant chunks, not everything that might be related.

Pitfall 2: Retrieving Too Little

The opposite problem: if retrieval is too narrow, you might miss important context. The AI can only work with what it’s given. The fix: Find the right balance. Test your retrieval to ensure it captures the necessary information without overwhelming the system.

Pitfall 3: Ignoring the Augmentation Step

Some people focus entirely on retrieval or generation and treat augmentation as an afterthought. But how you structure the prompt matters enormously. The fix: Pay attention to your prompt templates. Clear instructions about how to use the context lead to better outputs.

Pitfall 4: Forgetting That AI Can Still Hallucinate

Even with RAG, the AI can make mistakes. It might misinterpret the retrieved context, combine information incorrectly, or fill gaps with plausible-sounding fiction. The fix: RAG reduces hallucinations but doesn’t eliminate them. Critical information should still be verified.

Level Up

Here’s your challenge: Design a RAG workflow on paper. Scenario: You’re creating a RAG-powered assistant for a university library that helps students find and understand research papers. For each of these questions, outline what the three RAG steps would look like:
  1. “What are the main findings of the Smith 2023 paper on climate migration?”
  2. “Find me papers that compare machine learning approaches for medical diagnosis.”
  3. “Explain the methodology used in recent behavioral economics studies.”
Consider:
  • What would be retrieved for each?
  • How would the augmented prompt be structured?
  • What would make the generated response useful?

Key Takeaway

RAG follows a simple three-step process: Retrieve relevant information from a knowledge base, Augment the user’s question with that information, then Generate a response that’s informed by the retrieved context. It’s like giving AI an open-book exam instead of a closed-book one. The system is most effective when all three steps work well together: smart retrieval, clear augmentation, and grounded generation.

What’s Next

You now understand the three-step RAG process. But there’s a piece of technology that makes the retrieval step truly powerful: embeddings. In Lesson 3: Embeddings: How AI Understands Meaning, we’ll explore how AI converts text into numbers that capture meaning, enabling the semantic search that makes RAG so effective. This is the “secret sauce” that lets RAG find relevant information even when the exact words don’t match.