Lesson 7: The Generation Step

Where Everything Comes Together

You’ve retrieved relevant information. You’ve crafted a well-structured augmented prompt. Now the AI does what it does best: generates a response. But this isn’t just any generation. This is grounded generation, where the AI’s response is anchored to real information you’ve provided. Done well, you get accurate, verifiable answers. Done poorly, and the AI might still wander off into hallucination territory. Let’s understand how to make this final step work.

Core Concepts

Grounded vs. Ungrounded Generation

When an AI generates without RAG (ungrounded generation), it’s drawing entirely from patterns learned during training. This can produce fluent, helpful-sounding text, but there’s no guarantee it’s accurate or relevant to your specific situation. Grounded generation is different. The AI has specific context to work with. A well-tuned RAG system produces responses that:

Are based on provided source material
Can be traced back to specific documents
Reflect the actual information in your knowledge base
Are more likely to be accurate for your use case

The key word is “grounded.” The response has roots in real information, not just statistical patterns.

Faithfulness to Source Material

One measure of RAG quality is faithfulness: does the response accurately reflect what was in the retrieved context? A faithful response:

Doesn’t add information that wasn’t in the context
Doesn’t contradict the context
Represents the context’s meaning accurately
Maintains appropriate nuance and uncertainty

An unfaithful response might:

Embellish facts with invented details
State things more strongly than the source supports
Mix retrieved information with hallucinated additions
Misrepresent what the source actually said

Getting faithfulness right requires both good prompt instructions (from Lesson 6) and, sometimes, post-generation checks.

When the AI Should Say “I Don’t Know”

One of the most valuable things a RAG system can do is admit when it doesn’t have enough information to answer. This is counterintuitive. We often think AI is “better” when it always provides an answer. But consider what’s actually helpful: Unhelpful response: A confident answer that’s wrong because the knowledge base didn’t contain the relevant information. Helpful response: “I don’t have information about that topic in my available resources. You might want to check [alternative resource] or contact [relevant person].” Teaching the AI to recognize and acknowledge gaps is crucial. Your prompt instructions should explicitly guide this behavior:

“If the context doesn’t address the question, clearly state that”
“It’s better to say you don’t have the information than to guess”
“Distinguish between what the context says and what you’re inferring”

Synthesizing Multiple Sources

Often, the answer to a user’s question requires combining information from multiple retrieved chunks. The AI’s job is to synthesize this into a coherent response. Good synthesis:

Integrates information smoothly
Notes when different sources provide complementary details
Acknowledges when sources disagree
Creates a unified answer without losing important nuances

Poor synthesis:

Just lists information from each source separately
Misses connections between related pieces of information
Ignores contradictions
Forces false consistency when sources actually differ

The generation step is where the AI’s language abilities shine. It can weave together disparate pieces of information into a response that’s more useful than any single source chunk alone.

Citations and Verifiability

Here’s a principle that makes RAG genuinely trustworthy: responses should be verifiable. When the AI cites its sources, users can:

Check the original for more detail
Verify the AI interpreted the source correctly
Understand where the information came from
Have appropriate confidence in the answer

Citations can be simple or sophisticated: Inline references: “According to the Employee Handbook, new employees receive 15 days of vacation.” End citations: “[Source: Employee Handbook, Section 3.2, Updated January 2024]” Linked citations: In interfaces that support it, citations can link directly to the source document. The level of citation detail depends on your use case. Legal research might need precise references. Casual customer support might just need a general source indication.

Tone and Format

The generation step also determines how the response is presented. Your prompt instructions (from Lesson 6) shape this, but it’s worth emphasizing that RAG responses should be: Appropriate to context: A medical information system should be more formal and careful than a casual FAQ bot. Accessible to the audience: If your users aren’t experts, the response shouldn’t be filled with jargon, even if the source documents are technical. Properly formatted: Lists, paragraphs, headers, or whatever format best serves the user’s need. The AI can transform dense technical content into accessible explanations. This is part of the value RAG provides: not just finding information, but presenting it usefully.

Confidence and Hedging

Not all information is equally certain. A good RAG response calibrates its confidence: High confidence (information clearly stated in context): “The refund period is 30 days from purchase.” Medium confidence (information implied or requires inference): “Based on the documentation, it appears that refunds are processed within 5-7 business days, though this isn’t explicitly stated.” Low confidence (limited information, some uncertainty): “The documentation mentions a refund policy but doesn’t specify exact timelines. You may want to contact support for detailed information.” This calibration helps users know how much to trust each part of the response.

Try It Yourself

Exercise 1: Evaluate Faithfulness

Here’s a retrieved context:

“Our standard shipping takes 5-7 business days. Express shipping is available for an additional $15 and delivers within 2 business days. Free shipping is available for orders over$ 50.”

And here’s a generated response to “How long does shipping take?”:

“Standard shipping takes 5-7 business days. If you need it faster, we offer 2-day express shipping for $15 extra. We also offer overnight shipping for$ 25. Plus, orders over $50 ship free!”

Identify the faithfulness problem in this response. What did the AI add that wasn’t in the context?

Exercise 2: Rewrite for Uncertainty

Given this context:

“The company was founded in 2010 by Jane Smith. The company has grown to over 500 employees.”

The user asks: “How many offices does the company have?” The context doesn’t answer this question. Write two possible responses:

A bad response that makes something up
A good response that appropriately handles the missing information

Exercise 3: Design Citation Formats

For each of these use cases, design an appropriate citation format:

Academic research assistant: Needs precise, formal citations
Customer support bot: Needs to indicate source without being heavy-handed
Internal company knowledge base: Needs to help employees find the original document

What level of detail is appropriate for each?

Common Pitfalls

Pitfall 1: Accepting Fluent but Unfaithful Responses

The AI might generate a beautifully written response that sounds great but adds or distorts information from the context. Fluency isn’t the same as accuracy. The fix: Spot-check responses against the retrieved context. Build evaluation processes that check for faithfulness, not just readability.

Pitfall 2: Never Admitting Uncertainty

If the AI always provides confident answers, some of those answers will be wrong. Users won’t know which to trust. The fix: Explicitly instruct the AI to express appropriate uncertainty. Reward honesty over false confidence.

Pitfall 3: Losing Source Material in Synthesis

When synthesizing multiple chunks, the AI might oversimplify or lose important nuances from individual sources. The fix: For high-stakes applications, consider including key quotes directly from sources, not just the AI’s synthesis.

Pitfall 4: Inconsistent Response Quality

Generation quality can vary based on how well the retrieved context matches the question. Great matches lead to great responses; poor matches lead to awkward attempts. The fix: Monitor response quality systematically. Identify patterns in when generation succeeds or fails, and address root causes (often in retrieval or knowledge base quality).

Level Up

Here’s a challenge that brings together everything from the generation step: Scenario: A user asks: “What are the key differences between your Basic and Premium plans?” Retrieved context includes: Chunk 1: “Basic Plan: $10/month, includes up to 5 users, 10GB storage, email support.” Chunk 2: “Premium Plan: $25/month, includes unlimited users, 100GB storage, priority phone support, advanced analytics.” Chunk 3: “All plans include our core features: project management, file sharing, and mobile app access.” Your task:

Write an ideal generated response that synthesizes this information clearly
Include appropriate source references
Note any information a user might want that ISN’T in the context
Show how the response would handle a follow-up question: “Can I try Premium before buying?”

Key Takeaway

Generation is where RAG delivers value to the user. Grounded generation produces responses that are faithful to source material, appropriately uncertain when information is limited, and verifiable through citations. The AI’s job isn’t just to produce fluent text but to accurately synthesize and present the information that was retrieved. Good generation makes RAG trustworthy; poor generation undermines the entire system.

What’s Next

You now understand the complete RAG pipeline: retrieve, augment, generate. But understanding the mechanics is different from seeing the impact. In Lesson 8: RAG in the Real World, we’ll explore practical applications of RAG across different domains. You’ll see how organizations use RAG to transform customer support, research, knowledge management, and more, including how platforms like Gloo AI Studio make RAG accessible with a focus on safety and values-aligned results.

Gloo AI 101

AI 102: Prompting

AI 103: RAG

The Generation Step

Lesson 7: The Generation Step

Where Everything Comes Together

Core Concepts

Grounded vs. Ungrounded Generation

Faithfulness to Source Material

When the AI Should Say “I Don’t Know”

Synthesizing Multiple Sources

Citations and Verifiability

Tone and Format

Confidence and Hedging

Try It Yourself

Exercise 1: Evaluate Faithfulness

Exercise 2: Rewrite for Uncertainty

Exercise 3: Design Citation Formats

Common Pitfalls

Pitfall 1: Accepting Fluent but Unfaithful Responses

Pitfall 2: Never Admitting Uncertainty

Pitfall 3: Losing Source Material in Synthesis

Pitfall 4: Inconsistent Response Quality

Level Up

Key Takeaway

What’s Next

Gloo AI 101

AI 102: Prompting

AI 103: RAG

​Lesson 7: The Generation Step

​Where Everything Comes Together

​Core Concepts

​Grounded vs. Ungrounded Generation

​Faithfulness to Source Material

​When the AI Should Say “I Don’t Know”

​Synthesizing Multiple Sources

​Citations and Verifiability

​Tone and Format

​Confidence and Hedging

​Try It Yourself

​Exercise 1: Evaluate Faithfulness

​Exercise 2: Rewrite for Uncertainty

​Exercise 3: Design Citation Formats

​Common Pitfalls

​Pitfall 1: Accepting Fluent but Unfaithful Responses

​Pitfall 2: Never Admitting Uncertainty

​Pitfall 3: Losing Source Material in Synthesis

​Pitfall 4: Inconsistent Response Quality

​Level Up

​Key Takeaway

​What’s Next

Lesson 7: The Generation Step

Where Everything Comes Together

Core Concepts

Grounded vs. Ungrounded Generation

Faithfulness to Source Material

When the AI Should Say “I Don’t Know”

Synthesizing Multiple Sources

Citations and Verifiability

Tone and Format

Confidence and Hedging

Try It Yourself

Exercise 1: Evaluate Faithfulness

Exercise 2: Rewrite for Uncertainty

Exercise 3: Design Citation Formats

Common Pitfalls

Pitfall 1: Accepting Fluent but Unfaithful Responses

Pitfall 2: Never Admitting Uncertainty

Pitfall 3: Losing Source Material in Synthesis

Pitfall 4: Inconsistent Response Quality

Level Up

Key Takeaway

What’s Next