How Do I Tell AI What I Need It To Do?
So now we know what AI models are and how they’re trained. But how do we actually use them? This section covers how we interact with models through prompts, how generation works behind the scenes, and what affects the quality of a model’s response. At Gloo AI, we believe one of the most empowering things you can learn is how to write clear prompts and understand what influences model behavior.Prompt
What it means: A prompt is the input or question you give to an AI model. It tells the model what to generate or how to respond. Examples:- “Summarize this paragraph.”
- “Write a prayer for a new mother.”
- “What are 5 ways a church could use data in outreach?”
System Prompt
What it means: A system prompt is a special instruction given to the model to set its overall behavior, tone, or rules before it interacts with a user. Use case: A system prompt might say: “You are a friendly assistant who always answers concisely and avoids strong opinions.” This tells the model how to behave for the entire session. Note: Users don’t always see the system prompt, but it often shapes how the model responds. How it shows up in Gloo: Gloo uses carefully designed system prompts to enforce organizational voice, theological alignment, safety expectations, and content boundaries. These system instructions guide model behavior consistently, even when users ask open-ended questions.Prompt Engineering
What it means: Prompt engineering is the skill of designing prompts to get better, more accurate, or more useful responses from an AI model. Why it’s useful: Sometimes a vague prompt gives a vague answer. Prompt engineering helps you guide the model clearly by setting expectations in the prompt. Example: Instead of saying “Explain photosynthesis,” a better prompt might be “Explain photosynthesis in simple terms for a middle school science student.” How it shows up in Gloo: Prompt engineering helps organizations get the most out of Chat for Teams and our APIs. Clear prompts improve retrieval accuracy, content grounding, and response alignment, especially when working with complex or sensitive ministry topics.Inference
What it means: Inference is the process the model goes through to generate a response based on your prompt. It happens after training, during actual usage. Analogy: Training is like studying for an exam. Inference is answering the question during the test. Why it matters: Inference speed and accuracy affect how useful the model is in real-time settings like chat apps or search tools. How it shows up in Gloo: Inference happens every time users interact with Gloo AI. The system retrieves relevant content from the Data Engine and generates aligned responses in real time. Fast inference ensures smooth chat experiences and accurate document enrichment.Tokens
What it means: Tokens are chunks of text the model reads or generates. They can be as small as a character or as large as a word. Example: The sentence “Hello there!” is about 3 tokens: “Hello,” “there,” and “!” Why it matters: Models have token limits. Longer prompts or responses use more tokens. Most models process somewhere between 2,000 and 100,000 tokens depending on their size. How it shows up in Gloo: Token usage impacts request size and model behavior within the Gloo API. When documents are uploaded, they are chunked into token sized sections for embedding.Context Window
What it means: The context window is the total number of tokens a model can “remember” at once. It includes both your prompt and the model’s reply. Analogy: Think of it like a whiteboard. If you write too much, older stuff gets erased to make room. Bigger context windows mean the model can reference more of the conversation. How it shows up in Gloo: Gloo uses models with large context windows to allow better grounding in an organization’s uploaded content. This improves RAG accuracy, ensures longer documents can be analyzed, and helps maintain conversation continuity in Chat for Teams.Temperature
What it means: Temperature is a setting that controls how creative or focused a model’s response is. How it works:- Lower temperature (like 0.2): More focused and predictable answers
- Higher temperature (like 0.9): More random and creative responses
Top-k / Top-p Sampling
What it means: These are technical settings that influence which words the model chooses from when generating text. They help balance randomness and coherence. Quick comparison:- Top-k looks at the top K likely words and picks from them
- Top-p looks at the smallest number of words whose probabilities add up to P (like 90 percent), then chooses from those
Stop Sequence
What it means: A stop sequence is a specific signal that tells the model to stop generating text. Use case: You might set “###” as a stop sequence so the model stops there and doesn’t keep talking after a section. How it shows up in Gloo: Gloo uses stop sequences behind the scenes to structure outputs, enforce formatting rules, and keep responses within safe boundaries. This helps ensure that generated content does not run on too long or produce unintended sections.Next Up: How AI Uses and Stores Knowledge In the next section, we’ll answer: “What are vectors, embeddings, and retrieval systems, and how do they help AI remember or reason?”

