A comprehensive glossary of AI terms for business and product teams: LLMs, AI agents, RAG, embeddings, voicebots, STT, TTS and more. Essential vocabulary for anyone deploying conversational AI.
A comprehensive reference for business and product teams navigating the world of artificial intelligence. From foundational concepts to agent architectures, voice AI and business metrics â every term you need to understand and deploy AI in your organization.
1. Fundamental concepts
Artificial General Intelligence (AGI) AGI refers to hypothetical AI systems that can perform a wide range of cognitive tasks at least as well as an average human, across many domains, not just one. It usually implies broad reasoning, adaptation and autonomy, rather than being limited to a single use case like translation or image recognition.
Artificial Intelligence (AI) AI is the broad field of building computer systems that can perform tasks which typically require human intelligence, such as understanding language, recognizing patterns, making predictions or taking decisions. In business, AI usually refers to applied machine learning models embedded into products, workflows or services.
Machine Learning (ML) Machine learning is a subset of AI where models learn patterns from data rather than following manually coded rules. Instead of if/then logic, ML systems adjust their internal parameters to improve performance on tasks like classification, prediction or recommendation.
Deep Learning Deep learning is a subset of machine learning that uses multi-layer neural networks to learn complex patterns from large volumes of data. These models automatically discover useful features (e.g. shapes in images, patterns in audio, structures in text) instead of relying on hand-crafted rules. Deep learning powers most modern speech recognition, image generation and large language models.
Neural Network A neural network is a layered mathematical structure inspired (loosely) by the brain, made of interconnected "neurons" that transform input data step by step. Each connection has a weight that determines how strongly one unit influences another; training adjusts these weights so the network produces better outputs over time.
2. Large models, tokens and training
Large Language Model (LLM) A large language model is a deep neural network trained on vast amounts of text to predict the next token in a sequence. In practice, LLMs can chat, summarize, translate, write code and act as the reasoning core of many AI products. "GPT", "Claude", "Gemini", "Llama" or "Mistral" are LLM families; "ChatGPT", "Copilot" or "Le Chat" are assistant products built on top of them.
Tokens Tokens are the basic units of text a model processes, such as words, sub-words or characters. Billing, context limits and usage metrics are usually expressed in tokens (not characters). In practice, 100-150 tokens correspond roughly to 75-100 words of English.
Weights Weights are the numerical parameters inside a model that define how much importance is given to different features of the input. During training, the learning algorithm iteratively adjusts these weights to reduce the gap between the model's prediction and the desired output.
Training Training is the process of teaching a model to perform a task by exposing it to data and adjusting its weights to reduce errors. It is compute-intensive, often requires very large datasets, and is typically done once or a few times at scale before deployment.
Inference Inference is the process of running a trained model to generate outputs (predictions, text, audio, decisions) for new inputs. From a business perspective, training is a fixed, capital-intensive cost, while inference is the recurring cost that grows with usage.
Compute "Compute" refers to the computational power required to train and run AI models, usually provided by GPUs, TPUs or specialized accelerators. For modern LLMs, compute capacity is often the main bottleneck driving costs, latency and scalability.
Memory cache / KV caching Caching reuses intermediate computations so the model does not need to recompute everything at each step. In transformer models, key-value (KV) caching stores past tokens' representations so generating the next tokens becomes faster and cheaper. This is a key optimization for real-time and high-traffic applications.
Context window The context window is the maximum amount of text (tokens) the model can consider in a single request, including prompt, system instructions and previous messages. A larger context window allows the model to keep longer conversations, process longer documents or handle more complex multi-step tasks.
Cost per token Cost per token is the unit price AI providers use to bill model usage. It often differs for input tokens and output tokens, and may also differ for "reasoning tokens" in advanced models. Optimizing prompts, responses and caching strategies can significantly reduce total cost per token for a product.
3. Learning techniques and optimization
Fine-tuning Fine-tuning means continuing the training of an existing model on a more specific dataset to adapt it to a domain or task (e.g. customer support for veterinarians, internal knowledge of a company). It typically improves accuracy and tone in that niche while reusing all the general capabilities learned before.
Transfer learning Transfer learning uses a model trained on one task as the starting point for another, related task, reusing learned representations instead of training from scratch. Fine-tuning is a common form of transfer learning applied to large foundation models.
Distillation Distillation is a "teacher-student" technique where a smaller model learns to imitate a larger one by training on the larger model's outputs. The goal is to keep most of the quality while reducing size, latency and cost, which is crucial for edge devices or high-volume inference.
4. Generative AI, diffusion and GANs
Generative AI (GenAI) Generative AI refers to models that can create new content: text, images, audio, video, code or 3D assets. They do not just classify inputs; they produce original outputs that follow patterns learned from training data.
Diffusion model A diffusion model gradually adds noise to training data and then learns to reverse this process, denoising random noise back into coherent images, audio or other media. This "reverse diffusion" is behind many state-of-the-art image and video generation systems.
GAN (Generative Adversarial Network) A GAN uses two neural networks: a generator that produces synthetic data and a discriminator that tries to distinguish real from fake. By competing, both networks improve, leading to highly realistic images, videos or audio. GANs are widely used for deepfakes and realistic media synthesis.
Hallucination A hallucination occurs when a model produces confident but incorrect or fabricated information. Hallucinations are inherent to current LLMs, especially on topics not well covered in their training or when prompts are ambiguous. Product teams mitigate them via retrieval (RAG), constraints and better evaluation.
5. Embeddings, RAG and vector databases
Embedding An embedding is a numerical vector representation of text, audio, images or other data that captures semantic meaning. Similar content ends up with similar vectors. Embeddings are the backbone of semantic search, recommendation, clustering and retrieval-augmented generation.
Vector database A vector database is optimized to store and search embeddings efficiently using similarity metrics (e.g. cosine similarity). It enables fast "find me the most similar documents" queries, which are essential for RAG systems, recommendation engines and personalization.
RAG (Retrieval-Augmented Generation) Retrieval-augmented generation combines a generative model with an external knowledge base. Before answering, the system retrieves relevant documents (using embeddings and a vector database) and feeds them into the model so it can ground its answer in up-to-date or private data. RAG is critical for enterprise use cases where accuracy and freshness matter.
6. Agents, tools and automation
AI Agent An AI agent is a system that not only generates text but can plan, decide and take actions to achieve a goal, often across multiple steps and tools. Unlike a simple chatbot, an agent can call APIs, interact with databases, update CRMs, schedule meetings or launch workflows autonomously under constraints defined by the business.
Agentic AI Agentic AI refers to architectures that give models structured autonomy: the ability to set sub-goals, choose tools, monitor progress and adapt plans in real time. In practice, this means AI systems that behave more like digital coworkers than static chat interfaces, operating inside defined safety and compliance boundaries.
Tool use / Function calling Tool use is the ability of a model to call external functions (APIs, internal services, databases) from within a conversation. The model decides when and how to call a tool (e.g. "create_lead", "book_meeting", "check_inventory"), receives the result, and then continues the interaction with up-to-date information or completed actions.
Multi-agent system A multi-agent system orchestrates several specialized AI agents that collaborate to complete a process. For example, one agent qualifies a lead, another negotiates a meeting time, and a third handles post-call follow-up and CRM updates. This mirrors human teams and is powerful for complex workflows.
Workflow automation (AI) AI-driven workflow automation uses models and agents to execute end-to-end business processes: from inbound lead capture to qualification, appointment scheduling, routing to the right team and CRM enrichment. Instead of just answering questions, the system actually progresses the business process to completion.
Benchmark (AI models) Benchmarking AI models means systematically comparing them on standard tasks (e.g. reasoning, coding, multilinguality, latency, cost) to choose the best fit for a given product. In practice, teams benchmark models from OpenAI, Anthropic, Google, Mistral and others on their real business use cases, not only on public leaderboards.
7. Conversational and voice AI
Conversational AI Conversational AI covers systems that can understand and generate natural language in an interactive way across channels (web chat, WhatsApp, voice, email). Modern conversational AI goes beyond scripted chatbots, using LLMs, memory and tool use to provide more fluid, context-aware experiences.
Chatbot (modern) A chatbot is a conversational interface that interacts with users via text or messaging apps. Modern chatbots can rely on LLMs, RAG and tools to answer questions, guide users and trigger actions. Compared to an AI agent, a chatbot is often more constrained to Q&A and guided flows, with less autonomy over external systems.
Voicebot A voicebot is an AI agent that talks with users over the phone or other voice channels in real time. It handles speech recognition, language understanding, reasoning and speech synthesis under tight latency constraints, enabling use cases like inbound call routing, appointment booking or support triage without a human operator.
Speech-to-Text (STT) Speech-to-text converts spoken audio into written text. It is the first step in most voice AI systems and must be accurate, fast and robust to noise, accents and domain-specific vocabulary.
Text-to-Speech (TTS) Text-to-speech converts written text into natural-sounding audio. Modern TTS can generate expressive, low-latency voices that feel close to human speech, which is critical for customer experience in voicebots and virtual agents.
Latency Latency is the time it takes for a system to respond to a user action. In voice AI, latency must be very low (often under a few hundred milliseconds) to keep conversations natural and avoid people talking over the bot or abandoning the call.
8. Product, business and safety
Prompt engineering Prompt engineering is the practice of designing and structuring inputs to a model to obtain better, more reliable outputs. It includes instructions, examples, constraints, roles and formatting, and is a key lever for improving quality without changing the underlying model.
Lead qualification (AI) AI-driven lead qualification uses models and agents to assess how valuable an inbound prospect is based on their answers, behavior and context. It can score leads, ask follow-up questions, and decide whether to route them to sales, propose a meeting or handle them via self-service.
AI scheduling AI scheduling refers to agents that automatically find and book meeting slots across calendars, time zones and constraints. Combined with conversational AI, it allows leads or customers to confirm appointments directly in chat or over the phone without human intervention.
Omnichannel AI Omnichannel AI delivers consistent, connected experiences across multiple channels (phone, chat, email, social messaging). The same underlying agent or knowledge base can follow the user from one channel to another, preserving context and history.
AI safety AI safety focuses on preventing AI systems from causing harm, intentionally or unintentionally. It covers topics like misuse prevention, robustness against attacks, bias reduction, and ensuring systems behave within acceptable norms, especially when they are autonomous or high-impact.
Alignment Alignment refers to making AI systems' behavior match human values, goals and constraints. In practice, this means training and governing models so they follow policy, respect regulation, and act in ways that serve users and organizations, not just optimize a technical objective.
RAMageddon "RAMageddon" is an informal term describing the global memory chip shortage driven in part by the AI boom. Large-scale training and inference require massive amounts of RAM, which also affects availability and prices for gaming, consumer electronics and traditional IT.