What is RAG (Retrieval augmented generation)

Osman Yılmaz
3 min readDec 17, 2024

--

Nowadays, everyone is talking about RAG. I want to break it down simply — what RAG is and how we can use it effectively.

Retrieval-Augmented Generation (RAG) is making AI smarter and more useful by combining the ability to search for information with the power to create natural, helpful responses. It’s a simple idea that’s changing how we interact with technology

This method of (RAG) was first introduced in a 2020 paper co-authored by Patrick Lewis, now director of Machine Learning at enterprise AI startup Cohere.

Retrieval-Augmented Generation (RAG) helps improve the answers given by large language models (LLMs). It does this by adding relevant and up-to-date information to the AI’s response without needing to change the original AI model itself. This extra information can be very specific to a company, an industry, or even the latest news, making the AI’s answers more accurate and useful. Imagine an AI model as a student taking an exam. RAG helps this AI student by letting it access textbooks (like company documents or news articles) during the exam. This helps the AI give more accurate answers, avoid making mistakes, and stay up-to-date with the latest information. Many big tech companies like Microsoft, Google, Amazon, and Nvidia are now using RAG to make their AI models smarter.

Basically, RAG can find answers from a lot of data, even millions of documents.

Components of RAG Architecture

  1. Input (Question or Prompt)
    This is where the user asks a question or provides a statement for the AI to respond to.
  2. Retriever
  • The retriever searches a large database (or “knowledge base”) to find relevant pieces of information.
  • These could be documents, snippets, or facts related to the user’s question.

Common retrievers include:

  • Dense Retrieval: Uses neural networks to find the most relevant results.
  • Sparse Retrieval: Uses traditional methods like TF-IDF or BM25 to search for keywords.

3. Generator (LLM)

  • The generator (usually a large language model like GPT or Gemini) combines the retrieved information with the user’s question.
  • It generates a response based on both the question and the extra facts provided by the retriever.
  1. Output (Response)
    The final output is a coherent and accurate answer that reflects both the user’s input and the retrieved knowledge.

Maybe you can ask “RAG the same as generative AI?”

No, it’s not. RAG is a technique that gives more accurate answers by using external knowledge, not just the information already inside the AI model.

Somes examples of Retrieval-Augmented Generation

Example 1 : Customer Support Systems

Scenario: A customer asks for the warranty terms of a product.
RAG in Action: The system retrieves the latest warranty policy document or FAQ section and generates a concise, customized explanation based on the document.
Example Output:
“Your product has a 2-year warranty that covers manufacturing defects but excludes accidental damage. For more details, see the full terms here…”

Example 2 : E-commerce Search and Recommendation

Scenario: A user searches for “best laptops under $1,000.”
RAG in Action: The system retrieves product reviews, specifications, and ratings from various e-commerce sites and generates a summary.
Example Output:
“The top laptops under $1,000 include the Dell XPS 13, MacBook Air M1, and Lenovo Yoga Slim 7. Each offers strong performance and long battery life...”

--

--

Osman Yılmaz
Osman Yılmaz

No responses yet