Inside the RAG Engine: How AI Finds and Tells the Right Story 📚🔍

Tanishq Singh

Kaggle Discussion Expert | Researcher 🕵| Learning and Teaching Machine Learning👨💻

Published Dec 22, 2024

Hello Hello Hello! 👋

Welcome everyone! Previously, we discussed why RAG is such a game-changer for AI. You can check that out here -> Retrieval-Augmented Generation(RAG) : Not Just a Cloth, But a Game-Changer in AI! 🧠✨

Now, let’s understand the components that make RAG truly what it is. Think of each component as a member of the Avengers—every component has a special superpower that, when combined, can save the day! 🦸🦸

We will take an example of a Basic RAG System:

Article content — Source: My own writing pad!

Cool, right? 😎 Chillax, I’ll explain each component in detail! 💡

Step 1: User Query & Embedding 🕵️

Beginning the Journey: Everything starts with a user query. This could be anything from "What's the capital of Tripura?" to "How to make Kadhai Paneer?"

Transforming Words into Numbers: The query is transformed into a vector embedding—a numerical representation that captures the essence of the query. Think of this as translating a sentence into a secret language only the AI understands.

📌 Notice how the user's query converts into an embedding vector [0.2, 0.7, 0.8, 0.9]. This vector is key to finding relevant information.

Step 2: Chunking & Embedding External Data 📚

Preparing the External Data: Before the query even comes in, the system prepares by chunking external data into manageable pieces. This could be documents, web pages, or any textual content.

Chunking the External Data : Each chunk is then turned into its own embedding, similar to the query embedding, and stored in a Vector Database. Now one important thing to consider here is -> the Embedding model which is used here should be same as the one which was used for the query.

💭 Now, one doubt you might have is, “Why do we even need Chunking?” Can’t we just use the whole document as-is? 🤔

Here’s the deal:

1. Documents can vary in length; managing large documents would be challenging.

2. Chunking improves contextual relevance, ensuring the model focuses on specific, relevant parts.

3. LLMs have context window constraints—so we need to work within those limits.

We will talk about more about chunking in detail in the upcoming articles.

Recommended by LinkedIn

O3’s Most Surprising Use Cases to Transform Your Work

Pascal BORNET 2 weeks ago

O3’s Most Surprising Use Cases to Transform Your Work

Pascal BORNET 2 weeks ago

O3’s Most Surprising Use Cases to Transform Your Work

Pascal BORNET 2 weeks ago

📌 You can see in the image how documents are processed into embeddings and stored—ready to retrieve the moment a query matches their content.

Step 3: Query Matching 🔍

The Matchmaker: This is where the magic happens. ✨ The query embedding is compared against all document embeddings in the Vector Database to find the best matches based on similarity.

Fetching Relevant Text: Now it selects the top-k (can be any number) best matches. Then these are used by the LLM as an added/specialised knowledge to frame an answer. This ensures the response is based on accurate and relevant information.

👉 In simple terms -> It looks at the user query (example : “What is the capital of Tripura?”) and decides which documents (or chunks of data) might contain relevant info(example: Data containing information about state capitals of India). Then it feeds those documents back to the LLM.

💡 Cool Tip: We’ll explore the HNSW algorithm in a future article—that’s one amazing method for query matching in the Vector DB effectively. 🛠️

Step 4: LLM Processing 🧠

The Storyteller: Finally, we have the LLM, the brilliant narrator that takes the curated information from the retriever and crafts a coherent, context-aware answer, turning separate pieces of data into a helpful explanation.

📌 Fun Fact: You can use smaller LLMs instead of massive ones to save costs, especially when you have a solid retrieval setup. 💸

Here's a small example:

User Query:

"What are some natural ways to manage anxiety during pregnancy?"

Retrieved Text:

"Natural methods for managing anxiety during pregnancy include practicing yoga, meditation, and maintaining a balanced diet. Regular physical activity, such as walking or swimming, can also alleviate stress. It's important for expectant mothers to have regular sleep patterns and engage in social activities or support groups that can provide emotional support."

LLM Generated Response:

"Yoga and meditation are highly beneficial for managing anxiety during pregnancy as they enhance relaxation and mental balance. Including foods rich in omega-3 fatty acids and ensuring consistent sleep can also significantly reduce stress levels. Engaging in gentle exercises like walking can further help maintain emotional well-being."

Wrapping It Up 🎁

I hope it wasn’t boring, and you guys had fun reading it! 😄 In the upcoming articles, we'll uncover more about chunking strategies, vector databases, and much more.

Till then, Happy Learning! Cheers! 🙌

Luv Pratap

working in indian Restaurant as Restaurant manager

4mo

Great keep it up tanishq...

1 Reaction

Anagha Ramadas Mulloth

MSc Artificial Intelligence and Machine Learning | Ex-Mercedes Benz | Freelancer

4mo

Great read! Simple and easy to understand 👌 👏

1 Reaction

Sakshi Singh

Serving Notice Period | Associate Engineer: Data & Information @ Shell | Kaggle Discussion Expert | Generative AI | Data Science

4mo

Greatly explained! Waiting for the article on Chunking!

1 Reaction

Prince Chauhan

Thermal Data Analyst | Innovating Adaptive Battery Cooling Systems with AI | Patent Holder

4mo

It's a good read !! 👍

1 Reaction

See more comments

To view or add a comment, sign in

Inside the RAG Engine: How AI Finds and Tells the Right Story 📚🔍

Tanishq Singh

Kaggle Discussion Expert | Researcher 🕵| Learning and Teaching Machine Learning👨💻

Recommended by LinkedIn

More articles by Tanishq Singh

Insights from the community

Others also viewed

Almost Timely News: 🗞️ The DROID Framework for AI Task Delegation (2024-10-27)

Designing Hybrid Systems of Humans and OpenAI Model o1: Insights and Design Principles

How AI Will Help Us Navigate Uncertainty in 2025

The AI & Data Digest - 5th Edition

RAG in 2025: Navigating the New Frontier of AI and Data Integration

Beyond RAG: How Gemini 2.0 and Flash Are Redefining the Future of LLMs

Smarter AI, Fewer Errors: Why Agentic RAG is the Future of Intelligent Systems

This Week in AI #1: Scaling Q&A, Efficiency Trade-offs, and Increasing Credibility

June 19, 2024

The 7 personas of Machine Learning – and what they need from you as a leader

Explore topics

Recommended by LinkedIn

More articles by Tanishq Singh

HNSW Algorithm: The Secret Sauce Behind Lightning-Fast Vector Searches in RAG Systems 🔍⚡

Vector Databases: The Backbone of Retrieval-Augmented Generation (RAG) Systems 🌐📚

Chunking: The Unsung Hero of Retrieval-Augmented Generation (RAG) 📚🌉

Insights from the community

Others also viewed

Almost Timely News: 🗞️ The DROID Framework for AI Task Delegation (2024-10-27)

Designing Hybrid Systems of Humans and OpenAI Model o1: Insights and Design Principles

How AI Will Help Us Navigate Uncertainty in 2025

The AI & Data Digest - 5th Edition

RAG in 2025: Navigating the New Frontier of AI and Data Integration

Beyond RAG: How Gemini 2.0 and Flash Are Redefining the Future of LLMs

Smarter AI, Fewer Errors: Why Agentic RAG is the Future of Intelligent Systems

This Week in AI #1: Scaling Q&A, Efficiency Trade-offs, and Increasing Credibility

June 19, 2024

The 7 personas of Machine Learning – and what they need from you as a leader

Explore topics