Hands-On: How to Build a LangGraph Retrieval Agent (Step-by-Step Tutorial)
Langraph Retrieval Agent

Hands-On: How to Build a LangGraph Retrieval Agent (Step-by-Step Tutorial)

I’ve been fascinated by the ever-growing capabilities of Large Language Models (LLMs), especially when it comes to connecting them with external information sources. Over the last few weeks, I’ve been experimenting with a library called LangGraph and its unique way of orchestrating LLMs, tools, and states in a graph-like workflow. Today, I want to share a detailed, hands-on tutorial on how to implement a LangGraph Retrieval Agent from scratch.

If you’ve heard about RAG (Retrieval Augmented Generation), you know how it can significantly enhance your LLM-based applications by injecting relevant external content into the conversation. Below, I’ll walk you through the entire code and thought process. By the end of this article, you’ll have a working example you can adapt to your own use cases.


1. Introduction

Working with Large Language Models (LLMs) often means you need to pull in external knowledge. We achieve this by retrieving relevant data from various sources (like blog posts, PDFs, or databases) before generating a response.

Not long ago, building advanced AI agents with sophisticated retrieval and decision-making logic felt like something only huge tech companies could do. But thanks to open-source libraries such as LangChain and LangGraph, it’s more accessible than ever.

In this tutorial, I’m focusing on LangGraph because it lets us define each step in our AI pipeline as a node in a graph, clearly specifying how data flows and under what conditions each node gets triggered. This approach is intuitive, modular, and highly flexible, something that’s crucial if you want to scale or customize your retrieval pipeline later.

This tutorial shows you how to build a LangGraph Retrieval Agent with just two files:

  • retriever.ts – Creates a system that loads, splits, and embeds Lilian Weng’s blog posts into vector representations for easy searching.
  • index.ts – Sets up the state graph (with nodes and edges) that orchestrates user queries, calls the retrieval tool, checks relevance, and produces the final answer with a Google Generative AI model.

By the end, you’ll have a complete “agentic” workflow that can automatically fetch and incorporate external information into its responses.


2. Why LangGraph?

Before we dive into the actual code, let me highlight a few reasons why you might choose LangGraph for your retrieval agent:

  1. Graph-Based Workflows: It provides a stateful graph approach, where each node (function) can modify the state, and edges define the logic of your agent’s decision-making.
  2. Customizable: You can easily add or remove nodes and tools, define your own prompt templates, or integrate new embeddings or vector stores without rewriting your entire setup.
  3. Clear Abstraction: States, edges, and nodes make it straightforward to visualize how your data and decisions move through the system.

If you’ve been using LangChain alone, LangGraph’s approach can be a bit of a mental shift. However, I find it clarifies the agent’s control flow and makes debugging significantly easier.


3. Project Setup

  • Install dependencies (make sure you have Node.js and npm):

npm init
npm install dotenv 
npm install @langchain/langgraph 
npm install @langchain/core 
npm install @langchain/community 
npm install @langchain/google-genai 
npm install langchain 
npm install @langchain/textsplitters 
npm install zod 
npm install zod-to-json-schema
        

  • Set up your .env file in the project root:

GOOGLE_API_KEY="YOUR_GOOGLE_API_KEY"        

Replace YOUR_GOOGLE_API_KEY with your actual API key for Google Generative AI. With everything in place, you’re ready to start coding.


Next, I will start to explain step by step every piece of code that was developed to implement the agent. You can find the full reference code on this Github repository.


4. The Retriever: retriever.ts

This file handles document loading, splitting, embedding, and retrieval. The core idea is to transform Lilian Weng’s articles into vector embeddings so we can later query them semantically. Below is the file in full, followed by an explanation of each section.


Article content
retriever.ts

Detailed Explanation of retriever.ts

a)Importing and Configuring


Article content
importing and configuring

  • Loads environment variables from .env, including your GOOGLE_API_KEY.
  • Pulls in the necessary classes from LangChain for loading pages, splitting text, embedding, and vector storage.


b)Function createRetriever()

  • Define URLs: We specify three blog posts from Lilian Weng, covering LLM agents, prompt engineering, and adversarial attacks.


Article content
Define URLs

  • Load Documents:


Article content
Load Documents

  • Uses CheerioWebBaseLoader to scrape the HTML content from each URL.
  • Promise.all loads all pages in parallel, and flat() consolidates them into a single array of documents.


  • Split the Documents:

The docSplits step is essential because language models have a limited context window, meaning they can only process a certain amount of text at once. By splitting documents into smaller, overlapping chunks, we ensure each piece stays within this limit, enabling the model to handle them effectively. This also improves relevance and retrieval efficiency, as each chunk can be indexed and compared individually, allowing the system to return only the most pertinent information for a given query. Overlapping chunks help maintain context across sections, preventing important information from being lost between splits. In short, chunking allows for precise, scalable, and context-aware retrieval, forming a foundation for accurate and efficient RAG-based responses.

Article content
Split the documents

  • RecursiveCharacterTextSplitter breaks long articles into smaller chunks of ~500 characters.
  • Overlapping 50 characters helps maintain context across chunks so you don’t lose continuity between splits.


  • Embeddings and Vector Storage:


Article content
Embeddings and Vector Storage

  • This uses GoogleGenerativeAIEmbeddings to transform each chunk into a vector.
  • Stores those vectors in a MemoryVectorStore, which resides in RAM. (You can swap it out for a more persistent store if needed.)


  • Return a Retriever:


Article content
Return a Retriever

  • Converts the in-memory vector store into a retriever, which you’ll use later to query for relevant context.


c)Exporting the Retriever


Article content
Exporting the Retriever

  • Immediately calls createRetriever() so that your agent has an off-the-shelf retriever.
  • Makes it easy to import and use in other files, specifically in our index.ts.



Why This Matters

  • Semantic Search: By embedding the text, queries can find related concepts without relying on exact keyword matches.
  • RAG Workflow: This is the “retrieval” half of a Retrieval-Augmented Generation pipeline, feeding context to an LLM for more accurate answers.
  • Reusable: Because it’s self-contained, you can point the same retrieval logic at different docs or even different embedding models with minimal changes.


5. The Orchestrator: index.ts

In this file, we define our LangGraph workflow. We create nodes (functions) that handle tasks like deciding whether to retrieve documents, grading relevance, rewriting the query, and ultimately generating an answer.

Below is the entire file; read on for a breakdown of each major piece.


Article content
index.ts

Detailed Explanation of index.ts

a) Imports and Initial Configuration

Article content

  • Loads environment variables from your .env file (including GOOGLE_API_KEY).

Article content

  • Annotation: used to define GraphState fields.
  • BaseMessage / HumanMessage / AIMessage: for storing different message types (the user’s query, the AI’s response, etc.).
  • StateGraph, START, END: components from LangGraph that build and manage the graph-based workflow.
  • ChatPromptTemplate: to create prompt templates for the AI model.
  • z (Zod): library for schema validation.
  • ChatGoogleGenerativeAI: the client to interact with Google’s gemini-1.5-pro LLM.
  • ToolNode & createRetrieverTool: help create a tool node that can be added to the state graph.


Article content

  • Pulls in the retriever (from your retriever.ts) that loads, splits, and indexes Lilian Weng’s blog posts.

b) Defining the Graph State

Article content

  • The messages array in GraphState stores the entire conversation: user queries, AI responses, tool outputs, etc.
  • The reducer function concatenates new messages onto the existing list.


c) Creating the Retrieval Tool

Article content

  • createRetrieverTool wraps the retriever so the agent can call it by name.
  • This node is added to the graph, enabling retrieval logic to be invoked during the workflow.

d) Node Functions

Each node corresponds to a function that either modifies the conversation state or decides which path the graph should take next.

  1. shouldRetrieve: Checks whether the agent decided to call the retrieval tool.
  2. gradeDocuments: Asks the AI to assess if the retrieved documents are relevant to the user’s query.
  3. checkRelevance: Reads the tool call result from gradeDocuments.
  4. agent: The main AI call that decides whether to retrieve or just answer.
  5. rewrite: If the documents weren’t relevant, the agent tries to refine or improve the user’s original query to get better results next time.
  6. generate: Once the agent has relevant documents, this node calls the AI to merge the user’s question with the retrieved context and produce a final answer.

Article content
Nodes and Edges

e) Defining the Graph

Article content

  • Creates a new graph with the above node functions.

Article content

  • START → agent: the conversation begins.
  • agent → shouldRetrieve: if the AI calls the tool, go to retrieve; otherwise, end.
  • retrieve → gradeDocuments: once we retrieve docs, the agent judges relevance.
  • gradeDocuments → checkRelevance: returns yes or no, leading to generate or rewrite.
  • rewrite → agent: if the docs were irrelevant, refine the question and try again.
  • generate → END: final answer is produced; workflow ends.


f) Compiling and Running

Article content

  • workflow.compile() turns the node/edge definitions into an executable object.
  • app.stream(inputs) runs the conversation, yielding outputs from each node.
  • Each step’s output is logged, then the final state is printed in JSON for clarity.

And the full execution flow is like this:

  1. User Input: A HumanMessage asks about the types of agent memory described in Lilian Weng’s blog.
  2. Agent Node: Decides whether to retrieve info.
  3. Retrieval: If needed, calls the retrieve_blog_posts tool to grab relevant chunks.
  4. Relevance Check: The agent uses gradeDocuments to see if the chunked text is helpful.
  5. Rewrite or Generate:
  6. Answer: Once an appropriate snippet is found, the agent merges it with the user query to create a concluding response.

This approach is more robust than simple retrieval because it validates the retrieved data and can refine user queries, making your system far more reliable and intelligent.

You should see logs indicating how the graph moves from agent → retrieve → gradeDocuments → ... etc. In the end, you’ll get a final answer referencing the relevant sections of Lilian Weng’s blog. And that’s it: you have a working LangGraph Retrieval Agent!


6. Conclusion

Building a retrieval-augmented workflow for your AI agent no longer has to be a daunting task. By leveraging LangGraph’s node/edge model, you keep your logic organized and highly adaptable. Whenever you need new functionalities, like adding a custom tool to query an external API or integrating a different vector store, it’s as easy as defining a new node and wiring it up with your existing graph.

If you found this tutorial helpful, I’d love to hear about your own experiences and experiments with retrieval agents. Which vector stores have you tried, or what custom tools did you add to your workflow? Share your thoughts in the comments below or tag a friend who’s exploring advanced LLM solutions!


#LLM #LangGraph #AI #RAG #DevTutorial

Thanks for reading! I hope this step-by-step guide helps you implement your own Retrieval Agent using LangGraph. Feel free to connect and share your feedback or questions!

The synergy between stateful graphs and LLM retrieval is inspiring! Your explanation helps me see how each step logically fits together 🔗

Dante H.

Desenvolvedor de Software

3w

Loved your use of ChatPromptTemplate. It shows how flexible prompt engineering can be when combined with a robust pipeline 🤓

Guilherme Azevedo

Fullstack Engineer | Front-end Developer | React | Next.js | Node | Typescript | LLM | AWS

3w

The code structure is so clear! I can already imagine how easy it would be to add new nodes or tools in the future 😎

Wandaymo Gomes

Software Engineer | Java | Spring | AWS | Azure

3w

Great work! This kind of step-by-step tutorial is super helpful for anyone exploring RAG and LangGraph. Excited to dive in and see how Google Generative AI fits into the flow. Thanks for sharing!

Paulo Guedes

AI Engineer | Computer Vision | MLOps | RAG | LLM | M.Sc. in Computer Science

3w

Really enjoyed this tutorial — super clear and hands-on! LangGraph seems like a powerful way to structure more dynamic retrieval workflows. I'm curious, in your experience, how does it hold up compared to more traditional linear pipelines when it comes to scaling or debugging complex logic?

To view or add a comment, sign in

More articles by Hiram Reis Neto

Insights from the community

Others also viewed

Explore topics