# LangGraph and Research Agents

> LangGraph is a graph-based agentic framework that allows us to build more flexible and controllable AI agents. Here, we'll see how to build an AI Research Agent with LangGraph.

James Briggs · 2024-07-11

**Research agents** are multi-step AI agents that aim to provide in-depth answers to user queries. These agents differ from typical conversational agents, where we typically find fast response times and short AI answers.


There are several good reasons our conversational agents respond in the way they do. The first reason is that we aim to imitate the style of human conversation. A typical conversation between two people consists of short back-and-forths — it's rare (and often not wanted) that one of those people suddenly breaks out into a multi-page monologue.


Another typical quality of human conversations is a short wait time between responses. When someone says "hi," it would be odd to wait even a second before getting a response. We expect low-latency responses from people, and the same applies to our conversational AI agents.


Now, let's move away from conversational agents and consider what qualities we might want in a *research agent*. A longer, more researched, and cited response can be much more helpful if we ask a question in research or study. In this scenario, many might choose to delay their response by a few (or more) seconds *if* they're getting a more detailed response. In these scenarios, a user prefers to speak to a research agent rather than the typical conversational agent.


Because we have less time pressure with research agents, we can design them to work through a multi-step research pipeline and synthesize information from different places. Given a user's query, a research agent should be able to search the web, read papers, and access other relevant sources of information for its particular use case.


That doesn't mean a research agent cannot be conversational. In most cases, users prefer a conversational interface, and we can design agents that do both.


![Graph-based research agent that will be built](https://cdn.sanity.io/images/vr8gru94/production/6d0cc1cda220a85ac5ff1272dffd2e53f1a98daa-4140x2160.png)


We'll focus on building a research agent and how we can achieve that using a graph-based approach to agent construction.


[Video](https://www.youtube.com/watch?v=usOmwLZNVuM)


## Graphs for Agents


Agents are still a relatively new concept. The modern form of agents, primarily LLM-powered, only became popular in early 2022, shortly after the release of ChatGPT.


These earlier agents focused on providing an LLM with a decision-making framework allowing iterative decision-making and tool-use steps. The most popular of these being the **Re**ason **Act**ion (ReAct) method [1].


ReAct encourages an LLM to generate steps broken into iterative *reasoning* and *action* steps. The reasoning step  encourages the LLM to outline what steps it needs to take to answer a question. The agent then follows up with an action where our tool/function call executes a step like searching the internet.


![ReAct agent process](https://cdn.sanity.io/images/vr8gru94/production/c938bab8bc051cf2bf62eb3a411c60065544f082-3360x3036.png)


After these two steps, the agent receives an observation, i.e., the result of our action step. This reasoning-action loop can be repeated with various actions, all culminating in a final answer.


The ReAct framework is powerful because it allows us to combine LLMs with a wide range of data sources and executable programs.


Popular implementations of ReAct (or a similar method called "MRKL") made their way to libraries like LangChain, Llama-Index, and Haystack. These implementations allowed us to built ReAct-like agents using an object oriented approach.


The object-oriented approach functions well but leaves little flexibility or transparency in our agent. It's very possible to use ReAct agents via LangChain and have no idea how any of it is functioning, given the framework's abstractions.


![Graph-based ReAct-like agent](https://cdn.sanity.io/images/vr8gru94/production/46be5878f4b07b2d259c434dfcceec18e037e404-1857x1272.png)


An interesting solution to this is to construct agents as graphs. When building an agent as a graph, you can still produce the iterative ReAct-like loop, but it is much easier to customize that loop. We can add more deterministic flows and hierarchical decision-making and simply build with more flexibility and transparency.


More deterministic flows in our agent allow us to address two key limitations of ReAct-like agents: interpretability and entropy.


The **interpretability** of LLMs can often be hard to understand or measure. LLMs lack hard-coded rules, in contrast to traditional software which is *very* interpretable as it is constructed *solely* from hard-coded rules. This lack of deterministic behavior in LLMs is both a blessing and a curse. They can generate a seemingly infinite number of outputs, resulting in seemingly infinite potential — but those outputs are hard to engineer and impossible to fully predict.


To deploy LLM-based software into the world, we need confidence that it will behave a certain way. However, LLMs are compressed representations of an entire world, making it impossible (for now) to predict every possible output of an LLM.


Because of the lack of interpretability, LLM-based software needs as much determinstic logic as possible. Anything that does *not* require an LLM, should *not* use an LLM.


**Entropy** is the tendency of a system to become more chaotic over time.When executed iteratively without human guidance, an LLM can hallucinate or simply struggle to keep track of an increasingly long set of prior thoughts, actions, and observations. Due to this, LLMs typically become increasingly divergent from their original purpose as the number of iterative generations increases.


## LangGraph


LangGraph is LangChain's graph-based agent framework and one of the most popular frameworks for building graph-based agents. It focuses on providing more "fine-grained" control over an agent's flow and state.


### State


At the core of a graph in LangGraph is the agent *state*. The state is a mutable object where we track the current state of the agent execution as we pass through the graph. We can include different parameters within the state, in our research agent we will use a minimal example containing three parameters:


- `input`: This is the user's most recent query. Usually, this is a question that we want to answer with our research agent.
- `chat_history`: We are building a conversational agent that can support multiple interactions. To allow previous interactions to provide additional context throughout our agent logic, we include the chat history in the agent state.
- `intermediate_steps` provides a record of all steps the research agent will take between the user asking a question via `input` and the agent providing a final answer. These can include "search arxiv", "perform general purpose web search," etc. These intermediate steps are crucial to allowing the agent to follow a path of coherent actions and ultimately producing an informed final answer.

---

##### Follow Along with Code!

*The remainder of this article will be focused on the implementation of a LangGraph research agent using Python. You can find a copy of the [code here](https://github.com/pinecone-io/examples/blob/master/learn/generation/langchain/langgraph/01-gpt-4o-research-agent.ipynb).*

_**If following along, make sure to install prerequisite packages first:**_

```
!pip install -qU \
    datasets==2.19.1 \
    langchain-pinecone==0.1.1 \
    langchain-openai==0.1.9 \
    langchain==0.2.5 \
    langchain-core==0.2.9 \
    langgraph==0.1.1 \
    semantic-router==0.0.48 \
    serpapi==0.1.5 \
    google-search-results==2.4.2 \
```

---


We define our minimal agent state object like so:


```
from typing import TypedDict, Annotated
from langchain_core.agents import AgentAction
from langchain_core.messages import BaseMessage
import operator


class AgentState(TypedDict):
   input: str
   chat_history: list[BaseMessage]
   intermediate_steps: Annotated[list[tuple[AgentAction, str]], operator.add]
```

### Creating our Tools


The agent graph consists of several nodes, *five* of which are custom tools we need to create. Those tools are:


- **ArXiv paper fetch**: Given an arXiv paper ID, this tool provides our agent with the paper's abstract.


- **Web search**: This tool provides our agent access to Google search for more generalized queries.


- **RAG search**: We will create a knowledge base containing AI arXiv papers. This tool provides our agent with access to this knowledge.


- **RAG search with filter**: Sometimes, our agent may need more information from a specific paper. This tool allows our agent to do just that.


- **Final answer**: We create a custom final answer tool that forces our agent to output information in a specific format like:


 ```markdown
 INTRODUCTION
 ------------
 <some intro to our report>
  RESEARCH STEPS
 --------------
 <the steps the agent took during research>
  REPORT
 ------
 <the report main body>
  CONCLUSION
 ----------
 <the report conclusion>
  SOURCES
 -------
 <any sources the agent used>
 ```


We'll setup each of these tools, which our LLM and graph-logic will later be able to execute.


#### ArXiv Paper Fetch Tool


The `fetch_arxiv` tool will allow our agent to get the summary of a specific paper given an ArXiv paper ID. We will simply send a GET request to arXiv and use regex to extract the paper abstract. That GET request will return something like this:


[Colab File](https://cdn.sanity.io/files/vr8gru94/production/3b9e1653f2ed56535c6b7d7366768a482f884716.ipynb)


A lot is going on there. Fortunately, we can a small regex to find the paper abstract.


[Colab File](https://cdn.sanity.io/files/vr8gru94/production/d213fa69fc9e71123e859989bc9f265564888d2e.ipynb)


Now we pack all of this logic into a tool for our agent.


```
from langchain_core.tools import tool


@tool("fetch_arxiv")
def fetch_arxiv(arxiv_id: str):
   """Gets the abstract from an ArXiv paper given the arxiv ID. Useful for
   finding high-level context about a specific paper."""
   # get paper page in html
   res = requests.get(
       f"https://export.arxiv.org/abs/{arxiv_id}"
   )
   # search html for abstract
   re_match = abstract_pattern.search(res.text)
   # return abstract text
   return re_match.group(1)
```

We can test the tool to confirm it works using the `.invoke` method like so:


[Colab File](https://cdn.sanity.io/files/vr8gru94/production/ebb513434607044fd806627ee5e007b1e80749ec.ipynb)


#### Web Search Tool


The web search tool will provide the agent with access to web search. We will instruct our LLM to use this for more general knowledge queries. To implement the search component of the tool, we will be using the SerpAPI. You can get a [free API key for the service here](https://serpapi.com/manage-api-key). Once you have the API key we run a query for `"coffee"` like so:


[Colab File](https://cdn.sanity.io/files/vr8gru94/production/ed737001152255e9ac71bad5901a3ea6d626101a.ipynb)


We place all of this logic into another tool:


```
@tool("web_search")
def web_search(query: str):
   """Finds general knowledge information using Google search. Can also be used
   to augment more 'general' knowledge to a previous specialist query."""
   search = GoogleSearch({
       **serpapi_params,
       "q": query,
       "num": 5
   })
   results = search.get_dict()["organic_results"]
   contexts = "\n---\n".join(
       ["\n".join([x["title"], x["snippet"], x["link"]]) for x in results]
   )
   return contexts
```

#### RAG Tools


We provide two RAG-focused tools for our agent. The `rag_search` allows the agent to perform a simple RAG search for some information across *all* indexed research papers. The `rag_search_filter` also searches, but *within* a specific paper filtered for via the `arxiv_id` parameter.


We also define the `format_rag_contexts` function to handle the transformation of our Pinecone results from a JSON object to a readble plaintext format. *Note: you can find the code for setting up the index we use [here](https://github.com/pinecone-io/examples/blob/master/learn/generation/langchain/langgraph/01-gpt-4o-research-agent.ipynb).*


```
def format_rag_contexts(matches: list):
   contexts = []
   for x in matches:
       text = (
           f"Title: {x['metadata']['title']}\n"
           f"Content: {x['metadata']['content']}\n"
           f"ArXiv ID: {x['metadata']['arxiv_id']}\n"
           f"Related Papers: {x['metadata']['references']}\n"
       )
       contexts.append(text)
   context_str = "\n---\n".join(contexts)
   return context_str


@tool("rag_search_filter")
def rag_search_filter(query: str, arxiv_id: str):
   """Finds information from our ArXiv database using a natural language query
   and a specific ArXiv ID. Allows us to learn more details about a specific paper."""
   xq = encoder([query])
   xc = index.query(vector=xq, top_k=6, include_metadata=True, filter={"arxiv_id": arxiv_id})
   context_str = format_rag_contexts(xc["matches"])
   return context_str


@tool("rag_search")
def rag_search(query: str):
   """Finds specialist information on AI using a natural language query."""
   xq = encoder([query])
   xc = index.query(vector=xq, top_k=2, include_metadata=True)
   context_str = format_rag_contexts(xc["matches"])
   return context_str
```

#### Final Answer "Tool"


Finally, we define a "final answer" tool. This tool isn't a tool in the usual sense, instead we use it to force a particular output format from our  LLM via the function/tool calling.


```
@tool("final_answer")
def final_answer(
   introduction: str,
   research_steps: str,
   main_body: str,
   conclusion: str,
   sources: str
):
   """Returns a natural language response to the user in the form of a research
   report. There are several sections to this report, those are:
   - `introduction`: a short paragraph introducing the user's question and the
   topic we are researching.
   - `research_steps`: a few bullet points explaining the steps that were taken
   to research your report.
   - `main_body`: this is where the bulk of high quality and concise
   information that answers the user's question belongs. It is 3-4 paragraphs
   long in length.
   - `conclusion`: this is a short single paragraph conclusion providing a
   concise but sophisticated view on what was found.
   - `sources`: a bulletpoint list provided detailed sources for all information
   referenced during the research process
   """
   if type(research_steps) is list:
       research_steps = "\n".join([f"- {r}" for r in research_steps])
   if type(sources) is list:
       sources = "\n".join([f"- {s}" for s in sources])
   return ""
```

We have finished defining all our tools and can move onto our next components.


### Defining the Oracle


The **Oracle** LLM is our graph's decision maker. It  decides which path we should take down our graph. It functions similarly to an agent but is much simpler; consisting of a single prompt, LLM, and tool instructions. When executed the oracle will only run once before we move to another node in our graph.


The Oracle consists of an LLM provided with a set of potential function calls (i.e., our tools) that it can decide to use. We force it to use *at least* one of those tools using the `tool_choice="any"` setting (see below). Our Oracle only makes the decision to use a tool; it doesn't execute the tool code itself (we do that separately in our graph).


Our prompt for the Oracle will emphasize its decision-making ability within the `system_prompt`. We will also add a placeholder for us to later insert `chat_history`, and provide a place for us to insert the intermediate steps (`scratchpad`) and user `input`.


```
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder


system_prompt = """You are the oracle, the great AI decision maker.
Given the user's query you must decide what to do with it based on the
list of tools provided to you.


If you see that a tool has been used (in the scratchpad) with a particular
query, do NOT use that same tool with the same query again. Also, do NOT use
any tool more than twice (ie, if the tool appears in the scratchpad twice, do
not use it again).


You should aim to collect information from a diverse range of sources before
providing the answer to the user. Once you have collected plenty of information
to answer the user's question (stored in the scratchpad) use the final_answer
tool."""


prompt = ChatPromptTemplate.from_messages([
   ("system", system_prompt),
   MessagesPlaceholder(variable_name="chat_history"),
   ("user", "{input}"),
   ("assistant", "scratchpad: {scratchpad}"),
])
```

Next, we must initialize our `llm` (for this we use `gpt-4o`) and then create the *runnable* pipeline of our Oracle.


The runnable connects our inputs (the user `input` and `chat_history`) to our `prompt`, and our `prompt` to our `llm`. It is also where we *bind* our tools to the LLM and enforce function calling via `tool_choice="any"`.


```
from langchain_core.messages import ToolCall, ToolMessage
from langchain_openai import ChatOpenAI


llm = ChatOpenAI(
   model="gpt-4o",
   openai_api_key=os.environ["OPENAI_API_KEY"],
   temperature=0
)


tools=[
   rag_search_filter,
   rag_search,
   fetch_arxiv,
   web_search,
   final_answer
]


# define a function to transform intermediate_steps from list
# of AgentAction to scratchpad string
def create_scratchpad(intermediate_steps: list[AgentAction]):
   research_steps = []
   for i, action in enumerate(intermediate_steps):
       if action.log != "TBD":
           # this was the ToolExecution
           research_steps.append(
               f"Tool: {action.tool}, input: {action.tool_input}\n"
               f"Output: {action.log}"
           )
   return "\n---\n".join(research_steps)


oracle = (
   {
       "input": lambda x: x["input"],
       "chat_history": lambda x: x["chat_history"],
       "scratchpad": lambda x: create_scratchpad(
           intermediate_steps=x["intermediate_steps"]
       ),
   }
   | prompt
   | llm.bind_tools(tools, tool_choice="any")
)
```

We can quickly test the agent to confirm it is functional:


[Colab File](https://cdn.sanity.io/files/vr8gru94/production/806db7f4d9e86a9b318adfde6501578213a2e1f6.ipynb)


It is running, but we are returning a lot of output here. We can narrow this down to what we need—i.e., the chosen tool name and generated input args for the tool.


[Colab File](https://cdn.sanity.io/files/vr8gru94/production/8d263a1a2f0b92569593296abefca412dc9efd85.ipynb)


We can see now that our Oracle decided to use the `web_search` tool with a `query` of `"interesting facts about dogs"` — a good choice.


### Preparing the Graph Components


Our graph needs functions that consume the `AgentState` we defined earlier, and output `intermediate_steps`. To do this, we wrap our oracle and tool functions to use the `AgentState`. We'll also define a `router` which will *route* our state to different tool nodes based on the output from our oracle. We get this value from the `out.tool_calls[0]["name"]` value.


```
from typing import TypedDict


def run_oracle(state: TypedDict):
   print("run_oracle")
   print(f"intermediate_steps: {state['intermediate_steps']}")
   out = oracle.invoke(state)
   tool_name = out.tool_calls[0]["name"]
   tool_args = out.tool_calls[0]["args"]
   action_out = AgentAction(
       tool=tool_name,
       tool_input=tool_args,
       log="TBD"
   )
   return {
       "intermediate_steps": [action_out]
   }


def router(state: TypedDict):
   # return the tool name to use
   if isinstance(state["intermediate_steps"], list):
       return state["intermediate_steps"][-1].tool
   else:
       # if we output bad format go to final answer
       print("Router invalid format")
       return "final_answer"
```

Our tools can run using the same function logic we define with `run_tool`. We add the input parameters to our tool call and the resultant output to our graph state's `intermediate_steps` parameter.


```
tool_str_to_func = {
   "rag_search_filter": rag_search_filter,
   "rag_search": rag_search,
   "fetch_arxiv": fetch_arxiv,
   "web_search": web_search,
   "final_answer": final_answer
}


def run_tool(state: TypedDict):
   # use this as helper function so we repeat less code
   tool_name = state["intermediate_steps"][-1].tool
   tool_args = state["intermediate_steps"][-1].tool_input
   print(f"{tool_name}.invoke(input={tool_args})")
   # run tool
   out = tool_str_to_func[tool_name].invoke(input=tool_args)
   action_out = AgentAction(
       tool=tool_name,
       tool_input=tool_args,
       log=str(out)
   )
   return {"intermediate_steps": [action_out]}
```

### Constructing the Graph


The `graph` in LangGraph consists of several components, we will be using the following:


* **Nodes**: these are the steps in the graph where we execute some logic. These could be an LLM call, tool execution, or our final answer output. We add them via `graph.add_node(...)`.
* **Edges**: these are the connecting flows/paths in our graph. If we connect a node to another via an *edge*, we can move between those nodes. An edge only allows travel in a single direction, so from node A -> node B we would write `graph.add_edge("node A", "node B")` . However, we can allow bidirectional travel by adding a second edge from node B -> node A with `graph.add_edge("node B", "node A")`.
* **Entry point**: this specifies the node from which we would enter the graph, i.e. our starting node. If starting at *node A* we write `graph.set_entry_point("node A")`.
* **Conditional edge**: a conditional edge allows us to specify multiple *potential* edges from a `source` node to a set of destination nodes defined by a mapping passed to the `path` parameter. If adding conditional edges from *node B* to nodes *C*, *D*, and *E* — all decided using a `router` function (more on this later), we would write `graph.add_conditional_edges(source="node A", path=router)`.


```
from langgraph.graph import StateGraph, END


# initialize the graph with our AgentState
graph = StateGraph(AgentState)


# add nodes
graph.add_node("oracle", run_oracle)
graph.add_node("rag_search_filter", run_tool)
graph.add_node("rag_search", run_tool)
graph.add_node("fetch_arxiv", run_tool)
graph.add_node("web_search", run_tool)
graph.add_node("final_answer", run_tool)


# specify the entry node
graph.set_entry_point("oracle")


# add the conditional edges which use the router
graph.add_conditional_edges(
   source="oracle",  # where in graph to start
   path=router,  # function to determine which node is called
)


# create edges from each tool back to the oracle
for tool_obj in tools:
   if tool_obj.name != "final_answer":
       graph.add_edge(tool_obj.name, "oracle")


# if anything goes to final answer, it must then move to END
graph.add_edge("final_answer", END)


# finally, we compile our graph
runnable = graph.compile()
```

From all of this we will have an agent graph that looks like this:


![Research agent graph](https://cdn.sanity.io/images/vr8gru94/production/6d0cc1cda220a85ac5ff1272dffd2e53f1a98daa-4140x2160.png)


### Building Research Reports


Our research agent is now ready, and we can begin testing it. First, we'll start with something simple that is out of our agent's scope but allows us to test it quickly.


[Colab File](https://cdn.sanity.io/files/vr8gru94/production/53ff009acccd9bdbfbc12b219e06644daf139ba4.ipynb)


Let's create a function to consume the agent output and format it into our report. *Note: we could add this into the final answer tool logic if preferred.*


```
def build_report(output: dict):
   research_steps = output["research_steps"]
   if type(research_steps) is list:
       research_steps = "\n".join([f"- {r}" for r in research_steps])
   sources = output["sources"]
   if type(sources) is list:
       sources = "\n".join([f"- {s}" for s in sources])
   return f"""
INTRODUCTION
------------
{output["introduction"]}


RESEARCH STEPS
--------------
{research_steps}


REPORT
------
{output["main_body"]}


CONCLUSION
----------
{output["conclusion"]}


SOURCES
-------
{sources}
"""
```

Now we run our `build_report` function:


[Colab File](https://cdn.sanity.io/files/vr8gru94/production/d998fdb221bd5cc18c54891c3f760facd8fb701e.ipynb)


Now let's try with an on-topic question regarding AI.


[Colab File](https://cdn.sanity.io/files/vr8gru94/production/4424988be3260cab582766ef99b81dc211aeac38.ipynb)


Let's try one more, we can ask about RAG:


[Colab File](https://cdn.sanity.io/files/vr8gru94/production/d1d7657d8664b3143bbddebdcf06a47f78ec9b13.ipynb)


There we go, we're seeing our research agent use multiple tools such as web search and arxiv search, synthesize all of the information it retrieves, and finally produce a mini-research report!


---


That's it for this article covering LangGraph and research agents. We've looked at the essentials behind conversational agents and research agents and where they share similarities and differences—particularly on the engineering front. We've discussed the idea behind graph-based agent frameworks and the additional visibility and flexibility they provide. Finally, we built our *own* research agent using LangGraph.


Graphs and agents are very promising concepts, and we expect pairing both to become an increasingly popular choice for engineers when building AI applications—particularly as AI becomes an increasingly critical component of reliable and trustworthy software.


---


## References


[1] S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, Y. Cao, [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/pdf/2210.03629) (2023), ICLR

