''' LangChain Founder: The Next Wave of AI Will Be Dominated by AI Agents

06/26 2024 523

Recently, Harrison Chase, the founder of the AI startup LangChain, elaborated on a series of views and opinions regarding the current status, challenges, and prospects of agents in a podcast interview hosted by Sonya Huang and Pat Grady from Sequoia Capital.

Harrison stated that agents refer to enabling large language models to make decisions within the control flow of applications. He predicted that the next wave of AI may be dominated by agents, and the entire industry is shifting from an auxiliary mode to an agent mode. Although the first generation of autonomous agent architectures such as AutoGPT garnered significant attention, they still lack practicality and are insufficient to meet enterprise-level demands.

Addressing this issue, Harrison believes that the coexistence of general cognitive architectures (such as planning and reflection) and customized domain-specific cognitive architectures is the solution. Cognitive architectures are like the system architectures of large language model applications. These cognitive architectures are expected to help address early agents' difficulties in planning, task completion, and judging task completion status. Among them, planning and reflection are considered the more popular general cognitive architectures.

Looking ahead, Harrison believes that as agents grow and develop, they can automate a significant amount of repetitive work, allowing humans to focus on more creative or efficient tasks. Currently, customer support and programming are relatively mature application areas for agents.

Here is the content directory for this episode:

01 What are agents

02 Cognitive architectures in agents

03 Imagining the future of agent development

04 Real-world issues still to be resolved

/ 01 / What are agents?

Sonya Huang: Agents are a topic that everyone is very interested in right now. Since the start of the large language model (LLMs) wave, you have been at the core of agent construction. To help everyone better understand, what are agents?

Harrison Chase: I think defining agents can be a bit tricky, and everyone may have a different definition, especially as this is still the early stage of large language models and agent-related things.

In my view, an agent is to enable a large language model to make decisions within the control flow of an application. If you have a more traditional retrieval-augmented generation chain (RAG chain), the steps are typically predetermined. First, you might generate a search query, then retrieve some documents, generate an answer, and finally return it to the user. It's a very fixed sequence of events.

When I think about something starting to exhibit agent-like characteristics, it means you put the large language model in a central position and let it decide specifically what to do. So maybe sometimes it will look up a search query, other times it might not and just respond directly to the user. Maybe it will look up a search query, get results, then look up another search query, look up two search queries, and then respond. So you have the large language model deciding the control flow.

I think there are also some other popular terms related to this, such as tool use, which is often associated with agents, and I think that makes sense because when you have a large language model deciding what to do, it primarily decides through tool use. So I think these two things are complementary. Memory is also an aspect that is often associated with agents, and I think that makes sense too, because when the large language model decides what to do, it needs to remember what it did before. So tool use and memory are loosely related. But for me, when I think about agents, it's about letting the large language model decide the control flow of the application.

Pat Grady: Harrison, a lot of what you just said is about decision-making. And I've always thought of agents as being about action. Are these two inextricably linked? Is agent behavior more about decision-making or action? What's your take on this?

Harrison Chase: I think they are complementary. Many of the things we see agents doing are deciding what action to take. And the biggest challenge in taking action is deciding what the right action is. So I do think that solving one problem naturally leads to another. After you decide on the action to take, there is usually a system around the large language model that goes to execute that action and feeds back to the agent. So, I think they are indeed complementary.

Sonya Huang: So, Harrison, it seems the main difference between agents and chains is that the large language model itself decides which steps to take and which operations to perform next, rather than being pre-coded like a chain. Is this a fair way to distinguish agents?

Harrison Chase: Yes, I think that's correct. And there are also varying degrees of gradation. For example, as an extreme example, you can have a router that essentially decides which path to take. So you might only have a classification step in your chain. So the LLM is still deciding what to do, but it's a very simple way of making decisions. The other extreme is a fully autonomous agent. There is a spectrum of degrees in between. So I would say that's largely correct, though like most things in the LLM space, there are many nuances and gray areas.

Sonya Huang: I see. So there's a spectrum from control to fully autonomous decision-making and logic, and everything else falls on the spectrum of agents. Very interesting. What role do you think LangChain plays in the agent ecosystem?

Harrison Chase: I think our current focus is on enabling people to easily create things that sit in the middle of this spectrum. For various reasons, we see this as the best place to build agents right now. So we've seen a lot of interest in and rapid prototyping of more fully autonomous things, which have many benefits and are actually relatively simple to build, but we've also seen them often go off-track, and people want something more constrained but more flexible and powerful than a chain. Therefore, much of our recent focus has been on being this coordination layer, supporting the creation of these agents, especially those that sit between chains and autonomous agents. I can explain in detail what we've specifically done in this regard, but overall, we want to be part of this coordination framework.

Sonya Huang: I see. So there's a chain, there's an autonomous agent, and in the middle is a spectrum. And your sweet spot is the middle part, enabling people to create agents that are in that middle state.

Harrison Chase: Obviously, this has changed over time. It's interesting to look back at the evolution of LangChain. When LangChain first launched, it was essentially a combination of chains, and then we had a class, this agent execution class, which was essentially this autonomous agent thing. We started adding more control into this class, and ultimately we realized people wanted more flexibility and control than we were providing through that class.

So, recently we've put a lot of effort into LangGraph, which is an extension of LangChain specifically focused on customizable agents that sit in the middle. Therefore, our focus has evolved as the field has developed.

Sonya Huang: I'd like to ask a question to set the stage for the discussion. One of our core points is that the next wave of AI will be dominated by agents, and the entire industry is transitioning from copilots to agents. Do you agree with this view? Why?

Harrison Chase: Overall, I agree with your point. What excites me about this is that copilots still rely on human involvement in the entire loop. Therefore, there is a certain upper limit to the amount of work that can be done by external systems. In this sense, copilots have some limitations in functionality.

I do think there are some very interesting questions to consider in terms of the right user experience and human-agent interaction patterns, but I think these interaction patterns should be more like the agent performing an action and then occasionally confirming with you, rather than being in a constant loop like a copilot. I just think that having agents accomplish more tasks can lead to greater effectiveness and leverage, which may seem contradictory because the more you let an agent operate autonomously, the greater the risk of it making mistakes or going off-track. So I think finding the right balance will be very interesting.

Sonya Huang: I remember around March 2023, there were some autonomous agents that really captured people's imagination. Baby AGI and other GPT projects sparked a lot of interest on Twitter. However, I feel that the first generation of agent architectures did not fully meet people's expectations. Why do you think that is? Where do you think we are in the agent hype cycle?

Harrison Chase: Yes, I think we can start by discussing the agent hype cycle. I think AutoGPT was definitely a starting point. It was probably one of the most popular GitHub projects ever, so it represented a peak in the hype cycle. I would say that peak started in the spring of 2023 and lasted roughly through the summer of 2023. Then from late summer to early 2024, I personally felt it was a bit of a trough or downward trend. However, starting in 2024, we're gradually seeing more realistic use cases come online. For example, we're working with Elastic at LangChain, and they have an Elastic Assistant and agent in production. We've also seen Klarna's customer support bot go live and gain a lot of attention. Companies like Devin and Sierra are also starting to emerge in the agent space.

Regarding why AutoGPT-style architectures didn't succeed, I think it was very general and unconstrained, which made it very exciting and sparked people's imagination. However, from a practical standpoint, for those looking to provide immediate business value through automation, they actually want agents to perform very specific tasks. They also want agents to follow more rules or complete tasks in the way they expect. Therefore, in practice, we've seen these agents adopt more of what we call customized cognitive architectures, where the actions the agent typically performs are predefined. Of course, there is some flexibility here, otherwise people could just write code to achieve these functions. However, this is a very targeted way of thinking, which is the pattern followed by most agents and assistants we see today. It requires more engineering work, more trial and error to see what works and what doesn't, which is actually harder to achieve. That's why such agents didn't exist a year ago.

/ 02 / Cognitive architectures in agents

Sonya Huang: Can you explain what cognitive architectures are? Is there a good mental framework for us to understand them?

Harrison Chase: I think cognitive architectures are like the system architectures of large language model applications. If you're building a location-finding application that uses some large language models. What do you plan to use these large language models for? Is it just to generate the final answer? Or is your architecture more like a loop? These are all different variations of cognitive architectures, which simply describe the flow of information, data, and large language model invocations from user input to user output in a fancy way.

We are increasingly seeing, especially as people try to put agents into production, that this information flow is customized for their application and domain. They might start by wanting to do some specific checks, followed by three specific steps. Then, each step might include an option to loop back or have two independent sub-steps. So if you imagine it as a graph being drawn, we'll see more and more customized graphs as people try to constrain and guide agents to perform actions along their applications.

The reason I call it a cognitive architecture is that the power of large language models lies in reasoning and thinking about what to do. In other words, I might have a cognitive mental model of how to complete a task. I'm just encoding that mental model into some kind of software system, some kind of architecture.

Pat Grady: Do you think this is the direction the world is heading? Because I hear two things from you: one is very customized, and the other is quite primitive, with a lot of things hardcoded. Do you think this is the way forward, or is it a stopgap measure, and at some point, there will be more elegant architectures or a set of default reference architectures?

Harrison Chase: That's a great question, and one that I've spent a lot of time thinking about. You could argue at one extreme that if models become very good and reliable at planning, then the best architecture you could have is a loop, calling the LLM, deciding what to do, executing the action, and then looping back. All these constraints about how I want the model to act, I just put them in the prompt, and the model will explicitly follow.

I do think models will get better at planning and reasoning. But I don't think they'll be the best way to handle everything. There are a few reasons:

One is efficiency. If you know you always want to do step A after step B, you can just arrange them sequentially. The other is reliability. These things are not deterministic, especially in enterprise environments, where you might want more assurance that a simple but general architecture like a simple cognitive architecture running in a loop. What we see in production is customized and complex cognitive architectures. I think there's a different direction, which is complex but general cognitive architectures, like very complex planning steps and reflection loops or thought trees. I think this type of architecture may gradually disappear over time, because I think a lot of general planning and reflection will be trained into the models. But there will still be a lot of non-general planning, reflection, and control loops that will never be in the models. So I'm optimistic about both directions.

Sonya Huang: I think I can understand it this way: large language models do very general agent reasoning, but you need domain-specific reasoning. That can't be built into a general model.

Harrison Chase: Absolutely correct. I think one way to customize cognitive architectures is to shift planning and responsibility from the large language model to humans. Some planning will increasingly shift to the model and prompts. But I think many tasks are actually quite complex in terms of some planning, so it will take some time before we have systems that can reliably handle these tasks.

Sonya Huang: You shared a Bezos quote with me before, focusing on making your beer taste better. He was referring to the early 20th century, when many breweries tried to generate their own electricity instead of focusing on brewing. Many companies today are thinking about similar issues: do you think controlling your own cognitive architectures can really make your "beer" taste better, or do you think you should hand over control to the model and focus on building UI and products?

Harrison Chase: I think it might depend on the type of cognitive architecture you're building. Going back to the earlier discussion, if you're building a general cognitive architecture, I don't think it will make your "beer" taste better. I think model providers will work on those general plans. But if your cognitive architecture is essentially codifying the way your support team thinks, your internal business processes, or the best way for you to develop a specific type of code or application, then it will definitely make your "beer" taste better. Especially as we move towards these applications actually getting work done, those customized business logic or mental models are very important. Of course, user experience (UX) and user interface (UI) as well as distribution are also very important, but I would make a distinction between general and customized.

/ 03 / Imagining the future of agent development

Pat Grady: Harrison, before we dive into how people are building these things, can we take a high-level look first? Our founder Don Valentine is known for asking "so what?" So my question is, assuming autonomous agents were perfectly functioning, what would that mean for the world? How would life be different if that were the case?

Harrison Chase: At a high level, it means we humans would focus on different things. I think many industries currently have a lot of repetitive, mechanical work. The idea of agents is to automate those tasks, allowing us to think at a higher level about what these agents should do and leverage their output for more creative or efficient work. You can imagine an entrepreneur who can outsource many functions that would otherwise require hiring people, like marketing and sales, so he can focus on strategic thinking and product development. Overall, this would allow us to focus on what we want to do and are good at, while automating unnecessary work.

Pat Grady: Are you seeing any interesting examples now? Like things that are already running in production.

Harrison Chase: I think there are two main categories of agents that are gradually gaining more attention: one is customer support, and the other is programming. I think customer support is a great example. Programming is also interesting because some programming work is very creative and requires a lot of product thinking and positioning. But there's also some programming work that constrains people's creativity. If my mom has an idea for a website but doesn't know how to code, having an agent that can do the work would allow her to focus on the concept of the website and automate the rest. So customer support has already had a significant impact, while programming, though not yet mature, has also piqued the interest of many.

Pat Grady: Programming is indeed interesting because it fills us with optimism about AI. It can shorten the distance from idea to execution or from dream to reality. You might have a very creative idea but don't have the tools to realize it, and AI seems perfectly suited to solve this problem. Dylan from Figma also talked about this.

Harrison Chase: Yes, this goes back to the idea of automating things you don't necessarily know how to do or don't want to do but have to. I've been thinking a lot about what it means to be a builder in the era of generative AI and agents. Today's software builders are typically either engineers

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.