Intent-based API Discovery
How can AI agents implicitly discover what APIs to consume?
Discovering APIs is almost impossible. "API Discovery is a combination of pure luck and word-of-mouth," I wrote in 2015. Have things changed since then? Well, there have been a few attempts to improve API discovery. Without much success, I have to say, because I still can't find the APIs I need in an easy way. But perhaps now, with the advent of AI-orchestrated workflows, it will be possible to match API operations with the user's intent. Let's explore the possibilities.
This article is brought to you with the help of our supporter: Speakeasy.
Further expanding its best-in-class API tooling, Speakeasy now empowers teams with open standards. The platform simplifies OpenAPI Overlay adoption so you can focus on building. Click below to check out the playground.
The "search engine" approach to finding APIs doesn't work. First, you need to know what to search for. Then, the APIs indexed by the search engine need to have their metadata well-written. Finally, the number of indexed APIs can dramatically change the quality of the result: if there are too few APIs, the result is almost always the same for any search query. If there are too many APIs, it's hard to find the ones you really need among all the noise. Algorithms such as PageRank nailed this problem really well by creating a ranking based on one or more metrics from indexed items. But, even with such a ranking system in place, you still need to know what to search for. And what you search for is what APIs can do, not what you want to do.
The distinction between command-based and intent-based search is the key to solving API discovery. Command-based API search requires users to know the names or descriptions of the operations they want to discover. On the other hand, with intent-based API search, users can explain what they want and obtain the corresponding operations. There are two big differences to note. One, command-based interactions need to have a list of all the API operations—the commands—to perform. Two, during intent-based interactions, users don't have to know the names or descriptions of any operation. It's the difference between saying "Go fetch a glass, fill it with water, and bring it to me" and simply "I'm thirsty." Intent-based interactions completely abstract the functional details of the operations they execute.
Recommended by LinkedIn
The challenge is to make intent-based interactions work flawlessly. Doing that calls for understanding how to convert an intent into a series of chained commands. One way, which is what people are doing now, is to make machines learn by trial and error. We've been using AI systems with reinforcement learning to translate users' intent into a set of API operations. Solutions such as LangChain let AI systems match intent with external API operations. To those solutions, external API operations are called tools. The idea is that AI agents are able to find what tools are available and use them whenever necessary. However, finding which tool to use is done in a precarious way, often employing text matching. We're back to the challenge we had before with the "search engine" approach to finding APIs. Have too few available tools and you end up using the same ones over and over. Have too many tools and it becomes impossible to determine which one to use because there are too many search results. Something is clearly missing.
Another solution is the Model Context Protocol or, in short, MCP. Instead of offering a full solution like LangChain does, it focuses on the communication between the LLM and the available tools. It does that by adding a layer of context that the LLM uses to infer which tool, or tools, to use during runtime. Now, deciding which tool to use isn't just a text match as before. Now you have structured information that the LLM can use to identify which tool to use. That's one of the differences. The other differences include matching the prompt with the tool input parameters, aligning the choice of tool with the user's intent, and even going back to the chat history to see if there's a preferred tool. In short, there's a semantic alignment behind the choice of the best tool to use to fulfill a specific intent.
So, why do we need to have an MCP between the LLM and the APIs? In theory, it should be possible to offer the semantic alignment needed to discover tools directly using OpenAPI. It's possible, but it's not efficient. And, it's not efficient by design. Web APIs are meant to be flexible. You should be able to be flexible about your input parameters, the name of the resources you're exposing, and the response payload structure. All this flexibility makes it possible to have completely different APIs. The result is that whatever system that needs to interface with several APIs is not as efficient as it would be if it had to interface with just one API. Interfacing with one kind of API means the input parameters can be hard-coded into the client, the response payload is well known, and the names of the operations are always the same. And, that is exactly with MCP does. It offers a JSON-RPC API as the communication layer between the LLM and any external third-party APIs.
In summary, it's now possible to have efficient intent-based API discovery. MCP is the glue that was missing before. It connects the LLM to any API with an efficient, standardized JSON-RPC API. LLM is the bridge between end-users and the list of catalogued APIs. LLM is the piece that captures users' intent and translates it into the language of the machine to then connect to the most fitting API.
This post was originally published on the API Changelog newsletter as "Intent-based API Discovery."
Original Amazon Alumni (95-98) / Verifiable creator of API Chaining(R)
3wwhy would anyone want AI mining their databases through their apis??? They can PAY for that data like anyone else.