APIs - the ears, eyes and hands of the AI
Exploring the relationship between AI and APIs in application design
AI is often depicted by a brain symbol. It makes sense — AI systems try to mimic human intelligence.
However, we often forget that the human brain is so powerful because it does not exist in isolation - it does not exist in a bubble of pure thoughts and ideas. It is connected to the world around it from where it receives input and feedback, and it acts in the world based on its thoughts and plans. Human intelligence comprises the perception of input through the senses and output in the form of actions. The nervous system connects input and output to the brain, the eyes, ears, mouth, hands, and legs.
Just like the human counterpart, which an AI tries to mimic, a powerful AI needs to be connected to the world around it. It might receive input from its (digital) environment and might act in the (digital) world.
As we shall see, APIs play a very important role in connecting an AI-service to the world around it. In fact, connecting systems is what APIs are very good at. There are at least two basic patterns for the AI - API relation, and we will explore it in the following sections.
1. APIs are the eyes and ears of the AI
When we see or hear an interesting quote, we start to think about it. Our brain receives input via eyes and ears, gets triggered, and starts to process.
How does the AI receive its input from its environment? It needs a prompt. The API has the role of the “eyes and ears“, it delivers a prompt as input into the AI system for processing. Via the API, applications can send prompts, tasks, and input to the AI.
AI models are often packaged in the form of an API. For example, the AI model behind the well-known ChatGPT is available as an API, and many other AI models are also served as an API. The API allows us to trigger the AI and send our prompts as input. Via APIs developers can integrate AI into their applications, and make their applications “smart”.
If this sounds too abstract and philosophical, here is the code, for an API that triggers the AI…
Code for “APIs are the eyes and ears”
And here is the code for that - an API to process input for the AI.
The following API request (shown as a curl call) triggers the AI model with the input prompt “Say this is a test“.
The AI receives the prompt and computes its completion as output. The output corresponding to this API call responds with: “This is a test“.
But the analogy does not stop here.
2. APIs are the hands of the AI
When an AI model computes its response to a prompt, the output is typically some text. I like to compare the text output of the AI to an idea or a thought we might have in our head — an idea is quite abstract, and we have a lot of ideas. We humans evaluate our ideas and put some of them into action because we know that good ideas only have an impact when they are followed up by actions.
For us humans it means that the brain tells us to act, e.g., to do something with our hands in order to change our environment. E.g. if we feel warm, we move into the shade or switch on the air conditioning. We act. Ideas are nice, but they alone don’t change or improve the situation. Only by acting on the ideas we can improve the situation. Quite simple. What is the corresponding construct that would allow an AI to act?
How can an AI-service put ideas that were computed by an AI-model into action?
In short: By calling APIs.
APIs are the hands of an AI-service, that allow it to act.
APIs allow to trigger actions in the real or digital world, such as making a payment, sending a text message, or setting the thermostat to set the room temperature. If the AI-service can call APIs, it can break out of its bubble of text output, translate ideas into actions and allow the AI to act in its environment.
If this sounds too philosophical, here is the code…
Code for “APIs are the hands of the AI”
Here I will show you, how ChatGPT can call APIs. The mechanism is called ChatGPT Plugins and consists of (1) an API implementation, (2) the OpenAPI specification, and (3) a manifest file that links it all together and provides an overall descriptor of the API, so the AI can determine for which requests it shall call the API.
In the following, I show some sample code snippets of a ChatGPT Plugin that manages a TODO list (all source code is by OpenAI).
The API implementation:
The API specification (in OpenAPI 3):
The plugin manifest file binds implementation, spec, and descriptions together:
When the plugin is activated and a prompt mentions anything around managing the TODO list (adding, removing, or updating a TODO item), the API will be called with the appropriate parameters.
With this API the AI-service got the capability to act in the digital world by adding, removing, or updating TODO list items. As a result, the API can be considered the hand of the AI.
Safeguarding API usage
When an AI can call APIs that trigger actions in the real or digital world, we need to put some safeguards in place. And the API or an API management system is probably a good place, where such safeguarding can be defined and enforced. Safeguarding API usage in an AI context needs to be studied carefully - and likely exceeds any capabilities that are typically available today. It is an interesting field of research!
Summary: Two AI-API Patterns
APIs and AI technologies need to be applied synergistically to create powerful applications. In fact, we uncovered two patterns for using AI and APIs: (1) When an application integrates AI functionality, it typically calls an AI-API and passes input in the form of a prompt. (2) When an AI-service needs to get an action done, or just get some real-time data, it might call an API.
When building applications that leverage AI, these two patterns might be used - potentially even in combination, and these patterns can even be orchestrated. I will explore it in one of my next posts. Stay tuned.