Many of you are not realizing it, but behind the 'simple' AI chats you are conversing with, extremely important changes have occurred in recent months. So important that they do not make you realize that what you are interacting with, most likely, is not a simple generative AI but an AI Agent.

But before you get scared of tackling such an advanced topic, let me say that this article requires some background, but there is no reason to be discouraged. AI is not an all-at-once magic; it is a resource that invites us to grow and develop new skills. The impacts it can have on our personal and professional lives are too significant to ignore.
I promise to be as straightforward as possible and take you step by step on this journey. You don't have to be an expert to start discovering how AI can transform your world, just take a little time to read this article, maybe one level a day.
Ready to explore?
Level 0 - AI Chat
A 'simple' LLM, is the linguistic model at the base of modern Generative AI. It is a system that, in a nutshell:
It is trained on vast amounts of data
It undergoes a phase of fine tuning, validation, optimization
Once ready it can:
Understand text, images, words that we provide in a prompt
To reason, through inference about our requests,
Produce content (Text, Audio, Images, Video, Computer Code)
We normally interact with him through a chat interface on the now classic ChatGPT. You will know many things about this, and I will not discuss it further here.

In practice
A good way to use these assistants is to give them a role or specialization in the initial prompt to help us perform better. We can start a conversation by asking the chat to assume the role of a typical writer, editor, or reader and then interact with it while it plays that person's role. With a lot of Switch Time (i.e. time spent moving between applications) why would we do things like
new conversation: "You are a content writer for ACME Company, with this tone of voice..., this target readership... etc... Produce a text on [topic]"
Take the output and start a new conversation with a new role: "You are an expert editor, capable of making a text fluid and fluent, correcting grammatical errors, syntax errors ... "
New conversation with final test "You are a reader of the ACME site, you like reading things of type X with style Y, what do you think of this article: [article]" And at this point possibly start again from scratch if the 'typical reader' does not appreciate the content.
Lots of manual work, risk of error, little automation.
Level 1 - Workflow with AI
Among the various opportunities that emerged with LLMs, one of the first was to insert them into existing workflows, to exploit their potential and automate existing or new processes.
In essence, by putting together a bit of computer code, you can obtain situations in which all the steps between different phases of a dialogue with AI or other software tools can be automated.
In practice
In this example, given a link to an article, you can automate the creation of posts on Linkedin, Facebook, and X through three separate conversations, each specialized for writing on different social media. Thus avoiding repeating the initial prompts each time.
In this case the AI model is seen as a software existing within a process which, in this case, has the ability to generate text (which is supposed to be good the first time... not a trivial topic at all).
The process is clear, we decided it, we can repeat it 1000 times and it will always be the same, what will change will be the input and the output. And we decided to use a platform (in this case make.com ) that orchestrates everything.
An advantage? It does not necessarily have to start from a user request but could start automatically at intervals of time, or following previous actions (e.g. I receive an email, I update a row in a spreadsheet, etc.)
But...
when we fell in love with ChatGPT and its peers because they had acquired the ability to speak, we perhaps did not realize that it was also learning to do. It was starting to become not only a generator of dialogues and content but an entity capable of performing actions; becoming an AI Agent.
Level 2 - AI Agents
I'll start by saying that if you look for definitions on the internet, there are plenty of them: there is no universal taxonomy yet. Some see them as autonomous entities capable of acting in the digital space with enormous powers. Others associate them with simple IT workflows that have existed for decades but have a little more reasoning capacity. If you want an 'almost wonderful' explanation of what they are, Anthropic recently gave it
Having said that, I will try to give my non-technical definition, which I may evolve over time:
An AI Agent is a system based on generative artificial intelligence, capable of using tools and real-time data, thinking, and interacting autonomously with the environment. It can do this using traditional software or other types of AI to complete assigned tasks.

And here comes the interesting part: Besides responding to our request (the prompt), an AI Agent can use external tools - software or other AI - to interact and act in a completely new way.
First, a Level 2 AI Agent goes beyond the traditional view of LLM models, which considered them only as text generators within externally controlled workflows.
Level 2 Agents has buit-in tools like web search, internal memory access, code interpreters or image generators.
Different AI players have responded with different approaches that I will try to tell you below, very briefly and certainly skipping a lot of considerations. But I would like to be very practical and increase your awareness of what agents can do in parallel with real examples that you can do too. Using also custom tools that you can provide to the agent to do almost whatever you like in the software realm.
Level 3 - AI agents equipped with custom tools
Since OpenAI introduced customizable GPTs, we can plug these assistants into any kind of software and let them do the work for us. LLMs not only understand what we want, but they already know how to interact with external software via APIs (a language that allows different programs to talk to each other). So they can translate our intent into actions by making calls to these programs using their language.
In practice
Look at this example: I set up a GPT to manage my calendars. I ask it when I have free slots; it accesses 3 different calendars (already connected), searches through the commitments, and answers me with incredible clarity. No clicks, no effort. And, of course, it can create new ones!

By default, OpenAI GPTs have three external tools: Web Search, Internal Storage Access, Image Generation, Code Generator. But you can add as many as you want. You can go wild with a simple premium license of ChatGPT to build your own starting from here , or explore a huge catalog of ready-made agents. to try for free. But each manufacturer has its own strategies.
Memory Every conversation with an AI Agent is available to the agent (i.e. it knows what you asked and what it answered). I would call this Short-Term Memory . In addition to this, each Agent has the possibility, through various tools, to have a Long-Term Memory : that is, a set of documents that you can provide to it and to which it will refer when necessary. By providing an agent with access to external software, be it a web search or, as in the example above, a calendar system, the agent also has access to a Real-Time Memory.
But we are in a situation where we have a USER who makes a request, an LLM who processes it, decides whether to use one of the available tools, provides a response.

Essentially each agent has instructions (which give it a role and describe how to behave), access to different types of memory, and access to software tools configured within it.
Level 4 - AI agents capable of executing code
An LLM is able to generate code, that is, to CREATE ON THE FLY tools needed to solve the requested task. On this very broad topic I am running a column on 01Net called AI Cookbook. That I invite you to follow if you are passionate about coding (and speaking italian).
But it's one thing to write code, and another to execute it. Well, some agents also have the ability to execute, in special work environments, the code they just generated. And that's a big evolutionary leap!

In practice
Anthropic has been following a different path of its own since it provided the ability, with Claude Sonnet 3.5, to create artifacts. That is, computer code that runs inside a workspace available in conversation.
Let's take an elementary example: suppose we ask an LLM to prove that the sum of the three angles of a triangle is always equal to 180°. A traditional LLM would answer the question with a lot of more or less simple text.
If I ask Claude... well, he will understand that the best way to answer us is to create a new software that helps me understand it visually.

But what do we do with a computer code if we are not technicians? Claude is also able to RUN it as you can see by clicking here.
(Claude is not the only one to provide this possibility but, again, I don't want to go into too much detail to avoid very, very technical ramblings.)
Level 5 - Agentic AI
Yes, only the final C has changed, but this one ties everything together. Combining all the capabilities seen so far
So far we have understood that an AI agent is able to generate text or computer code, use tools that we give it and also execute code already written or just generated. Autonomously, based on how it believes our goal should be achieved.
The definition given above to an AI Agent is narrow since these systems put together all the capabilities seen so far. So, try to see all these capabilities as if they were different notes: A. Understanding requests (text, images, videos, sounds) B. Accessing memory (Short, Long, Real Time) C. Producing text (or images, videos, sounds) D. Producing structured reasoning, which can become processes E. Calling other software (Traditional, or other AI agents) F. Producing computer code G. Executing computer code
Are there 7? Good. How many symphonies have been composed to date with only 7 notes? How many solutions can be produced by putting together these 7 capabilities of an Agentic AI system?
So what is an Agentic AI system? Here's another definition (always personal)
Advanced collaboration systems between AI agents, each equipped with software tools, interaction, reasoning and memory capabilities. These systems are able to analyze an assigned task, autonomously decide which tools to use, define the most effective process and coordinate the available resources to achieve the set goal.

Take a moment for yourself and begin to imagine.
Autonomous interaction and workflow design
Among the capabilities seen in the previous levels, by putting together the same notes, an AI agent can solve specific tasks and design workflows autonomously. This means that the agent is able to combine tools, different software and even involve other AI agents in the decision-making or operational process.
Imagine an AI agent that, to respond to a request, analyzes the environment in which it finds itself :
Identify what tools, memories or software you have available.
Consider whether there are other agents already configured, each with their own role and skill set.
It decides what steps to take, involving internal or external resources, and, if necessary, writes computer code to create a new tool (which in turn could be an AI agent).
It brings everything together in an interactive, task-specific workflow.
This approach eliminates the need for continuous manual interventions, transforming the agent into a sort of digital orchestrator : a system capable of creating dynamic connections and optimizing complex processes, even in real time. Which will then provide a very sophisticated response after executing a workflow created for the occasion.
In practice
Continuing the previous example, suppose we ask an AI agent:
“Create a LinkedIn post to launch my article Why an AI project fails (and how to avoid it).’
The agent, leveraging a system like Autogen , generates a workflow on the fly , orchestrating different entities with specific roles:
Content Creator : The main agent starts by analyzing the article provided and identifies the key points to communicate. This agent is responsible for generating a first draft of the post, focusing on what to say.
Editor Agent : The draft is passed to a second agent, configured with the editor role. This assistant refines the text, correcting any syntax errors and improving the tone to make it suitable for LinkedIn.
Model Customer : Finally, the optimized text is submitted to a third agent, which simulates the feedback of a typical customer or ideal reader for LinkedIn, providing ratings on the effectiveness and clarity of the message.
Final Output : Once the steps are completed, the workflow concludes with the presentation of the post refined and ready for publication.
Thanks to this structure, the system combines multiple agents, each with a specific expertise, deciding the workflow, to produce a high quality result without the need for human intervention during the process.
I find this to be one of the most powerful, fast, and impactful innovations ever made! But, like any great innovation, it requires great control. Why? To avoid the famous Paperclip Maximizer problem, conceived by philosopher and scientist Nick Bostrom.
Here’s what it’s all about: Imagine designing an AI with a very simple task, say producing paperclips. If not managed properly, this AI could pursue its goal with such dedication that it would consume all available resources, even those vital to us, just to maximize the number of paperclips produced. It could turn the entire planet – or even the universe – into a giant paperclip factory.
Sounds crazy, right? But it’s an extreme example of what can happen when a system that is too powerful pursues a goal without considering the context or ethical limits. Want to try to understand what I’m talking about? I’ve created a simulator, strictly using AI tools, that will let you see the Paperclip Maximizer in action. Check it out!
So what...
The post would continue... I have already written much more, but I decided to break it into two parts because I realize that, already here, there is a lot to think about.
Agentic AI applications are moving at the speed of light and many manufacturers are focused on making them happen.
They are not the panacea for all problems, in fact they do not represent the default option for every interaction with LLMs but, indeed, they are an important milestone on the path toward a more General AI...
In the next part I will talk about reasoning skills, how to coordinate these types of tools in an organization that also includes us humans, how to deal with the concept of agency to maintain control.
I suggest you, if you haven't already, also check out my post on AI roles to better prepare yourself.
See you in part 2. And as always, enjoy AI responsibly !
Max
Comentarios