Sign in or Join the community to continue

A trace is worth a thousand logs // David de la Iglesia Castro // Agents in Production 2025

Posted Aug 04, 2025 | Views 158

# Agents in Production

# Thousand Logs

# Mozilla.ai

Share

Speaker

David de la Iglesia Castro

AI Engineer @ Mozilla.ai

SUMMARY

As AI Agents move from simple chatbots to complex, multi-step autonomous systems, our methods for understanding their behavior must evolve. We'll explore how a single, structured trace—capturing the full chain of thought, tool calls, and model interactions—provides a complete narrative of an agent's execution. These traces can be leveraged to rapidly debug failures, build robust evaluation suites, and create golden datasets for regression testing, ultimately enabling you to build more reliable and predictable agents.

+ Read More

TRANSCRIPT

David de la Iglesia Castro [00:00:00]: So thank you everyone. My name is David and I am currently working as an AI engineer at Mozilla AI and the talk today is going to be about what's a trace and why it is worth a thousand locks. So to get things started I giving you a little bit of background of what I am currently doing at Mozilla AI. This is not to try to be pretentious or to just try to sell you why you should listen to me. It's just to explain where I am coming from when I am going to make the following statements of what's an agent and what's a trace. This is the context I am currently working on any agent. This is an open source project where you can use a single interface to interact with the many different agent frameworks that are currently out there.

David de la Iglesia Castro [00:00:59]: Think of it maybe as Light LLM with an inter single interface to multiple LLM providers. Kind of the same for agent frameworks. As part of this work I have been contributing minor contributions, some major contributions to some projects to Ahno, Google, ADK, LangChain, Llama, Index, OpenAI, small agents. Basically every framework that we use under the hood in any agent has some sort of bug or feature that we needed. It's very early days for agent frameworks overall so it's like an exciting time to work on this. There is still a lot of work to be done and also we have been contributing to some of like side projects around the ecosystem of agent frameworks like Open inference for generating traces, the MCP protocol, the agent to agent protocol Light LLM which we use to contact customer LLM providers. So this is what I have been doing and this is where I am coming from what I have used to make this definition. So starting off with what's an agent? I really like to simplify it to three very simple concepts that I feel like are what best define how an agent behave which are the instructions that you just give to the agent.

David de la Iglesia Castro [00:02:19]: This is also called like system prompt sometimes and this is what you telling the agent how you want it to behave. Then there are the tools that are available for the agent to perform actions beyond just outputting text and you expect the agent to understand the tool calling syntax. So you expect the agent to respond with a specific sentence whenever you want it to call a tool. And finally there is like the actual model, the LLM powering all this in my mind a very simple view of what's an agent. Are these three components stitched together. That's exactly what we are trying to do when we define the Any agent API. Every framework has a different API. And we try to create this extraction layer on top where we focus on this very simple concept.

David de la Iglesia Castro [00:03:12]: Here's a snippet of how you would create an agent with any agent. And you can see you define the model. So the underlying LLM, you give instructions and you give some tools and then you just run it with a query. This matches the three components that I said before. Under the hood, this is just calling whatever agent framework you pick. Sorry, this first argument where we are choosing tiny agent, this could be Google OpenAI or another framework. And under the hood I can tell you I have seen the code of every framework in some way or another. They implement this very simple loop.

David de la Iglesia Castro [00:03:54]: Some frameworks put more abstractions in between, some keep the loop very simple. But that's basically it. This is how agents are implemented in these frameworks. So you have an initial history of messages with just the instructions and the user prompt. Then you have an infinite loop, a while loop where you start calling the LLM with the current history and you check whether the LLM wants to execute tools or not. If it wants to execute tools, you execute them and you append the results to the history of messages and you call the LLM again. You just repeat this until the LLM says okay, I think I got the final answer. And you just.

David de la Iglesia Castro [00:04:39]: So you can just stop the loop because there are no more tools left to be called. This is how agents are implemented. All the framework that I listed before, I might be not familiar with some framework that does something completely different to this. But to me today, this is what some people might ask questions about. Some other things that some frameworks implement, like memory knowledge on multi agent systems. In my mind and what I have seen in the code, those things are basically just tools. If you think about it, the way they are implemented in the framework, they could be just exposed as external tools. Similar to how you give the agent a tool to search the websites.

David de la Iglesia Castro [00:05:28]: You could give it a tool to access some external memory, to access some external knowledge or RAH system, and you can even expose other agents just as tools. So this doesn't break the assumption of the previous loop. It's just like all these three things. You can try to be smart and implement it inside the framework, or you can just implement it as tools and give it to the agent to figure out when to use them. And I think this is the right way to go to just implement everything you can as a tool. So with this, that's an agent. So what's a trace Again all the different frameworks have a different opinion on what's a trace. What we do in any agent we have this core concept under the hood where a trace is just what OpenTelemetry calls TRACE.

David de la Iglesia Castro [00:06:22]: It's a sequence of JSON objects. Each individual object is called a span and it represents an operation. In this while loop we have span for calling LLMs and we record what were the input arguments, what were the output arguments, some other additional metadata, and we do the same for every operation that we make in that loop. To simplify it, here I am just showing two things. One to call an LLM and one to execute the two. We have this core structure under the hood that we construct as we are running through the loop. Every framework does the same thing but different implements. A different Python object We were trying to do in any agent is just to abstract that and construct a single open telemetry trace.

David de la Iglesia Castro [00:07:12]: Trying to follow the open telemetry semantic conventions for generative AI. This is a work in progress by the Open Telemetry organization and we try to do that so it's consistent across framework. A single LLM call looks like this for Google ADK, for OpenAI agents for launching. It all looks the same. That's one of the core things that we are trying to focus in the any agent to standardize that trace. When you run it, you are going to see in the console just a more beautiful representation of the trace. But the core thing is still the same JSON object. You just take that JSON object and you export it to different formats.

David de la Iglesia Castro [00:07:58]: In this case we're exporting it to the console and using rich to print it in a more visually pleasant way. But the color object is still that JSON span that I have shown before. There are plenty of open source and not open source platforms that allow you to visualize these traces in a very more fancy way. With many more details I'm listing here, some of them weights and biases. Have a trace visualization arise phonics land trace trace lengths. So there are a lot of things, but under the hood the concept is the same. Trace is an open telemetry object that contains information about different steps that the agent took to reach an answer. And you can render them with different UI components.

David de la Iglesia Castro [00:08:49]: They may look better, they might expose some more information. But the important thing is that under the hood in your Python code that is running the agent. These are just the JSON objects that I show a couple of slides before. I think this is really important to understand because in the end if that's an internal JSON object that you are exporting to these external services. Why not just return that object to the user so you can manipulate and inspect it also in your Python code? This is another thing that we are trying to do. Whenever you run an agent, in any agent, we return the actual open telemetry trace wrapped in a pydantic object called Agent trace. You can manipulate and inspect the same things that you are looking into these fancy UIs where you see the trace represented as a tree and you can inspect the things. I think this is really powerful because it opens up the possibility to run assertions, to run wherever you want in Python in the same context that you're executing your agent without needing to interact with external services.

David de la Iglesia Castro [00:10:10]: So I think the platforms and external services are great for visualizing traces, but I think it's also very powerful to have access to that same underlying data in any agent in the Python code. Sorry. So that's what any agent does, returns the same trace that you're looking at. This fancy UIs. I think that's it for the lightning talk. Yeah, don't have much time.

Adam Becker [00:10:43]: David, thank you very much. We do have a couple of questions if you can stick around for that one. Ravi is asking, have you considered traces being part of context?

David de la Iglesia Castro [00:10:57]: No, I do not. I think that traces just register operations that run in that loop. I think it is possible that you could alter the context to inject some information from the trace, but I see those two things as two separate things. I just see the trace as a view of what the steps the agent is taking. We do provide some functionality in any agent in the form of callbacks where you can actually try to fetch information from the trace to inject it into the context. But that's more advanced. I like to think of those things as two separate concepts.

Adam Becker [00:11:43]: Another one from Bruce Richards. What is the rationale behind supporting various third party agent frameworks instead of just building one or choosing one to build on?

David de la Iglesia Castro [00:11:53]: Yeah, so when we started, what we wanted to do is just to evaluate different agent frameworks. That was the original idea. The problem that we found is that they were using way too many different APIs and ways to define and run agents to do the same thing. So we built this abstraction layer on top of just to be able to compare them. Then eventually we just write our own framework. This tiny agent thing is basically just the while loop that I show implemented. I think it's still valuable to be able to have a single way to define an agent and try to run it in different frameworks to understand what each of the frameworks does differently. Without that, it's hard to spot because you have different codes.

David de la Iglesia Castro [00:12:45]: But having this single interface allows you to realize, for example, that small agents does actually a lot of things to create a very elaborated system prompt. Even if the definition of the agent is the same, it allows you to surface these things that the frameworks are doing under the hood and understand the difference better.

Adam Becker [00:13:07]: David, one more question from me. You mentioned that being able to ingest those traces back in a pythonic way and exposing that to the developers might be very interesting. What do you have in mind? What are the directions that people can take this in?

David de la Iglesia Castro [00:13:22]: I think in the final slide what I show is the agent run returns a trace. I really like this idea that with that you can write tests for agents. You run simple assertions for Python workflows Using this trace object that you get returned is super easy to just make an assertion that this tool was used. These arguments were the correct ones. I really like the concept of just building test on top of this agent trace. That's personally one of the things I like. But we are exploring other things.

Adam Becker [00:14:02]: Yeah. David, what's a good way to connect with you and to follow this journey of yours? Are you on Twitter or on LinkedIn? Where can we send people?

David de la Iglesia Castro [00:14:11]: I can send you the LinkedIn and also you can find me at the AnyAgent GitHub project. Create issues, try the project and I am constantly. I work on that project so I keep track of the issues and stuff.

Adam Becker [00:14:27]: Awesome, David, thank you very much. I'll drop the links below and it was wonderful having you.

David de la Iglesia Castro [00:14:36]: Thank you.

Adam Becker [00:14:37]: Best of luck.

+ Read More

Sign in or Join the community

Comments (0)

Popular

Watch More

Planning is the New Search // Fabian Jakobi // Agents in Production

Posted Nov 26, 2024 | Views 1.8K

# streamline workflows

# Agentic

# memoryrank

The Creative Singularity is Here // Pietro Gagliano // Agents in Production

Posted Nov 26, 2024 | Views 1.1K

# Singularity

# Transitional

# Forms

Knowledge as a Service // Prashanth Chandrasekar // Agents in Production

Posted Nov 15, 2024 | Views 933