Hundreds of Users Love Our Data Analyst AI Agent // Ioannes and Donné
Data Science Leader focuses on delivering data science products that maximize positive impact on all stakeholders. Empower and assemble teams to start with the customer first and solve pain points responsibly and ethically. Passionate also about presenting our AI-by-Design framework for developing human-centered and resilient AI. Public speaker and co-founder of www.AI-by-design.com.
Currently, also helping the amazing founding team of TheyDo on starting their journey on AI.
Personally, I am interested in discussing and learning about (A)Intelligence, the human mind, system design thinking, and what it even means to be human.
Focused in building AI powered products that give companies the tools and expertise needed to harness to power of AI in their respective fields.
At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.
In our journey from concept to production, we focused on delivering consistent behaviors to build user trust. This talk will cover the design and refinement of AI systems using agentic frameworks and deterministic components, emphasizing the integration of continuous learning and human oversight.
Demetrios Brinkmann [00:00:20]: We're back. Look at that. Hopefully that gave you enough time to fill out the questionnaire we have on AI agents in production. Now I have the great pleasure of introducing our next guest, one of which is amazing, the other I can't say the same for, because if you're ever in a place and you want to eat and you're around this speaker, I highly recommend you protect your food. I was here last year doing the same conference and I got off the stage only to find that he had eaten my sandwich. So let me welcome to the stage the thief himself, Fionnizing. Here I am, here I am.
Ioannis Zempekakis [00:01:02]: That's yours, right? Hi. Hi.
Demetrios Brinkmann [00:01:04]: Hello.
Ioannis Zempekakis [00:01:04]: Hello.
Demetrios Brinkmann [00:01:05]: Thanks. Get out of here.
Ioannis Zempekakis [00:01:07]: I didn't know, but actually he's Greek, so I like his food. So. Well, what we're going to talk now. In the previous discussion, Yuro and Paul talked about OLX Magic and Tocan and I would like to focus on bringing Tokan Agent Data Analyst agent into production and our journey through this. So I will start with where's the presentation? Oh, perfect, Right. So first of all, I would like to start with introducing Don. Hi, Donne. Nice.
Ioannis Zempekakis [00:01:51]: All right.
Donné Stevenson [00:01:51]: Hi.
Ioannis Zempekakis [00:01:53]: So let's start presentation, Right, okay, good. So from agents to product, bringing Dokan Data Analyst to hundreds of users, who we are. Well, I'm Ioannis, you just met Donne. She will take care of the presentation a little bit later. So I will invite her again. First of all, what is Tokan Data Analyst? I think Juro and Paul described it before, but very, very plain. It gives you the opportunity to ask a question in English and got back an answer. In essence, it does all the work for you.
Ioannis Zempekakis [00:02:37]: It translates your question into SQL, you get back the answer and then you can even visualize and see the results. So that's what we are trying to go live the last couple of months and what we want now to share with you is our journey before we dive in. Agents are really magical, right? So I think everybody has this kind of moment where it says, wow, this is very good. This is what we have also experienced. You ask a question, it understands the question, it find the relevant data source, it can generate and execute SQL, it recovers if it fails. And that's one of the advantages of using an agent and it gives you the results and then you can even get visualization. So it's very, very nice when it works because truth be told, it never works out of the box. And you can go through many moments like donate here, where it asks the agent to stop and the Agent simply doesn't do anything.
Ioannis Zempekakis [00:03:51]: He just continues again and again, and you stop again, you ask again, and then you get the same answer. So another way is that, well, instead of doing something, it starts giving the same answer 59 or 58 times. So in essence, it can get very, very, very frustrating. So what we would like to share with you and why I want Donne to give this, you know, explain what we have learned is in essence how you get from a product which is keep failing to a product which is actually having an adoption, as you can see here on the screen. And there are actually four things where we have learned and what we would like to share with you. So first of all, we want to make sure that when you are solving for an agent in production, you need to be able to solve behavioral problems using deterministic flows first. Second, you need to make sure that you leverage your experts to set up and monitor your agent. Third, you need to have a system that is resilient enough to handle any input.
Ioannis Zempekakis [00:05:05]: And fourth, you need to make sure that you simplify the work for the agent by optimizing some of the tools. Now, these are four lessons that we have learned working with the agents the last six months with this particular agent in production. And I would like to invite again, Donne to walk us through for each of these four lessons and help us understand how we actually implement this. Donnelly.
Donné Stevenson [00:05:34]: Hello.
Ioannis Zempekakis [00:05:35]: Hey. Hi.
Donné Stevenson [00:05:37]: Okay. Am I on?
Ioannis Zempekakis [00:05:39]: Yes, we can hear you.
Donné Stevenson [00:05:42]: Cool. Great. So I can see the presentation now. Great. So thanks, Jonas. Hi, everyone. I'm Donae and I'm one of the machine learning engineers who worked on developing the Token data analyst along with some of our development partners in the process portfolio. So, like Ifood and Glovo and olm.
Donné Stevenson [00:06:06]: Yeah. So these four lessons that Yanis has kind of laid out, that's kind of the framework for the talk. And for each of these lessons, I'm going to go through the challenges we were facing and the solutions we put in place to deal with them and how that kind of ties back to an overall lesson for an agentic framework and more specifically for developing these more verticalized agents. So, yeah, let's just go to the first one. So to solve behavioral problems, consider deterministic workflows. Agents are great, right? So especially large language models. They can generate quite a lot of text, but it can be a little bit unpredictable. And when we first started building the agent, we had this sort of idea in our head that when users were asking questions, they would ask a question and the agent would be able to determine can it answer the question.
Donné Stevenson [00:06:53]: And this is specifically for when you're thinking about like SQL, which there's a lot of context. Context matters when you're writing SQL queries because you need to know the rules of the business and you need to understand how the different data is used with that and ideally the agent should know and be able to say, oh, I don't actually know enough information. But what we found in reality was users would ask vague questions and the agent, and this is a large language model thing in general would produce an answer. And it wasn't always a good answer. These answers could be just straight up wrong or different runs would result in different answers because there was an ambiguity in perhaps multiple ways to interpret the question. So we had to find a way to solve this. And because it was a required behavior, we knew the agent needed to go can I answer this question given the information I have right now? And because it was required we could hard code it. And so we implemented the sort of pre processing step which would evaluate the question in isolation.
Donné Stevenson [00:07:55]: So just the question and the context before the agent ever got to try and solve the problem. By doing this we create a more consistent experience for the users and also a more reliable system because we know that it will stop when it doesn't know enough. These sorts of deterministic workflows, they can help in a lot of different ways. Here we've managed to make the experience more consistent, but there's also other ways we could use it. One of the other ones is when generating the actual SQL queries, in an ideal world, the agent would produce perfectly valid SQL that only referred to information that was actually available in the table documentation. But I think, and you can think about this as a normal data analyst, you might go over the schema and the table documentation once and then from memory try and write the different queries. And you might misremember the name of a column, maybe put in a spelling error, or maybe even assume the assumption of a column because it would make sense just given what you know. No one's going to go back and check the schema immediately.
Donné Stevenson [00:09:05]: You run it and you wait to see what happens. And our analyst was doing exactly the same thing and getting back these column does not exist errors. But this was a predictable error. We knew that this would happen, so why were we waiting to run it against our users servers? Which meant we were making one a call to our users systems, putting unnecessary load over there. But we were also wasting quite a few cycles of time trying to get an Answer back before we would try and correct it. And it was a case where we could predict that this would happen again. We put one of these predetermined steps in place where when a query is written the first thing we do is check does the column exist. We can then bypass an entire set of agent flows to say, okay, you can't answer this question with this query and immediately start problem solving from there.
Donné Stevenson [00:10:00]: So the idea is basically the agent can produce a lot of content, but this LLM content can be evaluated and it can be simplified if we're using good functional problem solving to check it before we start applying it to the outside world. And this will reduce the number of unnecessary cycles that the agent is going to have to do to problem solve. And overall just make the process a lot smoother. Thank you. The next one is leveraging your experts to set up and monitor your systems. I think with agents we are so or LLMs are so impressed by how much they can do and once you have agents, how quickly they can problem solve and the answers are always sort of human understandable. And so I think sometimes forget that these still need to be treated as data products. And so they need to be tested and they need to be tested rigorously when you're building very specific and very specialized agents like a SQL analyst.
Donné Stevenson [00:11:02]: And when we were initially building out the idea for this agent, we did a once off transfer with our data experts. So they would give us documentation, they would give us schema and we would take that in, we'd ingest it, we'd figure out how to store it, how to feed it to the agent and then try and solve the problems with the test sets we were provided. We would send this feed or the results back and then our data experts would come back and say, no, there's a problem, the answer is wrong, let's miss this detail. And we would then take that feedback and we try and fix it. But we weren't experts, we couldn't figure this out for all the problems on our own, and we couldn't make general solutions that would fix such specific problems because it was different for every use case. So we had to find a flow that would let us include our experts in the process of developing the agent. We've created pipelines and UIs for the experts, the data experts who are using the agent to update their documentation so they can move really quickly in their development cycles. They can move without our intervention because they can do this, they can optimize the documentation that they're building up for this Agent to be consumed by an agent rather than by a human.
Donné Stevenson [00:12:12]: And by doing that, their agents also becoming really specialized and really good at their specific use case. So ready for production. We've done a lot of testing. I think we'd gone through a lot of processes of figuring out how to optimize certain things. And part of the process was for us to understand what the token data analyst's role was, what it could do and what it couldn't do. And to some degree you kind of assume, everyone knows that when you call it a data analyst, you know, it's built to do data extraction and it's built to kind of evaluate that data. But the moment we gave it to users who hadn't been involved, but who were familiar with token, they started asking questions we sort of hadn't anticipated, that we weren't prepared to deal with and just queries we really just didn't know what to do with. And occasionally there was also a little bit of abuse for poor token.
Donné Stevenson [00:13:06]: But realistically, when we're building products like this, users pushing the boundaries of what your product can do, that shouldn't be a problem. There's Postal's law. It says be conservative in what you do, be liberal in what you accept from others. This is out in the world and users are going to push. And it's good because it's also how we start to see new ways of using these things. But we also have to have the system be resilient enough to handle it when they're pushing past either what we expect or what the agent can actually do. The big lesson here for us was how to mix hard limits. So like really specific rules about when the agent has to stop with soft limits, which are more how can we sort of point it in the right direction? How can we nudge the agent into the path we wanted to take without removing the ability for the agent to do creative problem solving to self correct? Because that's why we use agents the way we do.
Donné Stevenson [00:14:06]: And by finding a way to balance these two, we did find a way to keep the agent capable of problem solving, but not get trapped in infinite loops with itself or in rabbit holes that ultimately wouldn't answer the question that the user had asked. Cool. We'll hop to the next one, then the last one. Optimizing tools to simplify work for the agents. A lot of the lessons we learned here were about how to set up the main agent flow and how to set up control mechanisms within that flow to help the agent as it moves through the problem solving but one of the big parts of an agentic framework is the tools. When you're trying to optimize and agent to do good work, optimizing the tools is a good way to start putting good control mechanisms in place. One of those for us was the schema tool. So the schema tool is used to read the documentation and then determine what's relevant for the agent to know in the main context.
Donné Stevenson [00:15:16]: And the schema tool initially was relatively broad because we had very small use cases with only a couple of tables, and they were relatively small tables. And what we found as we developed and we got more and more use cases was that this tool needed to get better at determining what was relevant for the agent to know. And the value of doing this was meant that the agent was getting very concentrated information so it couldn't get distracted by wrong information and didn't have to pass the context for information that wasn't relevant to it. And so this tool was reducing the effort and the complexity of the problem that the main agent had to solve. We are now trialing out a similar thing with the tool that does the SQL execution. If you've ever written any SQL, you'll know that the first couple of SQL queries you write, they make generally errors that are not very context specific. They're general errors. If you're moving between different SQL dialects, things like date functions will sometimes get confused.
Donné Stevenson [00:16:16]: You don't need to know too much about what's happened before. You don't need to know what's happening outside of really anything other than the SQL you're writing to fix it. And we're letting the SQL executor try and fix that itself before going back to the analyst. This optimization is again saving these cycles between the agent and the tools, which makes it faster, but it also means that the agent's context is not being cluttered with bad information. And yeah, so the key takeaways here are basically when these tools were being built, it made sense to move fast and limit the complexity of them so that we could check the feasibility. But as we scaled, it was important that these tools were growing with the agent to become smarter at how they shared information.
Ioannis Zempekakis [00:17:01]: How much more to really allow the.
Donné Stevenson [00:17:02]: Agent to focus on the.
Ioannis Zempekakis [00:17:03]: Is this real? Because it changed the time. It's changed the time changed.
Donné Stevenson [00:17:14]: Did I miss something?
Ioannis Zempekakis [00:17:15]: No.
Donné Stevenson [00:17:18]: Cool. Not sure. I'll wrap up quickly. Yeah, so those are kind of the challenges. And then just to kind of reiterate, they kind of learned, kind of turned into these four lessons about using deterministic workflows, always keeping experts in the loop, making sure the agent system is resilient. Even though it can problem solve, it doesn't have to problem solve everything itself. And tools are a good way to optimize agent workflows. Yeah, yeah.
Donné Stevenson [00:17:49]: One last statement. AI first doesn't mean AI only. And I think that is it. Onto the questions.
Ioannis Zempekakis [00:17:58]: Yeah. Perfect. Thank you. Thanks. Tonne, my Greek friend. You want.
Demetrios Brinkmann [00:18:05]: Yeah. Give me this.
Ioannis Zempekakis [00:18:06]: Okay. Okay.
Demetrios Brinkmann [00:18:07]: You also want that before you steal it. Like. Oh, my sandwiches. All right. So luckily Donne was here to give some really quality.
Ioannis Zempekakis [00:18:16]: I know. That's why we invited. It's Donae. But of course it was a team effort, right? It was Floriss.
Demetrios Brinkmann [00:18:24]: So many. Yeah, yeah, yeah.
Ioannis Zempekakis [00:18:25]: Right.
Demetrios Brinkmann [00:18:26]: So except for you.
Ioannis Zempekakis [00:18:27]: Except.
Demetrios Brinkmann [00:18:27]: And so now there's a lot of great questions coming through. You can see them here. I'm looking at them here. First one coming up is, can you give us an example for the reflection tool and how you built the logic in there?
Ioannis Zempekakis [00:18:39]: Who wants to go?
Demetrios Brinkmann [00:18:41]: Donna?
Donné Stevenson [00:18:43]: Yeah, Good. The reflection tool actually was implemented and it was the idea of Florist, who is also working on the SQL Agent. It exists within the execution tool. Sorry, it's a bit of feedback. It's essentially just an additional call to an LLM, but it's a much, much smaller call because all we have to share is the query and the error and it can immediately return a response that we can then try and fix from there. So it's an extra step in the existing tool.
Demetrios Brinkmann [00:19:24]: That was. That was your amigo here talking while you were trying to talk. Just to let you know, just keep talking. Donate.
Donné Stevenson [00:19:34]: This is just standard for him, right?
Demetrios Brinkmann [00:19:36]: It's not. Not anything off par. Right. So there's another great question coming through here. That is how did you. And this is specifically for you. Yeah. So obviously I don't know what he's still doing up here, but how did you minimize abuse of the agent? Did you include potential tool abuse in the testing phase prior to the release to prod.
Demetrios Brinkmann [00:19:58]: Good question.
Donné Stevenson [00:19:59]: So I'm assuming tool abuse is looking at things like running many, many tools or running tools infinitely long. Especially when you're running SQL queries. We run SQL queries against user databases. One of the big risks is that we run queries that ultimately lock up those databases. There's a couple of mechanisms in place. We have timeouts, so tools can only run for so long and we have pretty strict requirements with the guys. We build that when they give us credentials that we have relatively limited access. So that Any mistakes would not sort of bring down their broader systems.
Donné Stevenson [00:20:38]: And up to this point the biggest issue we've had is infinite tool calls. And that's easy, right? That's a hard limit. If you're running too many tools, stop. So there's a couple of places you put it in, but I think especially when you're dealing with SQL, it's important that your access is limited so that the agent can't sort of DDoS it.
Demetrios Brinkmann [00:20:57]: Yeah. So before I ask my amigo here one more, the big question that I have is how do you make sure that the agent has proper access to the data and the database? Like how do you have the role based access with the agents?
Ioannis Zempekakis [00:21:12]: That's a good question.
Donné Stevenson [00:21:13]: So this is difficult because every user has different databases and then the way they're managing access is very different for each of them. So the access is managed on a case by case basis and we will work with the guys who are in charge of the the access to figure out the best way to let us connect. And then, yeah, there's obviously like secret management that we have to do on our side as well. But every case is different and we do have to have quite a few different managements for that.
Demetrios Brinkmann [00:21:45]: All right, so this one's for you. Oh, ready? What framework have you used actually from.
Ioannis Zempekakis [00:21:54]: The beginning for Tokan, we have started developing our own framework. So we haven't used any of the frameworks that they are out there. We have tried a couple of them. I think a colleague also yesterday give a talk about the different kind of frameworks. But we have found out that for what we actually want, it fits better our own goals to use our own framework. That doesn't mean that there are no frameworks out there that they are working well, it's just that it's a very exploratory space and I think it also pays off to explore what you can build in terms of your own capabilities because you also don't know how you need to shape it in the future.
Demetrios Brinkmann [00:22:40]: But did you play with any frameworks to see how they work and what they do?
Ioannis Zempekakis [00:22:44]: We have tried in the beginning, but it was very early so we didn't really stick to it. And now we are only trying as a kind of like a swarm for education.
Demetrios Brinkmann [00:22:54]: Right.
Ioannis Zempekakis [00:22:55]: So to try things and see how it works. What are some of the pros, what are some of the cons? But so far I think that the framework we've built is very powerful.
Demetrios Brinkmann [00:23:05]: Yeah. Also you started Tokon two years ago, right?
Ioannis Zempekakis [00:23:07]: Yeah, exactly.
Demetrios Brinkmann [00:23:08]: The frameworks two years ago are much different than probably what you're getting today. So you already invested.
Ioannis Zempekakis [00:23:13]: Exactly. We've already invested and we have test and it's scale in terms of numbers and requests. And now going and testing another framework. I don't think so. There is a payoff at the moment.
Demetrios Brinkmann [00:23:25]: Yeah. So which mechanisms do you use to determine when the agent should ask for clarifying information donate?
Ioannis Zempekakis [00:23:33]: You want to take it? I can also take this, but I think it's good to take it.
Donné Stevenson [00:23:38]: Sure. So the mechanism is what we call a pre processing step. And it's essentially these pre processing steps can be pure functional work or they can be smaller requests to LLMs with much shorter context and much smaller expectations on the kinds of responses. And in this case it's that. So we share the available context or condensed version of the context with the question and you say actually just try to determine is the question answerable given the context and just small shout out to Pro LLM the dataset that we use to analyze the model for. This is also being tested in Pro LLM.
Demetrios Brinkmann [00:24:17]: Yeah, Pro LLM is a gem of a resource on the Internet. So huge shout out the next question and potentially last question depending on how verbose you are. Is Tokon based on a single agent or are there multi agents communicating amongst each other?
Ioannis Zempekakis [00:24:37]: I think it's a single agent that it can be very easily transformed into a multi agent. In essence data analyst. Tokan Data analyst is a part of tocan. Right. So in that way it's a vertical agent of a horizontal agent. Because Tokan can do many different tasks inside this Tocan data analyst we have separate tools that we are invoke. Right. That we are calling.
Ioannis Zempekakis [00:25:07]: So strictly speaking is a single agent. If you're going to lose a bit the definition of the agent, then you can also consider it to be to have some flavors of multi agent.
Demetrios Brinkmann [00:25:24]: So for the inference are you using, you're using how many different LLMs and what are you looking at? Open, closed source? Are you primarily hitting one first? And if it can't do it then you go to something else. Do you have a proxy that will tell you for like the scaling and rate limits, all of that fun stuff.
Ioannis Zempekakis [00:25:44]: For token we are using different models. Some we have tried a lot, some are closed, some are open. We generally decide to choose the model that performs the best.
Demetrios Brinkmann [00:26:01]: So this is the best non answer. You're a diplomat, right now we use some open, we use some clothes.
Ioannis Zempekakis [00:26:07]: I mean I can tell you we.
Demetrios Brinkmann [00:26:08]: Like the ones that from the best.
Ioannis Zempekakis [00:26:10]: We Use the ones from the best.
Demetrios Brinkmann [00:26:11]: All right. You use the best models. There we go.
Ioannis Zempekakis [00:26:13]: We use the best models, we use the best models, we use which are most of the time OpenAI models. But it doesn't mean that we use the same model for the whole flow.
Demetrios Brinkmann [00:26:26]: Okay, tell me more about that.
Ioannis Zempekakis [00:26:27]: Yes. So we have separate tool calls, right? For example, you have the reflection tool, like Donne said before. You have maybe a pre processing step and that can be deterministic, but if this is not deterministic, you might use another LLM there to make a call. So it depends on the complexity of each tool. We can call different type of model, we can call mini, we can call the 101, we can call 4.0, we can call even GPT4 Turbo. So it depends on the complexity, it depends on the cost. These are the two main requirements, which.
Demetrios Brinkmann [00:27:08]: Are more complex versus less complex. Which of these steps, when can you get away with the small models?
Ioannis Zempekakis [00:27:16]: I would say that the execution component is the one that it's more complex. Smaller, more sliced tools are much more easy to use. Smaller models. Generally speaking, the data analyst doesn't use very light models. So it's quite heavy in terms of computation. And also the cost of error is very high because people are depending on the analyst to get back data and they take decisions based on this data. So the risk of making a wrong decision. Right.
Ioannis Zempekakis [00:28:01]: So we are optimizing for precision and that's why we don't want to take big risks on what type of models do we use with that they might possible underperform versus the state of the art.
Demetrios Brinkmann [00:28:15]: Brilliant, man. This is very, very cool. I think we just hit time. So that's it, sadly.
Ioannis Zempekakis [00:28:24]: I'm going to eat the pizza.
Demetrios Brinkmann [00:28:25]: Yeah, Actually there's a party going on here upstairs of the next museum with everybody that came live. And if you're in Amsterdam, you still got time come. If you're in. Even if you're in Belgium, come. And we can make it. So we're here all night, but for those that are virtual, we've got you set up with a video that's going to happen right now and then we're going to get right into the next talk shortly, I guess is the best way to describe that time frame.
Ioannis Zempekakis [00:28:53]: Thank you. Thanks, Dana, and thanks to everyone.