MLOps Community
Sign in or Join the community to continue

From Chat Fatigue to Instant Action: Transforming Dealer Engagement Through Intelligent UI // Donné Stevenson

Posted Nov 25, 2025 | Views 41
# Agents in Production
# Prosus Group
# Dealer Engagement
Share

Speaker

user's Avatar
Donné Stevenson
Machine Learning Engineer @ Prosus Group

Focused in building AI powered products that give companies the tools and expertise needed to harness to power of AI in their respective fields.

+ Read More

SUMMARY

This presentation discusses the evolution of AI agent interaction, focusing on transitioning from low-engagement text-based chat to more intuitive, GUI-driven experiences. It outlines critical challenges in creating an intuitive and impactful experience for busy dealers, proposing solutions that include quick actions, efficient data streaming, and agent interactivity to create a great user experience.

+ Read More

TRANSCRIPT

Donné Stevenson [00:00:05]: Hi, guys. Thank you for joining. Yeah, I'm Donne, I'm one of the machine learning engineers at Process, and today I'm going to just do a quick discussion on the journey we've taken as we've built yet another agent and the biggest learning that we've taken from it, which is the idea that as we're moving more and more into agentic world, we need to move past just the chat interface and start looking at more of the instant action and more of a dynamic experience with these smart agents. Now you might ask yourself why it's really, if you start thinking about the users that need to use it, there's a process to go from building a really intelligent agent to getting to the moment where your users will do anything, even say hi to it. And when you're starting out with new users, users who aren't used to using agentic platforms for the work they're doing, that open chat, just having a text box where you can ask anything, while it might seem like really great, it's overwhelming, it's intimidating. That little flicking cursor is really unintuitive because the user is now expected to understand what, what the agent can do. They have to understand how to ask the agent to do that and then how to use the information that's coming back to them with very little information to guide them through that process, which is really, if you're busy, if you're overwhelmed, if you have a lot of stuff to interact with, then that's just another thing to do, so you don't want to do it. This is really just the journey we've taken to create an agent that dealers are interested in and the lessons that we learned while we did that so briefly.

Donné Stevenson [00:01:53]: The project. So we are working with Automotive. They are a car dealer for secondhand cars, mostly in Poland. They're actually the biggest one and they have, yeah, they have thousands of dealers and millions of users on the platform. So you're probably thinking, ah, yet another agent for the users to ask FAQ questions. That's not what we wanted to do. What we wanted to do was build an agent that started solving problems for the dealers, for the guys who are trying to sell their cars on this platform. And the reason we wanted to help them was automotive as a platform is incredibly rich.

Donné Stevenson [00:02:32]: It has a lot of data, has a lot of information that the dealers can use to really help them amplify their listings, who they're reaching, how they're selling, but there's so much information that it can be really overwhelming. For them. And that's what we're seeing in their feedback. And even if they can get through all that information, it's so much work for them to organize their own listings, to use those insights, to take actions, that we felt there really was a space here where some agency experience could help them do this more efficiently. That's where we started to focus. What this looked like for us was to build something like a business intelligence assistant. This is meant to help them parse this data, read this data, and then most importantly from that, take an action, give us a suggestion, tell them what to do next. And this was a Disrupt project.

Donné Stevenson [00:03:28]: So when we talk about Disrupt projects, these are projects where you need to show value quickly. So the approach was, let's move fast, let's show that there's some potential for adoption, right? These dealers, they've been doing this for years. They know their product, they know their business, they know their process. So we needed to check, could we get them to engage with an agent in any meaningful way and what that looked like. To do that, we really need to get feedback quickly. We decided to go for a limited skillset agent. So we didn't try and solve every problem for them, just a little bit of information, and then we set it up so that we could still, in this limited capacity, get their feedback, figure out what it is that they were expecting the agent to do. And this first agent we released, I think just a couple months in, it was pretty basic.

Donné Stevenson [00:04:19]: It's a react agent. It had just a couple of tools that did basic data retrieval and basic data analysis. That was it. We released this, but we still allowed users to ask questions on any feature, any information. We just didn't always answer them. Again, this idea that we wanted feedback quickly, these were our results. So within two weeks we had reached 100% of the users. So at this point, we saturated it.

Donné Stevenson [00:04:49]: We know that everyone has seen it and decided whether or not they're going to try it. And it's pretty unfortunate results. Only 10% of the dealers that we showed anything, the agent to actually engaged with it in any sort of measurable way. And the repeat usage was somewhat negligible. So overall it was disappointing results. But there were a lot of learnings that we gained from this. And this is really what the point of this first experiment was. So what we saw, users didn't know what to ask.

Donné Stevenson [00:05:17]: They were opening this platform and they were going, what can you do? How do I ask you questions? Why can't you do this? And they got frustrated at the Limited abilities of the agents. And that's a pretty positive signal, right? They wanted it to do more. And then two other sort of smaller things was the agent. We had those preset tools, so we kind of knew what questions we were expecting to answer with those tools. We showed users just a snippet of questions they could ask and they could click on a button. This is important. They clicked on it rather than having to type it. They would click on those more than they would ask their own questions.

Donné Stevenson [00:05:55]: 10% is not high, but it's some interaction, it's some potential to grab their attention. We took these, what we'd learned here, and we went back to the drawing board and sort of with this information, right, they'll engage because they want the insights, the buttons that being able to click on something was more engaging than open questions. And there was quick loss of interest, right. So I think everyone wants to move quickly. And if things don't get interesting or exciting often, then we lose interest. And this. So our conclusion, the next hypothesis we wanted to test was can we ease them into using the agent? Can we somehow guide them to use the agent without having to explicitly do onboarding, without having to do trainings or have like tooltips with large explanations of what was happening? And could we do this by using some form of a dynamic ui? And that's how we get to experiment two. And this is more where I'll kind of discuss the learnings that we, that we had along the way and how we came to the agent as it exists now, and this idea that chat fatigue exists and we need to move more to instance action, more to a dynamic ui.

Donné Stevenson [00:07:11]: What does iteration to look like? The agent doesn't change much. We still have this react agent. It is obviously an improvement. We don't want to put out the same one. It's got a lot more tools, a lot more access to data, it's a lot smarter, it's a lot more useful to the users. We've improved the prompting your basic iteration cycle on an agent. Then the two really interesting things we focused on were the buttons and the ux. Ux, I think, pretty straightforward things, interactive answers and streaming and the buttons is this idea.

Donné Stevenson [00:07:51]: So when we talk about AI agents, particularly in a chat thing, we're always thinking it's just this little icon in the corner. They click it, they start typing. But what if we could expand that experience so that the chat part of the agent was only a part of the service we offered or only a part of the platform that they were interacting with we have what we call a navbar. They see this on certain pages, the first two upload and sell and extend. They don't actually interact with our agents at all. They're completely front end based. They are going to be shortcuts for clicking around the platform to take you to very specific parts of the platform that we're seeing are quite useful for them to see. Recent changes is a button.

Donné Stevenson [00:08:43]: But what it will do is open the AI assistant with a preset question that brings back a summary of, for example, the dealer's inventory movements, which ones have sold, which ads are about to expire, that sort of information and then your standard AI assistant, which will open that chat window that we're all quite familiar with. What we learned when we were designing this, and it was a feedback cycle, was that while your aid agent can get really smart, you really need the UX solutions to make that intelligence accessible to them. And we saw this in kind of three ways. LLMs do not read tabular data very well. The users are going to interact with the agent in a way that indicates that they think or assume, and rightfully so, expect that it knows what they're looking at and that they want the answers to be quick. But unfortunately, parsing large amounts of data is quite slow. Starting with the tabular data, our agent, we've said it was a business intelligence agent, what's its primary function going to be? It's going to be data analysis and it's going to be taking analysis into recommendations. But that means its primary tool set is going to be data retrieval.

Donné Stevenson [00:10:03]: Designing tools for data retrieval means that these tools need to do it in a way that is safe and reliable because we want to make sure that the data we're returning is correct. Tool design when designing tools for any agent, especially for us, when we're looking at data, there's a spectrum. We could have very stringent tools, tools that each did exactly one thing. They had zero flexibility, zero interpretability, and that's the hammer. It can do exactly one thing and it can only do that thing. It's great in that it's reliable, it's safe, but it's not so great in that if you move past it or you need to do anything else, you don't have the tools to do it. On the other end of the spectrum, for example, the idea, if you just asked it to write the SQL queries to talk to your database, you end up in this giant toolbox where, sure, it can ask anything and everything, it's super flexible, but it's also incredibly complex to use a tool like that. You have to know what every tool does.

Donné Stevenson [00:11:11]: You have to know where every tool is applicable, which tool should you use instead of that one. And that creates a level of complexity and also a challenge in building and context building. That was really going to slow us down. And we really did have this need to move quickly. And so as with all spectrums, we landed up somewhere in the middle with this idea of like a Swiss army knife. Like it's got a fair number of tools that, you know, it's not super limited, but it's a, it is a fixed number and we know what each of them do. So we don't have this massive amount of complexity in exchange for some flexibility. And so our tool design kind of consolidated on what we call purpose built aggregation tools.

Donné Stevenson [00:11:55]: So data retrieval, which is each tool for data retrieval is purpose built with some concept of aggregation. And so every tool is related to a specific concept and it will aggregate the data for the use case. It will then explain any data concepts in the returned data in plain text in case that's necessary, and return a small snippet of raw data if you want to see what that looks like. More literally, this is an answer from the tool. To get an idea of the promotions the dealer has applied to some of their listings. So the first we have this aggregation, so literally the summary statistics of their portfolio. Then we have some data explanation, what certain words mean, how to interpret certain details in the data, and then this raw data snippet which is just some random output, well, not random, but just some broader output that is less specific, but that the agent could potentially use if it needed to answer something that isn't directly available in the summary statistics. And with that, we had our tools in place and now we're seeing how the users are interacting with it and the way they're doing this.

Donné Stevenson [00:13:10]: There's some expectation that the agent is aware of what they're looking at. But we're not building a web agent so it can't actually see your ui. This is what it looks like when a dealer is on the platform. They have this huge number of tabs they can click between and then they have the bar. And this bar is going to be maintained as they're clicking through the platform. So when they ask about something, they're expecting that the agent can see it. Now, building full context awareness is again a very big project all on its own. And we really, more than anything, wanted to give them the experience that was happening because the value add in the intelligence maybe didn't need that level of complexity.

Donné Stevenson [00:13:51]: So we created this by having the navbar be flexible or dynamic to the tab that they're on. So depending on where they are on the platform, they get a different nav bar. But it's mostly just those preset questions, those pre fixed functions that are going to change. So for an announcement you'll filter to see the ads that might expire, but on the inquiries page you'll see all the messages that need a reply. And those are the kinds of buttons we gave them. Rather than having the full agents change, we just changed the nav bar and how that looked. This is a really interesting thing that we've done and it's getting a lot of usage, these buttons. There's definitely an interest in preempting actions the user might want to take on the page and then letting them do that through this agent.

Donné Stevenson [00:14:42]: Then finally, agents are going to respond in text. But you really want the users to have a really good experience of the responses that are coming through. For us that meant resolving always the same one latency. They need to have a good experience and part of that is getting an answer as soon as possible. The other part for us was really trying to think about how to give them an answer that was more than a wall of text. Starting with the latency. Our P99 is almost 20 seconds. I think this is from the last month.

Donné Stevenson [00:15:21]: That's pretty long if you have to wait that long to get the full answer. One of the quick wins for us was to implement streaming. We didn't change the agent, we didn't make it any smarter. But by creating the impression or creating the experience that the the agent was answering them faster, there's much more traction with the users to keep talking to the agent. And then one of the really interesting challenges, I think it's coming up more and more for people, is how to represent large amounts of data to the agents without exploding the context. You can see once we released we had this huge spike and that was slowing it down and we really did have to find a way to deal with that. And there's a lot of conversation happening now around data representations with the I think it's I don't pronounce toon library. I won't get into that one too much.

Donné Stevenson [00:16:15]: But just for us, what this looked like was initially we were returning data as JSON because it is much more interpretable on an individual level. Each one is self contained and you can see the details, but it's an incredibly expensive Way to share data in terms of tokens, I think, yeah, it's almost double the number of tokens. As for something like CSV, for us, we've defaulted, we've moved away from the JSON representation to the CSV representation because we're adding this aggregate at the top with the summary statistics. That's helping a lot in dealing with the comprehensibility of the data without having to pay the cost of having this many tokens in the response. Then just this idea of like an interactive response. So LLMs, they produce text, right? That's what they're for. They just produce plain text. Text is not super easy to consume if you have a lot of it.

Donné Stevenson [00:17:15]: And also if you're trying to give users information, the most important part of that information is that it's actionable and text is not super actionable. So to really add value with the agent with the. The response needed to be interactive. How we started looking at this was how to start replacing the response with dynamic elements. On the left is plain text as it would come from the agent. On the right is how we show it on the ui. Now, of course, any product you're going to try and build a really beautiful UX and have the answers formatted in a way that's super readable and that thing. But we wanted it to be actionable.

Donné Stevenson [00:17:54]: We wanted them to be able to do something with the response as soon as they got it. So that underlining it represents a clickable element, right? You click on that, you go to the advert, you see the full advert. And that's really the point is that we created these interactive responses and this is more and more what we're trying to do with the responses that we're giving the agent. And then, funny story about clickables, we. When we were designing this, there's this contract that has to be agreed upon between the agent in the back end that's producing the text that needs to be shown on the front end, which is rendering that text. And when we introduced this idea of a clickable, we were returning the clickable link from the back end from the agent. But that meant we were relying on the agent to maintain correctly the link, which is this URL encoded content and hope that it set it correctly because if this is wrong, the link won't work. Right? And it blew up our tokens.

Donné Stevenson [00:19:01]: We had 66 tokens in this example and this is quite a short title. It blew up our tokens and it was really making it quite hard to manage the context. One of our engineers came up with what's, I think quite a clever token to represent an ad, which includes its ID and its name. And the front end knows to look for the specific token and then replace that with a clickable link. Now, because we've done it in this way, we're changing how the front end renders this sort of thing. But we haven't actually had to change our back end because it's going to handle how to represent that. And it saved us a ton of tokens. Maybe a bit of a rookie mistake to include something like this here, but I think an interesting conversation around how the front end and the back end agree to share this communication.

Donné Stevenson [00:19:51]: I'll just close off with some brief results. This is our graph of movements, but I won't drag you through all of them. I'll just take you briefly through, I think is maybe the most interesting part and that's that this what I said from the beginning, right when we gave them buttons to click on, they were more likely to ask a question later on. The yellow and the blue lines, these are buttons they could click on. And then the purple line that you can barely see, that's them asking an open question at the opening, at the first part of that session for them, they barely, barely would do it. But if we could get them to click on a button that could feed into the open questions and that's when they started to be able to ask follow ups. And I think that's like a really interesting learning to take from this. And something we're really going to push as we move on is make this UI even more dynamic.

Donné Stevenson [00:20:44]: So maybe even the open questions become less and less relevant because we're going to preempt and be able to predict what comes next. And yeah, I just, yeah, briefly. We care a little bit about like personalization, but figuring out what that means for dealers is interesting. All agents have this need for personalization, but what does it mean when it's a business? And so that's sort of our next hurdle. And that is everything. I'm sorry, I don't actually know what the time is.

Demetrios Brinkmann [00:21:14]: That was perfect, Donay. Thank you. There's so many pieces that I wanted to get into with that and first one, on your second or third slide, it's like chat fatigue is real. I feel you on that so hard. That is so true. So with that in mind, I really appreciate the idea of that toolbar. I wanted to ask how you decide what actions get put in that toolbar. You said that, hey, we want to do it eventually dynamically.

Demetrios Brinkmann [00:21:46]: But right now, is it just that the majority of the folks, that's what they want to do when they're on that page?

Donné Stevenson [00:21:54]: Yeah. So sort of based on user research that was done sort of before the project started, a little bit of interviews and then a little bit of just intuition, we're building those out and seeing which ones click and we're developing some ideas for additional actions. It's also what can we do with the front end without blowing up the rest of the platform? But yeah, I think mostly it's just coming from user research.

Demetrios Brinkmann [00:22:21]: All right, next question that I have is the hyperlink that you showed there, does that not mess up the evals? Because potentially one of your evals is like the person keeps talking or that you get some kind of an action done. Or is that part of the eval is if something is clicked that's a positive signal.

Donné Stevenson [00:22:46]: Yeah. So there's sort of two levels of evals. There's our evals from like what the agent is responding. So that doesn't care too much about which ones they're clicking on. If they were to ask a follow up, we would, we would see that more in them asking a specific question about the Mercedes. So they would offer in a follow up. The clicking on it is more interesting for us from an engagement perspective. So we were doing that more on the events tracking part of it.

Demetrios Brinkmann [00:23:12]: Okay. So it doesn't even really matter about the. The eval side. It's more about the engagement. I can see that.

Donné Stevenson [00:23:19]: And so then it might. We just haven't framed it that way. So if you have some thoughts there.

Demetrios Brinkmann [00:23:24]: No. Well, the thoughts that I had were that it was going to totally mess everything up. So it seems like you just said if we don't think about it, it's even better.

Donné Stevenson [00:23:34]: Sometimes the problems you don't know about are not problems.

Demetrios Brinkmann [00:23:36]: Yeah, that's like I heard. Yeah. If you don't track it. Somebody told me we have no bugs if we build no software.

Donné Stevenson [00:23:49]: There you go. You sound just like cursor. Nice.

Demetrios Brinkmann [00:23:54]: There's some questions coming through in the chat. What was your solution for streaming multiple UI components to the client? Any experience with AI, SDK and TypeScript?

Donné Stevenson [00:24:07]: I did not build most of this because. No, I don't know anything about TypeScript. It's mostly the front end responsibility. I can't tell you too much technically about how that is working, unfortunately.

Demetrios Brinkmann [00:24:22]: Well, then we'll go to the next question.

Donné Stevenson [00:24:24]: Sorry.

Demetrios Brinkmann [00:24:25]: On tool design, have you ever considered to implement a way for agents to automatically generate their own tools.

Donné Stevenson [00:24:34]: Yes, it's an interesting idea. It's again running into this complexity of monitoring and maintenance, because if it's generating its own tool, you have to know that it did it. You have to be able to test that when it's doing this, that they're correct and they're safe. That correctness, that is hard ensuring that it. Because generating your own tools is not so different to writing your own query. I've seen from other projects that it can be quite hard to get it to understand data in that way. But we do recognize that the agent is going to see stuff and if we give it some strength, like some, some skills to do this, then it could help. So it does have a parallel process that runs and tracks.

Donné Stevenson [00:25:23]: If we didn't answer a question, what was the question and what was the tool that would have helped? And this will eventually feed into a pipeline that lets us, helps us build the next tool we need. And we're also playing with the idea of having a fallback tool that says, okay, none of the safe, reliable ones work. Let's try a query and then see what happens. Maybe we can answer it, but it would really. This is a fallback. We want to make sure that we're aware of when this is happening and doing this as carefully as possible.

Demetrios Brinkmann [00:25:55]: Okay, how or where did you aggregate? Let me start over. Where did the aggregate data come from? A smaller LLM. Handcrafted summary queries.

Donné Stevenson [00:26:11]: Yeah, so we were quite fortunate that because the analyst is sort of meant to help them parse the data that's already available on the platform, in many cases the data is there, it just needs to be retrieved in a way that is accessible. The first level of querying is done for us and then we do the actual aggregations. The next level of aggregations we do manually for each tool. That's why we have this tool design conversation. We didn't say here's your SQL tool or here's your, I don't know, your ads tool. We said, okay, here's your tool to get ads for review. We understand what that concept is, and so we aggregate the data in a way that is respectful of that concept.

Demetrios Brinkmann [00:27:05]: Oh, so you're almost trying to preemptively make the shape of the data easier for the LLM to consume.

Donné Stevenson [00:27:14]: Yes.

Demetrios Brinkmann [00:27:16]: Because you have very clear paths.

Donné Stevenson [00:27:20]: Because we've seen a little bit about the data that's available, we've seen a little bit of the questions that they're asking, we're able to say, okay, well, if we Give it these numbers, it can probably answer a lot of the questions that the dealers have anyway. But we give it that raw data again as a way to say if that doesn't work out, here's another way you could try. Ideally, at some point we would like to be able to share the data and then let it do other operations on that data. So if it doesn't have the statistic it needs. But the experiment we ran, the compute was too slow. So we're just trying to figure out how to plug in a compute that's fast enough.

Demetrios Brinkmann [00:28:01]: And while we're on the topic of data, there was one big question that I had with the CSV. You said you went to CSV to try and basically help the context. Right. And not have this token explosion happening. What were the trade offs though? I understand that CSV, you consume less tokens, but what's the bad part about it?

Donné Stevenson [00:28:28]: It's harder to read. Right. So if the LLM needs to answer a question about a specific element in it, because it's not like the JSON is much easier to represent because it's much more self contained. Because, you know, if you're. So for example, we have like the number of clicks, the number of views and the number of phone calls. Now in JSON it says calls 3 views 400. But in a CSV you're going to have the ad id and then just commas with data in between. So it's a little bit harder for it to tie that number back to the header.

Donné Stevenson [00:29:06]: So you lose a little bit of the sort of comprehension of the data. And that's the, the trade off for us. And that's why these summary statistics start to matter a lot more.

Demetrios Brinkmann [00:29:21]: Okay, there's a ton more I want to ask you, but I'm going to get to Joao in the chat. In your implementation, does the be, I'm sure you understand what that means. I don't know what BE stands for. Does the BE expose different endpoints for different actions?

Donné Stevenson [00:29:38]: I think that means the back end. Yes, it does. So the basic one is just a single endpoint for the agent and then for other workflows, then we expose other endpoints. And we do that because we don't want the agent necessarily to always be aware of the other parts because they don't always contribute to the conversation that's happening.

Demetrios Brinkmann [00:30:01]: The first agent version had a small set of tools. The second version had way more tools. How have you defined how many tools to create?

Donné Stevenson [00:30:12]: Yeah, that's an ongoing conversation and that's something we iterate on. We have had someone join the team recently who's focused on evals, and that's turning into a really productive way to figure out what tools are there when tools are being confused. And again, that parallel process we have that tracks questions we couldn't answer that helps us also define when we need to add tools or when we need to review which tools we have and where there's confusion happening. It's a combination of proper evaluations and then also reviewing this data that the agent is collecting for us.

Demetrios Brinkmann [00:30:50]: When you say evaluations, you're doing tool evaluations.

Donné Stevenson [00:30:54]: It's pretty comprehensive evals that she's built. In this case here we're talking about, it will do an evaluation of was the correct tool called and was the tool output used correctly. Both of these can be flags if something is going wrong that the tools need to be reviewed.

Demetrios Brinkmann [00:31:14]: Awesome. Donay, I think there's maybe one more question for you from Jonas in the chat. Are you considering tune token object notation?

Donné Stevenson [00:31:24]: Nope.

Demetrios Brinkmann [00:31:25]: No, Just a straight. Know that. There you have it, folks. Donate. I appreciate you so much. Thanks for coming on.

Donné Stevenson [00:31:32]: Thank you for having me. Have a good evening.

Demetrios Brinkmann [00:31:34]: Ciao.

+ Read More
Comments (0)
Popular
avatar


Watch More

How to Systematically Test and Evaluate Your LLMs Apps
Posted Oct 18, 2024 | Views 15.2K
# LLMs
# Engineering best practices
# Comet ML
Small Data, Big Impact: The Story Behind DuckDB
Posted Jan 09, 2024 | Views 13.4K
# Data Management
# MotherDuck
# DuckDB
Building LLM Applications for Production
Posted Jun 20, 2023 | Views 11.1K
# LLM in Production
# LLMs
# Claypot AI
# Redis.io
# Gantry.io
# Predibase.com
# Humanloop.com
# Anyscale.com
# Zilliz.com
# Arize.com
# Nvidia.com
# TrueFoundry.com
# Premai.io
# Continual.ai
# Argilla.io
# Genesiscloud.com
# Rungalileo.io
Code of Conduct