MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Dynamic Contextual Retrieval in Enterprise Analytics // Dirk Petzoldt

Posted Nov 27, 2025 | Views 15
# Agents in Production
# Prosus Group
# Enterprise Analytics
Share

speaker

user's Avatar
Dirk Petzoldt
Co-Founder @ Explai.com
  • Ran Zalando's pricing and marketing platform.
  • Glovo's Chief Data Officer until 2023
  • Co-Founder explai.com
+ Read More

SUMMARY

We reflect on how the complexity of an agent analytics project at an international pharma taught us to move from prompt engineering to context engineering, empowering agent with interactive tooling to build their context dynamically.

+ Read More

TRANSCRIPT

Dirk Petzoldt [00:00:05]: Wonderful, thank you so much. So I want to make this session super practical. So my goal is whoever listens have been a long day that you really get three practical tips how you can, how you can improve your prompt engineering really tomorrow, right. So I want to make it very practical. So I've, I'm a data scientist by training. I'm in the data space for 20 years. I worked for a long time hands on and I also led some big data teams of a few hundred people at companies like Solando and Delivery Hero. I found it two years ago.

Dirk Petzoldt [00:00:41]: So I put recovering tech executive at my profile because I love being hands on again and I love the gentrik revolution and what it means for AI. And I'm a learner. So it has been a great day today. I loved it from the first keynote and please connect on LinkedIn if you want to share experiences, if you want to connect to projects. So I'm on a journey to understand how we, how we really can apply AI to analytics and data. And this feels like an opportunity of a lifetime to be part of such an industry shift. Okay, so this is my informal mission. So data science for dummies, they're not what you find on the website if you go to our company xplay or to my own website.

Dirk Petzoldt [00:01:22]: So you see this guy who has all this data and he cannot do SQL, he cannot do Python, he's not a trained statistician, but he's not the dummy because he has instinct, he has tremendous business knowledge, he is a smart guy, he has all this latent context of the workplace. So the real dummy here is this sweet fellow we all as AI builders love so much, which is the agent who, having read the whole Internet, often fails doing math on a few numbers. If we think about who is a dummy, it's not the user who has this tremendous knowledge untapped and that we really need to tap into and use when we do analytics. But it's actually how can we make the agent behave and learn to handle tabular data well. So that's the first mind shift I had to make. I mean, I made it my career before agents, but when I worked with agents, I really saw, okay, these agents have tremendous potential, but then they're really dumb unless you teach them otherwise. The other misconception I feel really is where we sometimes are in a wrong way as an industry applying AI to data is that instead sticking to building data companions, we built just another BI system. When I talk with fellow founders or in the network, a lot of people try to build Natural language to SQL or producing plots or sprinkle AI on BI tools.

Dirk Petzoldt [00:03:00]: Those can be super useful for people to get data. But if we do AI like this, it's just another tool. I think why tool is a useful metaphor to think okay, we are augmenting humans and, and it's just a tool in a sense it gets powerful in the end of the human. I also think we largely limit ourselves because I mean I've led like more than 500 analysts in my career and was one myself and kind of great data analytics was never about who is the greatest SQL coder or who can do. I mean analysts before AI have been done natural language to SQL, right? And of course you need to do it to be an analyst. But the great analyst is not the one who can do SQL the best or who can do the nicest plot. It's really about getting this, not to build yet another two but thinking about this end to end process which is high context. It has all those different elements which I've listed here and it's really a process that has all those steps, each of those steps are different every time.

Dirk Petzoldt [00:04:06]: So you see I call out that the science and data science doesn't mean like PhD level fancy formulas and mass, but the science and data science. And my experience is that the past two value, the past question is unknown. So maybe the question isn't even phrased the right way. Maybe there is no data, maybe there's no pattern, right? So it's not about the single query. And so to be honest, if a single query can answer, then it was an interesting question to begin with because if you ask what is my number of customers this week and gives you a number and you're just fine, then there was no signal in this answer because if you're not surprised, if you not have a full up question like what is the trend? Is it larger than last week? What can I do to increase it? What is the uplift of such. And so if you have no follow up question to this answer you got from the SQL then that probably you were expecting it anyway because it was weekly pattern or then there would be no signal. So if there's a signal then there's always follow up. And so you have this social cultural process that's always multi step, it's always human in the loop and the agent is a companion in the sense of a consultant, like a data analyst as a consultant.

Dirk Petzoldt [00:05:21]: So I just want to call it out because I think as an industry it's so important. I mean data analytics is Big, right? I mean we at Xplay we build a product here, but it's a huge landscape and there's space for a lot of, for a lot of products and we really should build, we should learn from how analysts worked in the last decades and how they help business and not just do another, you know, do another natural language interface to BI reports. Great. Of course once you have this job to be done figured out, I mean then you have this deeper layer of okay, you need to think about how do I implement all those activities. So you quickly figure out you need to have this multi agent system because you will have a plotter, you will have a SQL writer, you might have someone who's doing forecasting, some agent doing causal inference and some of these we haven't cracked yet. Right? So I mean SQL is soon becoming a commodity so you cannot expect to differentiate yourself there. But if it's about what is the best forecasting model or endpoint to use and there's still a lot of opportunity, so you need to coordinate this agent, you need to make sure they're skilled, you need to verify results because data analytics is a lot about accuracy so you can't have errors accumulate. Then you have to inform agents and make sure they get the right information.

Dirk Petzoldt [00:06:48]: That's what I focus on in this talk. I focus on the prompt engineering challenge, which as many say is really the core challenge when building AI systems. And I have to say personally, because agents don't learn really so we don't have reinforcement learning or fine tuning in most of our workloads. Then prompt engineering and what you put in the context window is really how you manufacture learning and end to end abilities in your system. So when we started our journey two years ago, I mean this is a quite recent picture from Antsophic which shows this visualizes this. Okay, you have this large collection of things you could put in the context and you really need to create it carefully. So the picture is new, but I think the practices have been out for a while. And to be honest, we did it wrong when we started because that was the wisdom of the time in our first 12 months as a company.

Dirk Petzoldt [00:07:47]: So we would do a lot of custom prompts for our customers. We felt a differentiator is that we know the data science process so we would put a lot of domain knowledge and we would feed it to the agents proactively. So that would be about the data science process. Of course we would have rag on all the table info, we would have documents that would give business knowledge, so we'd Kind of preload the agent with all the information we got. And then once the analysis starts, of course you have intermediate results and then you have, I mean you might, you do SQL, you get results, they're very long, there might be thousands of rows. So you do previews, so you get actual real data into the context. So samples of it. You might be super smart how you sample it and how you collect those snapshots, but they do accumulate, right, and kind of pollute the context in a sense.

Dirk Petzoldt [00:08:43]: And then we were doing trimming and summarization to kind of then reduce noise and make sure there's enough signal. So we felt very smart doing it, but it didn't work very well. And the reason was that data isn't small even when you, I mean even when you sample it. And then as we saw before, there is business data process, there's all those different, there's all this different disciplines and techniques and contexts that you want to bring in. And so we really saw instruction following degrading once we scaled it, once we had our real enterprise use cases and scaled it to real level data. So I want to give you the shortcut or the hacks how you can avoid this. Maybe you already passed this, then it's a recap for you. But for us it was a hard learned journey and I want to share with you how you can avoid it if you haven't been there.

Dirk Petzoldt [00:09:43]: So this is a great picture from the LangChain website. So it shows the many different ways how you, how you can engineer prompt. One way is how do you even create context. This is the red box here you see there's different ways how you actually want to persist information or commit information to be in the long term or short term memory and you want to write to it with reasoning agent. Of course there's even more ways to do it. Then once you have things in context, you want to select them. The blue box Again, there's several patterns here. I don't go through all of them, but I will focus on three actions examples.

Dirk Petzoldt [00:10:24]: Then when context gets too big, you want to compress. I already mentioned summarization, trimming and then at the end if it's still big, then the same like with compute workloads in data you begin to think about distribution and how you can partition your work. So here's what works for us. So you see the blue boxes of where our current focus or our strategy is, if you will, for prompt engineering. Then you see those three little stars and those are tactics which I will deep dive in the next three slides you can see writing context. We have started so we want to double down and do more. For example, we don't do a lot of scratch pad but we definitely see value here. Select context.

Dirk Petzoldt [00:11:11]: I want to give you an example that I think the way it's recommended, it doesn't work. And that relates to the previous slide where I showed that when we were preloading the context with too much information and that didn't work. So you have to resync from pull to push and I will give an example. Summarizing is great, so I think you should prefer it before trimming whenever time to token allows you to have an extra thinking cycle to spend on summarization. That can be super powerful, especially between handovers of agents. And then keep doing is isolation. I think some people don't do it at all and I think doing don't do it at all and I think it's a, it's a missed opportunity. Okay, so there's a lot to discover here.

Dirk Petzoldt [00:11:53]: This is only a 15 minute presentation. So I just picked three examples which we saw working for us and I think you can apply very, in a very lean and effective way. So let me go to the. So reverse direction first and double down and then writing context and then isolating context. So this is reversing the rac. So I did, I mean I mentioned how us preloading the agent context really broke the reasoning because it was just too much information. So we do it in a very different way now. So here you see an example of how to compute some gross metrics.

Dirk Petzoldt [00:12:32]: So it's a cagr. It has a mathematical formula and it's a standard metric in marketing and controlling and sales and a few other disciplines. So we used to have a tool for this that would do it and we used to have prompt guidance to do it. So how we do it now is we say we just have a document, but we do make sure that the document is very carefully structured. So you see it has this. I mean first it's version controlled and it can be accessed by the system in a controlled way. But it does has, I mean you see it in the four boxes here. It has a trigger message which is very short.

Dirk Petzoldt [00:13:08]: So you can preload it in every context because it's just a simple sentence, right? So it would say here, when is kegr actually useful? This is just one sentence. So you can put it in every agent context where you think that could be potentially useful. Then you would have tooling for the agents to actually pull the full document when this trigger is relevant and we have seen agents were very good to run a lot of those tools in parallel. So with the frontier models you would actually have it's no problem to have 5, 10, 15 calls in parallel. So you can really query a lot of those documents without lengthening syncing. So then the next section would be prerequisites for the agent. So the agency is okay, if I want to do this, then first I have to pull data that has two date ranges which is required here for this formula. It would then have related content to say, okay, well if you only have two consecutive years, then this is actually not a good metric to use.

Dirk Petzoldt [00:14:04]: Go over to the year over year metric because that's a better metric. And then it has an example because agents learn much better from inductively from examples than if you give deductive abstract instructions. So the point here is you can be document based as long as you have a clear structure and then you have good tooling for the agents. If you think about the primitives that the agent needs, you have good tooling for the agent to explore those. And they do it very efficiently, actively. But you need the discipline to set up in a structured way. And there's this recent anthropic post on they have skills. And you cannot just do it for skills, you can do it for domain knowledge too.

Dirk Petzoldt [00:14:46]: Okay then tactic two is write artifacts. So this is about writing to context. And how we do it is whenever we produce intermediate tables. So here you see, I mean this is a product but it's not a sales pitch. So you can think about your own Agentix system. So when you say show the smartphone table, it will give you a sample of the data. And what we do, we never put this data actually in agent context, but what we do put is actually. So I show the data here and then there's a follow up question where I say now generate some order data for this master data.

Dirk Petzoldt [00:15:25]: It's two step and you see you have two tables that are related. None of this table would be pushed into agent context. But each of this table is generated as a materialized view in the backend. It can be in postgres, it can be Pandas data frame. The agent would just see there is a table artifact that have a certain schema and table name. Then it has tools and endpoints to actually read and write. It could head and tail, it, it could grab, could see metadata like summary statistics and scales of the columns. And it wouldn't have useful context like lineage.

Dirk Petzoldt [00:15:59]: Like is this a result of maybe that's a result of regression analysis or it has a SQL so it will give information about the lineage in other context. But the agencies is one line, so very few tokens. But it has this infrastructure and it's not hard to do right, but be disciplined about this infrastructure. So the agent can explore this artifact. Then we can even render it on the front end. Even the front end here would actually see this as a table artifact. It would use an endpoint, send schema and table and then can actually page it with this endpoint. You see there's 200 data points here and we just see the first five and it can be paged interactively.

Dirk Petzoldt [00:16:39]: Tactics three is actually write full code. In the previous tactics you saw that still there's a lot of tool use and depending how well it works that of course you might have subsequent calls. It can be brittle or add some friction to have many tool calls. Sometimes you want it. We want to go Freestyle SQL because we want to make sure they are. There's data protection, PII data and stuff. So we wouldn't have our agent write Freestyle SQL, for example. We have a workflow with a lot of guardrails here.

Dirk Petzoldt [00:17:20]: But creating a plot, for example, is something we are fully comfortable to say. When we started in a very constrained way, for example, we would think, okay, if plotly is a library, then the agent, there's a JSON format that you can feed into Plotly. And so we would have the agent create a JSON declaration that would then be handed over to the Python runtime to create the plot. But we saw actually you can have the agent write the full code and it will do just fine. This is much more flexible because it might pre aggregate data, for example, in our case, it's actually looking at the plot and seeing if there are labels overlapping or it's too crowded, it will actually rerun and the plot again. So it's much more flexible. If the user says give me fancy plot xy, it will just do it. And it's not constrained by the grammar of.

Dirk Petzoldt [00:18:11]: And if a user doesn't want. If a customer doesn't want plotly by something else, we can also do it right. So this is low risk. We feel a sandbox and so the flexibility of the sandbox is great.

Adam Becker [00:18:22]: Dirk, we're running short on time.

Dirk Petzoldt [00:18:25]: That's fair. Let me wrap up 30 seconds. But that's a fair. Sorry, I got carried away.

Adam Becker [00:18:31]: I wanted to hear that last tactics. So that's why. Yeah.

Dirk Petzoldt [00:18:34]: Okay, so now you see here this chart. Okay, but let me wrap up to be fair to subsequent speakers. So there's the second to last slide. So the key point to take away is if you apply those tactics, you can actually give the agent more autonomy because it has a more more powerful, potent infrastructure. So you might start with workflows. You can go to React. That's really with those strong primitives and tool adjustment and then you can bring it to code. Right.

Dirk Petzoldt [00:18:58]: And you can do it based on the jobs to be done and give more autonomy to the agent here. And I said workflow is a goal. And actually I saw once you have those primitives, then React in code works just fine being overtime. Let me wrap up. Here's three of us in the company that love to talk to you. So reach out just if you want to help us learn, if you want to join us, if you want to recommend a project. I just love to connect to people if you're in Barcelona, reach out please. Adam, back to you, Dirk.

Adam Becker [00:19:29]: Thank you very much. I will be connecting with you for sure. I have a lot of thoughts about all of this and it's fascinating work that you're doing. Saad in the chat is saying this is mind blowing. Dirk. This solves a lot of enterprise grade agents, so I think there's interest. Stick around the chat.

Dirk Petzoldt [00:19:47]: Too good to be true. I hope that's not an agent writing. Thank you.

Adam Becker [00:19:50]: Nah, he laughed at some of my jokes, so I hope that he's human.

Dirk Petzoldt [00:19:54]: It might be good actually. Might use all my tactics. Thank you, Adam.

Adam Becker [00:19:58]: Thank you, Derek.

Dirk Petzoldt [00:19:59]: You too.

+ Read More
Comments (0)
Popular
avatar


Watch More

Goal Oriented Retrieval Agents // Zoe Weil // Agents in Production
Posted Nov 15, 2024 | Views 1.2K
# Retrieval Agents
# Faber Labs
# Agents in Production
Create a Contextual Chatbot with LLM a Vector Database in 10 Minutes
Posted Jun 28, 2023 | Views 853
# LLM in Production
# Vector Database
# Elsevier.com
Operationalizing AI Agents in Data Analytics Workflows // Ines Chami // Agents in Production
Posted Nov 22, 2024 | Views 999
# AI Agents
# analytics
# Gen AI
Code of Conduct