MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Expanding context engineering to the tooling layer - Lessons production systems for Fortune 100 tech // Frank Wittkampf

Posted Nov 25, 2025 | Views 18
# Agents in Production
# Prosus Group
# Context Engineering
Share

speaker

user's Avatar
Frank Wittkampf
VP Applied AI @ Databook

Frank is Head of Applied AI at Databook. Formerly co-founder of VM-X (ai infrastructure) and Thru.ai (an agentic platform for Healthcare). Before that he was VP of Innovation & Strategy at Western Digital, and Chief Data Officer at Uplift (a Fintech startup). At McKinsey & Company, he was an Engagement Manager in the Silicon Valley Tech practice. Frank holds a master of science degree in Physics (thesis on Cognitive Neuroscience), and an MBA from Kellogg.

+ Read More

SUMMARY

MCP and similar protocols solved tool discovery, but not tool execution. Raw exposure of APIs pollutes model context, bloats prompts, and degrades performance. This talk draws on lessons from enterprise-scale deployments at Fortune 100 tech companies and shows how deploying a thin layer within your tooling service improves accuracy, cost, and performance for AI in a real enterprise setting.

+ Read More

TRANSCRIPT

Frank Wittkampf [00:00:05]: Awesome. Very nice to meet you. I'm. I'm Frank. I am the head of Applied AI at Databook. We, we serve large tech companies like Microsoft, Salesforce, AWS, etc on automating their salesforces. So work that we do is for example, make presentations or do intelligence flows or basically help people forward. And in that learning a lot of things about how agentic framework should work when you do it at scale for big companies.

Frank Wittkampf [00:00:38]: And that's what I like to talk about. So should I just get started? I think I should. Let me, let me get in. So the, the main thing I want to talk about is the tooling layer today. So when we're making long production flows for these big tech companies, how do you make sure that those work, that they are reliable, that they can produce, but they can also do it at volume, with quality, reliability and while optimizing our cost. So today's topic, let me go switch my slides. Is about tool masking. So, so we're all very excited about mcp.

Frank Wittkampf [00:01:25]: MCP is really awesome. It has helped us connect a lot of services to our agents and it makes it significantly easier to run on a bunch of services that otherwise would be harder to connect. However, when you work with an LLM, it works best when you're working with a very clean and focused input and a clear expectation of what comes out. And if you do that proper, then your consistency goes up, your accuracy and quality goes up, your speed and cost goes better. However, when you work with mcp, there's also some downside. So when you connect any service, what it does is actually fully exposes the whole API or service that you're running to the LLM. And what that often does is it actually pollutes your LLM execution. There's a lot of information that comes out that you might not want to use for your agent.

Frank Wittkampf [00:02:22]: Plus there's on the input object that you're providing that might actually not be tuned to the context or the agent that you're using. So when I'll show you some examples when you deal with that, like how do you actually optimize and how do you prompt engineer the, the, the agent with its tools instead of just the, the prompt of the agent. So what MCP does really well is basically having a proper connection and ensuring that you can kind of like standardize across how you connect a lot of things. It allows us like better, better auth. There's obviously a lot of auth. Things still to solve and people have been talking about that today and it's Starting to get widely adopted. So MCP is awesome. I'm on the MCP train.

Frank Wittkampf [00:03:08]: MCP is great. What it doesn't do is filtering the tool surface that gets to your agent. So an input object that you present to an agent could contain a few or even dozens of fields that need to be figured out. And the naming descriptions, the way that those are shaped might not actually fit the agent that you're trying to run. You really start running into this when you start running large production agents that need to do a lot of things that have more than just a few tools. Like how do you optimize the running of many tools by an agent? So if you would design things from scratch, one of the main things is you would actually never design like MSP exactly the way it is if you would do that on purpose. So a typical remote, which, or I'm trying to visualize a tool here, like a typical tool, comes with lots of buttons and lots of ways that you can, you can customize it, but what you want to surface to your agent is the simplest things possible that actually gets it to the outcome that it needs to do. So how do you not get unnecessary tokens? How do you not get unrelated information in and how do you not get this thing that I call choice entropy, like more and more abilities for the agent to screw up? So the net effect of you introducing a tool that looks like these remotes on the right is that you're going to reduce the reliability of your agent.

Frank Wittkampf [00:04:39]: And the goal that you have is make things as simple and easy and workable by your agent as possible so that it will not screw up. So it will get you to the outcome that you're trying to do. So let me also check my screen if there's any other people are asking questions, yes or no. There we go. All right, so here's a bit of an example. So start with the output, then go to the input, and then kind of go to how you could work with this. So if you take a sample API, Yahoo. Finance is a, is, is a fun example.

Frank Wittkampf [00:05:11]: And you're trying to get output from that. The thing that it will reply to you with, like on a specific quote that you're trying to get is a hundred different fields about a specific stock. I've, I've listed a few here on the left. But if you get like that many fields of output and you're trying to combine that with other parallel tools that have been called, you get irrelevant data, your prompt gets extra bloated, there's unnecessary tokens that you're processing. And it's obvious that accuracy goes down as like, this grows. There's a bunch of articles online about how, how this degrades your quality. There's a few listed here if you want to later dive deeper into it. But there's a lot of people writing about, like, how, how do you actually more efficiently work with tools now? So what you introduce is you introduce choice entropy.

Frank Wittkampf [00:06:06]: So when you give more choices to the model, there's more ways that it can misfire or accidentally put in the wrong field or misunderstand part of your prompt. Extra tokens means higher cost. It also adds latency, gives you lower performance, less consistency. And when you're doing this at a large scale, like we're doing a data book, you are definitely pressured on your token budget. Again, reading more, there's a set of articles you could dive into. Now the input schema is actually even more important. So when you look at the same Yahoo. Finance, for example, the amount of variations of objects that you can put in, like, you can select like, okay, what profile do I want, what financial data do I want, what income history want, etc.

Frank Wittkampf [00:06:58]: That is all extra information that your LLM would have to provide accurately to be able to get to a proper outcome without like properly reshaping this or like shaping this in a way that it doesn't conflict with other, other tools that you have. You are making again, your LLM execution less reliable and, and work. So how do you approach this? The way we approach this is that when you have an agent, we have the agent use a tool, but we also put masks on top of the tool. What this means is this is the layer on the bottom, the tool handler is what MCP exposes or what your service does expose. This has the full raw surface of what comes out of your tool. That could be the full API input object and output object, and this could be MCP or not. But then what we do on top of that is we run masks. And these masks basically define what is the interface that goes to my agent and what is the translation that would have to go to this handler.

Frank Wittkampf [00:08:14]: And what's interesting about this is this mask is actually part of what my agent, editors and agent builders and people that are prompt engineering, these agents are editing as well. So instead of just editing the prompt of the agent and the context that the agent gets, we also edit the mask of the tool. So this goes into more of the modern definition of prompt engineering. This is more context engineering, where context engineering also contains the actual engineering of the top of the tool, the bottom of the tool is constant. I have an API that's exposed. That API has a lot of objects that stays the same, but I might put a mask on it or multiple masks for different contexts for different agents because that will make my agent run better. I'll show you some more examples in a second. What does this do? One, it actually allows you to expose an API in multiple ways.

Frank Wittkampf [00:09:13]: So for example, this Yahoo. Finance API that I've referenced here, you might say I'm going to make a little get revenue tool and it uses the finance API to just return the revenue on a specific company. Or you might say, I might make one that gives me a stock ticker response for just a few months or for the last five days. Or I might make a variant that is just looking at a margin profile or something like that. The useful thing about this is that now you have the choice of what you of what you surface. And it is up to the prompt engineer who is making the actual agent to define it exactly such that is as efficient as possible for the agent that they're making, but also that that that agent gets exactly what it needs. So you can tune it, you can get specific values that you just put in. So for example, like in the API call to the given service that you have, you might say, okay, well I'm always going to need the revenue, I'm always going to need these few things.

Frank Wittkampf [00:10:28]: These are now values that the AI does not need to provide when it calls the tool. And it might just be able to call the tool by only defining what the company ID is and get out what it needs to have. What's interesting is like this also allows you to much quicker react to client requests or to things that you need to change because it now lives in the prompt layer. So you don't have to go like fully deploy a whole bunch of new code. You can do this in your configuration layer. And this allows you to much more nimbly adapt to whatever your customer or your situation is requiring. So it lets you ship cleaner and leaner, more robust agents that are both faster but also more reliable what they get. And then in the end you are going to save costs with that as well.

Frank Wittkampf [00:11:20]: All right, so an example, so if we keep on this Yahoo example, which is an easy public API, that's why, and that a lot of people know, that's why I picked it here on the right side of the screen you see an example of how you can do this. So I give this tool a name, stock price, And I give it a description and the handler is, hey, I'm going to pass a very few specific objects that need to go to this API for it to function. And in the output I just have a very structured output that my AI agent is always going to get. So I'm just going to give it the symbol, the market price and the currency. That's all it gets. And it's very cleanly formatted and it's always the same way. So that when I use this agent, I can either integrate this directly into my prompt because it comes out very structured and I can put that in, or I can just present this object and run with that. So the fact that I can now define exactly what my output is going to look like depending on on how I need it in my agent helps you prompt engineer significantly easier on the input side.

Frank Wittkampf [00:12:30]: Like we've just simplified this to just only need one little symbol. And from there so when the agent calls this, this is very easy to do. And then this bottom part here is actually quite important. A thing that we also use, you don't have to, but we also use is we throw validation templates on this so that when we get an input, this tool can respond with the types of errors that actually help the AI agent self correct when it makes a wrong call. So if you say like okay, this symbol doesn't fit the format and you can immediately return that even without calling the underlying API or tool that you're calling, the agent can get a custom made error message that fix that is defined by you, which is also part of your impromptu engineering that allows it to self correct and call the tool immediately again and then find the right thing without it actually having to get 404s or 500 and you have more influence on the self correcting behavior that an agent could take. Let's see, what else did I want to say about this? Again, I made this point, but I want to emphasize it. The fact that you can throw multiple of these types of masks on top of the same tool makes it significantly easier to work with. This means that when I have a customer that needs one more element, I don't have to go recode this whole tool that I have.

Frank Wittkampf [00:14:05]: The only thing I have to do is make a slight variation of this configuration and then my customer has what they what they need. So starting to get into the end of what I want to say here. Like tools are prompts and we're generally overlooking the engineering of tools quite a lot situations that I've run into personally is agents that use somewhere between 10 and 25 tools and also have like a larger, larger amount of prompt. It is highly likely that at some point in the description of what your tool does, you're using words or phrases that are conflicting with what another tool says. One of the examples I had is I used to work notes a lot. I used to work notes for, for notes that the agent took to be able to write down some things that it needed to recall a little later, like memory. I used notes in a footnote type of fashion and I used notes for another tool that we had. And now these tools start conflicting with each other.

Frank Wittkampf [00:15:21]: And that might be totally fine in one agent, but in another agent where these tools come together, I should present that tool differently. So now, with a few clicks and changing one of the prompts that I have on top of my tools, I can make it perform. Otherwise I would have to go figure out how do I deploy this and how do I make sure that this is consistent across all of my infrastructure. That makes it significantly easier to solve those types of problems. So tools are basically prompt, and the description of your tool should fit with the rest of your prompt context. You need to make sure that you tune that tool. Naming matters a lot. I think people already have figured that out.

Frank Wittkampf [00:16:13]: Just making sure, I emphasize that a little bit. The input and output services of a tool are adding tokens and complexity. You should manage that. And then the framing and phrasing of tool errors actually really matter. So if you phrase it in the right way and you ensure that you have the right type of error responses on top of your tooling layer, your agents can self correct and ensure that a process still lands on its feet. And if you don't do that, you have a lot more chance that you're. Your process flow or your agent flow will result in some error that it doesn't recover from and that the user will have to go solve. So one more thing to consider is when you, when I see a lot of people making tools and agents, I see, I see a lot of the description of hey, you should use this tool in this fashion.

Frank Wittkampf [00:17:14]: I see that sit. A lot of people put that in the actual main prompt of your, of your, of your agent that you're running, of your many agents that you're running. And something that that anthropic actually is evangelizing is like you should, you should try to put more of that type of description of how do I use this tool in the tool description itself. So like when you, when you, you put a tool in Your agent don't put all of the description of like how to use this tool in your main prompt, but throw it into the the tool description. The one problem you might run into when you do that is that if you use that tool in a lot of other agentic contexts, you might actually want that tool to be used slightly differently by one agent than by another. Like always call this tool when or make sure you call this other tool first. Those types of descriptions only really are useful when you know that that the other tool is present. So this again makes a case for like, okay, what if I can put things in a tool description, but I can make that tool description more variable so that like in one agent context or another agent context, this tool shows up slightly differently.

Frank Wittkampf [00:18:34]: You would want to have some way to properly edit your, the prompt engineering that comes with your tool, which is where these masks come in very, very handy. Again, I also wanted to talk a little bit about like when you, if you use this type of a pattern, if you choose to, then like what are some of the ways that you could, what design patterns would you use? So the main, the main things that we focus on is one is how do you shrink the schema? So can we limit parameters to things that actually are relevant to what the, what the agent needs to do? So how can we constrain the amount of types that are being used? How we can make, how can we make the arrays or enum smaller so that there is less choice? The less choice, the better your agent performs. Role scope so we have different masks for different agents. We, there's, there's agents we use in a more exploratory mode and there's agents that we use that are more bound to specific rails that need to get somewhere and presenting the tool in different fashion helps a lot for the performance. Thirdly, we have like a capability gate. So how do you split tools into single purpose tools and, and ensure that like some of these tools are allowed, you're allowed to use at specific stages versus others? A good example there is you might have some tools that you can only call after an authorization of a user has happened, but it's actually running on a similar API. And how do you ensure that the API call that is actually fine to make unauthorized versus not can be split and safely taken apart from the other. One of the ways is by presenting that same surface in two different tool sets where the one mask has the actual user authorization hard coded in and the other or at least passing it on in a hard coded way while the other One allows a public or more a query to the part that you're allowed to do.

Frank Wittkampf [00:21:02]: In that way you can still shield off specific parts of an API by presenting it in a different way. Default ARGUMENTS so the main thing we're trying to do, as I said in the schema shrink is present less so therefore the more we can throw defaults in our arguments, the more we ensure that the API under gets the right values and hide away anything that's non essential. Point number five System provided arguments so we have agents running in a session and a larger context. What we, what we do is like in the actual context around the agent, there's a lot of information about the session that's going on. We might have the actual tenant that this is running in what region this is in what user, this is for what information has been gathered before and that information we have provided by the system into the actual call that is being made to the underlying API to the NCP exposed input object so that the LLM does not have to provide that. And if the LLM does not have to provide that, it will make less mistakes.

Allegra Guinan [00:22:24]: Hey Frank, just wanted to take a pause here because I know you're rounding out your time and I want to make sure that we get to one of the questions here before we break. Is that okay for you if we.

Frank Wittkampf [00:22:36]: Yeah, this is my last page, so I'm, I'm. I'm very ready to take any question.

Allegra Guinan [00:22:40]: Perfect. I'll leave this up here for a bit while we, while we answer this one so people can also see what you didn't get to yet. So we had one that was. Is the tool mask dependent on the tool version as you don't have control over the tool?

Frank Wittkampf [00:22:56]: Yeah, so we version our tool masks. So yes, the versioning with anything that you expose to an agent, obviously you need to make sure that the underlying surface you can version. So if you're depending on somebody else's surface that dynamically updates through mcp, I think you, you generally are in a fairly fragile situation. If you're making something for a large enterprise like you need to ensure that you're, that anything that you're anchoring behavior on, you have a way to lock that in. And versioning is absolutely critical. So the answer to that is a very strong yes. If you're making something that is a 20 step process and in step seven there is a tool called that like might change or that that that has variants, you're now introducing something that, that might break this, this larger process. So locking those in and having.

Frank Wittkampf [00:24:01]: Having versions when possible is. Is absolutely critical.

Allegra Guinan [00:24:06]: Yeah, perfect. And then one last question to round us out here. Do you think forcing the LLM to always perform tool calls and having specific tools to report to the user and set idle like in menace improves alarms to use correct tools more reliably?

Frank Wittkampf [00:24:23]: So yes, the. I think the big difference between like there's. There's a. When you read about anything agent specific online, it's, it's very like it's 80% of the time it's about some POC or something that is. That is like a new trick that someone's showing. The big difference with doing things in enterprise is when you do things in enterprise, it has to be reliable across thousands of executions. It needs to produce a similar expected result. And when you automate a process, you cannot have things fall apart.

Frank Wittkampf [00:25:01]: So the same as about the versioning comment earlier, having a real good control over what your tool is, how it presents and locking that in allows you to actually automate for exploratory processes. I think that's very different. If you're exploring something and you're doing a. I'm trying to find information and things are very dynamic, then I would go the other direction. But largely the type of work that I do is like automating things for, for larger companies and their variability is in general your. Your enemy. You try to main sure that you stay on top of. Of what happened.

Frank Wittkampf [00:25:41]: So yes, locking that in I think is. Is absolutely critical for a reliable behavior. And thank you for allowing me to. To this long monologue to. To talk about something that I care about.

Allegra Guinan [00:25:54]: It was fantastic. Thank you so much for joining us today. Thank you for answering those questions. I think there were a couple others in the chat which you can find in the actual event link so you can go ahead and join the rest of agents of production and chat with people there. Yeah. Enjoy the rest of your day. Thank you so much, Frank.

Frank Wittkampf [00:26:11]: Thank you for having me.

+ Read More
Comments (0)
Popular
avatar


Watch More

Prompt Engineering Copilot: AI-Based Approaches to Improve AI Accuracy for Production
Posted Mar 15, 2024 | Views 1.3K
# Prompt Engineering
# AI Accuracy
# Log10
The Infrastructure Imperative: What It Takes to Run Multi-Agent Systems // Dipanwita Mallick // Agents in Production 2025
Posted Aug 01, 2025 | Views 102
# Agents in Production
# Multi-Agent System
# HP
From Tool Calling to CodeAct: Practical Lessons for Deploying Executable Agent Workflows // Gal Peretz // Agents in Production 2025
Posted Aug 04, 2025 | Views 66
# Agents in Production
# Agent Workflows
# Carbyne
Code of Conduct