Sign in or Join the community to continue

Building Deterministic Context Layers for AI Agents: Harnessing mloda // Tom Kaltofen

Posted Dec 11, 2025 | Views 85

# Agents in Production

# Prosus Group

# Context Layers

Share

Speaker

Tom Kaltofen

Engineer @ mloda

Tom has held diverse data roles in large organizations, where he led the creation of feature stores, databases and BI tools. This experience exposed him to the complexities of ML and AI data lifecycles, inspiring him to build mloda as an open‑source solution to make data, feature and context engineering shareable. Based in Berlin, Tom is passionate about enabling teams to innovate faster by reducing repetitive work and improving data confidence.

+ Read More

SUMMARY

Modern AI agents depend on vast amounts of context, data, features, and intermediate states, to make correct decisions. In practice, this context is often tied to specific datasets or infrastructure, leading to brittle pipelines and unpredictable behaviour when agents move from prototypes to production. This talk introduces mloda, an open‑source Python framework that makes data, feature, and context engineering shareable. By separating what you compute from how you compute it, mloda provides the missing abstraction layer for AI pipelines, allowing teams to build deterministic context layers that agents can rely on. Attendees will learn how mloda's plugin‑based architecture (minimal dependencies, BYOB design) enables clean separation of transformation logic from execution environments. We'll explore how built‑in input/output validation and test‑driven development will help you build strong contexts. The session will demonstrate how mloda can generate production‑ready data flows. Real‑world examples will show how mloda enables deterministic context layers from laptop prototypes to cloud deployments.

+ Read More

TRANSCRIPT

Tom Kaltof [00:00:05]: Hello everyone. I'm going to talk about how we can make context layers more deterministic. What that means is that we actually are more sure what's really going to happen. Because actually AI agent are itself quite non deterministic or it's very hard sometimes to make them deterministic. So it means that if we make more parts of the whole chain deterministic, then we maybe can live with more non deterministic parts. We will show here some approaches how to do that, what we can consider. And we will do this with mloda, but that's maybe also a side thing. So first about me, where do I come from? So I think that the blame usually goes to clean data and organizations.

Tom Kaltof [00:00:56]: And this was already before AI, this is also before ML. This is really coming from the basic data. You have data quality issues, so then you have quality teams to go after this. But to be honest, this should not be 2025, this should not be. Nowadays we have already the modern data stack on the data side and we have all the data science stacks like SciKit, Learn, Pytorch, or now we have RAC pipelines, which some are getting really mature. Why do we still have these issues? And why do a lot of AI systems die here? Well, it's quite simple because these systems are not built for the intersection. One hint can be that actually a data engineer works on the data side and the model side. You have the data scientist, maybe now also AI engineer.

Tom Kaltof [00:01:49]: But who is responsible for the middle of this? There we see data engineer, the data scientist, but also we see software engineers. We sit ML engineers, ML ops engineers, and now with AI, sometimes even product people. I mean, okay, that's another thing to maybe think about and why I'm really looking at this intersectional interface is because I was building a lot of these tools in my past in this intersection, meaning the feature store, which is not really relevant for AI, but maybe we can learn from that. Pipelines, BI rack, chat bots. I was also building other stuff like databases. And I think we should look at this and this I'm calling here with the Amruda approach, but maybe it will open some discussion into this direction. But if you want to start to discuss, maybe you look at parts which work right? So why is the data site working for coding agents better than for example for classic ML, even though that should be for classic ML much easier. And actually if you think about it, that makes sense.

Tom Kaltof [00:03:02]: The code is accessible. So the was context in the code base is is there. The dynamic data state issue is not really there in the codeback because it's managed. We have git, we have the whole lifecycle with all kinds of branching strategies. So here we don't have this typical data drift or time travel problem of AI and ML. Then the next thing is the data is unstructured. Yes, in a coding agent, but actually we have the structure there, the function names, for example, it's a file folder structure and it's not really decentralized. Most AI engine systems, the files and databases and racks are flying around in organizations.

Tom Kaltof [00:03:47]: But this is not valid for a coding agent. So we need to overcome this in the context layer and the coding agent. We worked on the RAC systems, maybe the longest from those which are publicly available. In the previous presentation you heard that Cursor already is working on the strong system to find specific parts of the code. But I remember last year, maybe September, October, November, there were a lot of people just putting the whole code base into Germany. Gemini Flask 2.0, Flash 2.0. So we had here development, which we haven't seen in the context layer. So somewhere else governance and audit.

Tom Kaltof [00:04:30]: I mean we all know code basis, it's there. And the last one, the evaluation, well the human is the evaluation actually in the coding agent. And we don't have that as easy in the normal one. So what does it mean? Well, we need to somehow figure this out. And here we have a good news. We know these patterns, these challenges really are not new. We have feature stores, right? We have versions of them. We have semantic layers, data catalogues, the software engineers, for example, systems which we can use for simple agents.

Tom Kaltof [00:05:08]: There are workflow tools like Airflow. We have now reinventions like N8N now which make it able that even business people create just basic data engineering pipelines. We have data contracts, but there I think we're still lacking on the data side. And this is also valid for I think classic machine learning. But then again, let's go back to the gap. I think the tools are close, but they are not at the moment built for agents. They have features we don't need for agents. And other side we are missing agent specific needs.

Tom Kaltof [00:05:48]: In that sense we need to build something purpose built. But how do we get there? First of all we need to identify the properties, meaning we should have a visual list of things we expect one to have or even need. This list here is not complete or one could even argue one should not be on this. Right? I would even argue that even the categories reliability, velocity and operational government might not even be correct. Maybe 4 5th, maybe reliability don't even need to be there if the agents are strong enough in future. But we use these properties for our idea for our approach. But I'd be really open to discuss this right offline or later online, maybe by LinkedIn or on future conferences. This here is basically MLODA approach.

Tom Kaltof [00:06:48]: As I said earlier, we want to introduce a handover layer between essentially data producers, which are the data engineers, SoftEngine which prepares structured and unstructured data and all the tooling which exists, which are feature stores, semantic layers, APIs, CSV files, streams and this way we have a clear boundary. But I think you need to realize something that we need to separate what you compute from how you compute it. And the way simple an AI agent has also development life cycle. And in this life cycle you don't want to develop in production data. So you need to be able to have a feature or a feature group or some way to build things on a very low level where you just have a test data, for example. That means you need to be able to just exchange the test data with production data. This is somewhat familiar to feature store, but it's not one to one. But we really should have it when we want to be fast and reliable deploying, for example in a RAC pipeline, in a RAG pipeline, you really usually do some re ranking from document.

Tom Kaltof [00:08:04]: So we say basically these are the most important ones. Then we do PII cleaning. And you don't want to always test the whole pipeline. So for that you need to separate what you compute from how you compute it. Finally, I think if you do something like that at the moment, due to how the community set up, it should be in Python. Maybe the core will move at some point to rust. But I think that's too early for now to do this idea in a sensible way. I think we need to make plugins interchangeable.

Tom Kaltof [00:08:41]: You just want to switch from the single docs to production database. Should be easy. There's a second thing what you now can do. If you have a plugin system you you can go away from pipelines which are defined like A, B and C. To that you just say hey, this is how the result should look. Line please. Core engine, please figure out how ABC should be combined. So the feature groups compute framework extender, so essentially plugins how they should be combined, that it makes sense.

Tom Kaltof [00:09:15]: And at the end we have validated context. This is essentially closer to that we don't program right. We rather declarative describe the result at the end of the day. This way we can Build scalable systems much easier. And here I want to go in the demo. I hope it works live. Let me see if the screen sharing, if this works as I hope. Okay, fantastic, it's there.

Tom Kaltof [00:09:49]: Okay, so the demo here is very simple, right? It's also something what I think a lot of people have seen in the past already. So there is possible, right? The idea here is however that we just can say something like this. Can you give me the last five, can you give me the last five emits which are for example from medium.com and we want to redact them using mlota at the end of the day. What you will get here now is that first the cloud agent will want to find out how the syntax works and then it will actually try to define how the feature will look like. Let's look into this server. So at the moment we are seeing that it tries to figure out how a feature group should look like, how something should be defined. We will see at the end how it looks like here you can see now how it looks like. As you can see here we have request features.

Tom Kaltof [00:11:01]: So. So basically what it requested was PII redacted Gmail support inbox. So basically it found. Oh, the email, what I said in the task is Gmail because it's only access, only available and there is a tool which has PII reduction and it learned, okay, how can I actually configure it? I need to put it to noreplyedium.com and five items because I asked it. So essentially it was able to configure basically any technology, any data source, as long as the access and further, the plugins are available. And yeah, okay, that is of course, yeah, here is results. So basically it got the. It got the image here.

Tom Kaltof [00:11:56]: Let's switch back again because I think that was a short demo. So we've seen this, we've seen this, it worked. Okay, what you've seen is actually that in the background this was defined before. So you need these Python classes and these Python classes are shareable. You can just give them to everyone. They don't have business secrets or something. And the data user in this case, this was cloud code, but it could be a data scientist or an OpenAI model. Does not matter that they describe their requirements or what they want and mloda resolves it intelligently.

Tom Kaltof [00:12:34]: And that way you don't need to rebuild all the features each time, you don't need to build the pipelines at the end. This way you can invest more time into, for example, security, you can have Very good plugins for security and check all the data which flows into the model, for example. But as the presentation was about deterministic layers, deterministic context layer, we need to go back to this and how this can help. Well, the plugins are classes which have fixed APIs. That means you can enable it to have the whole suite of audit governance language from the first to last plugin. Imagine something what I think at the moment I've not seen somewhere working real well, if at all. That means if you have this, you basically can track down the first data point up to the user. And now if you have this fixed API, you can basically put a send to or a dollar or euro on each data point now, because now you have the start to the end chain available and you don't have this handover layer missing where a lot of information is lost.

Tom Kaltof [00:13:51]: Usually in data engineer world you have the data catalogs and in the user side you had maybe feature stores with meta information. But a lot of systems that doesn't. But now we have the whole chain. The next thing. Plugins are very testable. You have for example at the moment, 50 unit integration test. The number itself doesn't say a lot, right? Can be also bad design test, but you can use best practices there. This is what I want to highlight.

Tom Kaltof [00:14:23]: And validations is a part of design because at the end you just have a function which where you inject the validation right? And there you can use any validation tool as you want. And with this you reduce the, you reduce the issues or unknowns. On the data side, the enterprise integration is very important. I will go a bit faster through it. But what's important is that this framework is basically something on top of existing stuff. It's on top of a feature store, data collect databases, semantic layers, and you can have multiple of them just in use. And this way you just reuse existing data infrastructure. This also helps migrations, by the way, when you just can migrate one system or another step by step.

Tom Kaltof [00:15:19]: This also still allows centralized governance with your LDAPs or ENTRA IDs or whatever you're using. And the teams can really collaborate on the same code base, those who are basically producing the classes, the Python classes, which are the plugins. So in that sense we're not replacing the infrastructure. We're starting to connect actually all the unstructured data sources. Because I think before AI we had way less sources to connect in the organizations than we have nowadays to do. So this is one approach to enable it. In that sense we should always also at this stage of development, we should always look also where not to use it. And it's pretty obvious.

Tom Kaltof [00:16:05]: If you have just a single database access and you don't really have a model lifecycle issue, then just use an orm, just look at what software engineers then and just do it. Or if you have one off scripts, just use the coding agents, they're really good. And the third one is we haven't yet tested a very major setup where we really need like sub 5 minus microseconds for this kind of engineering is in AI also I think not so the common use case. So if you in that space, maybe it's not the best. However, it really shines when you have any repetition, meaning you have a lot of data sources, you need some more portability. Maybe you have racks where you want to try out different algorithms. When you have actually not so easy processes when going to dev2 prod, which is the normal case of data, then this is something you want to use. When you have also a team which wants to collaborate and be faster.

Tom Kaltof [00:17:15]: And when you are having coding agents, plugins are really small units at the end of the day and if you're allowed to use them, they're really helping you. And you will have longer coffee breaks. And in that sense, let's build deterministic context layers together. mluda is open source and I really appreciate any feedback. Give me, hook me up with a call, we can talk. And I think today we can really make something great. Thank you very much. I'm eager to listen to your questions.

Allegra Guinan [00:17:51]: Thank you. That was fantastic. So awesome. Yeah. Everybody that's tuning in, please drop your questions in the chat. We can get to those. We have around 10 minutes to take questions, so feel free to think about some and add them in there. I can kick us off with a question.

Allegra Guinan [00:18:11]: Is there a common mistake that teams usually make when building context layers for agents? And how do you help avoid them, including with mloda?

Tom Kaltof [00:18:23]: Yeah, I think what I've seen is that I've seen it at the start is that we have a data issue again, typically, which is also solved in classic machine learning, is that you have data coming from for example, Jira and you put it into a rack and the racks are updated nightly, for example. But how do you get the last messages from the last 10 minutes? And in Lora, could there easily help that you say, hey, we just have two plugins, one which connects to rack and the other one just gets updates from the last 10 minutes. And this would be an easy way to get this Issue of recency. Right, Fix the recency issue of this data. Yeah, that's one way to do it.

Allegra Guinan [00:19:16]: Thank you. And is there a real world example you could walk through where these deterministic context layers were able to prevent some production failures that were not possible before?

Tom Kaltof [00:19:28]: So I think this is in general, I hope the misconception. So I think the tooling all exists, right? So all the tooling, that's what I had earlier, this chart, right? The graphic with the six tools, different grades. So this all exists so you can avoid any production issues also nowadays. But the thing is, how easy can you avoid it? Therefore, I would argue that this tool did not yet avoid really in itself a production issue because it's only making it easier to avoid. And that's I think what I want to highlight here. So we are just rethinking a bit how we approach these pipelines. We are not really making something that we are saying this was avoided because of that. Got it.

Allegra Guinan [00:20:32]: Thank you. I can keep going here. And of course everybody feel free to add in more questions. How does mluda integrate with other existing agent frameworks?

Tom Kaltof [00:20:45]: Luda itself is just a Python library, so you can plug it in. Right? So it's really just a PIP install and then you have a module we haven't specialized it on or for an agent framework. I think that is also currently too early. Right. I mean the software due to that you need the resolver is already Quite advanced with 1000 unit integration test at the end of the day. And if we want to specialize it for an agent, then I think we need to have a real, real use case at the end of the day. If you have a framework and you only use one RAC pipeline, maybe that's also at the moment too early to use Mruda. But if you for example plan that you need 2, 3, 4, 5 or more and you need to iterate over, then it makes sense because then it is a natural handover layer.

Tom Kaltof [00:21:44]: So that there, I think this also was the pre last slide. We need to be a bit here at the moment clever about the level of design because we also need to learn still what agents really want. I think that is at the moment, yeah, they are getting better. So meaning something what we didn't believe work half a year ago works now. So that's a bit, I think still open.

Allegra Guinan [00:22:12]: Got it. And maybe, you know, depending on the scale and sort of repetition, as you mentioned, this can be context for the next question. But would you say a good use case would be information retrieval and prioritization of answer or does that again depend on sort of the scale of that?

Tom Kaltof [00:22:31]: Sorry, I didn't.

Allegra Guinan [00:22:32]: For information retrieval, is this a good use case for information retrieval?

Tom Kaltof [00:22:40]: I think this is the. If the information retrieval has the necessity that you have some repetition or you need to add, for example security layers on top of. If you have, for example the need that you need to reduce PII and there you're actually developing. You have Presidio for example from Microsoft, the open source one, but there are also two others, then maybe you want to reduce the data, you want to filter the data. If you re rank the data, then this becomes a good use case. However, the question of prioritization itself is not solved by mluda because this would be a specific implementation inside of a plugin. And that is then I would say not anymore in the handover layer. This clearly either already done on the database algorithmic side or it's clearly done by the algorithm, by the data science side or AI engineer side.

Tom Kaltof [00:23:43]: So I think here we need to look out when we define a good use case, what belongs really handover and what is. Yeah, it works mloda, but then again un quite flexible. So the question yes, it would work, but is it a good use case?

Allegra Guinan [00:23:58]: Yeah, I think that section that you showed around the complexity and sort of when it makes sense and when it doesn't is really helpful to understand. It really is dependent on the complexity as you've called out. And then one other question that we had. What is the CPU and memory requirement for this layer?

Tom Kaltof [00:24:18]: I mean it's a Python module, so I think it's never like if you have Rust or something, but at the end of the day it's technology agnostic, meaning you can run it on. So it's normal Python framework, right. It's actually agnostic from CPU memory with the in mind that this will never be as small as Rust or Go application can be. Right. Also you have limitations with its cpu, right? Like multi threading, multi processing. So there are limitations because of Python. But the implementation you can run, for example, you can run Numpy there, no problem. Polaris is also implemented, but also Spark can connect to Spark or DuckDB and so on.

Tom Kaltof [00:25:05]: So there's open, right? There's. I didn't go too much into this detail due to time, but actually the what I said earlier, the split between the what and the how is really the how that you can use essentially any technology right at the end you just need to somehow structure the plugins that it works.

Allegra Guinan [00:25:27]: Thank you. Looks like those are all of our questions for now. Really appreciate you coming on and talking about what you're building. Everybody make sure you connect with Tom and find him on LinkedIn and check out what he's doing at mloda. So awesome to have you on. Thank you so much. And enjoy the the rest of the Agents of Production conference.

Tom Kaltof [00:25:48]: Thank you all. It was nice to have you all. I hope you had fun and I would like to hear from you all.

Allegra Guinan [00:25:56]: Thank you.

Tom Kaltof [00:25:58]: Bye.

+ Read More

Watch More

Building Reliable AI Agents

Posted Jun 28, 2023 | Views 1.3K

# AI Agents

# LLM in Production

# Stealth

Building Conversational AI Agents with Voice

Posted Mar 06, 2024 | Views 1.6K

# Conversational AI

# Voice

# Deepgram

Harnessing AI APIs for Safer, Accurate, & Reliable Applications

Posted Aug 06, 2024 | Views 882

# AI APIs

# LLMs

# SentinelOne