MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Accelerating Production Agent Development with Community-Driven Stacks // Panel 1

Posted Nov 27, 2025 | Views 30
# Agents in Production
# Prosus Group
# Agent Development
Share

speakers

user's Avatar
Ben Epstein
Co-Founder & CTO @ GrottoAI

Ben was the machine learning lead for Splice Machine, leading the development of their MLOps platform and Feature Store. He is now a the Co-founder and CTO at GrottoAI focused on supercharging multifamily teams and reduce vacancy loss with AI-powered guidance for leasing and renewals. Ben also works as an adjunct professor at Washington University in St. Louis teaching concepts in cloud computing and big data analytics.

+ Read More
user's Avatar
Adel El Hallak
Senior Director Of Product @ NVIDIA

Adel El Hallak is Senior Director of Software Product Management for NVIDIA AI Enterprise, a suite of APIs, libraries, and runtimes that simplify the development, deployment, and scaling of AI applications. He focuses on microservices and blueprints for building and operationalizing production-grade AI agentic systems through NVIDIA's partner ecosystem.

Adel holds a Bachelor's in Computer Science from McGill University and an MBA from Warwick Business School.

+ Read More
user's Avatar
Laurel Orr
AI Staff Software Engineer @ Stacklok

Laurel Orr is an AI Staff Software Engineer at Stacklok working applying generative AI to data tasks. She graduated with a PhD in Databases and Data Management from Paul G Allen School for Computer Science and Engineering at the University of Washington and then was a PostDoc at Stanford working for Chris Ré in the HazyReserach Labs. Her research interests are broadly at the intersection of artifical intelligence, foundation models, and data management. She focuses on how to train, customize, and deploy foundation models to data tasks such as data cleaning, record matching, and generating code snippets for determinisitic data transformations. This includes problems around data curation for training, efficient model training and inference for batch workloads, and prompting paradigms for high performant, personalized models.

+ Read More
user's Avatar
Olga Pavlov
Head of Product @ OLX Group

Life motto: Live, learn, excel!

Product-oriented, goal-driven versatile professional with a passion for engaging products.

+ Read More

SUMMARY

The rapid evolution of AI agents is fueled by the collaborative power of the open-source community. This panel explores how the democratization of foundational components—including open model weights, permissible datasets, and open RL/planning algorithms—is dramatically reducing the barrier to entry for production-ready agents. When these assets are combined with open communication protocols, development velocity skyrockets. We'll discuss the practical benefits of this approach: developers can fine-tune specialized agent models without massive proprietary costs, researchers can transparently audit the full stack for safety and reliability, and organizations gain the agility to adapt algorithms and protocols to unique business needs. This panel talks about the strategic necessity of an entirely open-source approach to ensure the future of production agents is trustworthy, accessible, and fast-moving.

+ Read More

TRANSCRIPT

Ben Epstein [00:00:05]: Alrighty. Thanks everybody for jumping on. Obviously a phenomenal intro from Demetrios. I honestly enjoy his videos even maybe more than his live songs. I think they're very good. He's a professional in the craft. We are today talking, at least in this panel, about open source community driven agentic stacks, which is a very exciting topic because for people out there building agentic workflows actually kind of pushing these things into production. There is a huge difference both in the interfaces with working with open source acts and closed source stacks, and also the capabilities and the way you monitor these tools.

Ben Epstein [00:00:41]: There's kind of more open source tooling options now than I think there are foundational models that you can leverage to build agentic systems and so it can be a little bit overwhelming. So we're going to. We have three kind of experts in this field that we're going to ask a bunch of questions to get their feedback, see how they think about deploying these things, and hopefully dive into some really nitty gritty technical details. I will let them introduce themselves. As Demetrios said, that's the best way to do it. So we'll do some intros and we have a couple of questions to walk through and if there's any extra time, we'll ask some questions from the community. So, Adele, why don't we kick it off with you?

Adel El Hallak [00:01:21]: All right. Sweet. You had me worried for a sec. I thought I was going to have to sing my way into my intro.

Adel El Hallak [00:01:27]: My name is Adele. If you grew up in French Canada or adult if you're native English speaking, I cannot sing like Adele. Trust me, you don't want me to do that. So I've, I'm, I'm at Nvidia. I lead product management in our enterprise product group and what that entails more, more often than not IS is libraries, APIs and microservices for agents. And so we're a full stack company. A lot of the ecosystem knows us for obviously our GPUs, but a significant portion of our engineers here work on software. And so our job is to take the best of the ecosystem, as I said, package those up and make them easy to consume through what we call blueprints for the enterprise.

Adel El Hallak [00:02:12]: And so, yeah, excited to be here.

Ben Epstein [00:02:17]: Sweet. Olga, you want to introduce yourself?

Olga Pavlov [00:02:20]: Hi. Pleasure to be here. Thanks for inviting me. I'm Olga. I lead the product analytics in OLX group. Actually I started my data journey with heading off an internal team that was building our internal data stack tooling. So things that we use for data collection, experimentation and then evolving towards. Towards the also things like AI solutions and systems that allow easy management of machine learning and data aficionado.

Olga Pavlov [00:02:57]: I try to look at data from perspective of both the internal consumers, how we can facilitate the journey for them and the internal integrity of the data. So happy to be here. Over to Laurel, I guess.

Laurel Orr [00:03:15]: Yeah, thanks.

Ben Epstein [00:03:17]: Yeah.

Laurel Orr [00:03:17]: So. So my name is Laurel. I've been in this space for a while. So I am. I was a founding engineer at a startup called Number Station based out of Seattle where we basically deployed agents on top of structured data. We got acquired by a company called Elation where we also rolled out an agency platform for them to also do do AI. And then I'm now transitioning into becoming kind of a leading AI engineer at a startup called Stack Lock which is also doing a lot of MCP hosting for the Agentix space. And so have been kind of on the front lines working from the engineering side, trying to go from kind of production of little demo apps and like fun code on your laptop to trying to get it out rolled out to customers.

Laurel Orr [00:03:59]: And so have kind of been dealing with in that space for a couple of years now. So yeah. So I'm excited to get started. I have a lot of opinions.

Ben Epstein [00:04:08]: We love opinions. I'll give a little intro to round it out. My name is Ben, CTO and co founder of a company called Grotto. We focus on reducing vacancy loss for multifamily owners and operators. So we are a consumer of MLOps tools rather than a creator of MLOps tools. Before this I was a founding engineer at a company called Galileo where we did build MLOps tools. Kind of observability tech stacks. Laura, we can definitely talk about it.

Laurel Orr [00:04:32]: I also, I know Galileo pretty well. I know the founders.

Adel El Hallak [00:04:35]: Me too. That's why I was talking.

Laurel Orr [00:04:37]: That's so cool. Yeah. Say hello to our 10 boys.

Ben Epstein [00:04:40]: I will. That's very cool. And then after Galileo was a staff engineer at a company called Evolutioniq until they got acquired back in December and I was leading their LLM internal tooling operations. So been in this space for quite a while now, but have transitioned from a builder of tooling to a consumer of tooling, which has been a really fun transition I think we'll jump off, which is with a question that I think a lot of people get asked and especially people who are new into the space. When do you decide to turn to open source tooling and open source stacks? You can interpret that in any way you want. You can interpret that in terms of open Source models, you could interpret that in terms of open source agentic stacks. There's a million different places where you can either use foundational models really quickly, get up with an API and chain them together. And a level of abstractions are infinite in terms of the tools that are provided to you.

Ben Epstein [00:05:33]: So how do you guys think about when you start a new project? Where do you start when you transition to open source tooling? If ever, presumably with this panel, it's at some point feel free to jump in with whoever wants to go first there and we can just kick it off with that.

Laurel Orr [00:05:50]: I guess I'll be the one to kick this off. In my mind. I see the. One of the huge benefits of having things in the open is I feel like it really helps challenges and kind of improves the flow of ideas and kind of paradigms. Like you saw a ton of these agentic platforms come out on the open and just communication protocols took off, good and bad ones. To be clear, not everything's a winner, but just the community kind of latched onto this and realized it was something they could quickly build and iterate on and really challenge what worked and what didn't. And so from my mind, using things like open source agentic platforms to start really because like it is pretty easy to get started. Not necessarily on something complex, but something pretty simple.

Laurel Orr [00:06:34]: There's maybe 5 to 20 kind of kind of common agentic platforms out there. Getting started on them is pretty easy and I feel like it really teaches you within a couple of weeks what works and what doesn't, what's going to work for your use case. And so when I recommend people getting started, I would say for one, probably start with a closed source model. Unless you are comfortable hosting something yourself or using something like together or one of those kind of open source platforms, hosting a model locally is just kind of a pain if you haven't done it a lot before. And so if it's something you're not super familiar with, I'd probably skip that step and start with kind of models that you know are going to work or at least the community has landed on as kind of the main status quo and then deal with kind of open source models later. I feel like that's kind of a different can of worms. But play like expert, like think of it as like a fun task, right? Your job is to play with a lot of things and understand, hey look, here's what platforms like a frame are working for me. Here's where not eval is a whole other separate can of worms.

Laurel Orr [00:07:29]: You can talk about as well. But the idea is at the end of the day, you want to come to your, you know, whoever, your supervisor, your boss, whoever has this task and say, in essence, like, here's the breakdown of like, where agents are failing, what they're doing well, what they're not doing well. And then you can make more of an informed decision going on. I loved it when customers came to us and they were like, hey, look, we tried LangChain, we tried Crewai. Here's what they did, so you better do that too. And then here's where they fell short, right? And it gave. It gives people a bar by which it's like, if you're going to go to a proprietary software, you know now what it has to do better then. And if you don't, you can kind of just be a little blindsided by where things work and where things don't.

Laurel Orr [00:08:06]: That's kind of my overall. Overall $0.02.

Adel El Hallak [00:08:10]: Yeah. I'm going to build a bit on what, what Laurel said. Right. And I don't see them as mutually exclusive. Right. Open source and closed source systems, I believe agentic systems are made up of. Right of. Of.

Adel El Hallak [00:08:22]: It's a system of AIs of all sorts of different models, but it starts with the use case. Right. And more often than not, some of these frontier models made it super easy for us to try them and demonstrate that we can actually execute and solve for our given task. And so, yes, the early tendency was to utilize anthropic and OpenAI and to just demonstrate whether we can solve for this with Genai or agentic AI. But I think a couple of watershed moments happened for us where it was when Meta released Llama 3 and then subsequently in January when you had your first open reasoning model from Deep seq. And it really showed that we were closing the gap to these frontier models. Once you've demonstrated your use case, you start thinking about, okay, well how do I scale this into produce? It's one thing to be doing it with test data or not really your production data. From a compliance perspective, we couldn't send any of our prompts anywhere.

Adel El Hallak [00:09:26]: We quickly had to figure out how to leverage some of these open source models and deploy those internally. Here to deploy for compliance reasons, deploy some of these systems. I'd say building on Laurel, yes. You want to demonstrate your use case first, that you can solve for it and then look to leverage open source either to optimize or to meet the compliance requirements that you have.

Olga Pavlov [00:09:54]: I would build on that. When looking at this problem, I Wear the product management hat. What is the problem that we're trying to solve? Do we build, do we customize or do we actually buy? With the closed system, it offers a very curated experience that's really nice to kind of poc your case and to see what you actually need and where it satisfies your requirements. But then the more the case evolves and the more you build on it, there's more the economies of scale that are playing to it. You understand where you need to fine tune, where the curated solution is not exactly fitting your needs. And that's probably where you want to start testing the open source, where you also have the community support, which is super important, especially when you're fresh to this journey.

Ben Epstein [00:10:43]: Yeah, so I have a follow up to that. I think those are really interesting thoughts. One question or one thing I hear a lot from people is that a huge benefit of. And if you separate, I think like Laurel said, separate the model and the framework for a moment. Because there are different reasons, I think, to move from foundation models to open source models. I think those are questions of like scale determinism, privacy and like if you don't have those problems up front, I think like there's a different conversation. But in terms of the stack, something I hear a lot is that, well, it's open source, so you can sort of make anything work. Like if you have a problem, you can kind of fix it.

Ben Epstein [00:11:18]: I have not found that to be the case. I'm not going to call out any like particular open source stacks, but I've certainly found in my experience that there are plenty of open source tools that are so abstract that like, like you said Laurel, it's really quick to get up and running but then like you sort of immediately hit a wall which can even maybe be more painful because you haven't actually learned or understood what is happening yet. You're so far in that you feel like there's this lost cause fallacy that you just kind of have to continue pushing forward with it. So how do you guys think about that? Do you ever think about suggesting to people to start really simple, right? Like oh well, an agent can sort of just be state and a while loop, right? Like while the state isn't done, give it some options and let it do a thing and kind of build that out from scratch before starting with a tool? Or do you sort of suggest to people to start with the tool despite potentially not understanding what's happening as deeply?

Laurel Orr [00:12:11]: Yeah, so I think there are some like our personal choice for the frameworks so we actually used some more abstracted frameworks and then as you said, ran into issues where you had basically no idea what's going on. And especially when it came to kind of in the early days pretty some of the O3 kind of advanced reasoning capabilities where communication, if you think about it, when GPT4 came out there was no advanced reasoning. People realized that having this orchestrator communication protocols in frameworks helped. It was a cheap form of reasoning because you could split up tasks and you could do more complex things with more advanced agent check protocols. But then once all these reasoning models came out, our overall philosophy was like, hey, look, if you have a good model, maybe like one orchestrator, but in essence a good model with like curated tools, it's going to do far better than like a very advanced framework. Mainly because it's really hard to debug and build and maintain something when you have eight agents talking to each other. And if you mess up the message history, it's one thing, everything gets really confused. And so going off of that we ended up at one point rolling out our own agent framework, which was probably not worth it in the long run.

Laurel Orr [00:13:22]: Ended up choosing one now that was in production. That is very simple, right? Like it kind of abstracts model calls for you, it abstracts some communication for you. But in essence it was the simplest framework we could find that gave us some of the nuts and bolts that are just sort of painful to build without having any of these high level ways of abstractions because it just is. The minute you run into an issue, it's hard to do, but you can and you end up building like a Frankenstein, which I'm sure many companies have now. It just is, is, is complex. So I don't, I don't know I'd necessarily recommend rebuilding everything from scratch. I think from a learning exercise it's great. But stick with like the lowest level abstraction that you can because you're right, it can cause a lot of headache later if not, if it, you know.

Laurel Orr [00:14:04]: Yeah, too much boilerplate that you don't understand. But you want to do this with production, you know, with like proprietary systems as well sometimes.

Adel El Hallak [00:14:11]: Yeah. We recognize quickly looking internally. Right. Was frankly we're all about enabling the ecosystem and we quickly recognized that at Nvidia we had all sorts of different teams building their very first REACT based agents and frankly using all sorts of different tooling. You had the LangChain line graph crew, you obviously had the Crewai crew. You had folks using Autogen, which I think has been since rebranded to semantic kernel. You have some folks straight up doing it in Python, right. And I don't think you want to stifle that innovation.

Adel El Hallak [00:14:44]: Right. To your point earlier, Laurel and Olga, right, like each framework has their strengths and frankly their limitations. Like, I think CREWAI was super easy for a lot of folks to get started, right? And so, you know, the approach we took was we don't want to stifle and limit folks to deciding on what framework to utilize once they get started. We kind of wanted to enable them. But then how do you augment an enterprise that has a system of different agents utilizing different frameworks? And so our approach to that was what we call the NEMO Agent Toolkit, which is a framework, frankly, that works with everybody else's framework. So it doesn't replace a land graph, it doesn't replace a crew AI. But what it does is it enables three things which we needed to do, we needed to solve for here at Nvidia. Right? And frankly, that's kind of one of the premises that I operate under, which is if it, if it helps us as an enterprise, it's going to help the ecosystem, you know, externally and other enterprises as well.

Adel El Hallak [00:15:42]: And so we built the NIMO Agent Toolkit, which we've open sourced, which we open source, obviously. And it does three things. Number one is it enables interoperability through decoration of all these agents built across all these different frameworks. And when you have that kind of interoperability and you're able to observe all these systems, that enables you to observe a system of agents, right. Built with again, these different frameworks and being able to collect those traces, being able to have that wide observability allows us to also profile. Going back to operationalizing these agents allows us to profile the system as a whole. And that allows us to do the one thing that we're unique at doing, which is full stack acceleration, that is getting the agents to run more efficiently and making smarter decisions about whether you do disaggregated computing versus aggregated computing. And so we wanted to enable kind of the innovation.

Adel El Hallak [00:16:39]: You couldn't stop it. Like I said, there are still, right. And since then, you've got the 80Ks of the world, you've got the strands coming out, right? And frankly, there's a data gravity that sometimes forces a choice of what framework you're going to utilize. Hey, my data resides in xyz, so I'm going to use, you know, Agent core for AWS with strands, right? And so, and so we didn't want to stifle, you know, that Innovation because right folks utilize the different frameworks for their different use cases. We just wanted to make sure we provide them with the, with the tooling so they can observe them as a whole and drive like that full stack acceleration.

Laurel Orr [00:17:15]: I've actually I have a question, a follow up question for you is what do you think about this trend? And I've seen this I think three or four times where you're getting observability platforms built on top of agent frameworks. So like Smith has one, Antic does like there's a startup in Seattle that's like a bamboo agent language and they're rolling out one. I think right now is it's kind of almost, I don't know if it's contrary to your point, but it's almost locking people even further into this ecosystem because now their entire logging stack is built on and like not that you, all of them, you know, say like oh you know, you could technically we support otel like we can support any agent framework but then you know, there's not a single engineer who's like woohoo, let's go do this.

Ben Epstein [00:17:57]: Right.

Laurel Orr [00:17:58]: Like you're just like well we're stuck with this now. And so just your point on like data lock and I also thought was really interesting is that you're seeing people try to and like even databricks have their own like genie kind of similar stuff like does this hurt or like support what you guys are building?

Adel El Hallak [00:18:17]: We forced the whole. Right. All we do is spit out hotel traces. Right. And because to your point of we've got Datadog internally, we've got weights and biases internally, we've got Langsmith internally. I'm not going to force them off the platform. Right. And it doesn't mean that for their specific agent they have to visualize and manage those agents through whatever their agent framework is.

Adel El Hallak [00:18:40]: And so I think again we kind of want to let the ecosystem make choice in terms of what it is they want to adopt and yeah, I mean you still want to enable the ecosystem to realize some value. Langgraph is doing great. Langsmith is their observability platform. We don't want to stop folks from utilizing that.

Olga Pavlov [00:19:11]: Yeah, very similar approach here. So internally in our gen platform we offer different solutions as long as the internal users are very clear on the trade offs that can come either through compatibility, cost scale, ability to fine tune and they take it from there.

Ben Epstein [00:19:34]: Yeah, the, the evaluation stack is an interesting thing. I don't think it's, I mean I don't think that the agent Stack yet is solved, but I think that the observability and eval stack is even less solved. As for the last maybe you know, three or four years, all LLM tooling I've built is internally like for a product B2B. And so I'm not. So, so I, I. Part of the reason that I did that was so that I could see what like enterprises needed. And my experience thus far has been that every enterprise's needs have been so custom that there was no like generic solution. And so we essentially just do otel.

Ben Epstein [00:20:19]: Like we, I don't, I don't even know if it's a problem for like dropping names, but like we use baml. Like that's my agent framework of choice. It's like the lowest abstraction I could find. I'm a huge fan. I think they're thinking about it in a really cool way. And we just trace those logs and they output those logs in JSON and we actually just have a lakehouse storing all those logs and we just query them with SQL and we're a four person company that works really, really well for us. I think that changes as the scale of the company changes. Right.

Ben Epstein [00:20:51]: Like you, like one of my follow ups for you actually was you were saying that you let different teams pick different tools. One of the arguments I've heard from like companies I really respect, like ZenML for example, is that that's a great and fun idea for me as a founder of a startup that has six people, but that does not work for an enterprise of 40, 50, 100, 300 people. And I think he's totally right about that. How do you think about that? Like as Nvidia being huge company, like how do you think about many teams using different frameworks? Is that, do you think Nemo solves that?

Adel El Hallak [00:21:27]: Yeah. So I think, right, you're spot on. And by the way, right, we talk about AI factories and we can't talk sincerely about an AI factory unless we stand one up and operate it ourselves. And a factory is basically like your on premises data center to generate tokens, which are the intelligence that you get out of agents. I think you have to differentiate between a sandbox that allows for exploration versus how you operationalize those agents. And to your point, right, we did and continue to do tons of different evaluations because we're talking now as if we're engineers and scientists. But the realization is we have a lot of other disciplines at the company. Right.

Adel El Hallak [00:22:18]: That you know, that can take full advantage of building. Right. And utilizing agents without writing a Single line of code. Right. And so, you know, you can name the coherers in Canada of the worlds to the lovables to the right. And everybody in between. Right. And so to your point earlier, like, there are some things that we've standardized against.

Adel El Hallak [00:22:42]: Hey, how are we going. You know, we were obviously, you know, for scale and orchestration, we run a kubernetes. What's the downstream kubernetes that we've standardized on everybody? Like. Right. Like we standardize on, on, you know, red hat internally. Where does the artifacture of where you're storing your actual artifacts? Right. I can say those publicly because those are things that we've said publicly in the past. Right.

Adel El Hallak [00:23:01]: We use, we use JFrog Artifactory. So there are some common building blocks that you're going to utilize across the stack that everybody harmonizes against, but then there's other set. Right? And then you still have a sandbuck that allows, that allows for exploration of different frameworks and tooling. Right. But, but, but, but once you're operationalize those agents, I mean, that's a big lift from I've tried something to now I want to run it. And you know, and do it in a way where compliance gives you the thumbs up. Like, you know, we talk about observability, man. Collecting traces is difficult while maintaining privacy.

Adel El Hallak [00:23:36]: Like, it's not just about like, yeah, just spit them in a data lake. We can't do that here. Right? We, we can't do that here because. Right. Somebody in a certain discipline, you know, I'm not talking about marketing, but say finance or chip design. Right. We can't collect their traces, we can't collect their prompts. How do you get, then get into differential.

Adel El Hallak [00:23:55]: Right. Privacy. How do you get into. So, so there are some things you can standardize. Those are the common building blocks. Your AIOps platform for like that tooling, I think, Ben, like Galileo likes, right? The ML Ops tooling, your artifactory is your downstream kubernetes. You can get to a few different frameworks, but you still want to allow for exploration. And that's kind of the approach we've taken.

Ben Epstein [00:24:21]: How do you. Laurel, Olga, how do you. I want to just take on one thing that Adelsea was talking about, building agents for people who don't write code. This is like a thing that we've seen a lot of. There are tools coming out, drag and drops, all these things. How do you think about that? Like, how in the order, the hierarchy of like needs of your company. Where, where does that, where do you Think that falls like if it is harder for non technical people, but it is. But if you can make a decision that is makes it harder for non technical people to put agents into production but easier, more standardized, safer for engineers to put into production like, or vice versa.

Ben Epstein [00:24:58]: How do you make that decision?

Laurel Orr [00:25:01]: I mean my experience is it's really depend on the kind of product that you're rolling out and like who your customers are.

Ben Epstein [00:25:08]: Right.

Laurel Orr [00:25:09]: When we were doing something where it's like hey look, we basically owned the AI and which is a very hard game to be in nowadays. But in essence like we were responsible for quality in that case. Like none of this low cost stuff really mattered because we needed complete precise control over basically everything. And again that's a different battle that you fight as an engineer. But from that perspective, like I didn't care if it was easier for the engineer. Like I just had to work and kind of be the best quality it could be. However kind of in kind of the second place I was at, we ended up we're trying to roll things out more so that customers could, you know, we had this whole low code like agent builder. I would say my biggest overall like hunch in the space is that you kind of.

Laurel Orr [00:25:50]: Everyone likes to see some sort of like low code agent builder. But our experience was that people kind of wanted to tell you their and have it done for them. Like most of the time there was a level of it was hard like people didn't know what to do. They'd be like okay, I have a box or a system prompt. Like I don't like, like the black box problem. There's, there's like, it's. Yeah. Like everyone understands that AI is incredibly powerful and then have seg fault when it comes to kind of doing and just because it's so foreign.

Laurel Orr [00:26:16]: Right. I mean if you think about it, some many people are still using tool circa like pre bird era, right. And so now you're kind of giving them this magical text box and debugging it and like, and like you know, from an engineer who's done, you know how you learn how to debug AI, it's like a whole new skill set, right? To understand how to read prompts and understand what it's doing and like have conversations and all this stuff. And for many users it's completely above, above what they even have time for. Right. Like they're like I'm paid for something else. Like I don't want to spend time on this. And so I think things that generally I feel like are more have worked a little better like thinking a lot about user experience and templates.

Laurel Orr [00:26:52]: Again, I don't know how well that's going to go. I really like different kinds of workflow builders. I feel like it's much more easy for people to understand and explain like oh, it almost feels like a deterministic flow with like little hints of AI in there. I think it's more relatable to people. So. So my overall read is like I don't actually think we're at the point where fully no code stuff can go out to most customers. You kind of have to be holding their hands a lot of it. Which is also kind of the only real reason why open source is hard to fully roll out is like you oftentimes really need to hold customers hands the whole time and they're not going to get that in an open source community.

Laurel Orr [00:27:28]: The community is great and very supportive but they're not going to sit there for a week trying to help somebody roll out something in their prod. Like it just doesn't. The incentives kind of aren't aligned there. And so this was again my experience, I was in the BI space. So like you get a lot of kind of customers who are brand new to AI trying to turn it on very quickly.

Olga Pavlov [00:27:46]: Yeah, I would say it really depends on the type of the problem we're trying to solve. So if we're talking about something that will influence the end user, of course we are a lot more careful and no code solutions are not exactly fit. But if it's about enabling users to use internally the agents or to build internally the agents to improve their internal productivity. I'm actually offered because it's a fantastic feeling like when you, you are in control of improving your all your own workflow and your own task day and your own well, performance of sorts. Right. Enabling the high performance culture and that comes with a bit of cost of investment in the time and guidance and sometimes also in governance educating these internal users. But I think it's also very thankful initiative.

Ben Epstein [00:28:50]: Yeah, that I think is, I think that's, that's the exact right like split question sort of. You said internal versus like external facing. I think the other way to think about it or like the same thought in a different framing is like what's your tolerance for failure? Like what actually is the precision required for the system? We have plenty of tools that we use internally that have like 60, 70, 80% precision and they're great and like we use them and they're really fun. But only our engineering and our data science teams are putting out actual agents, prompts, whatever you want to call them, that go to customers that have 90 plus, 95 plus 99 plus percent precision. And the way we do it is the most bare bones thing of all time. We all use baml. And so it becomes really easy to move to production because if you have BAML and you have just a spreadsheet of data that you can evaluate against and everyone's aligned that if it's getting a certain score on this spreadsheet, then it's good enough to move to production, then it's really easy to move to production because it's just a BAML prompt. And so you're already kind of like in the system.

Ben Epstein [00:29:52]: But yeah, I think if you're doing internal tooling, it's really fun and really powerful to give non technical people ways to automate kind of every part of their job that is automatable. Good timing.

Allegra Guinan [00:30:05]: Yeah, thank you so much. I'm going to pause us there. There are a ton of questions in the the chat, so I would love for all of you if you're able to go into the chat and maybe respond to some of those. We won't be doing any Q and A right now as we are going to move on to our next speaker. But thank you so much, Adele and Laurel and Olga and of course Ben as well for this amazing panel. And I hope you enjoy the rest of Agents in Production as well.

Ben Epstein [00:30:30]: Thanks so much. Thanks everyone for coming on.

Adel El Hallak [00:30:33]: Thanks for hosting. Lauren, Olga, pleasure meeting you guys.

Olga Pavlov [00:30:36]: Pleasure meeting.

Ben Epstein [00:30:38]: Thank you all so much.

+ Read More
Comments (0)
Popular
avatar


Watch More

Eval Driven Development: Best Practices and Pitfalls When Building with AI // Raza Habib & Brianna Connelly// AI in Production 2025
Posted Mar 13, 2025 | Views 426
# AI Development
# RAG
# LLM
# HumanLoop
Driving Evaluation-Driven Development with MLflow 3.0 // Yuki Watanabe // Agents in Production 2025
Posted Jul 23, 2025 | Views 184
# Agents in Production
# Databricks
# MLFLOW
Code of Conduct