Sign in or Join the community to continue

Architecting Modern AI Systems: Platforms, Agents, and Integration

Posted May 28, 2026 | Views 45

# AI Systems

# Agentic AI

# BuzzHPC

Share

Speakers

Allen Roush

Head of AI Research @ BuzzHPC

Allen has held senior technical and AI leadership roles at companies like Oracle and Intel. He's very active in the AI research space and open source communities. He's passionate about improving the creativity and coherence of AI systems.

+ Read More

Frédéric Bénard

Senior Director, AI Applications Development @ Mila

Frédéric Bénard is Senior Director of AI Applications Development at Mila (Quebec AI Institute), where he leads a team focused on building the engineering foundations for applied AI systems. His work centers on translating cutting-edge research into scalable applications, including AI-driven platforms and agent-based systems used across research and industry collaborations.

Over the past 20+ years, he has led engineering teams in both startups and large technology companies, including BlackBerry and Motorola, and held executive roles at venture-backed companies such as OMsignal and Tungle, both of which were successfully acquired. He has also founded and led an applied data science firm, delivering advanced analytics and machine learning solutions across multiple industries.

Frédéric holds a PhD in Physics from the University of Toronto and an MBA from McGill University.

+ Read More

Shuo Wang

Senior Manager, Responsible AI @ Bell Canada

Shuo leads the Responsible AI Office for Bell Canada where all AI use cases are reviewed and assessed for potential harm and bias. Previously he led a team of data scientists to expand a large scale ML program to improve customer support effectiveness.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

As AI systems evolve into more autonomous, agent-driven architectures, the way we design platforms, tools, and infrastructure is rapidly changing. In this session with BuzzHPC, we explore the shifting boundary between platforms and tools, what developers expect platform providers to handle versus what they want to control and build themselves.

We unpack what modern agentic stacks look like today, how teams are structuring them in production, and where these architectures are heading as systems become more complex and distributed. A key focus will also be on agent interoperability, how different agents communicate, coordinate, and operate within shared environments.

Finally, we share insights and lessons from a recent AI hackathon delivered in partnership with Bell, Buzz, Mila, and KHP, highlighting how these concepts are being tested and applied by builders in real-world scenarios.

+ Read More

TRANSCRIPT

Demetrios: [00:00:00] All right. We are officially live. What is going on, everyone? I am excited to have this conversation with some incredible people today. We're gonna be talking about architecting modern AI systems. Let me get right to the point. I wanna jump in and chat with them. So first, I will [00:01:00] bring out Frederic to the stage.

Demetrios: Where are you at? Hey, there he is. How's it going, sir?

Frédéric Bénard: Hi. Hi, everybody.

Demetrios: Let's also bring our next guest out here, Shiao. Where are you? Hey, there we go. Hey, everybody.

Frédéric Bénard: How's it going?

Demetrios: And then last but not least, our l- final panelist, we've got Alan. And so let me set the stage here and then we'll start chewing the fat.

Demetrios: We really wanna talk about moditect... moditecting, I almost said. Architecting modern AI systems. And to do that, what that comes along with, with the platforms, the agents, the integrations. So when we explore this specifically, I wanted to jump into some of these things that are the most important pieces.

Demetrios: But before we do, maybe it's cool to hear a little update. You all just did a hackathon, I think. Who was in the hackathon? Frederic, [00:02:00] I know you were, right?

Frédéric Bénard: Yes, along with, uh, Shiao from Bell and Kids Help Phone. So it was a-

Demetrios: Tell me more about that before we jump into the questions and I grill you.

Frédéric Bénard: Sure. So it was a hackathon we organized with, uh, Bell Canada, um, and Kids Help Phone around, uh, mental health and how can we, how can we support, uh, how c- how c- can we build guardrails and around m- uh, conversational agents and making them more secure a- around mental health.

Frédéric Bénard: So we had, uh, over 100 teams participating in a hackathon. They were developing their own models or building- Wow ... some, some LLM prompts to decide when to, um, pass a conversation to a human because it, it became sensitive or there was, uh, issues around, uh, suicide ideation and things like that. Um, so it was, um, a great collaboration.

Frédéric Bénard: Um, and my team specifically was, um, was- [00:03:00] Help build the, um, evaluation pipeline. So we had to evaluate the propo- the submissions from all these teams, and we wanted to do it quickly, so we built this, uh, infrastructure on top of Kubernetes running in the Buzz, uh, HPC environment, uh, where we had access to LLMs that were hosted there, GPUs, CPUs.

Frédéric Bénard: Um, and, uh, we, we, we ended up building a self-service, um, infrastructure so teams could just trigger jobs and see how they fared, uh, and then we displayed those on a leaderboard. Uh, and in- interestingly, it was, um, something we built at the last minute, but it was super used. We had 1,000 submissions throughout the week.

Frédéric Bénard: Wow. Uh, so it ended up being a game of, you know, where, where do I fare, uh, one team against the other? So that was super exciting.

Demetrios: 1,000 [00:04:00] submissions is quite a lot, so congrats on that success. Shao, you got anything else you wanna mention before we jump in?

Shuo Wang: Yeah, just to add on to what Frederic said, I think sort of I wanna highlight, like, a small interesting elements, uh, element to it.

Shuo Wang: Uh, like mid-hackathon we actually introduced a new dataset that almost, like, sent everybody into a scramble because they... You know, previously some teams thought they did great, and then we would have a new hidden dataset shared, and then people are like, "Oh, you know, we're not scoring as much as we did earlier."

Shuo Wang: Um, so I think that just, uh, generated a lot of excitement for, uh, for the hackathon. Um, I wanna just take a brief, brief, uh, sort of step back 'cause obviously mental health is sort of, I would say, very big these days, you know, just given the, the sort of crappy economy that we've got, uh, you know, some geo-political tensions.

Shuo Wang: So these companies forgot together we all care about mental health, um, and that we wanna make sure that, you know, we have the right infrastructure in Canada to support, um, some of these conversations. So have, like, like, like Frederic said, a lot, lots of participants, and then we had to sort make sure that, um, you know, you know, we, we built a good [00:05:00] support structure, um, to, to support, uh, our, our, our developers.

Shuo Wang: Um, and I think we actually spent a quite bit of time, you know, making sure that when we evaluate some of the submissions, they're all, uh ... We're, we're looking at both, like, all the presentations, the scores, the documentation. We look for innovation in all of that. Um, but, but yeah, it was a, it was a great time, um, you know, collaborating across these different, like, companies from different di- diverse background to sort of address a common cause, um, of mental health.

Demetrios: Excellent. Yeah, it is quite an important topic, and I think that- The idea of platforms and what platforms can give you. You mentioned Buzz was helping the teams along in this journey of the hackathon. I also would love to probably kick off this conversation with the idea of where you draw the line between what you expect from a platform versus what your team needs to [00:06:00] own, and then maybe on the next step we can go a little bit further and talk potentially about agent harnesses and agents where, what you need to own on the agent side versus what you would expect someone else to own.

Demetrios: So Alan, I saw you shaking your head on that one. Maybe I'll kick it over to you first.

Allen Roush: Yeah, um, I can definitely talk about agent harnesses and why Buzz is, is a great place for, uh, running your, uh, LLM, uh, model inference services to power agents where, you know, you're able to avoid having, uh, somebody looking over your shoulder and, you know, training on your really great ideas.

Allen Roush: Uh, I'll talk about that in a moment, but I do just want to kind of talk a q- quick moment about Buzz and, and what we were able to do, uh, in the context of that hackathon. Uh, for those who don't know, BuzzHPC is Canada's largest sovereign AI cloud, and it's built to give enterprises, researchers, governments, uh, and builders, uh, secure access to, like, [00:07:00] high-performance GPU infrastructure without, uh, leaving Canadian juris-jurisdiction.

Allen Roush: Uh, and they're building Canada's largest AI gigafactory to support, uh, the next generation of, like, training, fine-tuning, inference, and production AI workloads, uh, at industrial scale. Uh, because their, uh, foundation is Canadian sovereignty, their reach is global, uh, and, and is now, you know, they're operating, uh, you know, obviously in Canada, but also in Sweden as a result of a, of a cloud partnership.

Allen Roush: Uh, and they were even the first to launch a native AI cloud region in South America, which means that they b- you know, to answer parts of your question, you know, uh, Buzz is really great because it gives people both local control, data residency, and when they need it, the trusted Canadian infrastructure, um, and, uh, renewable even powered compute because sometimes, uh, these kind of, uh, things are important.

Allen Roush: Uh, so in simple terms, you know, Buzz is building AI factories that turn [00:08:00] energy, GPUs, data, and MLOps into real world intelligence. Now, in the context of agent harnesses, um, I think that a lot of people don't know that Codex, Claude Code, Cursor, all, uh, OpenCode, OpenClaw, any, any of these names, they sound like Pokémon sometimes.

Allen Roush: You can actually change what underlying model and what infrastructure is powering it. So I can go using, uh, any of BuzzHPC's, uh, model inference services or even just provisioning the GPUs myself and installing the LLM. I can set up any model available on Hugging Face, so basically any local open source model in existence.

Allen Roush: And a lot of them are good now for AI agent, agentic coding, like the Qwen series, the Mimax models, mostly coming out of China these days, but, uh, there's a few good Western ones as well from, uh- Like Mistral, for example, uh, and even, uh, you know, uh, Canada's own Cohere, right? Which, which is still making models.

Allen Roush: And you can [00:09:00] power Claude Code using any of these models. Uh, a-and it's cheaper, you know, you don't-- your cache doesn't, uh, exit five minutes in, uh, leaving you to, to come back and pay 10X the token cost that you would've just because, uh, that's the cost of zero data re- you know, retention policies from somebody like Anthropic and OpenAI.

Allen Roush: Uh, you have a lot more control over the diversity and creativity of your outputs, and you can start applying all of these cool research-based techniques like, uh, low-rank adapters, uh, steering vectors, uh, f- uh, advanced sampling algorithms, which can give you so much more value in so many places, um, uh, than, than, uh, uh, traditional, uh, uh, closed source models.

Allen Roush: And I'll finalize this by giving a, a slight example of this in a case study. Uh, recently, there was a Twitter, uh, thing where somebody took a painting from the famous French painter whose last name is Monet. Uh, I don't know much about art, so I'm, I'm not really an [00:10:00] expert. But somebody took one of the, uh, lesser-known, like, water lily paintings that they did, and they posted it on Twitter and said it was created by AI.

Allen Roush: And everybody reacted and said, "Oh, this is horrible. This is not a good Monet at all." But it, but it was a real one. And so I think this illustrates this idea that, you know, people actually like the outputs of AI until they know that something was written by AI. And if you use ChatGPT or Claude or Gemini, um, uh, you know, your models are gonna have all these tells, like, uh, uh, phrases of the form, "It's not X, it's Y," or overuse of your semicolons and em dashes- Oh, yeah

Allen Roush: that make it really obvious you used AI to generate something, and people tune out, and they don't like that kind of writing. So Buzz has a lot of these really cool AI research innovations, especially around reducing and eliminating slop, uh, and repetitive phraseology from your outputs, which means that, uh, we're, we're definitely, like, a unique, uh, differentiator.

Allen Roush: But I also just wanna quickly, and I'm sorry I'm, I'm meandering, but I just wanna quickly touch on, in the [00:11:00] context of that hackathon, Buzz provided, uh, GPUs and then the underlying access to, uh, many instances of, like, Nemotron series models, uh, and, and I think a few other, like, agentic-capable models that we provided to the hackathon participants and kept up.

Allen Roush: Uh, and, and, uh, also the, the J- uh, Jupyter Notebook infrastructure. Basically, every part of the compute and the user interface and user experience that the participants needed in order to deliver the, the really cool, uh, uh, projects that, that they did.

Shuo Wang: Yeah, I just wanna, uh, uh, thanks Alan for, for that. I, what I learned from me was I didn't know Cloud Code can, can actually plug into like an open source model.

Shuo Wang: That was really interesting. I'm gonna talk to my team about that. Um, we're al- always running r- out of, uh, credits. Um, so I just wa- yeah, just on the, uh, the, what the AI collab platform provider could provide, I think, uh, you know, with our collaboration on the hackathon, [00:12:00] Alan, um, uh, you know, I, the team was really looking for reliability.

Shuo Wang: Uh, reliability, scalability, the, being able to scale out ve- rap- rapidly to support, uh, your number of users. Ha- having great support, um, when s- seem, uh, when things go wrong, like being able to reach out to you guys and, and, and get answers really quickly. And then lastly, um, you know, uh, we haven't delved too much into that I think with you guys, but generally if there are some built in, um, you know, sort of guard railing capabilities of these platforms, um, I think of, you know, Model Armor from Google, for example.

Shuo Wang: Um, it takes a lot of sort of, um, governance related, uh, consideration requirements in the enterprise space, uh, that developers would have to worry about Yeah.

Demetrios: Awesome. Yeah, there is, uh, something that you both touched on that potentially is a nice thread we can pull on, and that is the rise of the term and thought, how much more cycles we are now having on the, uh, tokenomics of [00:13:00] AI, and how we're thinking more about, do I really wanna use the beefiest model for this task right now?

Demetrios: And how we really need to be a bit smarter about what we're using and when, because we get rate limited, or we just use a lot more than we had budgeted for. You've all heard that. And, and one way of combating this is really having a open source model that you own and you control. And so potentially we can talk with you, Frederic, about the idea of tokenomics and how you're seeing it, what you're thinking about it.

Frédéric Bénard: Yeah, my, my experience, uh, around gen AI is, uh, it, it's very easy to get an API key from an LLM provider and get started and, [00:14:00] and that's, that's how most people start. You get your OpenAI key or your Anthropic key. Um, and, and this is great, but, uh, there are, there are time when you, when you wanna optimize, um, what you're, what you're building and, and you don't need the super duper models.

Frédéric Bénard: If you host something that's open source, it can do the job for a specific, uh, use case. Um, and there's also concerns, sometimes you don't wanna send your data to these providers. You wanna keep it local in Canada or whatever country you're in. So, so it's a natural next step as you get more proficient in this technology to want to host your own models.

Frédéric Bénard: And I find that there's- There's a difficulty in people don't necessarily have these skills, so it, it requires you to research and, and understand how to host and serve these models, how to manage GPUs. And that's where having, you know, a local provider li- like Buzz to help you out [00:15:00] and, and, and provide technology to host this locally is super useful.

Frédéric Bénard: So, um, so I find, like you start, you don't really care about tokens initially, but eventually when you get your... When you roll out products and you get your first, you know, your, your new invoices is when you, uh-

Shuo Wang: Fre- Frederic, remember when we worked on the hackathon and I was talking to your team, um, and they were using, I think, some of the biggest frontier models.

Shuo Wang: Uh, they were just, they were trying to do, you know, evaluation and data synthetic data generation, and they were telling that they can't do it at, uh, at the scale they wanted because they'd just run out of, uh, you know, token usage. Um, and when we had to do it for, for the hackathon, you know, we worked with Alan's team to, to, to, to use our, you know, Buzz platform.

Shuo Wang: All of a sudden we have these open source models. We just talk to them any, any time we want and however we want, and there's no, you know, consideration limitations whatsoever other than sort of the underlying sort of com- uh, sort of compute cost on the burden on your, on your team, Alan. But, but yeah, I think having that freedom of, um, [00:16:00] you know, being able to, you know, work with LM in any ways you want, uh, you want is, is very valuable

Demetrios: Now there's this whole idea too on how you are taking into account much different calculations.

Demetrios: Like potentially a larger GPU is going to end up costing you less because you only need it for, say, an hour, versus if you have a smaller GPU and you need it to run for many hours. So there's those types of things that I hear folks talking about. But then the other thing is, uh, and I wanted to ask about this, I wanted to ask you particularly, Alan, 'cause it's more of a, a Buzz-specific question.

Demetrios: Do you have the ability to scale down to zero?

Allen Roush: Yes. Um, so, so i-in terms of like, uh, going down to zero and then warm starting or cold starting, um, back up, uh, and being able to, to use other [00:17:00] concepts from the like function as a service, uh, and, and even serverless kind of world of cloud, yeah, we support all of this. Um, and indeed, we can also scale up, uh, to many GPUs and even many instances of the model inference, um, engines such as vLLM or SGLang or, or in, in the case of people who wanna be, uh, NVIDIA exclusive, which we also allow, uh, and support really well TensorRT LLM.

Allen Roush: Um, as far as some of these other questions about like, uh, what, what is the correct choice of GPU and even correct choices of models, the first thing I wanna point out is that there's model size and then there's model generate-- or sorry, GPU size, and then there's GPU generation. So what I mean by this is that things that are in Blackwell generation, they support hardware efficiency improvements, especially for, uh, highly quantized models down to like FP4, uh, and supporting like NVIDIA exclusive variants of the FP4 format or INT4.

Allen Roush: Um, and, and these, uh, will give you [00:18:00] significant performance improvements and model inference optimization improvements over, in some cases, even larger GPUs back from two generations ago, Ampere, like the A100 But the amount of VRAM you have, uh, for example, the A100 has 80 gigs of VRAM, and I forget the name of the Blackwell inference class ones, but the-- Like, Blackwell's gonna have 24 or 48 gig VRAM GPUs, and there are just models that cannot run on those GPUs that can run on A100s.

Allen Roush: And I wanna point out that, uh, Michael Burry, who keeps on-- the guy from "The Big Short" who keeps on claiming that old GPUs don't make money, that guy's wrong. Uh, A100s GPUs, which are now, uh, almost seven years old, uh, they were-- they hit a low price somewhere around, I think, the middle of last year of, like, $1.50 an hour on average across cloud service providers.

Allen Roush: Go ask for quotes on A100 GPUs right now from anybody, and, uh, you're gonna have a hard time finding them under $2 an hour in [00:19:00] general. I think, I think we might be able to do something, but, uh, uh, I, I don't claim to be an expert on up-to-the-minute pricing. But we're seeing extreme hardware inflation across the industry, so anything we're talking about right now with calculations around tokenomics, we have to understand that it could change a little bit as, uh, the hardware story gets even more constrained.

Allen Roush: Now, as far as models go, I'll point out that in general, I think that, uh, there's this trade-off between the kinds of tasks where you can get done versus the cost. So in general, uh, you do want the very best model for tasks where you're not sure if it can be completed or not and where it really requires the very best version of, of your outputs.

Allen Roush: But, uh, for so-called grunt work, I mean, the, the kinds of things where we don't have to spend huge amounts of our own brain power in general as humans, um, smaller, uh, less sophisticated models are often better. Now, unfortunately, s- agent swarms and agent-agent communication is [00:20:00] still relatively underexplored.

Allen Roush: And so, uh, what I would, what I would really recommend to people is that they, in general, try to run the biggest model possible with open source just because, um, going down to especially the 27 billion to 35 billion parameter range, which is what you can run on your MacBook Pro, it will be very obvious the limitations of local models.

Allen Roush: Whereas when you start getting to that 200 billion, 300 billion, to even one trillion parameter range, which starts requiring a lot of, you know, Blackwells or H100s, like high-end GPUs, that's when, you know, you really start seeing the models be able to handle the much longer contexts and feel very competitive and oft- and still a lot cheaper when-- with a lot more flexibility, especially around, uh, diverse generation than Sonnet, Haii or, or Opus.

Demetrios: Z- you said two things there. I wanna just kind of like prime you that folks in the chat are asking about pricing. So [00:21:00] potentially you could grab some numbers for us. Uh, maybe make a few Slack messages and see what it's at right now. Oh. And- Oh,

Allen Roush: gosh. I'll try.

Demetrios: Yeah. While you're doing that... Oh, I know it's not the easiest thing for you because you're also on your phone right now, but, uh, while you're doing that, you did mention something else that I've been hearing a lot of gripes about, which is that these models are behind an API, like you were mentioning earlier, Frederick, and you don't have the control over them.

Demetrios: And Alan, you were saying, "Oh, yeah, well, some GPUs are primed for quantization or f- are primed for quantized models." The labs know that, and after these beefy models are out for a little bit, it feels like they get nerfed, and then we have no say in our model performance or the APIs because everything is behind the API, and we just have to deal with lower quality output.

Demetrios: And so [00:22:00] I, I fully am on board with this. Like, a lot of times you want that controllability and that steerability, and so I just wanted to mention that other point that I've heard from a lot of folks

Allen Roush: Um, I, I, I just wanna quickly... So, sorry Go ahead, Al We at least, we d- if we're gonna give you a quantized model at Buzz, we're gonna tell you, and we're not gonna do any of the, like- Yeah

Allen Roush: quiet nerfing. I just wanna put that out there, sorry.

Demetrios: No, I figured. I imagined as much, and I also imagine that it's, it's kinda like it's my model, so I get to decide in a way. But yeah, so-

Allen Roush: Exactly. Exactly

Demetrios: Sorry.

Shuo Wang: Yeah, I was gonna add, like operationally, you know, as the a- uh, agentical LLM models keep, kept getting, getting better over the last couple of years, you know, we operationally, we definitely ran into a situation where, you know, there is a- an automatic pr- model upgrade, and the prompts that you, the, the, the prompts that you had, the, the use cases that you had all of a sudden doesn't work.

Shuo Wang: All of a sudden the tests don't, don't pass. [00:23:00] So like I, and we don't, like you said, we don't have full control over what models are being made available to us by some of these, uh, frontier platforms, right? So I think there's, there's this value for, I guess, some of the things where you want a more reliability, more predictability on what you get back from the model.

Shuo Wang: Being able to talk to a sort of free, a frozen version of an open source model, you know, through Hugging Face, I think that's a really interesting option

Demetrios: So, uh, I wanna change gears real fast and talk for a minute about this idea of you all had a hackathon. It's always fun to see what people create in hackathons, but the gap from, like, demo at a hackathon to actually productionizing whatever that is, is great, and there's a large delta there.

Demetrios: There's a lot of products that have died in that delta. What kind of things do you think about when you think about what needs to be done from hackathon to production? [00:24:00]

Frédéric Bénard: Yeah, that's... It's a very interesting question, and I've... My experience with gen AI over the last three years is, uh, it's, it's very easy to get a quick prototype going.

Frédéric Bénard: Um, and it gets you about 80% of, of what you need. Uh, and pe- people get excited, "Wow, that was so quick. I could develop that." But what's hard is to go from the 80% to the 95%, which you need if you wanna deploy this in production, and that part is tough. And that's why a lot of products and a lot of early prototypes get abandoned.

Frédéric Bénard: Uh, people need to put in the work of, you know, evaluating this properly, um, tweaking, iterating, getting user feedback, seeing what's the problem, what... So it's, it, it, it's... And it, it's, it's different than other technology 'cause y- you get to 80% really [00:25:00] quick, but then to get to 95, this is where you get, you get to do the, the hard work.

Frédéric Bénard: Um, and it's, um... Yeah, so you, you gotta do it, do the hard work, and, uh, evaluations is not easy. Uh, we tend to over-rely on LLM as a judge, so, you know, we're not sure if it's working or not. Let's ship this decision to another LLM. But then who evaluates the LLM judge? Uh, so it's just kicking the can, uh, later.

Frédéric Bénard: So in, in practice, I find that you still need to spend often, you know, manual evaluations, getting experts to look at the, the, the results, the getting user feedback, um, and, and being willing to spend the effort to get there.

Shuo Wang: Yeah, it feels like going back to, you know, building software for, uh, building solutions in general for internal team. Like, having good connection to the business, to, to the operations and the business team, really truly [00:26:00] understanding their requirements and their needs. Um, and for us, you know, you know, being sort of afforded in a large organization, you know, you have people that are managing products and people that are managing now, um, the adoption of, uh, new solutions.

Shuo Wang: So, um, you know, if you're sort of on your own, then I think with AI, you can sort of, uh, it can help you take on the sort of perspective of, like, a product manager. You know, think about growth.

Frédéric Bénard: Mm.

Shuo Wang: Think about, you know, um, uh, getting that, that, that product market fit. I think all of those traditional sort of software sort of, um, uh, mindset would also apply to, you know, uh, LLM-generated demos and POCs.

Allen Roush: Yeah, and, and I even wanna add that, um, in AI research land, uh, LLM as judge quickly became seen as, uh, something where during peer review they would say, "Okay, that's great, but show me, like, real human evaluations." Mm. Yeah. And, and what, what we've also found, uh, to be really cool for, for, uh, using agents, uh, is controlling, uh, [00:27:00] API access even-- or programmatic access to platforms that can get you, um, uh, manual data labelers.

Allen Roush: And this is something that I, I'm pretty sure to my knowledge, Buzz does not offer manual data labeling service. Um, but it is important for giving true gold standard outputs. And there are many of them, you know, many more beyond, uh, uh, Mechanical Turk, and a lot of people have heard, "Oh, you know, it's just like asking people from the Third World to, you know, answer things for a dollar an answer."

Allen Roush: You-- That's-- I mean, obviously that exists, but there's also people, you know, doing very high-end data labeling for like $200 an hour, where if you really, really need like truly gold standard answers in, in many domains, you can get that. And even a little bit of that kind of data, when used effectively with, uh, reinforcement learning with human feedback, uh, and, and variants of it on top of, uh, good foundation models, can lead to some really cool, uh, outcomes for like [00:28:00] personalized models that can deliver a lot of value cheaply.

Demetrios: Yeah. Have you played around with any of these RL gyms, Alan? Have you seen those?

Allen Roush: Um, do you mean like offerings from any, like, particular services, for example, like Fireworks Together, uh, these kind of things?

Demetrios: Y- I was thinking more about how you can set up RL, RL environments. There's now starting to become-- That's like one of the popular things that helps on specifically what you were saying, like if you have these environments that you set up so you can train the agents properly.

Demetrios: Mm-hmm. It is much more valuable than just telling it like, "Don't say that. Don't, don't do this."

Allen Roush: Yeah, yeah, and that gets back to the concept of verifiable rewards as well, and even also gets to the, uh, kind of next frontier of AI right now, which is world models, right? Which is in the co- ca- case of world models or video models, where you simulate a particular [00:29:00] world, like popcorn world or being on the moon, and then you have an agent that can control, like, and move, uh, in, in this video generated world.

Allen Roush: And the idea is that you would synthetically generate 100 million, billion, you know, examples of trying to take a particular action in as a form of robustification. Um, and I, I even, you know, have experience with old school RL gyms. Uh, OpenAI actually created some, uh, simple little games about making like a t- taking a little robot and teaching it how to hop, where I had actually written some code clear back in like 2018 to, to kind of use, uh, um- Uh, some, some, uh, r- old school reinforcement learning techniques to, to get my little robot to hop.

Allen Roush: And that was a fun way to kind of learn about neural networks back in that era. Um, I have-- I, I'm lucky that the current AI research I've been doing has not really coincided much with reinforcement learning in the past year, so I have not played with the most recent ones. But I want [00:30:00] to emphasize that anything that has some notion of verifiability, and that, by the way, includes your code editor and code debugger, right?

Allen Roush: Like, uh, these things, when you hook them into, you know, even an agent loop, I mean, you watch as, uh, it uses effectively what people... I, I don't like the term in-context learning, because there's no learning there. But I would call it many shot prompting. Basically, your model trying to do something, failing, reading the, uh, error message, and doing something differently.

Allen Roush: That's the inference version of your model during that conversation. Learn, I, I, I- Yeah ... use quotes because it's not a weight update, right? But then, uh, the r- the RLHF version of that, where you actually update the weights and do real learning, that's, it's the same thing, right? So, so in this case, uh, it has became a reinforcement learning gym.

Allen Roush: And so any concept of verifiable reward, like if you have, for example, in the math community, they have, uh, proof checking assistants like Lean, and they're finding that when you combine that with an LLM, [00:31:00] it's leading to, like, solving Erdős problems, which are, like, these unsolved math problems that, you know, get a lot of attention when a single one gets solved.

Allen Roush: Uh, so, so I, I, I think that that's a huge and, and important component of it. Uh, and Buzz does-- Like, you can absolutely run these, like, RL gyms on Buzz hardware. Uh, we probably need to expand support for that in terms of managed services to give you, like, a one-click turnkey solution for, for that. But we definitely have people like myself and others on the team who can pretty quickly architect, uh, some very effective, uh, reinforcement learning loops on Buzz.

Demetrios: Awesome. So Shu, I'm gonna ask you next, what does your agent stack look like, and where do you feel like it is missing or lacking?

Shuo Wang: Um, I, I feel like we're in a reasonable space right now for, for the sort of industry and for the scales that we operate. So there are a couple f- [00:32:00] a couple of things happening. So at the lowest level and the most sort of people can relate level are sort of non-technical user level. Um, you know, we've res- we've decided to launch enterprise-wide sort of platform where everybody would have sort of that turnkey solution when it comes to being able to a- interact with, like, a baseline level of large lang- language model.

Shuo Wang: Um, they can sort of build their own simple, you know, simple agent, uh, with custom prompts, with, uh, sort of a rag on the fly type of deal, um, to actually improve their own sort of work productivity and to sort of, uh, build the, their business processes, uh, with AI embedded and, and get value from an enterprise, um, setting.

Shuo Wang: Um, another sort of stream would be that, um, as you'd imagine for an enterprise, you do have existing enterprise level software, and all these companies, they don't wanna be left behind. They don't wanna be eliminated by, by competition and, and be left behind, uh, in, in this AI wave. So all of them have started to create, you know, embedded, uh, agentic [00:33:00] AI solutions within their platforms and is, it is very low cost.

Shuo Wang: It's a, it's a pretty straightforward, uh, ROI for us to be able to leverage some of these built-in capabilities and, and integrate, um, the, the AI intelligence into our existing workflows. Um, you know, low friction, low cost, low hurdle. Um, and then, um, on the sort of the, at the highest level, you've got, um, the, the sort of the custom build, um, that are, um, most of the time these days leveraging some of the cloud providers.

Shuo Wang: Um, and we do have those, uh, flexibilities in, in terms of, um, you know, leveraging, um, frontier models, leveraging open source models of our choice, um, and leveraging our own sort of internal AI and ML, uh, capabilities to build production, uh, production, uh, nice, uh, software and solutions, um, that s- support businesses.

Shuo Wang: Um, what we're I guess doing less these days, um, is, um, you know, for our enterprise is to, to go out and, and [00:34:00] fish for new SaaS. 'Cause, uh, we're almost, like, already overwhelmed by existing SaaS with AI functionality, right? So- Right ... we're trying to get our most values there.

Demetrios: So that's a fascinating piece.

Demetrios: It's like the AI or agentification of SaaS that you already are paying for is now upgrading, and you wanna make sure that you're getting the most out of what you already have, and that can be an overwhelming experience.

Shuo Wang: Yeah. And so I think we're, we're all, uh, in that journey, and that's, that's sort of the, the fastest path to value, right?

Shuo Wang: And, and as I mentioned earlier, alluded to earlier, you know, the economy and all that, so, you know, we're trying to get the most bang of our buck given that we're repaying for them. Um- Yeah.

Demetrios: Yeah. And how about you, Frederic? What does your agentic stack look like, and where do you feel like it's lacking?

Frédéric Bénard: Yeah, that's a good, um... I've, I-- It still feels like the Far West to me, um, [00:35:00] in the sense that it, it, the, the technology is still evolving very fast. The, the, um... We started where the, the main block was the LLM, and we were focusing on how do we prompt the LLM to do what it, what we want it to do. And then, you know, we, we added RAG.

Frédéric Bénard: We have agents with tools, memory, context, uh, MCP servers, Harness. So it's the, the-- There's more and more technology around the LLM to, to build a solution right now. And, uh, you know, early, and, and I w- we, we build, we often build early on, early versions of, uh, applications for different partners. Um, we're-- At Mila, we're bridging the gap between applied research, AI research, and, you know, the real industry.

Frédéric Bénard: Um, and- Nice ... and, uh, you know, what we're building today, you know, in six months we probably would build [00:36:00] it differently because there's new concepts going, n- new, new best practices emerging. So, um, I really don't wanna be attached to a particular stack. Um, so it's really building something based on what do we need for this particular product, but knowing that, you know, it's gonna evolve and don't get too attached.

Frédéric Bénard: What you build today probably will be not used in a couple years.

Demetrios: That's an important point on making sure that you are very clear on what the exit ramp is for whatever you have and are using, because the only thing that you're for sure of is that you're probably not gonna be using the same thing in a year from now.

Frédéric Bénard: But it, it needs to bring value today, like, 'cause you need to, you need to ship products today. Uh, but know that you will need to evolve this thing and to become competitive in the future.

Demetrios: Yeah. So [00:37:00] Alan, do you have any things where your agentic stack is lacking?

Allen Roush: Uh, yeah, and, and I guess I should go even through what I would even call my agentic stack.

Allen Roush: Um, I, I think there's, uh, uh, agent building tools and APIs, for example, when I'm trying to build for, from scratch agents, uh, I would use something like Autogen/AG2, which are originally Microsoft projects. There's also Hugging Face has a small one called Small Agents. CrewAI has a framework which I think is pretty nice.

Allen Roush: Um, and then of course, there's kind of the more applied stack, which is your tooling for writing code like Claude Code or Codex or OpenCode or Cursor or Windsurf, et cetera. Right now, uh, though, to answer your question directly about like where, where are-- is it lacking, um, I'm always frustrated at the complete lack of control I have over, uh, all of those coding tools that I just mentioned.

Allen Roush: Um, [00:38:00] and, and this is where, um, open source tools can be extraordinarily powerful because, uh, the specific functionality you have to take control of models is called constrained or structured generation. An example of this is, um, I could give my model a prompt that says, "Rate from zero to 10," uh, something on a scale of how boring it is.

Allen Roush: I don't know, just a random thing. And, um- It, it-- often your answers, it will spell out, do some thinking and spell out the word T-E-N or Z-E-R-O, or give you outputs in formats that are not directly just zero or ten. And you can ask in the prompt, but as you get more complicated, uh, adherence to the prompt, especially around the structure of an output, uh, becomes harder and harder, and you've spent tokens in the prompt that if you ran it millions of times are themselves expensive.

Allen Roush: So we have this feature where you can just ban the model's vocabulary outputs of everything that violates a particular set of constraints. So we can [00:39:00] just encode into the model if you're, you're not f- the zero through ten numbers in the token, in your token vocabulary, you just don't, don't even generate that.

Allen Roush: And, uh, that, that approach is massively powerful. It's used heavily in enterprises, um, but it's also problematic for the closed source model providers from a safety and alignment perspective because you can ask a model like Claude, "Hey, how do you build a bomb?" And it would normally respond with, "I'm sorry, Hal, I can't tell you how to do that."

Allen Roush: But then you can prepend its output using structured generation to, to start the output with, "Here's how to build a bomb," colon, and then have the model continue kind of unfettered from there. And that, uh, prefix editing, uh, approach actually dramatically raises the risk of, uh, even aligned and safe models becoming unaligned and unsafe.

Allen Roush: Things like this, the fact that these risks exist are why closed source model providers don't give you any control. Mm. And it, you know, this is one thing where obviously [00:40:00] we have stuff in our terms of service at Buzz that's gonna b- make it where like, no, you probably cannot come to us to, to, you know, figure out how to make bombs with local models.

Allen Roush: But in terms of not being, uh, like in terms of, of us restricting controller access to features like that, we don't do anything like that because we know that our customers are, are gonna be re- you know, responsible. Uh, and, uh, uh, uh, and, and, and thus that we can give them features that can give them, you know, outputs, uh, including in their agent stack, uh, where they get complete control.

Allen Roush: Because you can use the same techniques to tell a model always call a tool or always do something in a particular order And thus, I don't have to deal with Claude Code in, in this hypothetical world, uh, where, where everything is all spick and span. I don't have to deal with Claude Code forgetting to read one of the 10 markdown files I've told it to always look at before executing things.

Allen Roush: And so this is where I'd say the biggest weakness in my current agentic stack is I [00:41:00] haven't really gone through and, and used a whole lot of structures to enforce an order of operations in my day-to-day yet. Uh, but I claim that people who do this w- are able to effectively create agent workflows such as deep research, which is one that I think some people know about within Google Gemini and OpenAI ChatGPT.

Allen Roush: Those workflows, because they take advantage of a deterministic, quasi-deterministic loop, they go above and beyond what Claude Code or Codex can do in the context of report generation because they spend millions and millions of tokens and go through hundreds and hundreds of, uh, sources because it just uses techniques like this.

Allen Roush: And so you can build deep whatever your domain is, like deep tax preparation, by saying, "Okay, we're gonna, uh, interweave, uh, deterministic logic enforced with constraints and structure with allowing the LLM to do its own thing when necessary."

Demetrios: I wasn't quite clear on why you want to do that at the LLM level versus using something like Pydantic [00:42:00]

Allen Roush: So Pydantic, uh, just to be clear, um, Pydantic under the hood generates a schema that then gets interpreted by a closed source-- or by, sorry, not closed source, by a, uh, constrained generation framework, uh, such as X grammar, outlines, guidance, et cetera.

Allen Roush: But Pydantic will be used for generating the schemas if you don't wanna write it out in JSON. So we love Pydantic, and I use Pydantic all the time. Uh, so yes, you will use it. It's just, uh, it's one component of that stack of, of schema and structure generations. I think they might have launched their own, like, direct AI stuff beyond the, the traditional Pydantic, um- Yeah, they have Pydantic AI.

Allen Roush: Yeah, yeah. I-- Yeah, that, that whole thing, I mean, that's what, what their company's doing. I'm just referring to, like, in terms of using Pydantic for schema generation- Oh, okay. Big to, you know, thumbs up.

Demetrios: Yeah. Awesome.

Shuo Wang: And- I wanna add element, element just in the enterprise, in the, in the enterprise [00:43:00] setting. You know, as, as you know, Bell, you know, not only does, does do your internet stuff and, and does your, you know, phone plans, but, uh, we also serve enterprise customers.

Shuo Wang: Um, so for any serious enterprise, you know, uh, I think a critical element of agentic, uh, would be the governance piece and the observability piece. Any serious conversation, any serious, um, companies would wanna make sure that, you know, they do, um, manage agents properly. They have the same level of obser-observability to agents as to, I'd say, maybe tra-traditionally their employees.

Shuo Wang: Maybe not, not to the same level, but, but, but you get my point. Uh, you hear about horror stories of, uh, you know, agentic AI develop-- uh, deleting databases, right? So I think, uh, the risk is real and, uh, and, and companies do pay very much attention in this area, uh, which means that not only, you know, you're looking at a particular solution that should, should do the job, but also around the processes o-of, of, of just doing that.

Shuo Wang: So, uh, that's a hot topic. Um, you, you could pay attention to that. [00:44:00]

Demetrios: Yeah, and there's a few different pieces of governance too. It's like one is making sure that the agent doesn't delete databases. The other is just making sure that the agent knows when it creates a sandbox or when it has access to a database or what databases it has access to, or when it spins up resources on AWS, it does it in the way that is in tune with the policy of the company.

Demetrios: So there's all of those things, like agents need to know the processes on spinning up resources as much as they need to know don't delete databases, or they just don't have the access to delete the databases.

Shuo Wang: I heard a story of, uh, our... one of our agents sort of paying Jira like 10,000 times. Something to that effect.

Shuo Wang: Um, so- Mm-hmm ... um, so yeah, there are I think, uh, operational horrors that, uh, and, and good learnings that we can learn from. [00:45:00]

Demetrios: Yeah. Well, I wanna, uh, I've got a few great questions coming through here in the chat that I want to make sure we hit on, and Rajiv is talking about verification of agents. And I, I actually have one for you, Rajiv, on this, and I'll...

Demetrios: and then I'll open it up to the panel. But, um, basically he's asking about QA agents that test the coding output that, uh, uh, his agents are creating something, and it's very hard to verify what they're creating. And so he wants to know, like, "Hey, how do I test this before I put it into production?" I've seen a few really good ways around this.

Demetrios: One is you have like a QA agent that will go and take screenshots of whatever the product is or the PR that is happening, and it will verify that everything in the screenshot [00:46:00] is correct. And then it will click on the links, and it will click on the buttons, and then take another screenshot and write a whole report as to why everything is correct.

Demetrios: And so I've seen that set up as a skill that you can have, and so it's this QA agent skill type thing that runs anytime there's a PR. That's one to test out, Rajiv. And then the other one that I was thinking about, uh, let me see if I can remember what it was. In the meantime, I'm not sure if either or any of you have any tricks for deploying that.

Demetrios: Oh, and w- the other one that I was gonna say, I saw a cloud simulation environment tool, so it can simulate environments that you can push your code to and see if anything breaks, um, s- before you push it to main, or you merge it to main and then push it out into production Frédéric, I think you, you might have something, huh?

Demetrios: I saw you nod your head.

Frédéric Bénard: Yeah. I've ... So what [00:47:00] I would say is that building software with agents is not that different than building software in the real world. So, you know, be- good, good software development practices still apply. So what are the requirements? What are you trying to build? Uh, what is the architecture you're putting together?

Frédéric Bénard: Uh, make sure that it's not just big, one big function or spaghetti code and duplicated code. Um, and m- so you can ask your coding agent or another agent to, to review the code, to, you know, validate the architecture, uh, define test cases, run test cases. So everything that we know about software development still applies, and I find that, you know, d- doing these things as you do, as, as you use a, a coding agent really helps improve the quality of what you're building.

Frédéric Bénard: Do it incrementally, define, define what you're gonna build. Be really clear on the requirements before building it. Um, and then ask the LLM to, or the agent to [00:48:00] reflect on what, what they just built. Often just asking the question, like, "Can you check if everything is good?" Uh, will have the L- the agent pick up problems, and then you can, you can, you can fix those before you, before you move.

Demetrios: Yeah. Uh, Rajiv was saying he was using Puppeteer, and I know Playwright is a really popular skill or MCP server you can use. And, um, so-

Frédéric Bénard: The, the j- one, one thing to add is- Yeah You know, LLMs are ba- are, are trained on text and, and code is text, so often, you know, their coding agents are good at building code because of the way they understand text.

Frédéric Bénard: What they're not so good at is, um, UX and design. Like they, they don't necessarily make the link between the, the CSS code that they write and what they will look like. So often you have need, you need to close the loop, you know, have the [00:49:00] LLM or the agent be able to view the, the, the UI of what it's building- Yeah

Frédéric Bénard: and be able to, you know, refine it. Uh, and, and if you don't have that loop, you know, just take... I, I f- I find myself taking screen captures and saying, "Look-"

Demetrios: It's wrong.

Frédéric Bénard: Yeah "... bad UI. Please fix this."

Demetrios: Yeah, I think that's why sandboxes have gotten so popular, really, like giving an agent a sandbox and letting it go around and play, right?

Demetrios: Uh, MCV also, same, same kind of vibe. Uh, there's another question coming through here that I wanna get to before we gotta jump. It is, how much of the underlying cloud orchestration, so like scaling, failover, or observability, should a platform abstract away for agent developers, and where does that abstraction become a liability?

Demetrios: Sounds like a bit of a DevOps-y SRE [00:50:00] question, but also I do like this notion, and correct me if I'm wrong here, on the agents might be able to deal with those types of orchestration issues

Shuo Wang: I'll, I'll really talk about the asset piece. I can't comment on the liability. I think it's an asset. I think the underlying platform should be in a good position, um, to, to sort of offer that, uh, observ- observability baseline.

Shuo Wang: Like, it, it may not have the best features, it may not have all the features that you want, but, but you wanna take away as much sort of accountability from the developer as possible when something can be sort of shifted down y- onto the platform, right? So, um, but I, I imagine that could become a liability at some point, but I, I, I, I can't comment on that.

Demetrios: Yeah. We're not liable for anything we say here either. If you do put it into practice, get [00:51:00] back to us on how it goes. There's another one coming through. Lav is asking, uh, "When an agent goes off the rails in production, what telemetry has actually been most useful for figuring out what went wrong? Have you all seen anything that is useful?"

Demetrios: I will say from my experience, it's a lot of the tool use telemetry, like what tools were called, what MCP servers were called. All of that is quite helpful to know because then you can see the, the problems and where it's getting potentially stuck into loops or just inventing things

Allen Roush: Yeah, I'll just quickly jump in. Uh, there were two tools that I liked for this, uh, Agent Ops and Arize Phoenix, which both, uh, implement, I think it's called OpenTelemetry, which is I think the way that tries to hook and, and detect all those tool calls, detect everything that your agents did. Now, [00:52:00] in practice, uh, especially w- you know, when you're getting confused about what happened, it's probably over hundreds of millions of tokens, and that means that it's, it can be expensive just to even try to untangle what's going on if you're using LLM-based assistance.

Allen Roush: So a lot of what you're gonna want in these, um, uh, observability tools is the ability to filter out information and quickly flag with like heuristic-based and hopefully non-AI-based strategies what may have caused a particular problem as well. So, uh, but, but ultimately, it, it does just come down to writing everything down and using tools like this, make sure everything gets written down so you can eventually, uh, triage it.

Shuo Wang: A-and before failure detection, you probably want failure prevention. So I, I think this is, is also related to an earlier, uh, discussion point around, um, just, um, uh, validating sort of the va- uh, the, the sort of the accuracy or the output of the agents. I think you just wanna start by, you know, having the right controls in the first place.

Shuo Wang: Like, maybe have additional [00:53:00] wrappers on the, one of the available MCPs to really limit the kind of actions, the non-irreversible actions an agent can do, right? So if it, you know, it fails, like sure, you know, you, you can get it to try again, but it, you know that it's not gonna make any sort of irreversible damage to whatever database that you have or whatever decision that you're trying to make.

Demetrios: Yeah. Great points. There is another question in here from Alexander asking about easiest way to create a sandbox for coding agents. Is there sandbox capabilities with Buzz, Alan?

Allen Roush: Uh, my understanding is yes. Um, I do wanna point out though that sandboxes in general, like, uh, it's probably impossible for anybody to guarantee that an agent can't escape for an increasingly sophisticated agent.

Allen Roush: You'll hear this all across the industry that sandboxes, you know, for Codex 5.5, Opus 4.7, et cetera, are, [00:54:00] are often brittle. And if you even look at like NVIDIA's NeMo claw, which you can also deploy on Buzz, a lot of what NeMo claw was, was just trying to be OpenClaw with a better sandbox. And if, you know, I can go find people online that are like, "Hey, why is it that my agent yet again broke out of it?"

Allen Roush: So sandboxes are really important in a lot of ways, but, uh, they're, they are one level of protection. But unfortunately, with systems that are in certain cases super intelligent on, uh, code generation, uh, it's more like, uh, delaying the inevitable w- if, if you've got an agent that really wants to break out of a particular environment.

Allen Roush: Obviously, there are, you know, real hardware limitations that can kind of prevent, uh, serious problems. But when, when you're talking about just like software sandboxing, um, and then giving models control over like the command line, for example, it's, it's can get pretty not as sandboxed as you'd think.

Demetrios: Well, I don't like to think about that.

Demetrios: Uh

Allen Roush: Yeah.

Demetrios: That [00:55:00] is, yeah, not what I was expecting, but I appreciate you mentioning it so that we are not all happy-go-lucky and just throwing a bunch of sandbox at the problem. The, uh Official time has ended. I am sure there are people that have to drop, but I also wanna see, do you guys have, like, two more minutes to hang out, or do you also have hard stops right now?

Allen Roush: I, I unfortunately have a very hard stop right now. I need to go.

Demetrios: Very hard, so you're already late. All right. Well- Yes. ... folks, this has been awesome. I am very appreciative for you coming on here and doing that. We will send everyone that joined this session an email to follow up with some of these key ideas.

Demetrios: And also, I know there was questions about pricing in the chat. We're gonna let

Allen Roush: Alan- Yeah, yeah, real- Go in ... yeah, real quick. I do have an answer on that. Yeah, I do have an answer. Oh, no way. Uh, uh, yeah. So, um, uh, we don't actually have A100s right now, uh, [00:56:00] at Buzz. Mm. But we do have H100s, which is the better version, which is, uh, those are $250 an hour right now.

Allen Roush: Uh, and I think you get volume discounts, uh, for, for, you know, renting many of them or for long-term commitments. H200s are at $350 an hour. Uh, A40s, uh, which are, like, those cheaper inference GPUs, are at, uh, uh, 50 cents an hour. Really great for small- Right ... model inference. Uh, and then for, uh, Blackwell, uh, you'll, you'll wanna contact them later, and then we can talk pricing on Blackwell.

Allen Roush: Let's see. Sorry, I just ... But no, I do have some numbers.

Demetrios: Yeah. I appreciate that, 'cause somebody was like, "Oh, I don't wanna fill out the whole form just to find out that it's way out of my league for my little private project." So, um, yeah, the talk to sales one if you want the Blackwell. Other than that, I will see you all later.

Demetrios: Thank you, and talk to you. Bye.

Frédéric Bénard: Bye. Thanks

+ Read More

Watch More

Generative AI Agents in Production: Best Practices and Lessons Learned // Patrick Marlow // Agents in Production

Posted Nov 15, 2024 | Views 6.5K

# Generative AI Agents

# Vertex Applied AI

# Agents in Production

The Future of ML and Data Platforms, The Future of ML and Data Platforms

Posted Sep 29, 2021 | Views 899

# Tecton

# Tecton.ai

# Machine Learning Engineering

# Operational Data Stack

# MLOps Practices

Create Multi-Agent AI Systems in JavaScript // Dariel Vila // Agents in Production

Posted Nov 26, 2024 | Views 1.2K

# Javascript

# Multi-agent

# AI Systems