Sign in or Join the community to continue

Small Language Model - From Experiments to Production // Joshua Alphonse // AI in Production 2025

Posted Mar 14, 2025 | Views 321

# LLM

# AI Apps

# PremAI

Share

speaker

Joshua Alphonse

Head of Product @ PremAI

Joshua is a seasoned Developer Advocate who leads Developer Relations at PremAI. Joshua has spent his time empowering developers to create innovative solutions using cutting-edge open-source technologies. Previously, Joshua worked at Wix, leading Product and R&D engagements for their Developer Relations Team, and at Bytedance. He successfully created content, tutorials, and curated events for the developer community.

+ Read More

SUMMARY

As LLMs become widespread, enterprises build AI apps, often exposing sensitive data to centralized service providers and getting locked into their models. While smaller, specialized models can cut costs by up to 70%. In this talk, I'll quickly take you over what goes into building production-ready SLMs.

+ Read More

TRANSCRIPT

Adam: [00:00:00] He's uh, he's inspiring. He's painting the perfect picture. Joshua, very nice to have you here. Yeah, Joshua Alphonse: pleasure to, pleasure to be here. Thank you so much Adam. Appreciate it. Adam: Yeah. Thanks for coming. And I hope you've enjoyed the conversation so far. Joshua Alphonse: Absolutely. Adam: Please. Um, if folks have questions right now, so Joshua, you're gonna be talking about, um, small language. Joshua Alphonse: Absolutely. Adam: Right. Yeah. I'm Joshua Alphonse: gonna be talking about small language models and testing inference and just kind of the findings that we've come across from our journey of Uh, creating a platform that really specializes in, uh, special small English models. Adam: Uh, it's very interesting because I think they've already been mentioned multiple times, even just like today, right? Like when it comes to guardrails, when it comes to like to voice agent, there's so many different things that I believe. Is the world going to be, you know, I'm going to leave all my questions for the end, but, uh, [00:01:00] Joshua, I'll be back in 20 minutes. If folks have more questions, drop them in the chat. And I'll see you very soon. Joshua Alphonse: Absolutely. Thank you, Adam. Appreciate it. Do you Adam: need to share your screen, Joshua Alphonse: by the way? Yeah, my screen is there waiting to go. Awesome. Awesome. All right. Thank you guys so much. And thank you, Adam. Thank you, Jacques, as well. And I, uh, hope to, to paint, uh, as good as a picture of Jacques here on, uh, what we're talking about. Small language models from experimentation to production or just testing to inference, uh, necessarily. So, I'm Joshua Alfons. I am, uh, Director of Developer Relations at Prem AI. Previously, I was at ByteDance, Open Source and Wix as well. But, you know, I've been diving deep into AI, small language models with the folks at Prem now. Um, so for those who are not familiar with what we do at Prem AI, I'll just give you a quick overview of what we do, just because I think it helps warrant the rest of the conversation and, you know, why I'm even talking about this. [00:02:00] So, uh, Prem facilitates fine tuning, optimization of models. Uh, with your data in a safe and secure environment, meaning that we work with companies that are working with proprietary data and want to make models that they can own, um, and run on their own infrastructure. So at Prem AI, we have a testing environment called the lab, which allows you to simultaneously test out. Uh, four different models at a time in the space before you put them in production. We also have a place for you to collect traces and monitor to see the performance of those models, uh, before you even put them out in production or even during, uh, while they're in production. And we also have an autonomous fine tuning agent that allows you to pick a base model, choose the training depth. You drag and drop in a data set, and you let the agent do the rest of the work for you. So you could just be a software engineer and not have to be a machine learning engineer or data scientists just to fine tune a model and put it into your apps. Um, so we have a really easy way to do that. And [00:03:00] then we also have the launch pad, which helps you easily launch. your model to our API and SDK so that you can, uh, put this into any application with any language, you name it. Um, and all this is taking a larger model, distilling it into something much smaller, keeping it accurate, keeping it secure, and something that you can run in your own infrastructure by downloading the weights or just downloading the whole dataset itself. So LLMs are not as accessible as you may think, but this is a little bit outdated, I would say, because LLMs are getting, uh, a lot more accessible, but still there is this, uh, the struggle between, uh, resources. LLMs require a lot of resources, complex integration and deployment of such. And can we trust large companies with our data? You know, oftentimes we come across, uh, companies that are, uh, you know, interested in working with us at Prem. And the one thing that we always come across is that people are saying, can we trust [00:04:00] open AI? Can we trust Google? Can we trust, uh, Amazon? Can we trust all these other hyper scalers with our data, especially if you're working with proprietary data that has to do with financial compliance, healthcare, entertainment, you name it. Um, and at prime, we've come across, uh, you know, folks from all these different types of industries, um, and companies that, you know, they trust us, um, with, with our method. Yeah. Um, sometimes over what some of these large companies do. So, uh, where do SLMs, uh, you know, actually, uh, you know, progress, right? Um, so SLMs are, uh, they are great offline. They are, um, uh, great offline on, on device and on premise where your local inference. Is actually may be required. Um, they can handle latency very well. So in [00:05:00] applications where near instant response are crucial, like real time conversational AI, or where time is sensitive. S L M s really shine their their compact size is efficient for different architectures and minimizing latency and providing that seamless experience, uh, really helps process data locally and S L M s can eliminate the need for network communication and reduce response response times. S L M s are also great for cost limitations and constraints. Uh, their smaller size can translate to lower computational requirements as we talked about before, and they've reduced the need for expensive hardware and cloud resources. Um, I've seen a lot of people now with like Mac minis and turning them into Mac mini towers, and now we have the new Mac studio. So you don't have to buy a bunch of like, you know, NVIDIA GPUs to run models and fine tune models or train them now, um, especially if you're working with SLMs. So during development and testing phase, SLMs are [00:06:00] developers best friend because they can be easily integrated into your resource constraint development environments like local machines, as I mentioned, or low powered servers. So this allows developers to quickly iterate and test their applications and continue to fine tune models over time without the need for a costly infrastructure. So while LLMs are impressive out the box for like performance wise, they may not always fit for a specific task or specific need. And that's who we definitely specialize in working with at Prem because fine tuning SLMs on domain specific data can yield superior results compared to using just generic LLMs that are pre trained. Um, so we can focus on narrowing those tasks and leveraging relevant training data Um, and fine tune. S L M s can capture those nuances that you're looking for. Um, and the terminology and pattern that's unique to the target domain, leading to improve accuracy and relevancy when you're [00:07:00] interacting with these models. So in summary, small language models offer a powerful, versatile solution for wide range of applications. They enable offline inference, handle latency well, constraints, and provide a cost effective option for you. So, as the field of NLP continues to evolve, SLMs will play an increasingly large, crucial role in bringing together AI for a broader spectrum of things. You're starting to see a lot of companies now as well create AI agent building platforms. Um, you know, we have other competitors like RC as well, who have taking the stance of this drag and drop environment of putting together a agents and giving them different tasks and different models together. And they're utilizing S L M s to the fullest as well. And we could see the same thing happening with Lang flow from from data sex. So overall, there has been this evolution of language models over time. [00:08:00] Now things are subject to change because even just today we've seen this announcement with Gemma three. So Um, that that's changing. We'll go over that in the next slide as well. But the evolution of language model sparks the questions. Can smaller models Significantly less parametric memory and emulate such an emerging ability for large as large language models. So language models have been shown to exhibit a range of emerging abilities such as summarization, arithmetic, translation, common sense reasoning, um, and as well as all these other reasoning models that have come out. And as they're scaled up in size and trained on diverse amount of data, we want to see if these abilities suggest the language models are not only learning the surface patterns of language, but also acquiring some degree of semantic and logical understanding of the world and what text is. So although researchers have had different breakthroughs and, like, you know, other modalities like CNNs and attention mechanisms have contributed to the significant [00:09:00] advancement, the recent narrative of AI can be encapsulated into one simple concept, and you can see here from this graph and all the other graphs you'll see, and that is scale. So increasing data, model size, computational resources has led to this advance, uh, this enhancement in performance. But the notion that bigger models are better is frequently accepted without people actually questioning. So, uh, however, it's essential for us to consider the consequences of developing application using the largest available models. Sometimes we don't necessarily need to do that. Small language models rely more on their pre existing knowledge and semantic priors. Gain during the pre training process. So even when given this example that contradicts this knowledge in contrast, large language models can overcome their pre existing knowledge when provided with examples that go against it, despite potentially having a stronger pre existing knowledge compared to the smaller models. So we can continue to fine tune these smaller models [00:10:00] to actually work in the constraints that we need them to. So in other words, small language models stick to what they learn during pre training, even when shown from other conflicting examples within large language models that can adapt to new information from examples, even if it contradicts what they've learned during the pre training process. So, uh, today, you know, we got an announcement, uh, early on about GEMMA 3. This is another thing to show you like how model sizes are starting to become a lot more compact, especially on the open source side of things. So Gemma three is just 20, uh, 27 billion parameters right now compared to some of the others. Like, you know, we have, uh, when, um, when 2. 5, Uh, you know, we have, uh, Lama 70 B, uh, deep seeker one and Gemma three, um, which I definitely want to keep trying to try out today as well. Is it just been shown to have a much smaller parameter size, um, as well, and [00:11:00] is performing really well on the, uh, ELL score as well. So let's just bring it to plain words. SLMs are fast. They're compact architectures with fewer parameters, enable quick inference times, and local deployment eliminates the network latency and can result in near instant responses, ideal for applications that require that real time processing. SLMs are also cheap. The model size reduces the computation of resources required and also the hardware costs are much lower, so lower demand of expensive cloud resources makes SLMs More cost effective and attractive to certain industries and certain companies as well. They're customizable. SLMs offer a consistent performance across various deployment methods, um, and environments. So deterministic outputs are, uh, ensure reliability and re, uh, reproducibility of these results. So easier to test, easier to debug, and you can maintain to larger and more complex [00:12:00] models. Um, so you can continue to fine tune S L M s on domain specific tasks yield superior performance on those targeted tasks as well. Um, and you know, that also comes with the fine tuning process, which, you know, we focus a lot on that prim. So, um, you can adapt to these different industries, the jargon, the writing, the styles, however you want. Um, and you can add that person a personal, uh, user experience. And S L M s are also great for privacy because Um, They have a parameter size that you can fit on certain devices like edge devices, phones, cars, um, you know, tablets, you name it. Um, as long as you fit within the parameter size that can be on locally on these devices, it can eliminate the need to send data to any type of external service servers or cloud services. So this can ensure compliance with certain industries and companies that want to work within this data, um, within these data privacy regulations and reduce the security risk. [00:13:00] So SLMs have an advantage that makes them super attractive. And I also just want to point out that Size doesn't always necessarily matter. SLMs are streamlined, uh, by their variants that are LLMs, and boasting few parameters and simple architectures, they're designed for local data processing, making them ideal for on device implementations, like I mentioned, and SLMs strike the balance of performance and practicality, bringing the language, uh, language AI to a wider range of devices and applications. SLMs can be effectively trained using, um, smaller curated data sets leading to greater explainability. So smaller data sets allow for a different type of understanding of control over the model's behavior and outputs. So this is also really good for retrieval augmented generation when you combine it with S. L. M. S. As well because it can significantly boost the S. L. M. S. Performance and experience. And rag also involves the relevant information [00:14:00] for external sources and documents. Um, and then also with all the new events with the M. C. P. As well, this can all combine into this into this point by combining S. L. M. S. Rag, external data sources, external tools. Developers can create a powerful a I language model with solutions that are not only efficient but highly effective and user friendly. Um, so large language models can lack like precise information, especially because it's trained on so much generic data. Um, despite their impressive capabilities, they suffer this precise lack of control. Uh, the unsupervised training approach using LLMs results in models that can generate fluent, but sometimes inconsistent biases. And outputs without explicitly guiding the fine tuning. So as Adam was mentioning before, there was some talks earlier about the guardrails and so forth. LMS may produce responses that are inappropriate, offensive or just factually incorrect. So it just creates this black box, black box [00:15:00] nature of LMS, making it difficult to understand and explain their decisions, making decision making processes and raising concerns about transparency and accountability. So in contrast, SLMs trained on carefully curated data sets, offer greater control, explainability and alignment with specific tasks and values. But SLMs are not perfect either and have a lot of work to do. But I also believe as you combine these SLMs and these agents together, you can continue to see this improvement. Um, sometimes SLMs that are just coming off the pre trained, uh, SLMs are, uh, limited in context and understanding. So while SLMs require offer a lot of these advantages. They face certain challenges and limitations for contextual understanding. Inferior contextual awareness due to fewer parameters. S. L. M. S. May struggle a little bit to generate responses that require deep contextual awareness. However, with the shorter context window, they may have a limited ability to maintain context over a long [00:16:00] conversations. But this is where the fine tuning process continues to come in. So, uh, there's there can be a reduced accuracy, uh, and performance. Smaller language models often, uh, have lower quality outputs again with the pre trained versions of these, um, particularly in nuanced, complex, um, tasks and generalizations. There's a limit, limit in creativity and also in variability. So less creative responses due to the training process, the pre training process and data with fewer parameters as well. So there can be some repetitive outputs and less creativity there that the large, the larger counterparts have no problem doing data dependency. So. The performance of S L M s heavily relies on high quality training data. Um, as we say at Prem, better data, better model. So there can be some inaccuracies or biases in the training data can significantly impact their [00:17:00] performance. So limited knowledge bases. S L M s might lack the extensive knowledge base that are found in L L M s. And then there's also scalability issues. While SLMs can be fine tuned for specific tasks, their general applicability across a wide range of tasks may be limited. Um, so I just also want to mention that, like, you know, there are some other limitations there. Um, but we can also, you know, continue to focus on token generation, speed optimization, response time requirements, streaming and batch inference. CPU versus GP tradeoffs. Edge deployment considerations. There's a lot of things that we have to think about when it comes to the scalability piece of small language models. But there are a lot of other solutions that are emerging now that are helping with all those different pieces to make small language models. Better and more performant. So here are some just emerging use cases of SLMs. And this list will continue to grow, but just some things that I, you know, like to see agents, of course, using function calling and MCP or however [00:18:00] way you want to do it, SQL generation, data labeling, a few shot prompting, and of course rag. Um, so, um, by combining, uh, all of these things together. Um, you can make a very powerful, small language model. Now, um, at prem, we also have a few other side projects like bit GPT, which is actually powered by small language models. And it's a decentralized AI agent, uh, web three wallet that can instantly connect to any decentralized network. And it uses AI agents and fine tuned AI agents that were built on the prem platform in order for, to make web three more accessible. To, uh, to users because it just offers us a gentic chat interface that you can, um, actually chat with your wallet, connected different chains. Um, and this has just been a proven use case that has worked with SLMs in order to make these secure payments and transfers of cryptocurrencies, [00:19:00] function calling, and local deployment, uh, is big of course. And when it comes to function calling language models, uh, achieve are fast and accurate. So traditionally large language models, you know, if we're looking to the past, like GPT 3. 5 is used for this purpose. And of course, all the others afterwards as well. So this is where small language models are making a significant impact now with with tool calling. Um, and the prime example of this is like closed source models like, uh, like called inversion and from the past and they developed, they were a really big research team in that case. So, uh, we seen like some good function calling results from, you know, functionary and Mistral Lama three, so forth. And you can run a lot of this on a single GPU and you can do that same now with, with Gemma three. Right. So of course, with a retrieval augmented generation, you know, LMS like GPT four, they have huge knowledge embedded in it. And, you know, which are not required for [00:20:00] rags. So possibly we can build SLMs that can handle better rack use cases. So we've seen a lot of like cool stuff happen with command R and command R plus. And, you know, um, a lot of the other models from Lama as well that are doing really good with retrieval, augmented generation for certain industries. So recently we released like a, uh, small language model called prime one B just open source outside of what you can do with our platform. You can find it on hugging face and we continue to research. So our small language model was up against like tiny Lama, um, you know, a few others as well. Um, and we noticed that like, you know, from our understanding, we started from, uh, from, from Lama too, from this, from this model a while ago, um, it didn't perform necessarily as well as the others. But. We decided to actually, uh, go on to the next bit and, um, create prim one B SQL, which has 26. 5 K plus downloads on, on hugging [00:21:00] face. We kept it, uh, really specific on, uh, SQL, um, and, you know, being able to generate SQL from, from text. Um, and this solution has become really popular. Um, it's growing a popularity. We are, um, you know, working on a newer version as well. That is going to be even more accurate. And as we started to specify how our small language model was operating in certain use cases. That's when we started to get better results. And we're starting to push out more of these open source models as well. So we've learned that small language models, of course, do really well with specific tasks. So just some quick success stories from what we're doing at Prem. Um, we've worked with Marvel Studios, uh, as well on a few different projects, utilizing our platform and other technologies as well. But small language models and our fine tuning process as well has been used as well. For some of the new and upcoming projects that are coming out of the, uh, Marvel studios camp as well. So just some quick quotes from, uh, [00:22:00] the founder of Marvel studios, who's working closely with us as well. And, uh, just some of the benefits that small language models has brought to the tables, reducing costs, reducing time and augmenting their IP portfolio and extending their proprietary models as core assets, not just the superheroes. Um, so why prem, we focus on data sovereignty and ownership. We offer on premise solutions, customizations, and of course, small language models. So, thank you all so much for sticking with me throughout this talk. You can find me on Twitter, LinkedIn, GitHub, or X, right? Um, and you know, you can take a shot of the QR code, join us on Discord, uh, leave me some feedback, learn about some of the new things we're releasing at prem. Um, and thank you all so much. I'm ready for some questions. Adam: Nice. Joshua, thank you very much. Uh, thanks for joining us and for sharing of course Let's see if we have any questions. I'm not sure we have any questions in the audience yet Give me one second. [00:23:00] Let's see. Okay. I'm not seeing any questions in the audience yet, but I kind of want to pick your brain then on a couple of things. So first of all, in terms of, so it looks like we already have this kind of like tendency in the market to build more capable yet smaller models, right? So this is the kind of like one trend that we're seeing certainly like in the open source space. But on the other hand, we also have these capabilities of just like distilling. larger models into smaller models, right? And then we're sort of like in control of the ultimate science. Do you sort of see those as either converging in some way? Do you think that at some point we'll just have like, like, yeah, for example, like, and to what extent do you see people already just using smaller models that are more capable versus, okay, well you could. Start with a large model and then fine tune it on very specific tasks for you. And then create that. Cause I feel like there could be like two different versions of the world that we get [00:24:00] to see. Joshua Alphonse: Yeah. Um, that's a, that's a great question because, um, some of the things that we focus on at prem, of course, like we have this autonomous fine tuning agent that takes care of that distilling process for you. Um, some of the use cases from our customers and our users of course are like very industry specific. So we work with. financial compliance companies and different banks across Europe as well to, uh, create small language models that can operate in tandem, like, you know, as agents together to then help with creating this, uh, environment for, uh, you know, accurate, um, financial compliance, um, and, uh, and checks as well, uh, also within healthcare. So, um, there's this convergence of where, where this is happening. And of course, like. A lot of these companies that we're working with are working with, uh, proprietary data that they don't want to be handled by, um, some of these larger companies. So that's why we give folks the ability to download their own weights. They can run on their own infrastructure. And we also have a self hosted version of the [00:25:00] platform as well. So, um, just from my experience of what we're doing with prem, we've come across like a lot of, uh, of companies that. Signify this, this convergence that, that you're speaking of, and we're making it more accessible so that there are developers that are, may not be. Machine learning engineers or data scientists can just focus on putting their app in production, taking the right data, and just being able to ship something really fast, um, and keep it accurate and keep it small. Adam: So, speaking about the right data, we have a question here from Sarah, but before then Uh, Walton asks, can you see domain specific SLM coming in the future? Joshua Alphonse: Oh yeah, for, for sure. I can see some domain specific, uh, SLMs coming in the future. I mean, like, we've, we've had some other, like, competitors out there, like, you know, Uh, like, Writer and, like, um, what is it, Jasper. Like, I feel like they are still in that whole LLM landscape. But they do have solutions for specific [00:26:00] industries, like again, like in financial services, healthcare and so forth. And if we're looking to, you know, uh, lower the cost of, of AI, and we've seen this advancement happen with like, you know, what's going on with Gemma right now, what's happening with deep seek, what's happening with a bunch of things where we're finding ways to lower the costs and keep things more specific. So I definitely see SLMs. Taking a big leap forward in the industry as things get more expensive. Things are just advancing and getting smaller, to be honest. So Adam: we got a question here from Sarah. The sequel specialization. It only helps when the model knows the queried data architecture. How is that taken into account? And that seems to be a missing step for Sarah. Joshua Alphonse: Yeah, and that's a good question. Um, so yes, you would have to know, um, the, the, uh, the whole query, the whole SQL architecture as well that you have. Um, and this is where we're experimenting now, of course, with our own fine tuning agent, like the, the prem one B SQL that we [00:27:00] released. Um, was more of like a general premise, uh, texas sequel model that you just kind of start from scratch. But of course, like we want to be able to then add some more functionality that, you know, we can understand an existing database as well. So please keep up to date with us on, uh, on all of our different channels. And, um, you can test out prim one B sequel on hugging face as well, um, and see how it's going and leave us some feedback because that's something that we're aiming towards for this next version. Adam: Yeah. Yeah. Maybe if you can stick around in the chat, maybe folks want to know like the, the link to, uh, to the repo, let me ask you just along the same line, it seems relatively clear to me that if a company wants to go and start investing in SLMs, then they should start with some data that's already kind of like, like representative of their domain in some way. Right? Like, cause I, I imagine that the more specific the tasks. The better the SLM will perform. Do you find that [00:28:00] really like the, the, like the challenge here and the bottleneck for getting companies to use SLMs is for them to even figure out what the data ought to be in the first place? Joshua Alphonse: Yeah, I, I definitely think so. And again, like, you know, what I, what I mentioned before is the one thing that we just always, we, we come across as questions sometimes. And it's just like, we always say, you know, better data, better model, right? That better, the higher quality of the data, the better the model. And one thing that, you know, we've, um, you know, been able to accomplish really well at prem as well, is like being able to start from a small data set, that's still high quality and give you like pretty good results. And then we have like synthetic data generation on top of that, that can, you know, uh, create a larger data set and so forth, but yeah, like, you know. Adam: Some examples or some like some ideas or principles for like how to start with that small data in the first place. Like what, what, what is that data? Joshua Alphonse: Yeah. I mean that, I mean, that's really depends on how specific you want to get and what industry it [00:29:00] is. Um, just know your industry the best that you can, um, and take advantage of synthetic data generation as well. Um, because it's a way for you to like fast forward to see the performance of this model, um, and so forth. So I would start small, try to get as high quality and then use as much synthetic data as you possibly can just to see the results, uh, faster. Because I think sometimes people want that, uh, instant gratification as well to see how they can advance a lot further. Adam: Small, even hundreds. Joshua Alphonse: I'm sorry, Adam: as small as an even hundreds, hundreds. I mean, right now we're Joshua Alphonse: even starting people at 50 at 50 data points, right? So you can go from 50 all the way up. That's what kind of where we're suggesting from, from how our platform operates. Adam: There's so many people that are now just like using agents and building agents, investing in like agent infrastructure, and they're trying to like conceptualize how, what the relationship needs to be between all these different concepts. So for example, do you have in mind that [00:30:00] multiple agents would be sharing one SLM? Or are we moving towards a place where really there's like an agent to SLM kind of like? One to one relationship. How do you see the bound? I mean, Joshua Alphonse: I, I personally think it will be that one to one relationship. Um, I think sharing, I think each, like just in terms of how the architecture is, having different agents that are maybe even using different SLMs for a specific part of the task is something that I've seen also be popular for with like folks at. Uh, RC Langflow, um, when I used to work at, at ByteDance, we had a, um, a, another agent framework called Coase. So it operates in that same way. And I think having that different type of one to one relationship will just have a better performing agent architecture, just in my opinion. Adam: Joshua, thank you very much for coming and sharing with us. This has been a lot of fun and I see people in the audience writing that this was a wonderful introduction to SLMs for people who don't know about it. [00:31:00] Joshua, thanks again. Thank you, Joshua Alphonse: Adam. And thank you, MLOps. I appreciate you guys. Can't wait to come back next time.

+ Read More

Sign in or Join the community

Like

Comments (0)

Popular

Watch More

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines // Arnav Singhvi // AI in Production Talk

Posted Mar 07, 2024 | Views 943

# DSPy

# self-refining

# constraints

# assertions

From Research to Production: Fine-Tuning & Aligning LLMs // Philipp Schmid // AI in Production

Posted Feb 25, 2024 | Views 1.3K

# LLM

# Fine-tuning LLMs

# dpo

# Evaluation

Generative AI Agents in Production: Best Practices and Lessons Learned // Patrick Marlow // Agents in Production

Posted Nov 15, 2024 | Views 6.4K

# Generative AI Agents

# Vertex Applied AI

# Agents in Production