Sign in or Join the community to continue

Exploring the Impact of Agentic Workflows

Posted Oct 15, 2024 | Views 7.8K

# AI agents in production

# LLMs

# AI

Share

speaker

Raj Rikhy

Principal Product Manager @ Microsoft

Raj is a Senior Product Manager at Microsoft AI + R, enabling deep reinforcement learning use cases for autonomous systems. Previously, Raj was the Group Technical Product Manager in the CDO for Data Science and Deep Learning at IBM. Prior to joining IBM, Raj has been working in product management for several years - at Bitnami, Appdirect and Salesforce.

+ Read More

SUMMARY

In this MLOps Community podcast, Demetrios chats with Raj Rikhy, Principal Product Manager at Microsoft, about deploying AI agents in production. They discuss starting with simple tools, setting clear success criteria, and deploying agents in controlled environments for better scaling. Raj highlights real-time uses like fraud detection and optimizing inference costs with LLMs while stressing human oversight during early deployment to manage LLM randomness. The episode offers practical advice on deploying AI agents thoughtfully and efficiently, avoiding over-engineering and integrating AI into everyday applications.

+ Read More

TRANSCRIPT

Raj Rikhy [00:00:00]: My name is Raj Rikhy. I'm a principal product manager at Microsoft in Azure data, and I usually do shots of espresso over ice with oat milk.

Demetrios [00:00:16]: Agents. Agents. Agents. Agents. Agents. Today we are talking agents, and it's just in time because our next virtual conference is coming up on November 13. And guess what that's gonna be about.

Raj Rikhy [00:00:29]: Yep.

Demetrios [00:00:30]: AI agents in production. It's not just agents. It's agents in production. So if you want to join us, we'll leave a link to sign up as we do in the description. For now. Talking with Raj, I had one gigantic takeaway, and it was around the question of, how do I know if something is not possible or if I'm just not using the right tools to accomplish the task? Do I need to try a different agent framework? Do I need to try something different in my stack to get the desired outcome? And his response, I thought, was right on the money. We got to start with this, because it feels like there's the agents that you can have that are inside of certain programs that will work that program, but then there's, like, open ended agents, then there's agents, like I just said, that they're dealing with pixels on screens. So when you look at the plethora of agents that are out there, how do you categorize them and bucket them?

Raj Rikhy [00:01:49]: I think it's helpful to take a step back. And before we talk about, like, because we do want to talk about it in the context of actions, but let's. Let's take a step back and figure out, like, what are we actually talking about here? Right? So, agents, actually, as a matter of practice, have been in. In the academic discipline since, believe it or not, the fifties. Like, reinforcement learning has, you know, the field of reinforcement learning in both, you know, things like psychology as well as things like dynamic programming, it's really been about understanding within the scope of trying to take actions on a user's behalf, what is the universe of things I can do and why, and why do I want them to do those things, and how do I make them do it better? These are actually the principles that underlie when we talk about agents. Forget about AI when we talk about agents, right? These are very fundamentally the things that we look for. And if we even peel back another layer, like, what is an agent? Forget about what we're trying to do with them. What is an agent? And at the base level, an agent is some kind of an entity, software or otherwise.

Raj Rikhy [00:03:06]: It's some kind of an entity that takes actions autonomously, that has decision making power and capabilities. That's what an agent is. And I think the key thing here, to your point, around differentiating agents, if we think about an agent as some kind of an entity that can act autonomously, the reason why it's useful to understand the lineage, the provenance, the history of agents, is that we can learn things that have been studied since then. So in reinforcement learning, when you're using agents, just by virtue of the fact that we're trying to get things, that we've been trying to get things done for a long time. And just for some context, I was part of the product team that led developing the first deep reinforcement learning platform as a service for Microsoft, the first one that really achieved a high level of scale. And our customers were asking at that time, there's not really a whole lot of folks that are, I would say it's being done in research, but it's not really like, okay, I'm going to go deployed my business and kind of a thing. And there are these principles that we had to educate folks on and I think it's worthwhile to talk about them. So one of them is we talk about our environment.

Raj Rikhy [00:04:33]: We kind of sit pen to paper and figure out what is the universe that we're operating in. We're not even talking about start coding things up. We're not even talking about what model you're using. This is verboten stuff at the moment. We just want to get a sense of what are we trying to accomplish. When you sit down and begin the process of ideating, you know, what is your actual end goal and what is the intended behavior that you want the agent to undergo. You've taken that very first step. And really it literally as simple as sitting down.

Raj Rikhy [00:05:11]: And this is why I think some PM's are drawn and even data scientists who are capable of visualizing those products, they're drawn to try and figure out, okay, what I want to get done with agents. Because in a very real way, what you're doing by practice of writing things down with agents is you're creating an mvp for an agent, okay, and when you think about like, okay, what are the different types of things I can do with agents? Or what are the different types of agents? It really is a function of, okay, well, now you've written down what it is that you want. Now figure out how you're going to get it done. And so whether that means in the context of a video game, right. Your purpose, agent, is to be a enemy, to be an antagonist and light the light the main character on fire. Right? Like that's, you know, that's your purpose. That's what you need to do. Figure out a way to do that.

Raj Rikhy [00:06:06]: Okay. And that's one way to think about it, another way to think about it in terms of what you want to accomplish and how it is. It's like, okay, how am I going to figure out how to get this guy to light him on fire? Well, he needs a flamethrower, right? Obviously, he needs to figure out how to understand the environment. He needs a map. And these are tools. This is how an agent accomplishes its goal. It's a tool, a toolset. Practically speaking, when we talk about agentic implementation, we talk about in the form of it's either a function call or a tool call.

Raj Rikhy [00:06:37]: But ultimately, if we're connecting the dots here, that is the capability of an agent. You're describing the things that they can do. They can drive to the grocery store. They can pick up the bag of flour. Right? These are the things they can do. So you've gone from defining your goal to defining what it is that they're capable of and how you can offer them things to do that. Okay, I just want to pause for.

Demetrios [00:07:00]: A second because I think you said something that is worth noting, too, on the environment piece. If we're talking about the game the agent is playing in the game, that is their environment. If we're talking about an agent looking at pixels on a screen, that is their environment. If we're talking about an agent working inside of Microsoft fabric or Microsoft office, that is their environment. And so almost like you're putting the guardrails on, here's your environment, here's where you play. And this is one other piece, one other principle. Now we'll give you the tools. Now we'll give you the why or what we want you to accomplish, but it's going to be within this framework.

Raj Rikhy [00:07:39]: Right. And I think that that's important not just because, you know, you're thinking about the universe that they're playing in, not just because you're thinking about what capabilities they have and what they can get done. There's this other principle about agents that in the scope of what we're talking about is really important, which is how well can they do it? In reinforcement learning, we ascribe this principle basically around reward. Folks are kind of familiar with positive reinforcement. Negative reinforcement. You want to do positive reinforcement because you want to continually reinforce the positive things. And what's the reward? The reward of positive reinforcement is praise. That's how you offer, that's quite frankly, the reward you're offering, the increase in praise which will lead to another outcome.

Raj Rikhy [00:08:31]: Well, you can't necessarily praise software, although you can try, but. But what you can do is you can define parameters for success. So in a game that is, as an example, like let's use our flamethrower guy or flamethrower gal, right, light them on fire more, you know, decrease their health faster, right. Get them, you know, zero to fire in less than 3 seconds. You know what I'm saying? It's, what are the parameters for success? Make sure you deliver that. You know, it's these parameters of success that matter in the scope of accomplished Canadian. Once you honestly, you do not need to program anything to accomplish what I just said. And that is important because as you know, as we know, programming and code development is a means to an end, right? Like it is a tool that is used that has, you know, other types of libraries and, you know, anything that you want off the shelf to go and accomplish something, but you don't necessarily need to program to accomplish the objectives of a successful agent, whether it's in the scope of a game or scope of.

Raj Rikhy [00:09:48]: Anyway, I've been talking a lot, but let me pause there. Does that make sense? Is that all?

Demetrios [00:09:53]: Yeah, for sure. And one thing that came to mind when you were saying this is the trade off, I think that you get between defining how narrow you define these success criteria and how much you leave open for interpretations. And I can see how sometimes we can inadvertently shoot ourselves in the foot by defining too much. That's right. But I also see that you want to define more because it gives you a little bit more reliability. So there's that trade off. There's that line that you have to.

Raj Rikhy [00:10:33]: Walk, you know, this is so important, right? Like this is where, you know, the pedal meets the metal in terms of agentic development. Do you over define? Do you not over define? Right. And the tools that we use to make these determinations are getting more sophisticated, more mature, and more abstract, which is a good thing for everybody involved. I think about six months ago, as an example, one of the most popular repos auto GPT, they had this interface called forge. It was like this ux, and they might still have it, but the point of it is that it basically functioned as this high level abstraction where you can put in some text and the text becomes the metaprompt. The agent is supposed to be defined by that, but there's no facebook. It's not exactly the easiest ux to work with, then there's this other step level thing. Now we have some nicer interfaces, now we have cleaner libraries, and now the code is more mature, there's more pull requests and things like that.

Raj Rikhy [00:11:47]: The other thing that's at play here is, and this is kind of a big word, but it's used intentionally, is your orchestration. What is your scheduler? Who is doing the planning? Are you offloading that planning to somebody else in the form of azure? OpenAI is doing the planning for you because it has a planner behind the scenes and it's doing that on your behalf. Or OpenAI, I say azure open AI, that's an interface. Like OpenAI has an assistant API right there. Other cloud, as an example, has a function calling API and it has a planner to figure out how it calls those functions. A lot of that logic is there, but if you're running it by yourself, you have to figure out what that plan looks like, whether you're defining it like semantic kernel or lang chain or land graph or whatever it is. That's your responsibility. If you're not coding it, are you leaving it up to an algorithm? Right? Like there are planning algorithms out there too that can do this, and maybe those are good for you.

Raj Rikhy [00:12:48]: The point of it is, you know, the definition that you put into the environment and the criteria for success should start upfront on paper and you should have a well oiled understanding of how success looks like before you get to the point where you're running wild. And by the way, there's one more thing that I want to come like, just call out here, LLMs are great, right? But LLMs are random, and that is by nature, right? There was about a year ago, you know, and you probably heard this before, Demetrius, but like, you know, people like, oh, yeah, we ought to reduce hallucinations. It's hallucinating all the time. It's always, hallucinations are the product. That is the product, right? Like that is what an LLM does. It fabricates your goal, right? Your goal is to figure out how to effectively fabricate. And I know that sounds ridiculous, but by definition a large language model is probabilistic and it's stochastic. It is designed to be random, okay? It is.

Raj Rikhy [00:13:59]: You can create affordances, you can create capabilities that are more deterministic, that are more planned, but that is up to you to define and to govern as the, quote unquote. And I'm saying this only because like, it's important to think of yourself this way as the architect of that system. Yeah, right. You may not be an architect. That's okay. You don't have to be an architect. What you do have to think about is that you are acting as an architect. You are, you know, the puppet master.

Raj Rikhy [00:14:32]: You are pulling the strings. That is who you are. Whether you want to accept that role or not. It is important because anything that happens in that system, good or bad, ultimately, are things that you can affect, not control, but you can affect. Right? Because remember, agents have autonomy.

Demetrios [00:14:52]: The idea here with defining all these different pieces and recognizing how many variables you have and that you can, how many knobs you can tweak, right, is very valuable. And so maybe it's worth talking through different knobs that you want to be looking at and focusing on or ways that you can. I think I'm thinking about it as how can we? For my own use, I've gone and pounded my head against the wall because I don't know if what I'm trying to get done is not possible or if it's just I'm not able to make it work because of my architecture or my puppeteering, as you will. And so I'm wondering if you've seen any good ways to just get that fast feedback loop and recognize and debug quickly.

Raj Rikhy [00:15:54]: That's a excellent point. Because like I said, right? If you are in a position where you are architecting that system, you want to figure out what failure states exist before your users or whatever know, forget about the users. Like, let's say you're not designing for somebody else. Let's say you're designing for yourself, right? You want to figure out what those failure states are before you put it out in the wild, right? And you're never going to be able to discover every single one. But there are some tried and true patterns to help you figure that out, right? And I'll give you an example of how not to do it before I tell you how to do it. Like, one of the things, you know, people, again, we're talking about the hallucinations thing because people tend to be like, you know, people tend to be angry about it. They're like, I don't know why. It's so, you know, making fake crap up.

Raj Rikhy [00:16:49]: I don't get it. You know, it's. I just wanted to write a legal brief. Stop making up fake law cases that, you know, it's like, okay, listen, listen, right. The way that, you know, the way that you can explore what these failure states are is actually pretty straightforward. Here's pro tip. Never give your agent raw web access out of the box. Just don't do it.

Raj Rikhy [00:17:21]: It's a bad idea. It doesn't matter what safeguards you put into place, just don't do it. You need to give it some kind of directive. Okay. Like it's dumb.

Demetrios [00:17:37]: Yeah. Too much agency.

Raj Rikhy [00:17:40]: Yes, yes. It's like, it's like, here's a tablet, five year old, you know, go get addicted to find out. Yeah, yeah. Gosh, you know, I found YouTube. You know, it's like the key here is vet the outcomes yourself first. And we always, you know, even before agents came along, we would say this like with generative AI, always at the first iteration for the mvp, keep the human in the loop. Agents are designed to act autonomously. That does not mean that they have to action that autonomy.

Raj Rikhy [00:18:19]: You have the opportunity to stopgap before that action is taken and ask it to explain the next steps that it is in fact going to go do. And in fact, actually it's quite good at outputting the text kind of in like a debug log or a debug stream. What it's going to do and what it's supposed to do, you could ask it to simulate its own activities. So that's one way to do it like a human in the loop interpretation of what it's actually supposed to do. Another way to do it is to constrain the state space, constrain the environment. Right. See how successful you are with a small landing space before you go out and do a broader thing. And ill give you two examples of this.

Raj Rikhy [00:19:00]: The first is I have kids, im assuming, I dont know you have kids, but you mentioned a five year old. So you have kids. Okay, so if I give my five year old chocolate milk, the first thing that hes going to do, if I dont, if im not looking, is hes going to go run toward the couch and sit somewhere comfy. And invariably what's going to happen? It's going to get all over the couch. That's what happens. Okay. And you're like, okay, I've got to clean this up. What do I do? Well, you're not going to use a random cleaner on the most visible spots on your couch.

Raj Rikhy [00:19:32]: You're going to spray it in an unaffected area first, make sure it doesn't, you know, mess up the whole couch. And then if you're okay, then use it on the chocolate milk. Right. You don't want to make a problem worse just because you think it's going to solve it. Right. Like the cleaner is the agent in this case, right?

Demetrios [00:19:49]: Yeah.

Raj Rikhy [00:19:50]: So as an example, let's say that you want to create a support agent. You're not going to give it the whole compendium of your support documentation and start firing it at support tickets. That is a recipe for failure. Instead, instead pick, like I would say, cherry pick some of the lower priority support cases that are more clearly defined and provide it with that support documentation context and understand the behavior of how it would address those. And whether you have a relatively high np's on ticket closure or nothing, right. Try and create a smaller constrained environment to start with. And then the third tool that I would say for agents, okay, is don't try and get fancy out of the box too fast. I mentioned there are things like planning algorithms, and you have an opportunity to try and offload some of that stuff.

Raj Rikhy [00:20:59]: If your MVP is pen and paper, then the step beyond MVP is not like, I'm going to code this up in blank chain. Please don't. The reason why I say that is because you want to try as much as possible to understand how to action success fastest, getting stuff off the shelf to do that, for example, using a function calling API or tool API or even if you're not a code first person, go find a no code tool after you've done your pen and paper and see if that solves your problem. And if it's not solving a problem, understand why it's not solving it. But don't get to the point immediately where you start going. I need to quite literally open up the hood and start changing my transmission before you got took Safeway, right? Like that makes no sense to anybody and it shouldn't make sense to you, right? Like these same principles apply to software development, right? You know, you wouldn't start writing your own database just because your relational database is not sufficient for your use case. It's crazy, right? It's literally crazy. Don't do it.

Raj Rikhy [00:22:13]: But people are like, whatever, SQL, I can just do this, it's fine, right? But these things, these tools exist for a reason. I don't bring the burden on yourself and try and iterate from there.

Demetrios [00:22:26]: One thing that I going along those lines of that last point you made and I often think about is you mentioned auto GPT earlier, right?

Raj Rikhy [00:22:38]: Totally.

Demetrios [00:22:39]: If you were playing with auto GPT six months ago or a year ago and you were trying to use agents and you felt like, dang, this just isn't working, these aren't reliable, what's going on? And you were trying to tailor your environment and do these things at the end of the day, you probably would land on the conclusion of, it might be the wand, not the wizard here. Like, I wish I could make this work, but I don't think I have the proper tools. And so then maybe you jump to another one. And now today, there are so many different agent frameworks and tools that you could use. Tools, not in the sense of agents using tools, but us as the orchestrators being able to use tools or a product, I should call it. How do you know when the product has hit its maximum? How do you kind of evaluate that? Like, oh, this may be because this. Exactly.

Raj Rikhy [00:23:42]: I think so that's, it's, it's. It's useful to evaluate it in the context of the scope of what you're trying to accomplish. And I think the auto GPT example is actually a good one, because, let's be real. Like, a lot of the tools that are surrounding agentic development today are designed for, for code native folks. Like, full stop. Yeah. That is not only because the industry and I think the applications in the field are moving very quickly, and it's much faster to iterate when you're just working in a code native environment to try and figure out, okay, how do you land something with these new capabilities that are coming off the shelf when people are constantly innovating? It does not really permit experimentation very well. That's why, by the way, this is not unique to, this is not unique to agents, okay? It wasn't that long ago that people were just training models everywhere off the shelf, and they were like, yeah, no, here you go.

Raj Rikhy [00:24:46]: Like, it's a paper. Here's the code. Like, you know, look at what he did. Look at what I did in this paper. I did that. You know, and people are like, I have no idea how to reproduce this. This makes no sense. This code is spaghetti, and I can't stitch.

Raj Rikhy [00:25:02]: Like, the variables. Naming doesn't even make any sense. I can't interpret it, let alone contribute, let alone reproduce it. So it's like, okay, and then hugging face came along, and now they have, like, a whole way to serve these models within friends to test them out in a playground. So I think it's a very, very good question if you're trying to evaluate wands, you know, and you don't know if it's the wand or maybe it's the wizard, right? Yeah, don't go for, you know, I'm going to go deep here. Don't go for Voldemort's twin wand out of the box. Okay? Like, there is nothing wrong with the trading wand. No one is asking you to bond with Sephirothist snapes leftover wand.

Raj Rikhy [00:26:00]: Figure out how to start with the tools that are not bleeding edge. Even the tools that are not leading edge go with the most common denominator. And there's nothing wrong with that. If you're a code native person and you're having trouble with auto GPT. First of all, you're not alone. I was struggling six months ago to set up auto GPT myself too. The documentation is good, but you know, sometimes you end up in challenging spots. Okay.

Raj Rikhy [00:26:32]: Environment variables aren't exactly likely documented in some places anyway. It's fine, right? That's not to say that it's not a great library. It's fantastic. I've used it before. That's an excellent library. It has incredible contributors. I really think that it's a great piece of software that's driven by the community. It's open source, and as you mentioned, there are several other libraries out there that are equally as high quality.

Raj Rikhy [00:26:57]: But the reality is that if you were struggling with it, don't ask yourself to do hard labor and explore cavern without a flashlight, because there are people that are doing that. Figure out a way to pair back until you can come together, cobble together a solution that looks pretty close to what you want to accomplish, even if that's something that's like a no code solution. And that's a great way to start. Like Copilot Studio as an example. You know, there are tools off the shelf to do this stuff. Don't twist yourself into knots unnecessarily. Now, you asked a very important question, which is, how do you evaluate the success or failure, given that you've settled on a tool? And the answer is, quite simply, understand what success looks like. Go back to what I was talking about at the very beginning and be clear.

Raj Rikhy [00:27:58]: Are you able to accomplish with a clear mind and a clear intent to the best of your ability with that tool, that success criteria that you spelled out on paper? It's a very easy way of understanding the problem. And if you come all the way back to the no code thing that has everything, all the bells and whistles out of the box, you just can't get there. Try the next step up, see if that gets you where it is. But be clear about that success before you get started, because otherwise you're getting lost in this four decks of tools.

Demetrios [00:28:29]: That's so good. That is. And it, I have to make the point that knowing what success looks like is also not something that is unique to agents, right? That is something. And it's so easy to forget. It's so easy to get wrapped up in all the cool stuff and the shiny bells and whistles, and then next thing you know, you're sitting there going like, oh, weren't we just supposed to do that? I think if you, if we may have defined this a little better, we wouldn't have gone completely off the rails.

Raj Rikhy [00:29:09]: You know, as I, as a product manager, I, you know, I, whatever you want to call me, code fluent, but don't get to code, you know, like just the person that's in front of the customer, whatever, whatever you want to call me, right? Like, I subscribe to the school of thought around customer development, and the principles of customer development and lean startup are really focused on what is that granular piece of value that you're delivering. And when you talk about thinking and architecting an agentic system, like it or not, you are, in fact, creating a product. And the reason why I say that is a product is designed to accomplish a task through some combination of systems, whether that is software that you're writing for yourself, whether it's web software, you have a database, you have an application, you have whatever other ancillary services, right? The point is that people are using those services to accomplish something. The difference with agents is they're designed to accomplish those tasks autonomously. You're giving them the tools to go accomplish something. So in a way, each agent is itself a product, and that gets, you know, it can get, it can get difficult if you aren't thinking about it with first principles in mind, right. Because it can get impossibly complex if you put yourself in a situation where you say, I'm going to have an agent for literally every single task, and they're all going to work together and they're going to have a hierarchy, and you're going to have a platter and you want to put all this stuff together, it's going to be beautiful. And you're thinking about it as a conductor, and you have this symphony and this orchestra, and you're just like, da da da da da da da da da.

Raj Rikhy [00:31:10]: Please keep it simple. Please keep it simple. Please keep it simple. I will say that the best systems, whether they are stitched out of AI or whether stitch out of software, solve problems simply. And if you can manage to get an agent to solve a problem simply, you won. Yeah.

Demetrios [00:31:43]: Yeah. And it's especially exacerbated because if it can do something in one hop that it in other times has taken five hops to do. You're gonna get more reliability. You're gonna get, it's gonna be faster. Everything is going to be better because it is simple.

Raj Rikhy [00:32:02]: Right, right. And I think to make it accessible to folks because I do think actually that this is, and if you'll excuse me, Demetrius, this is where I kind of go a bit hyperbolic, but I do think, yeah. Oh, indeed. I do think that agentic workflows are going to be fundamentally transformative to software at some level. Like Steve Jobs when he said the good technology is indistinguishable from magic. Right. I think one of the things that has happened, particularly with generative AI, is that when we first started interacting with Jet GPT, I'm talking about like layperson, right? Like just anyone on the show, we immediately jump to, like the first thing you do is you ask like a dumb question, what's, you know, what do you think about, you know, Abraham Lincoln's, you know, beard speech or something, whatever. What theater? Like, you ask it something dumb, just test how it goes, and it gives you this educated, genuine response, and you seize it and you're like, wait, that was pretty good.

Demetrios [00:33:19]: Yeah.

Raj Rikhy [00:33:21]: The next thing that you do is you leap to, what can this do for me? And it wasn't until the introduction, really, in my opinion, of function calls, of tools, of things that you can offer to the agent, right? Like access to the web, access to a database, access to documentation, the ability to interact with an API, the ability to perform inference of another model and obtain that and use it for the purpose of predicting something else, right? Like these are all capabilities that are inherent in agents. It wasn't until we got to the point where we can create that action space for these transformer architecture, generative AI models, these large language models that we were able to make the leap from what can it do for me? To how can I get it to do it for me? And that is, I think, where agentic AI has such promise. But I'm going to put this caveat that it is very early. That doesn't mean that you shouldn't educate yourself. It doesn't mean you shouldn't try. Just because the iPhone was the first model doesn't mean it didn't have its hiccups. I remember working with the iPhone press. I was like, wow, this magic, this is really cool.

Raj Rikhy [00:34:43]: I was like, I still don't understand why can't, I can't figure out how to get the calling to work while I'm texting somebody and I just want to text. It had its flaws. That's why software is iterative. We are early in the agentic software development life cycle. Give yourself and give the tools that are offered in its stage of maturity some grace, but don't try it and then be like, oh, it's too early. I can't make sense of it. Stay persistent, because that piece of magic that happened after you first interacted with the LLM, it's there. You just have to dig a little bit and work on it.

Demetrios [00:35:30]: Yeah, you mentioned support agents. I think that's a great use case for agents. And there's some companies that are just absolutely blowing up right now that are doing that type of thing for agent workflows. What are some other workflows that you've been seeing and you appreciate?

Raj Rikhy [00:35:52]: You know what, I'm going to go rapid fire here. There are so many of them. So many of them. Okay, so, all right, here we go. You ready? I'm just going to start going off the top of my head.

Demetrios [00:36:07]: Hit me, hit me.

Raj Rikhy [00:36:08]: All right, so look, you know you can fraud detection in finance, right? Identifying suspicious transactions and stopping them in their tracks and fig and contacting the user on that behalf, like intercept the fraud at that moment, right? Supply chain optimization, bin packing is an age old problem. Everybody deals with it. It doesn't matter what industry you're in, okay? But in supply chain, it's so much more acute because there's always real time information that is throwing things awry. And being able to act reflexively is the name of the game. Okay? Patient monitoring. Okay, let's say, you know, I'm just, I'm going all over the place here because there's so many applications. Like, let's say you have an EHR. If you know anything about healthcare, there's all kinds of information and telemetry in that EHR.

Raj Rikhy [00:37:03]: There's charts, there's doctor's notes, there's intake forms, there's patient history, right? Like, if there is an issue, why are we waiting until somebody recognizes there is an issue? Why not have proactive monitoring to see if any of that data flags, to see if there should be a follow up. Totally asynchronous, but a huge opportunity for agents. I'll give you another one offline, because people always think about online, they never think about the offline applications of agents. I got a farm, right? I've got sensors. I got telemetry in that farm. What am I doing with those sensors? What am I doing with that telemetry? I'm doing nothing with it because I have to monitor it, I have to sit on it, and I have to figure out where I want, like, ridiculous. What's the fertilizer contact? What's the nitrogen content of the soil? What am I supposed to do with that? No, make it actionable. Hand the agency over to an agent that can turn on or off the water, you know, apply nitrogen through kind of.

Raj Rikhy [00:38:06]: Some kind of a drone or some kind of a distribution system, you know? Cool. I'll give you another one that I really love. Here's another one I love. Okay. Personal shopping assistance. Take a snapshot of your closet. Take a sa. Forget your closet.

Raj Rikhy [00:38:25]: Load up your social media. Okay? Connect your Instagram account. Yeah, yeah, totally. Whatever. Okay. Get, get your, get your LLM some idea your multimodal LLM some idea what you're wearing.

Demetrios [00:38:40]: A.

Raj Rikhy [00:38:42]: Have it scour the web for your brands and recommend you the clothes. Why are you looking for the clothes? Is it in look, sales and the sales. Okay, yeah, forget about the clothes. What about the sales? Right? You don't have a personal shopper. Why, you know, nobody's saying, don't go into the store, but don't you want to know, like, why are, like, you know what our alternative here is to be true? It's RSS feeds. It's RSS feeds. That's the closest approximation you have. People are parsing through emails and pamphlets to figure out, like, how do I get the best deal with the thing that I like? Why are you doing that? Don't do it.

Demetrios [00:39:28]: That is such a good one. I actually saw the other day somebody created, like a tinder, but for clothing items, and you could swipe right and swipe left. And so if you swipe right, it saves it. Or it will take you automatically to that. But it is. That's novel. That's not your shopping assistant that can say, it would be even cooler if you could say, hey, I just got you five shirts because you're thinking about getting a new shirt. These are all on sale.

Demetrios [00:39:57]: You've showed interest in them in some way, shape or form, or you have one of them in your closet. You have been watching videos on TikTok about these types of shirts. But going back to, there is something that I wanted to point out with those first two use cases that you mentioned, which is that they're very real time. The fraud and the supply chain are very real time. And in my mind, the use of LLM and real time never really went together because it is so slow.

Raj Rikhy [00:40:31]: Right.

Demetrios [00:40:32]: And so, like a fraud, you can't be sitting around for two minutes waiting to see if your credit card passes.

Raj Rikhy [00:40:38]: So there's. I'm actually really glad you brought this point up, to be honest. I think that there is a huge investment opportunity that has yet to be fully realized and movement around small language models. I think we've looked at LLMs as being a generalized set of capabilities that offer a wide variety of possibilities. But I think that as the maturity lifecycle of. As an example, do you know how many possible computer vision models, specifically models have been trained? I mean, I innumerable, like, yeah, here we are talking about like Lava 3.1, Lava 3.2, you know, Claude 3.5, you know, like get down to and get, you know, will, you know, will 7.53, you know, it's like, look, the point here is that the accessibility that small language models represent for latency and inference, and said plainly, the smaller you get, the less expensive it is, the more accurate it potentially is, and the more powerful it is. And the faster it is, fundamentally, and the faster it is, the cheaper it is, the faster it is. There's also a parallel movement that's been going on around quantization and parameter efficient fine tuning.

Raj Rikhy [00:42:26]: And I think that as we start exploring a lot of these use cases, the inference cost, whether that is done in terms of tokens, whether that's done in terms of latency, whether that's done in terms of ram size, the actual compute resources that are required for inference, we are going to continue to see that decrease and decline. This is the part where I have to say that this is on more of the bleeding edge in terms of the return on investment. But the key here, the important thing here is if you can get yourself and your application to the point where inference speed and cost is the problem, you already won, because you've already solved the problem of how do you achieve your goals? Have you delivered success? And there are tools out there that will allow you to do it. And even if you have to go and hire data scientists to figure out how do I train an SLM on this dataset, or how do I provide a graph rag interface that will make this more efficient and quicker, how do I reduce the amount of resources that's associated with inference because I'm getting too much consumption downstream on my API. These are great problems to have, especially if you deliver in real time. If you can prove that the value of supply chain optimizations happens in a time series. If you simulate the outcomes and you can showcase that you're more efficiently allocating resources in your supply chain and you can do that effectively. And the only barrier to entry is that you need to make that faster.

Raj Rikhy [00:44:13]: You are in a very good spot, my friend. You are at a very good spot.

Demetrios [00:44:19]: Yeah, that's, that is a great point. The speed is the last thing. There's so many other questions that you need to tackle or so many other obstacles we could say, and hurdles that you need to get through. By the time you get to speed as the hurdle, you're looking good. And so instead of jumping, I guess I kind of jump to the end and say, whoa, is it ever going to be that fast? Yeah, I understand there's distilling. I understand small language models, but I also see the side of if I'm doing fraud and it has to be quick. I do think there is a world, though a little tangent, where you get these workflows that are using a combination of small language models, plus just traditional machine learning, plus you've got these workflows that are plugging in. And again, going back to that orchestration, each step on the orchestration, whether synchronous or asynchronous, it doesn't need to be be.

Demetrios [00:45:25]: Every time there's a step on the orchestration or agendic workflow, it has to be LLM call or just language model call. It can be a regular regex, it can be whatever you want it to be.

Raj Rikhy [00:45:40]: Right.

Demetrios [00:45:40]: And so whatever you find as the most optimized and also then going to that, because I know people when they go into production, one of the very difficult parts of, again, walking that tightrope of the trade offs is, yeah, we can get it fast, but we can't get it reliable. Yeah, once you start trying to shave off time, then you start seeing the accuracy go down. And that's a classic problem that is as old as machine learning. If we're going back to these, the fifties, as you mentioned, since agents have been around, that's kind of been the name of the game.

Raj Rikhy [00:46:17]: Yep. And I think, look, here's the deal. You're not alone in this. Like, it's human nature to confront something that is new and make a justification why that new thing will fail, and whether that is intentional or unintentional, whether that is justified or not justified, I think it really behooves us to kind of see past those initial objections and figure out a is the value worth it? And quite frankly, am I just saying it's not going to work because I don't really know how it would work?

Demetrios [00:46:56]: Or am I just saying on the braid?

Raj Rikhy [00:47:00]: Yeah, or I just want to discount the potential. It's okay. Nobody's going to fault you for calling out is what are, quite frankly, the main impediments to rolling this thing out at, you know, mega cluster scale. Right. Nobody's, you know, if you go somebody on the street and they're like, you know, why don't I have chat GPT everywhere I look? Well, yeah, I mean, it's a problem. I mean, there's other stuff. There's work you got to do. But the truth is that, you know, all of.

Raj Rikhy [00:47:39]: I wouldn't say all, like, there are an incredible, there is an incredible concentration of minds that is focused in trying to figure out alternatives to these objections. And just because they're working on it doesn't mean they're going to solve it, but it also doesn't mean that you can outright cast it out.

Demetrios [00:48:04]: Yeah, that put a very not scary but fun picture in my head of, yeah, like these brilliant folks with their candlelight just working on figuring out how to make this, how to get this working, and it's like this army of smart people, and then me, the naysayer, being like, nah, that shit ain't gonna work.

Raj Rikhy [00:48:27]: Yeah, right.

Demetrios [00:48:28]: It's like, dude, you know how many smart people are working on this problem?

Raj Rikhy [00:48:31]: Yeah, but it's, it's fine. It's fine. You know, like, we've seen this happen everywhere. You know, people said the same, people said the same thing about. About the web, if I be totally honest. Look where we are today.

Demetrios [00:48:48]: Yep.

Raj Rikhy [00:48:50]: Yeah.

Demetrios [00:48:50]: We're having a call. You're halfway across the world, and we're having a call, video call, nonetheless, on the web. Reading the browser.

Raj Rikhy [00:48:59]: In the browser.

+ Read More

Watch More

Small Data, Big Impact: The Story Behind DuckDB

Posted Jan 09, 2024 | Views 13.3K

# Data Management

# MotherDuck

# DuckDB

From Arduinos to LLMs: Exploring the Spectrum of ML

Posted Jun 20, 2023 | Views 762

# LLMs

# TinyML

# Sleek.com

The Impact of UX Research in the AI Space

Posted Nov 12, 2024 | Views 504

# UX Research

# AI Space