MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Agents in Action: Real-World Applications // Zach Wallace // Agent Hour

Posted Jan 24, 2025 | Views 133
# Agents
# real world
# AI agents in production
Share
speaker
avatar
Zach Wallace
Engineering Manager @ Nearpod Inc

Software Engineer with 10 years of experience. Started my career as an Application Engineer, but I have transformed into a Platform Engineer. As a Platform Engineer, I have handled the problems described below

  • Localization across 6-7 different languages
  • Building a custom local environment tool for our engineers
  • Building a Data Platform
  • Building standards and interfaces for Agentic AI within ed-tech.
+ Read More
SUMMARY

Agents are transforming how we approach problem-solving, automation, and user interaction. In this talk, I will explore the practical applications of agents, focusing on how they can deliver value. We'll discuss when agents are the right tool for the job, scenarios where they are not the right tool for the job, and strategies for deploying them to production with confidence and reliability. Whether you're new to agents or looking to refine your approach, this session offers actionable insights grounded in real-world experience.

+ Read More
TRANSCRIPT

Demetrios [00:00:00]: Zach, I'll hand it off to you to tell us about some real world use cases that you've seen. These are always fun. I've actually been steeped in real world use cases this week, so it'll be good to exchange notes. What do you got for us, man?

Zach Wallace [00:00:15]: Nice. Yeah. So first, thank you so much for this opportunity. I'm excited to share and honestly excited to hear thoughts of others, especially others with any experience. It was amazing to hear your talk Nirodya.

Zach Wallace [00:00:29]: I probably said that wrong. I'm sorry, please correct me afterwards. But yeah, I'm gonna dive in. And just as a heads up, I am a huge proponent of eating your own dog food. So I am going to share a presentation that is typically used by. Actually, let me get the link. Sorry, I copied the wrong URL. This is typically used by teachers to provide lessons in real time to students in throughout 75% of US schools and districts.

Zach Wallace [00:00:59]: This is also the product that I'm building. So I'm going to. I'm going to use this as a presentation tool to really talk about real world agents. We are exploring the practical applications of agents. I'm going to dive briefly into that. There's so much to talk about. I'm hoping we could hit on more during the roundtable, but I want to make sure we had everyone in alignment with kind of what I'm talking about, where we are and so some of the objectives that I'm going to hit. We're going to talk about agents in their applications, some examples, when to use them, when to not use them, evaluating and some future trends that I'm seeing.

Zach Wallace [00:01:33]: This is much more high level, just kind of giving an insight from engineering aha moments, if you will, and also business aha moments. I'm in an interesting point in my career where I'm sort of making a transition to a manager role. So I wanted to give and reflect both sides of this. So with this, you know, what are agents and how are they used in production? While we're going through this, a brief concept of agents we have purpose oriented and independent reasoning and adaptive. This is exactly what you just heard in some of the talks there were. It was a little more technical saying, like showing you exactly how we think about the data transfer, how we think about the different areas. Again, this is a little more high level, maybe taking, you know, not trying to, not trying to differentiate too much but just, just trying to say there's a lot of agreement from the talk I just heard and this definition. Now there's an interesting Space coming up in this field where integrating with an LLM and calling in an agent is not always the right term per se.

Zach Wallace [00:02:35]: To a lot of the folks who are writing the prompts and building out the Python code for an agent, we're now seeing what's being introduced as a workflow. So these are much more focused, they are generally task oriented, they're predefined logic paths. So when you think about an agent, you think about agency, right? When you think about a workflow, you think about, okay, this is a deterministic set of logic that I want you to go through. So for instance, we just did something at nearpod where we can actually generate questions for teachers that are using our platform and the teachers will enter in a prompt. We're going to generate those questions. What we're asking the LLM to do is really just generate the question. All that matters is that it's generating the question. We're going to let it build that content, but we're not going to let it define things like, should we check to see if this is safe content? Are they trying to hack us? Do we want to try to create a different slide in a different area? That's not what we're allowing it to ask.

Zach Wallace [00:03:41]: We just want it to be a predefined flow where you enter something in and we'll generate a question. You, you'll see these workflows actually predominantly used and discussed as agents throughout. But where, and I'm going to go back a slide here, but where it's really important to understand is that agents have so much many more possibilities. They have independent reasoning, they're adaptive, you can think about feedback loops. When you think about. We'll talk about a blog editor in a second. I'll go a little bit more into this. So that's the next slide.

Zach Wallace [00:04:14]: Nice. So for a blog editor, this is a real world example that I built in my free time because one, I'm a nerd and two, it's super interesting and three, I need to completely separate what I'm actually doing at work from what I think is super interesting. And this was how I built it. So what comes of this is that my main task, the main purpose of this agent is to edit and review a blog, but there's subtle workflows that it has to do throughout this, where it may need to review blog posts to any content guidelines, make sure that anything you're posting is relevant to the content guidelines, where this has agency in saying this. And again, agent agency. There's sort of A play on words there, but it's super helpful because that independent reasoning aspect of agency. And so as we're going through this, you look at these blog posts and the main orchestration agent can determine if we even need to call this workflow. Do we need to use any of these workflows for generate recommendations? Do we need to generate recommendations? Was there anything wrong with this? We're letting that orchestration aspect of the agent determine whether or not to do this.

Zach Wallace [00:05:25]: Should we check for plagiarism? Arguably you should always do that, right? Maybe there's some places even within your independently reasoning agent that you actually do want to add some determinism, right? Maybe we always want to check for plagiarism. Maybe it's Reddit and you don't care if anyone plagiarizes anything, right? So there's just a lot of different areas here where you have to think through, okay, what are we trying to do? What is the purpose of the agent? Do we need any independent reasoning or agency in this? Or can it be a workflow? And what I'll say is that you can now compound these. You can now say, okay, I want to have two agents. And exactly as we just heard, you can have multi agent layers where they're having independent agents reasoning at each layer. I thought that was a brilliant way of stating that. And as you're going through these layers, if you flip it almost to a tree structure, right? So you think about it. In a tree structure, you can now have agents at the root level who are calling either workflows or agents or both. And then those agents will continue to have child nodes potentially or no nodes at all.

Zach Wallace [00:06:32]: Because again, independent reasoning but the workflows, that's the end, right? That's the leaf node of this tree. They don't, they won't go down any further because they, they may, you know, I'm, I'm keeping it very broad here, but like at the end of the day, that's the whole purpose of defining workflows versus agents. What I found, for someone who's just trying to enter into this space, focus on generating workflows. Workflows are much easier to analyze. Now with that, what are some of the realizations that I had going through this? Proof of concept is so quick. Refining your agent is very tricky and releasing to prod is extremely scary. What do I mean by this? Well, proof of concept to the development lifecycle of something going from, you know, conceptually in your head to a proof of concept of saying, hey, this is working, right? Take for instance, this blog editor. It took me probably a couple of days, three days maybe max, to have an entire LLM working, going through a blog that I wrote, providing edits and potential things I should do related to.

Zach Wallace [00:07:36]: Does my blog flow well? Do I have the right audience that I'm trying to hit? Do I have, you know, what tag should I use? Well, for SEO purposes, am I plagiarizing anything or could it be deemed plagiarism? So many different things. It's checking for grammar. It took three days to build. It takes about an hour or so for me to build each workflow and maybe let's say five hours to build an actual agent. Right. When you put that in terms of what's possible, think about how long that would have taken just to write a blog editor without LLMs, without AI. That would have taken a long time. You have to go through all of this data.

Zach Wallace [00:08:11]: You have to, you know, write out serially all of the different grammar fluctuations in different languages, etc. This just doesn't. Right. However, please remember this just does. It means it's doing about 70 to 80% accuracy. Getting it to 90% accuracy is extremely difficult. And why is that? Right? Well, one prompt engineering is sort of new. That's part of this trick.

Zach Wallace [00:08:38]: Also there are feedback loops you have to consider and things you have to try to implement. And that's tricky to reason with. It's tricky to realize in these new paradigms how it's going to work. But what really takes the longest time is integrating with SMEs. Do you have someone you can go to and say, hey, please tell me why this, this AI is, you know, only working 70%, 80%, which grammar mistakes is it commonly making? Which one should I, should I put in there? Right. There's a lot there. Talking about releasing to prod, that's extremely scary. Your risk just shot up through the roof because you're allowing a non deterministic LLM to now generate content for you, right? In this, in this proof of concept, what I was doing, I'm sorry I don't have an image of it here, but what I was getting it to do is just generate recommendations that text.

Zach Wallace [00:09:31]: I have no idea what it's gonna, what it's gonna say. Could it be psychologically harmful? Could I be sued for liability in that area? Those are things you need to think about from this, this angle. Some of the complexity though, and these are really important to think about. Observability patterns, when you're talking about workflows, the observability is pretty standard and easy to be completely honest. You, you know, just with new relic or datadogs or something like that, you can, you can identify patterns pretty quickly when we talk about it and one of the best things of the talk prior was the state data management and I loved how we talked about graphs can be used to connect the data across layers. That's one of the key areas that's extremely complex when you're using multi agent frameworks to solve some given problem. And more to come on that in hopefully a couple months from me as we're about to release something pretty cool. I can't.

Zach Wallace [00:10:21]: But yeah, so we'll have, we'll have a lot more, more insight there soon. But state data management is extremely complex. You have, you can have multiple agents running concurrently, potentially generating data and then how do you integrate that data? What do you do with that data? Can you stream that data via, you know, via API streams and whatnot? And then additionally, what are the feedback loops? How do you understand which, like which responses are going to be useful for you from the user perspective? How do you integrate SMEs to make sure that they're able to review this check over the complexity of the agent. I mean we're essentially, and I spoke here a little bit with Demetrius, but we're generating three year old agents, right? Three year old consultants, these three year olds. I have a three year old myself and she believes that she is the smartest person in the world and I love her for it. But sometimes it goes off on tangents, right? Sometimes an agent can go off tangents, sometimes a three year old can go off on tangents. Bringing that back and saying okay, now let's take this agent and consultant through elementary school, through middle school or however your education system works there you're starting to refine, right? You're starting to use feedback loops to generate and refine the thinking models and thinking capacity of this agent along, along the side, you know, behind the hood, just as a three year old grows up there, you know, their brain gets larger. And not a biologist but you know all of the different traditional growth that you would see in both brain function but also in the body like things are growing, you have more capacity for learning as under the hood Nvidia and AMD are all sparking out with a ton of new LLM capable devices, processors, you're going to see our capacity for learning is going to continue to grow in agents.

Zach Wallace [00:12:06]: And that's really interesting because you really are trying to start with this three year old and you're learning as the agent has more capacity to learn. So it's super interesting, but very complex. What are the business realizations? I briefly spoke, you know, we'd have some, some sort of business insight here, hopefully. Well, bottlenecks are shifting. If it takes me three days to build out 11 different workflow slash agents. What's possible for you to build out in three days? Right. What's possible for you to build out in two months? In the past, whenever we're working on large projects and you know, I've been an Engineer for about 10 plus years, you, you'll start to see that the engineering bottleneck is shifting so longer. The engineers that are, that are sitting there trying to put two wires together, connect the dots between two different code paradigms, now all of a sudden it just kind of works.

Zach Wallace [00:12:55]: 70 to 80%. How do we get it up to that? 90%, 95, 98%. Well, we're going to need to bring in SMEs. We're going to need to start bringing in other departments and have them provide feedback to enable us to really build out a consultant and an SME. Maybe that's an external consultant agency that helps with this. Right. There's a lot to it. And that's sort of the interesting paradigm shift that I've seen.

Zach Wallace [00:13:20]: Possibilities are endless. When we start bringing this up, I've gone around, I've had to, you know, provide workshops to multiple departments to get them on board with what's happening with the business. Dynamic shifts. The possibilities are endless. So you'll start to see product is coming in with these features that they never thought were possible. Right. There's so many more possibilities now that we can think dynamically in some regard rather than statically. But risk is also increased.

Zach Wallace [00:13:48]: And that's something that's absolutely important to highlight. As you're releasing this, you're going to have to start thinking about, maybe it's legal guidelines on what you're allowed to say and when you're allowed to say it. If you're an international corporation like our company is, you have to be careful about the cultural dynamics across many different nations. What are we allowed to say and when are we allowed to generate this sort of content? When are we not? How do we know? Right. Those are the sort of questions where there's so much risk to this because it's not. It's coming as our brand, but it is not necessarily our content that we're generating. It is content generated by an LLM. And that's a weird Paradigm shift beyond that business complexity, collaboration between SMEs kind of covered that.

Zach Wallace [00:14:37]: Sharing new paradigms across departments. I told you, one of the things I've been doing is I've been hosting workshops with entire departments across our organization about, hey, here's what's possible, here's how we're thinking about this. How would we apply this to the problems you're seeing? How can we provide you with the ability to upskill your area of expertise? And when should we use an agent? Right? That gets into a fun question. When do we want to avoid agents? Right? Well, what we found, especially in that multi layer approach, right? And again, using a graph or some sort of state management really will help solve this problem. But we don't want to use orchestration agents or some sort of agent to integrate data. It is right about 90% of the time, maybe it's 60% of the time if you're using that data in some sort of JSON schema that goes through directly into, you know, your website or something like that, you have to rely that the data is going to be where you think it is going to be. So you need to make sure that any secondary agent is not integrating data between and across other agents or manipulating data. Depth of accuracy is the other area when you start thinking about depth of accuracy.

Zach Wallace [00:15:50]: What I really mean by this is we can be accurate to, to a point, right? In this blog editing example, what I did was I was able to highlight specific words in a real text editor and provide, you know, whatever recommendation for the blog editing on the side. But what I found out is that the LLM was really poor and maybe it was the model that we're using, but regardless, it was really poor at getting precision or sorry, getting accuracy on where we were going to add that in. It was actually a software engineering problem, right? That's a super simple software engineering problem that's been solved 8 trillion times. All you have to do is take the recommendation and actually apply it where you need to apply it. That was one of the real interesting areas of when to avoid agent is that depth of accuracy, right? Are you able, do you need it to be precise on one index in an entire paragraph or do you need it to say, hey, in this paragraph do something? It's really good at, you know, specifying which paragraph. It's not really good at specifying which index within that paragraph. So depth of accuracy is interesting. When do you want to use the agents? Right? Well, well defined tasks are really useful for doing this.

Zach Wallace [00:17:00]: Generating content is another area. Analyzing Content and I'm going a little faster here because I want to make sure I stay on time. It does end in about two minutes. So let me go through. So just, you know, the typical LLM integrate integrations, well defined tasks are super important. It's good at ocr, you know, for analyzation. It's good at analyzing trends and providing content. It's really good at generating content.

Zach Wallace [00:17:27]: Evals are the heart of everything. Deterministic evals and non deterministic evals. So try to keep this short, but it's a super interesting conversation. Feel free to reach out if you, if you want to talk about it. We have two different types of evals. For deterministic evals. We're able to essentially have some sort of binary response from our agent that is telling us, you know, yes or no. We'll say it that way and when we provide it some sort of prompt, it answers in yes or no.

Zach Wallace [00:17:54]: And we're able to reason that, you know, this prompt is giving us the right answer. Let's for instance, I'll say input validation agent. Right. So we have this agent that's validating all of our input, making sure we're not getting hacked, making, making sure that, you know, we're not susceptible to psychological harm. We're not, it's not being racist or anything like that, you know, for this sensitive topic. So when, when that happens, we have to pass these prompts to that agent and we're expecting a yes or no answer. Is this safe or is this not safe? Right. When that happens, we're able to deterministically evaluate that.

Zach Wallace [00:18:30]: However, in a lot of cases when you're generating content, you actually need to assess the quality of the agent response. So you know your agent, you sent a user prompt to this agent, agent response. We actually then take that and assign our own criteria as the prompt with the output of the first one as the text to evaluate based on the criteria. And we send that to another model and the model will go in and it will evaluate whether or not the prompt that was returned from the first LLM has enough quality for us to actually release this to production. And that's all about building confidence, future trends. And I'm right at times I'll go super quick. Organizational dynamic shifts. Talked a little bit about that state management across multi age agent modals.

Zach Wallace [00:19:15]: Again that was really covered with the graph analysis earlier in the different layers. I thought that was a really amazing insight. And then additionally summary key points. I'll go just kind of flip through here and thank you. Thank you, everybody. I know this is time out of your day, so I just want to say thank you. I hope you enjoyed this talk. Feel free to reach out to me.

Zach Wallace [00:19:35]: I'm more than happy to talk about this to the depths of the world. So let me know. And. And I appreciate.

+ Read More
Sign in or Join the community

Create an account

Change email
e.g. https://www.linkedin.com/in/xxx or https://xx.linkedin.com/in/xxx
I agree to MLOps Community’s Code of Conduct and Privacy Policy.

Watch More

Real World AI Agent Stories
Posted Jan 14, 2025 | Views 322
# AI Agents
# LLMs
# Nearpod Inc
AI Agents as Neuro-Symbolic Systems // Nirodya Pussadeniya // Agent Hour
Posted Jan 24, 2025 | Views 494
# Agents
# neuro
# symbolic
# neuro-symbolic systems
Lessons From Building Replit Agent // James Austin // Agents in Production
Posted Nov 26, 2024 | Views 1.3K
# Replit Agent
# Repls
# Replit