Sign in or Join the community to continue

Building Better Agents, Fast: For Accounting

Posted Sep 11, 2025 | Views 34

# AI Agents

# Biomedical Models

# MyOB

Share

speaker

Elma O'Sullivan-Greene

Principal Machine Learning Engineer @ MyOB

Data Scientist with extensive experience driving high impact innovation through analytics of large data-sets and computational modelling of complex systems.

I designed and clinically validated multiple models and algorithms for various biomedical devices and systems, spanning the enteric nervous system (inflammatory bowel disease), vision science (retinal prosthesis), and epilepsy (EEG).

I enjoy interdisciplinary and collaborative environments with experience engaging diverse stakeholders across research institutes, hospitals, universities, and industry. My work has resulted in multiple publications, awards, and a patent.

+ Read More

SUMMARY

Building better agents fast: real stories, lean workflows, and practical tips for building trustworthy, human-friendly agents in accounting and beyond.

+ Read More

TRANSCRIPT

Charlie [00:00:04]: Mlops Community has chapters in major cities, globally and that includes Melbourne. We're a group that connects ML practitioners together in this continuously evolving field. This is a community open to learning together, sharing knowledge and of course, asking questions. So we're excited and we love to have you here on board. Tonight's meetup is actually a very special meetup. It's part of a global MLOps community series on a very timely and relevant hot topic. Agents, right? How to build them, how to deploy them reliably. Tonight we wish to demystify AI agents and hopefully it will help us engineers take that real step on deploying them to production.

Charlie [00:01:06]: I hope you're excited. I'm very excited. I'm actually working on agents myself right now. Really keen on and excited to learn from our speakers tonight. That being said, before we start with the talks, if you're interested to know about our restroom facilities at any point tonight, if you need them, they're just across the hall. So go out the door and turn to your right. Both the male and female restrooms are in there. Also later we still have more pizzas I believe, so if you haven't eaten yet.

Charlie [00:01:53]: We'll resume food and drinks after the talks and also networking. And lastly, to respect our speakers, please feel, please just a gentle reminder to keep your phones on silence. If you have to get, I mean answer a call, you can have them outside. For our first talk, we are honored to have Elma O. Sullivan Green. Elma is a principal machine learning engineer at myod. He specializes in driving high impact innovation and has designed and clinically validated multiple models from biomedical devices with applications in vision science, epilepsy and inflammatory bowel disease. Her work, which has resulted in multiple publications, awards and a patent, highlights her passion for interdisciplinary collaboration.

Charlie [00:02:53]: Alma's topic tonight will be about agents and how to build them effectively and efficiently. She will share some practical approaches for developing agents in production. So please join me in giving Elma a warm welcome to the stage.

Elma [00:03:12]: Thank you. Thank you. There we go. Thank you for the lovely introduction, Charlotte. So I'm going to talk about building better agents fast for accounting for regulated industries. The talk is going to be nuggets of wisdom or just tales from very early learning days into doing agent stuff. So I'm Elma, principal machine learning engineer at MyLB who make accounting software. So let's start by mentioning the elephant.

Elma [00:03:56]: So agents, agents everywhere. Everything will. That will be. Everything will be autonomous AI agents. Seems to be the message that is in every feed everywhere. And so there's quite a Hype machine. And so I have some quotes I have everything that moves will be autonomous from Jensen Huang from Nvidia. Tomorrow you'll spend up organizations of operators for long running tasks that will basically run a whole company, says Andre Karthi, OpenAI and Tesla.

Elma [00:04:34]: So it feels like there's this giant, giant elephant. And so I hope in this talk to, to answer how do we take the first few bites out of this giant elephant size thing that is agentic and then for, for balance. I also like these quotes. So there's Fei, Fei Li from Stanford. AI agents will transform the way we interact with technology, making it more natural and intuitive. That sounds very nice. And Melanie Perkins from Canva. AI should feel like this natural part of your creative process, not a separate system.

Elma [00:05:12]: And when it's thoughtfully integrated it'll empower deeper, more focused work. So there's really human centered AI experience quotes which I think is a nice balance to the giant elephant. So in this talk it'll be how do we take the first bites out of the enormous elephant size thing that it's agentic and how do we keep human experience at the core of that? So I'm going to heavily plug that the user experience is 50% of the whole value proposition of agentic AI. So I think unless you've been under a rock, there's, you've heard of 2025 is going to be the year of the agent. We're halfway through, beyond halfway through. It's all very exciting and a lot of fun. And then last month we had Andre Kaparthi share in a talk that he kind of called it the decade of the agent, which I kind of liked because it was a bit, you know, we can pace ourselves, give ourselves a bit of permission to breathe. We can do this over the course of a decade.

Elma [00:06:16]: And I've got a story to go with that from one of our executives at nyob, Dean Chadwick, who talked about this great story about interviewing first graduation interviews in the 1990s. And there was this question, can you do the Internet? Which in 2025 seems like a really crazy question. It's more, you know, can you do digital marketing? You know, E commerce, content creation? That wouldn't have been a thing back then, but, but just like what can you do with it? Not can you do it? And so in the vein of the decade of the agent. So there's, there's less can you do agentic AI now or can you do AI agents? It's like, are you doing agents? Everyone doing agents? And so hopefully we will switch to talk more about the products that we're building for people that are powered by agentic AI and maybe even we won't use the word agents for AI agents in a few years time. Okay, so core of this talk is little bits of information or things that might be interesting that we learned while we're still starting on this AI agentic journey. So a little bit of light bulbs, maybe some glue to get me started. Maybe this is a stretch, maybe some owl for wisdom. But just for fun, I'm going to connect all this new stuff to some really old stuff and bring some transfer learnings from some really core software engineering concepts that are hopefully familiar things like functional programming, test driven development and just some basic stats.

Elma [00:08:01]: So because it's nice sometimes with all of the new and exciting to link it back to. Oh yeah, we do know some things. All right. But first, a shared understanding of words. So it seems like every day you go on LinkedIn or on any feeds of newsletters, there are new words and new definitions for things. So I thought I would do a little bit of signal pointing for some of the things I'm saying for the context of this talk. So the first one is which flavor of agentic, like it's an ice cream we're picking. So AI, agentic AI agents.

Elma [00:08:43]: So they both have a sort of agency in the word. So it's AI which has the agency to act or do something. But then how do we sort of like categorize it? Think of the spectrum of things that are happening. And so looking at it on a few axes. So the first is autonomy. So we've got, on one side we've got AI agent and that's got a minus. And on the other side we've got agentic AI and that's got a plus. So one is less than one is more.

Elma [00:09:11]: So for autonomy, AI agent is a little lower on autonomy. So they do low level autonomy. So more the autonomy is there, but it's controlled a little bit more by some sort of program scope. For example, from an accounting world, maybe we process all of these documents with these particular set of tools. You've got some autonomy, but not a lot. And then on the other side of the spectrum we've got agentic AI which is more proactive. So maybe it's detecting and responding to new anomaly threats or something without being explicitly instructed. So that was autonomy.

Elma [00:09:49]: And then we can also have say, task complexity. So on the less side, the agents are often a little bit more repetitive tasks and more predictable outcomes. So for Example, document, example, run OCR to get the text from my documents and try and reconcile these things to accounting books. And then on the other side, on the plus side, we've got agentic AI where it's a more multi step complex process like do all my accounting with minimal human intervention and adjust to new tax law while you're at it, off you go by yourself. And so there might be a undercurrent that minus is lower and plus is better, but I'm going to position that they're really not, they're just different for different things. So if we bring up like system complexity, so now it looks a little different because lower system complexity for all of our sort of engineering brains would be, oh yes, more of that. I want less complexity please, not more so on system complexity. You know, AI agents could be a component within a larger system, whereas agentic AI is more sort of the umbrella system that coordinates many AI agents.

Elma [00:11:11]: And so I really don't want to give the impression that the plus is better in any way. They're just suited to different things. And it's not a merit, merit system. It's not like the enormous agentic system falls short of some or it's like the AI agent by not being an enormous agentic system doesn't fall short of some ideal. Ultimately we're trying to solve cultsmer problems and to bring it back to Melanie Perkins. We want to transform the way we interact with technology, make it more natural and intuitive and useful for people. It doesn't quite matter which flavor of agentic we're doing to get there. So some more sort of signposting and words.

Elma [00:11:49]: So architecture complexity. It seems like I read a new agentic architecture daily and so it's all very new. I don't think it's settled to any sort of steady state. So for this slide, I've largely gone with some images that Anthropic had from like six months ago, which is already old but still, you know, valid. So you've got your AI agent which you know is an augmented LLM, a glorified LLM that it's got, you know, maybe some data retrieval, some tools and some memory attached to an LLM. And then, you know, there are other sort of workflow type things where, you know, maybe an LLM generates something and something else evaluates it and you've got that sort of feedback thing that's kind of starting to be a little more autonomous. This is an example of Anthropic's prompt chaining workflow. But the thing of interest here is that there's some sort of gate that you've got some linear process going from A to B to C and you have sort of gates in between them.

Elma [00:12:54]: And then move to an autonomous agent where like it can keep going until it's. So it decides itself it wants to stop. And then. Well, Anthropic didn't have this one. I've just sort of extrapolated out for six months. Most of the rest of the diagrams are some form of sort of fleets of autonomous agents. So lots of the agents together, maybe there is some sort of like orchestrator or synthesizer around them, but it's just more, more complex. So now let's talk about memory in terms of complexity.

Elma [00:13:29]: One of the little blue boxes there. So I call this slide Agentic AI context overload. So agentic systems are highly stateful. They like to remember lots of things in lots of context. There's stateful webs of like prompts and tools and tool descriptions and agents and execution history and execution logic and it's just a lot. And Tropic had these numbers that sort of matched what we were finding in our work in MYOB too that. So if you considered the amount of tokens it took for a chat interaction with an LLM to do the same thing in an agent it was four times the amount of tokens. And to do the same thing in a multi agent system it was 15 times the more tokens than chats.

Elma [00:14:20]: So that's a lot of chatting. And Antropic in one of their blogs published a thing that said the their agents seem to be their multi agent systems. The agents seem to be distracting each other with lots of excessive updates and like you don't want to personify the things. But that also kind of rang through. We were like looking at context flying everywhere and going there's so much, so much stuff. And so yeah, so that context clogs up context windows pretty quickly. And then let's talk about mcp. So model context protocol, the I don't know, glue connectors, the new API that makes it really easy to connect lots of things to your agent systems.

Elma [00:15:06]: This is a quote from an old colleague who works at aws. Now I want you to think about your internal company's API. I don't think any company has a beautifully clean API system that's just ready for being connected to things with mcp. But Tim's comment was maybe it's not the best idea to expose the current firehose that is your company's API to agents that probably rings true in terms of context overload, and maybe we may need to be quite intentional on which parts we connect and when. So to add some color, I've got ChatGPT to make me a picture of a context overload agent. And I just want to highlight something on it because ultimately these things are powered by LLMs, which are predicting ahead next likely token. And so there's a fun, you know, I'm talking about agents, but it's done. Oh well, an is really a really better common English thing to be doing.

Elma [00:16:08]: And so I think that kind of grounds back the Underneath these things are LLMs. So particularly for regulated industries like accounting and anywhere with money, making it up isn't really an option. So how do we keep agents on a leash? So that phrase is also Andre Kaparthy, and it kind of rings through this idea of how do you keep these things on a leash? I'm guessing a lot of people in the room are playing with coding assistants, and sometimes they're great, and sometimes they're like, I just need you to stop. You're running away and doing too much. How do you keep them on a leash? So Karpathi's comment on agents on a leash was particularly in the context of keeping your AI coding agents on a leash. But I think it applies for building customer products as well. So we talked about injecting every tool and its full description and the detailed history. All of this clogs up prompts and that sort of bloated context, as well as being expensive, it's a lot of tokens you're paying for.

Elma [00:17:10]: It also bloats the context, so the clarity is diluted, the key information is harder to find, and it slows down. It's slow for the agents to work their way through this giant sea of text and come up with something useful and preferably the right kind of useful. So what we found to get started in building this and keep it sort of lean is workflows work, but they work nicely. And workflows have an advantage in that you can set them up to be a chain of functional agents and save your results at sort of gate points in between your functions before passing it to the next agent. So you can kind of take the bloated context and go, well, this is the key stuff I need. I don't need to remember all the ways that you decided to get here. That's the key stuff. Let's move forward.

Elma [00:18:02]: And I kind of like this because there's echoes of functional programming patterns in there, which is handy. So Things like defined inputs and outputs. It's modular, it's testable and also links to functional core and sort of imperative shell in the sense that it's contained and you, you're limiting damage or the imperative that a particular agent can do, which is nice and good. And they're stateless and idempotent like you can reuse them in a more reliable way. So I have an example and I've done this in of lots of text on the slide. You're not supposed to read that. It's just sort of chatgpt generated type of inter agent chat to come to a decision. But hypothetically, what if those have worked in banking? And so this is like a banking example of what if there's a bank transaction and a banking system wants to work out if it's suspicious or not.

Elma [00:19:00]: And so there's a transfer of 12 grand to someone and that goes into a functional agent which has, you know, sub agent components that must decide whether or not this is suspicious. It does lots of reasoning and chatting reasoning. But ultimately it says oh, it is suspicious, is suspicious is true. And the reason is unusually large amount. For example, maybe that's all the info we need and we can sync that to a data store and drop all the rest of the thing that's clogging all of our agents before we move to the next functional agent bit. And one of those nice idempotent reuse things is let's say the next thing that you do is you go and ask a human to confirm in some way and the human confirms that there's a reason I bought a car. You can run it through the same sort of functional code to go now that you know what was transferred and how much into who and you have a human explanation. Do you still think it's suspicious or not? Which is like a very old thing, the functional programming, but perhaps useful for the new.

Elma [00:20:12]: And it stops the. I suppose it stops the agents distracting each other with their excessive updates, which is nice. So the second one is eval, eval, eval. Can't say it enough. Need to evaluate this stuff while you're developing it. And so the agentic promise is that it's tackling really nebulous stuff that needs judgment. And so I came from NLP and machine learning and fine tuning models and I kind of had the perception that this is going to be hard to evaluate. It's nebulous problems, we're going to need tons of data.

Elma [00:20:53]: And so I read an anentropic article that kind of unblocked that and it kind of said that for their, for evaluating a multi agent system they started evaluating immediately with small samples. So that's yeah, okay, if you don't have huge data you start with small. But the kind of unlock is that they said oh, you can make meaningful progress with error based iteration on those small samples. And I'm like oh, this I wouldn't have expected. And that rang true for us as well. And it's something that just makes it easier to get started. So in the, in their context they were some sort of researcher agent system that was going through academic papers and pulling out information. And so they, this error based iteration was actually looking at what's it.

Elma [00:21:44]: What it was doing in certain contexts and seeing patterns on small data sizes and then going okay, our error is so large when we scale chart we can make it meaningfully smaller by looking at a few samples. Which is a really, really nice way to take a bite out of this elephant thing. And I think there's kind of echoes of test driven development in there, which is nice. That's a familiar thing. So solving test cases incrementally one at a time and you know, manageable development steps towards getting better AI agent systems. So then there's a question of where do you hunt for the errors? And so I'm going to plug for getting it in front of customers end to end as early as possible. So end to end teams are fantastic with this stuff because The UX is 50% of the value proposition. And we mentioned error based iteration on the last slide.

Elma [00:22:36]: So this is like well where do you hunt for that? A good place to start is in state evaluation. So starting at the level of customers. So rather than looking at trying to optimize one of the things in micro, maybe optimize the whole thing at the level of the consumer in that sort of big macro environment. And that's conveniently so I'm calling it in state evaluation and it's conveniently product customer focused and understandable too. So for example, how are we improving that our AI is getting the dollar value right, that bottom line thing. It's convenient. In regulated industries you have this really concrete stuff like money that you can use to ground your things, you know, versus something like precision we call or toxicity in like a small part of the system. Much easier to communicate.

Elma [00:23:30]: That's not that you don't do that. This is just what did we learn getting started with this St and so another motivation for starting with the macro in state evaluation is a really old thing compounding error so even if you had this awesome fully agentic thing with lots of sub agents and each of those agents has, you know, high accuracy of 80%, you put them all together. Now there's a bit of bit of imagination in the maps here. Let's just assume everything's independent and. Yeah, but generally the more things with error that you put together, you have more error. So that would be that we found success I suppose by focusing on in state evaluation rather than turn by turn analysis, particularly at the start of an agentic journey. And another plug for agent workflows, like keeping sort of agents on a leash because you've got a checkpoint of something you're expecting out midway, you can put that in front of a human in UI and sort of get feedback and sort of up the up the accuracy of some of the components. So the overall accuracy of the system is high.

Elma [00:24:55]: So I'll land the plane. Okay. I've got one and a half minutes. So conclusion on, you know, how do you build better agents fast? So starting lean and building shipping and learning for the decades of agents. So strap ourselves in. So first solve real customer things and the user experience is 50% of that value proposition. So really think about how you show it to users and how they interact with it and evaluating at that level. Workflows work, especially for something like a highly regulated industry allows a lot more control.

Elma [00:25:36]: So no one type of agentic system is holier than another type. And skipping forward to the most complex one is, well, complex. So if you want to get started, maybe not the most complex one. We can reduce context overload in agents by storing sort of model data at gate points. It's been really useful. And this is one I didn't say, but I'm just going to plug at the end. Not everything's an agent. Some things are better done in functional code, like straight up deterministic things and end state error based evaluation over turn by turn analysis allows you to go really fast from having brand new things with huge error to chunking down something that's quite nice for a human removing some of the big error leaks.

Elma [00:26:29]: Yeah. So shout out to all the other people who were learning this with me at myob. Yeah, thank you so much for questions.

Frank Liu [00:26:46]: Yeah, I think we'll open for questions.

Elma [00:26:51]: Oh wow.

Charlie [00:26:53]: All the questions.

Q1 [00:26:55]: Hi, thank you, that was wonderful. You were talking about how much context everything is swimming in all of the time. And you also mentioned working in regulated industries. How do you walk the line between auditability of output and not storing a bunch of agent chatter when you know, that you, you know, you do need at some point to go back and trace. Okay, this is the point where it looks like that was the decision that got made.

Elma [00:27:21]: Yeah, that's a, that's a great question. And that's something we definitely been thinking about. I suppose it comes down to like that example, you know, is it suspicious? Yes or no. And why? And, you know, there's only a few other things, you know, did some human say something? What do they. We log that against us. Because even from an auditing point of view, like looking at the logs that come from agent systems, like particularly for myob, I'm imagining like it's, it's tax and the ATO and compliance and logging, so you almost want to be able to have things that are easy for them to audit. And the blow by blow account of how an agent came up with something is not very audible. It's just.

Elma [00:28:09]: Yeah. So maybe really good data modeling and designing so that you are storing the key information in ways that are useful and letting go of the things that aren't, that aren't needed.

Charlie [00:28:20]: Thank you.

Elma [00:28:23]: Hello.

Q2 [00:28:23]: There was a news article, I've forgotten exactly where from, but they let a LLM do accounting and it committed fraud. Yes. There's several cases now where lawyers have been writing, you know, dissertations that end up citing cases that don't exist.

Elma [00:28:46]: Yeah.

Q2 [00:28:46]: How are you handling the hallucination problem with LLMs?

Elma [00:28:51]: That's an excellent question. There was also, I don't know if it was a. Based on the same one, but there was, there was something recently on an attempt to do full accounting with agents. No human in the loop. And sort of the first time it did it, it was okay. And it was one of these compounding error stories. It wasn't too bad. But then there was no human correction in any of it.

Elma [00:29:12]: So, you know, the next year's data, it did, it got worse and then it got worse. And so, yeah, so I suppose partly being really conscious of when it is a regulated industry, there is definitely human in the loop in part of those, those systems and needs to be for quite some time. And you know, there are, there are ways like workflows, like it's all about agents on a leash. That's probably like my answer. Leashing things in and having human in the loop in a way, that's a really nice experience for that human. And it seems like it's intuitive and helpful. But they're also leasing the agents while they're at it. So partly partly engineering design and partly user experience.

Elma [00:29:58]: Design and being really conscious of what you're doing in regulated industries.

Q3 [00:30:09]: Thank you Alma for the chat. That was really good. I've got a two part question. One is when you're comparing agentic AI to agent AI, are we saying that agent AI is more closer to generative AI or is it still on the realm of agentic AI but not as smart as agentic AI.

Elma [00:30:32]: Augmented I suppose Genai that's got access to tools and memory. Because this is evolving so much it. It's very hard to pigeon things into specific boxes. There's definitely a spectrum where at one end you know no human touches it and it's going to solve every problem on the planet and on the other end it's you know, a scoped thing for. So on those axes like task complexity and autonomy as was my thoughts are it's a spectrum and the labels are evolving. That's a very hand wavy answer. But yeah.

Charlie [00:31:17]: Yes. Second part was around like having guardrails. So when you're talking about agentic AI, especially with DAX and where there's personal information and private information so having AI like how do you kind of expose your personal data or what do you ask it? Like how can you kind of have those guardrails and especially at acquiring.

Elma [00:31:40]: Yeah. So we work with cloud providers and have our own like locked down version of things. In the first case we are heavily informed with the sort of Victorian. I'm going to get this as V A I S S It's like Victorian. No voluntary Australian AI guidelines safety standards. Thank you. I say voice it sounds like, like a vice, like, like smoking or something. But yeah, that, that's really useful because it kind of brings things very much to the consumer level around these things.

Elma [00:32:18]: But also all of the existing standards are the same like all of the ISO compliance for data and privacy and like those things haven't changed. Yeah.

Charlie [00:32:31]: Can I put in a plug for Elvis colleagues Nigel and Mangesh.

Elma [00:32:36]: Yeah.

Charlie [00:32:40]: Recently if you are interested in PI management.

Elma [00:32:46]: Yeah. So just in case it didn't come up on the. Yeah, yeah. Nigel and Manish from my. We talked to Data Eng Bytes on how removing PII from data systems and I think those talks are available online. Yeah. Thank you.

Q4 [00:33:10]: Thank you Elma for your talk. And I have question for the changes that you've been possibly done for the fundamental models as agents they're interacting with the fundamental LLM models and as you mentioned there are different types of agents. So just wondering for the models did you do any fine tuning for it or doing some rag for it to make it more better suit for the typical agent's case. And what's the cost of that? Thank you.

Elma [00:33:41]: That's a great question. Like which models did we fine tune? Yeah, so because we're in the early days of it, we've just used some of the generally available models and prompted them to our needs. But at the same time, not everything's an agent. So there are things where we've used other models because it makes more sense that some of which, you know, are more fine tuned to specific cases. But thank you. You're welcome.

Q5 [00:34:19]: Hi Amma, thank you for the talk. I've just got a multi part question again just related to evaluation.

Elma [00:34:25]: So.

Q5 [00:34:26]: So when you were doing your evaluations, did you find LLMs as a judge to be useful in the long run over, you know, having an expert or a human in the loop or even, you know, end user feedback? How did you kind of manage that and how well did they work for you?

Elma [00:34:39]: Yeah, they. Yes, they did. So we have used some LLMs as a judge kind of work better if you kind of constrain them to go like answer yes or no to these things. I want to know specifically for my use case as opposed to like tell me the vibe of the. It's a bit too open. Yeah. But then there are also things that we use beyond LLM as a judge, like dollar values. How close is it to the dollar value? That's a really easy thing for any sort of financy thing.

Q5 [00:35:14]: And just again with the evaluation, how.

Charlie [00:35:16]: Did you kind of handle the stochastic nature of the model outputs? Like did you do hundreds of runs with the same prompts and then try and just see the average values? How did you handle that?

Elma [00:35:29]: Yeah, so actually a lot of how we handle that is in the user experience level it's never going to be exactly the same and there are going to be strange things and so we need to design for them occurring. That sort of makes sense that those. Yeah, those things are going to happen. We need to design for them. And in terms of workflows and picking and choosing, you know, agents on a leash, like where are the things that it can be a bit more creative maybe in how it's labeling things. Where are the things where it can't be creative in the dollar values of money? Leave it near that.

Charlie [00:36:08]: Yeah. Thanks Elmar.

Elma [00:36:24]: Hello.

Q6 [00:36:26]: Thanks Elmar, that was an awesome speech. Thank you so much. Just again on the evaluations and you really drove that home, which is fantastic. But you also talked about how you involved the users and the domain experts and I just wondered if there are any sort of, you know, surprises or, or insights that came out of that way of working.

Elma [00:36:46]: Yeah, that's very good. That's a good question of what surprises there might have been out of. Yeah, the most surprising thing is the, you know, on one hand you have. Everything will be autonomously doing its own stuff and we won't even have to think about it. And then there's this interim of a decade where yes we do like people need to know and trying to work out how much information people do want and how much control they might want and what's like even thinking of metrics of like how many times like we've talked about things like nag ratio as a metric. So if the AI is insufficiently able to come to a conclusion, we can, you know, ask for further information. So how much. How much nagging is okay and how much.

Elma [00:37:36]: Yeah, yeah, that was interesting. Yeah.

Charlie [00:37:42]: All right, one more.

Elma [00:37:44]: Yeah, one more. Probably.

Charlie [00:37:47]: Just a question about.

Elma [00:37:50]: Thank you. Hi there.

Q7 [00:37:51]: Just a question about the models you were running. You mentioned you used off the shelf models predominantly. Did you find you had better results with a fixed seed or keeping with a stochastic nature with a, a randomly generated one for your agents?

Elma [00:38:04]: So then again a randomly generated like.

Q7 [00:38:07]: Open with a fixed seed so you have a repeatable output every time to the same input tokens. Did you have better results with your agents with a fixed seed or with a randomly generated one with a fixed.

Elma [00:38:20]: I didn't catch a fixed something or seed. Oh, we did not experiment. We're at such an early point in the journey that we have not tested to that level yet.

Q7 [00:38:30]: Second question about how would you audit that trail if it's randomly generated? If you have no, if you're relying on the stochastic nature of the outputs to give you a non linear nonlinear output?

Elma [00:38:41]: That is interesting. Yeah. Like. Yeah, so we're looking at the very high level metrics at the moment, but then when you start to go in and optimize each of the little boxes in workflows and yet then that gets really interesting. For sure. I don't have an answer, but it's interesting. Thank you. All right, thank you very much.

Elma [00:39:00]: Thank you.

+ Read More

Watch More

Building Better Data Teams

Posted Aug 04, 2022 | Views 1.6K

# Data Teams

# Data Tooling

# RN Production

# Financial Times

# Ft.com

Building LLM Applications for Production

Posted Jun 20, 2023 | Views 11K

# LLM in Production

# LLMs

# Claypot AI

# Redis.io

# Gantry.io

# Predibase.com

# Humanloop.com

# Anyscale.com

# Zilliz.com

# Arize.com

# Nvidia.com

# TrueFoundry.com

# Premai.io

# Continual.ai

# Argilla.io

# Genesiscloud.com

# Rungalileo.io

Building Reliable AI Agents

Posted Jun 28, 2023 | Views 1.3K

# AI Agents

# LLM in Production

# Stealth