MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Illogical Logic: Why Agents Are Stupid & What We Can Do About It // Dan Jeffries // Agents in Production

Posted Nov 15, 2024 | Views 955
# Logical Agents
# Kentauros AI
# Agents in Production
Share
speakers
avatar
Daniel Jeffries
Chief Executive Officer @ Kentauros AI

I'm the Managing Director of the AI Infrastructure Alliance, CEO of Kentauros AI, and the former Chief Intelligence Officer at Stability AI. I've also spent time at wonderful companies like Red Hat and Pachyderm. Over the years, I've worn many hats in business, from IT infrastructure engineering, to evangelism, to solutions architecture, to marketing and management.

But primarily, I think of myself as an author, engineer, and futurist who's current obsession is Artificial Intelligence and Machine Learning. More than anything I'm relentlessly curious. I love to learn and think about how things work and how things can be done better.

I've given talks all over the world and virtually on AI and cryptographic platforms. With more than 50K followers on Medium and a rapidly growing following on Substack, my articles have been read by more than 5 million people worldwide.

+ Read More
avatar
Adam Becker
IRL @ MLOps Community

I'm a tech entrepreneur and I spent the last decade founding companies that drive societal change.

I am now building Deep Matter, a startup still in stealth mode...

I was most recently building Telepath, the world's most developer-friendly machine learning platform. Throughout my previous projects, I had learned that building machine learning powered applications is hard - especially hard when you don't have a background in data science. I believe that this is choking innovation, especially in industries that can't support large data teams.

For example, I previously co-founded Call Time AI, where we used Artificial Intelligence to assemble and study the largest database of political contributions. The company powered progressive campaigns from school board to the Presidency. As of October, 2020, we helped Democrats raise tens of millions of dollars. In April of 2021, we sold Call Time to Political Data Inc.. Our success, in large part, is due to our ability to productionize machine learning.

I believe that knowledge is unbounded, and that everything that is not forbidden by laws of nature is achievable, given the right knowledge. This holds immense promise for the future of intelligence and therefore for the future of well-being. I believe that the process of mining knowledge should be done honestly and responsibly, and that wielding it should be done with care. I co-founded Telepath to give more tools to more people to access more knowledge.

I'm fascinated by the relationship between technology, science and history. I graduated from UC Berkeley with degrees in Astrophysics and Classics and have published several papers on those topics. I was previously a researcher at the Getty Villa where I wrote about Ancient Greek math and at the Weizmann Institute, where I researched supernovae.

I currently live in New York City. I enjoy advising startups, thinking about how they can make for an excellent vehicle for addressing the Israeli-Palestinian conflict, and hearing from random folks who stumble on my LinkedIn profile. Reach out, friend!

+ Read More
SUMMARY

If you think AGI is just around the corner, try building an agent. Today's frontier models often feel like idiot children, capable of brilliant superhuman replies and at the same time making absurd logical errors, while lacking any common sense. Even worse, those errors pile up, making them unreliable. AI Agents hold the incredible potential to change how we work, learn and play. But any company working in agents quickly realizes they're incredibly hard to build, to make reliable and to generalize. Instead, most teams have shifted to building narrow agents that they can scope to a particular problem with lots of glue code, heuristics and prompt engineering. In this talk we'll take a look at why agents are so challenging to build, what we can do about it and whether just when they'll live up to the hype of doing complex, open ended tasks in the real world.

+ Read More
TRANSCRIPT

Adam Becker [00:00:01]: I am here with Daniel Jeffries. Daniel, you can hear me okay?

Dan Jeffries [00:00:06]: I can hear you just fine.

Adam Becker [00:00:08]: So we started out, you were telling me that you're. You've been a science fiction author. Is that true? Can you tell me a little bit about that? How long have you been writing science fiction?

Dan Jeffries [00:00:18]: About 30 years. So it's been a long time. I wrote my first story about the robots taking all the jobs 25 years ago, and I no longer believe it's actually real. It's a good thing. That's a good thing to write stuff down, because then you disabuse yourself of the notion when you have enough time to think about it.

Adam Becker [00:00:35]: Was that, like, a gradual process where you began to sort of, like, stop fearing it? Like, would you say that, like, robots taking over our jobs has dropped in your mind to, like, a 0% chance? Or, like, it went from 100 to, like, a 50?

Dan Jeffries [00:00:47]: Like, where would you place it? And nothing's a 0% chance, but other than living forever. And so I would say. I would say no. It's pretty close to zero, though. It's actually when I started looking into historical fears. We've had the same kind of fears about the end of the world a bazillion different times. And we've had. Since the beginning of Stone Tools, we've been worrying about all the jobs disappearing.

Dan Jeffries [00:01:14]: That was a huge understanding of the Luddite movement and smashing looms and all these kinds of things. It never really pays out. And the general retort to that is, this time is different. This time is always different. It's never different.

Adam Becker [00:01:28]: But is it the fact that perhaps, like, right now, we're dealing with, let's say, like, agents, autonomous agents, agents that are incredibly intelligent? Like, does that. Does that change the calculus in some way?

Dan Jeffries [00:01:40]: Well, we're certainly not dealing with agents that are incredibly intelligent yet. Maybe when they arrive, then we can think about it. But at that point in time, when people started thinking about how cars or trains would go above 10 miles an hour, they started to wonder whether their head would explode or they just dropped dead. And the point is, we're generally pretty bad at kind of seeing around the corner and understanding the actual threats, and we tend to miss them. A good example is something like ChatGPT. They spent a ton of time worrying about political manipulation. They spent all this time, as they were releasing it, thinking about that in the early days, and we heard all that stuff up to the election. It's going to be a huge set of misinformation.

Dan Jeffries [00:02:21]: Meanwhile, one of the biggest problems they faced was like spammers using it, right? And they didn't think about that at all. And that's really what ends up happening. We tend to focus on big flashy threats because that's what humans are wired for. And we miss little threats that are actually worse. Like more people die of cancer and smoking and eating poorly because that plays out over a long time versus terrorism or lawnmowers. But we tend to focus on the terrorism thing because it's big and flashy. And even though five people tragically die in this situation, the fact is much more people are going to die over this long. We're just not good at seeing that we're really terrible species.

Adam Becker [00:03:02]: Well, perhaps agents could help us understand it better, or at least better calibrate the actual risks that they themselves impose. Have you seen? Well, let me just lead right into your talk because I think it's probably the best way to start a stage. Illogical Logic why Agents are Stupid and what We Can Do About It I'm stoked to hear what you have to say about this. I'll be back in 20 minutes. Folks can be writing their questions in the chat below. In 20 minutes. I'll start to read them out loud for Daniel to react to. Daniel, you're sharing your screen.

Adam Becker [00:03:39]: Let me just put this up and the floor is yours.

Dan Jeffries [00:03:44]: Thank you, sir. Let's get into it. So the logical logic of agents. So why do they suck and what can we do about it? So if you think that AGI is just around the corner, well then try to build an agent. And I don't mean talk to your PDF or scrape this website with Playwright Agent. Those are good, they're useful tools, but they do not qualify as complex thinking and reasoning systems. I mean, try to build a real one and you know what is a real agent? An AI system that's capable of doing complex open ended tasks in the real world. And even more complicated than that is building one that can do long running open ended tasks.

Dan Jeffries [00:04:24]: And I'm talking about tasks that can be done over many hours, many days or many weeks with no unrecoverable errors, like a person. So the real world is messy. And what do I mean by the real world? It could be the physical world of a robot. And you could see Atlas's Boston Dynamics facing the reality of the real world over there. And these are excellent robots by the way, but they just can't do anything perfect. Neither can anything else on the planet because the real world is super complex. Or it could be the Digital world, the online world. And when you start to face the never ending complexity of real life, then you start to realize just how dumb these systems currently are.

Dan Jeffries [00:05:07]: You start to realize when you're working with agents every day, like my team does, the limitations of today's frontier models. Now we tend to classify the problems that we see colloquially into big brain, little brain and tool brain. And we tend to see the same problems again and again. That's what we affectionately call it. Big brain is anything to do with higher level reasoning, strategic long term planning, abstraction and experts understanding common sense. So we hear things like it's going to be a PhD level intelligence. That's just the accumulation of knowledge, knowing how to abstract that knowledge to apply it to other domains that you've never seen before, knowing when to use that knowledge. Understanding is completely different, which is why I don't like that analogy at all.

Dan Jeffries [00:05:54]: As we're going to see a little bit later when I get back to this. These are basically the biggest problems that we run into. And what's interesting is we're building GUI navigation agents, agents that can do complex tasks by interfacing with a GUI in a multimodal way. And then we saw Claude computer use came out and we quickly saw that it had the same big brain problems that we had faced. Meaning there was a great example of where it just got bored or bored and was looking at different photos instead of doing the test that it was sent to. Or I sent Claude out to try to go to Google flights and it got lost on the European cookie pop up and it couldn't figure out that it needed to scroll down to find the submit button and it couldn't figure out how to get to kayak or it couldn't figure out how to get to Google flights, so it went to kayak. And then the Google pop up for login was covering part of the screen that it needed to put information into and it couldn't figure that out. So I was pretty happy to see that an organization with billions more in the bank than we have was still facing some of the same challenges.

Dan Jeffries [00:06:57]: So here's an example of a problem that's trivial for you to solve, but it's hard for a machine. Okay, so I'm thirsty. There's a cascade of steps that if you decide you want to go out and get some type of drink, whether at a bar or let's say at the grocery store, because you don't want to just drink water from the tap. So that means you've got to get dressed, take Your keys, take money, close the door, lock up, head to the store, find the water, buy it, drinking it now more of Experidex says that reasoning for robots is relatively easy and sensory motor perception is hard and that's just wrong. They're both hard, especially when you try to generalize reasoning. And in particular I think we've made more progress on sensory motor perception with the Atlas robots and the robots you're starting to see come out from Tesla, et cetera, self driving cars than we have actually on reasoning. So the secondary problems are little brain and this is kind of the tactical in the moment actions that where you have to decide. So if you think about when you're checking out, there's a sub cluster of actions that are happening.

Dan Jeffries [00:08:00]: Okay, I'm going to pick the shortest line, I'm going to wait on that line, I'm going to step forward and answer question like do you want a bag or not? Are you going to pay small talk with the checkout person? All of these are subcluster of tasks and you can think of those as tactical kind of reasoning that are incredibly challenging. So you can mess up on the long term reasoning and you can mess up on the tactical reasoning as well. The last bit is toolbrain. So that's really the quality of the precision, quality and precision of the tooling and the appendages APIs, the interfaces that you've given. We have our hands, our fingers, we have some of the most highly evolved appendages on the planet and we'll talk more about those in a little bit. But for a model or a robot to execute the action, if, if the big brain reasoning, little brain reasoning are dialed in but the tool brain is broken, okay then it means that it's not gonna be able to work. In other words, if my hand is broken, I can't pick up this banana. It doesn't matter if I know how to pick up the banana, I might be able to pick up the bottle in front of me even if I know the steps if my hand is broken.

Dan Jeffries [00:09:13]: So reasoning, let's talk a little bit about reasoning. And when we're talking about reasoning, we're talking about the thinking capabilities of the Frontier models 01 which I love. Preview Mini GPT4 Claude 3.5 Llama how do they make decisions? And the answer is amazingly and then absolutely horribly in an endless loop. Even worse, when it comes to agents, it's not a big deal if it makes one problem. Like if you get a prompt and you type something in and it's wrong. You can reprompt it, but it's not necessarily connected to the last prompt in an agent. That can be disastrous because the errors can cascade. It is a compounding problem.

Dan Jeffries [00:09:56]: A mistake in one part of the task can carry to the steps down the chain. There's all kinds of ways that this starts to show up. A great example is it's built in memory, right? Not even talking about a rag database or anything more advanced than that, but it's just what's built into its weights. And this one, if you've ever coded, which I'm sure you have with AI, it always remembers the old school OpenAI chat completion create versus the new version of the API client chat completion create with lowercase. And it'll occasionally just go back to that old version because that's what it remembers. That's a mistake that cascades down. Suddenly you're hunting for errors in the code because 10 steps in, it decided to revert back to this older version of the API that doesn't make sense anymore. A more complex example.

Dan Jeffries [00:10:42]: So we have these GUI navigation agents, and here's a screenshot from it. I want to show a video, but you can see them on the Contaros YouTube. In this case, I told it, I gave it a very simple test that it had done probably 10 times, which is go to Wikipedia. And I have a variation of these prompts where it's like, tell me about a specific king and what he was king of. Right? So it's sometimes it's, they're all kings of England, but they might be kings of also Ireland and some other things. So it gets to the page with Edward the Confessor, and all it has to do to complete the task is to read the page which is right in the first paragraph, and tell me who he was the king of. Instead, it decided it has to click the language button for reasons unknown. And then it scrolls around a bit, it decides to try to switch it to English, but even though it's already on English and it doesn't, and it spends about two minutes trying to figure out before it finally returns the answer there.

Dan Jeffries [00:11:41]: And so this is a more complex example of the kinds of things we see all the time in the agents, it's primarily because the agents just don't have any common sense. If you've read A Brief History of Intelligence, which is one of my absolute favorite books on neurobiology, neurohistory and artificial intelligence, incredible book. I read it cover to cover. I highlighted almost every page. I just stopped highlighting at some point. Because I meant to highlight every page. This is a great example. He says, I threw a baseball 100ft above my head.

Dan Jeffries [00:12:08]: I reached up to catch it and jumped. And if you give that prompt to GPT 99 out of 100 times, maybe they fine tuned it in now because I put it out on Twitter, but 99 out of 100 times, it would get it wrong. It would say like you would jump up into the air and I caught the ball and I felt the roar of the crowd. Because of your common sense, you know, the ball's too high to catch. Now if I change that and I say, look, this is a fantasy and you have superpowers and you can jump, then you know it. The thing is, GPT just doesn't know this type of thing. And there's all kinds of ways that these errors cascade into the agents themselves. Another thing is there's no abstract reasoning.

Dan Jeffries [00:12:47]: So common sense is a built in understanding of abstract patterns. It's a world model and there's no world model. We've seen papers recently about fractured world models that they develop. If you get cut one time by a knife, then you abstract the concept of pain and sharp. Now you can see a jagged rock, a spike on a fence, a pike, a sword, and you immediately know that it's dangerous. Multimodal model has got to see all these different pictures of knives, then it has to see other types of things. Jagged rock. It doesn't generalize that shape, that concept of pain and sharpness to other things easily.

Dan Jeffries [00:13:22]: Another example is a fish, and this came out of that book, actually, where there's a comparative psychologist who showed the built in understanding of three dimensions that fish and people and gorillas and everyone else has, but that the models don't have. It's built in understanding of the world where if you show a picture of a frog to a fish and another animal and train it to touch its nose to it to get food, if you then replace that frog with a totally different angle that it's never seen before, it'll still know that that's the frog without ever having seen it from 100 different directions, like a multimodal diffusion model needs to do. So that's also our built in understanding. Our tools are incredibly refined. Your eyes can detect 10 million colors. They can process 36,000 pieces of information. I could talk endlessly about the beauty of our hands. The most beautiful and exquisite fine grained object manipulators in the known universe.

Dan Jeffries [00:14:17]: It can throw a ball, it can paint a picture, it can do precision cutting it can do surgery. It is an incredible tool and we take this tool for granted when we're building things. I'll give an example of how the tools break down in an agent. We had a specialized thing. We found frontier models were not very good at returning exact coordinates. That's starting to change now with the Momo model, which is Allen Institute of AI's model, our fine tuned model based on Polygemma. Also with Claude being started to train on this. Right.

Dan Jeffries [00:14:49]: But in general we found they were good, for instance, at approximately telling me where things were. So we decided to layer a grid over an image and we'd say, okay, tell me where the compose button is. And they would say, okay, it's number seven. And then we could zoom in. Well, that usually worked pretty well, except we found on the calendars it was a disaster because as we got in too far, you can say, okay, where's the 29? Or where's the 30? The problem is it's now covered by those 14 and 15 numbers. So we had to create a new type of calendar system, just a new type of calendar recognizer where we would use ocr. The position of the text and what would come back would. Oftentimes the OCR would just.

Dan Jeffries [00:15:28]: For reasons unknown, even the best OCR engines on the planet that you can pay for would miss two or three of the numbers out of it. So it wouldn't have the coordinates for 2, 7 and 9 randomly on a page. So we had to basically figure out mathematically where everything was placed, what numbers were showing, and then mathematically recreate the grid so it would know where to click. That's giving it a more precise tool. And it's the type of thing you have to do all the time with agents. So how do we fix this? There's three keys to making progress and that's reinforcement learning, learnable, generalizable algorithms and scalable data. We're going to go into that reinforcement system. If you really look at Strawberry, when it, when it came out, a bunch of people were saying, this is AGI.

Dan Jeffries [00:16:09]: Really? It was exactly what I said it would be. It's a deterministic policy that comes from a 10 year old paper by DeepMind where they taught it to play Atari games. It's a deterministic policy. So once it learns to go left or right, it never goes, it never goes in the other direction. If this is the ideal way to climb in Qbert, it starts on the left every time. So that makes it good at hard reasoning. Right, like math. And that has a Definitive outcome.

Dan Jeffries [00:16:39]: That science that has a definitive outcome doesn't make it good at fuzzy reasoning, but it is good for being able to build in larger workflows with reinforcement learning. And there's two kinds of reinforcement learning. There's sort of reinforcement light and reinforcement strong that I'm going to talk about. So the bitter lesson is also really not understood very well. Generally people just think it's throw more compute at it, but actually that's a bad misreading of it. That is correct. But it also is about what happens is people try to build these expert systems in. We're even guilty of it.

Dan Jeffries [00:17:09]: I think it's necessary as a step to get to generalized agents. But what you end up is basically as soon as you take out all that expert knowledge and just let the machine sort of learn on its own and you have the algorithms that are good enough to allow it. General purpose algorithms. These are really hard to build. Back propagation is one. Reinforcement learning is one. Right, the transformer. These are sort of generalized algorithms.

Dan Jeffries [00:17:29]: These are challenging. But once you have them, you get something like AlphaGo Zero with no prior knowledge, no expert training built in and it blows away. Lisa Doll's. Right, the version of AlphaGo that beat Lisa Doll, you know, four games to one and it blows it away 100 to zero. 100 to zero after three days of training. Right. So that's where you want to always get to a generalized system over time. But sometimes it's hard.

Dan Jeffries [00:17:56]: So we think there's an eight fold path to better agents. There's tool interface models, parallel processing, shared memory memory, task specific reasoning, hot swappable skills, generalized task reasoning and generalized reasoning. Okay, we're going to go look at each one of them. So the good news is once you're down some of these roads, you start tackling other pieces of the stack. It helps you understand the tasks that you're facing. You can make mistakes, adjust and move forward. That's applied AI. That's different than peer research.

Dan Jeffries [00:18:23]: It's where you meet the real world. With better memory, you have better workflow and tactical data that you can now loop back into the reasoning engine. All these things create a positive feedback loop. They compound. Once you have a rock solid agentic memory platform that could store any arbitrary task structure, then it applies to all the agents in your domain. Once you have better memories, that gives you better workflow data which you can leverage to do better fine tuning, et cetera, so you can cross pollinate with different domains. We have several versions of our agent. They've been reached Open source.

Dan Jeffries [00:18:57]: We've dropped Robby G2, which is pretty good. We were proud of it, but it's not production ready. Now we're working on G3. We're almost done with it. In that part of it, we've created a five part model of the brain. The strategist, the tactician, critic, translator in the arm. What's interesting is the strategic is like the big brain. The tactician is like the little brain.

Dan Jeffries [00:19:19]: It figures out the little steps that need to be done. The critic checks in and says, I don't know about this, maybe we need to go in a different direction. Here's a problem or I think you're stuck in a loop, let's break out of it. The translator takes these semantic actions of I think I need to click the blue submit button and turns them into coordinates through various tool brain interfaces and the ARM takes the action. It can also be considered a multi agent system, but we tend to think of it as parts of the brain. Now we have been working on a toolbrain soda click model. So initially we started with Pally Gemma. We released the wave UI data set on Hugging Face and the original Paligemo model.

Dan Jeffries [00:19:54]: We took it to about 63% accurate at clicking on things. We found that the Momo model, the 7B version, is about 83% accurate and the 72B version, which they train for robotics on 2 million pairs, is about 90% accurate. They never really trained it on GUIs, but it seems to be really good at GUIs. We're in an open fine tune right now and we're going to actually be releasing this as a model that people can utilize also on our interface, et cetera. But now it powers the tool brain of our next generation model that we're releasing soon. Memory is turning to be a really important point. So most memories have focused on RAG systems and they pump a bunch of data into it. And then the developers make the mistake that if you just pump a bunch of data into it, it's going to do everything.

Dan Jeffries [00:20:40]: It will not. The storing the data is only the most basic level of memory and it's not even most important. Okay. The retrieval system is what makes a good memory store. When you are talking to somebody, your brain is running a bunch of routines. Where do I know this person? Is there an analogy for what they're saying? Do I understand it? It's hunting through your memory for related associated memories, right? That is a retrieval algorithm them. That is where the real rubber meets the road when it comes to agents, right? And where we're going and where they learn things and can pull things from the memory. We built a system, this is a sort of command line interface to it, where we could build synthetic memories or we can annotate the memories.

Dan Jeffries [00:21:20]: We can tell it was a success or a failure. We can human annotate it and say, hey, stop clicking the, stop clicking the language button. You don't need to just go ahead and read the screen that's there, that's a failure. And then it learns by just having these synthetic memories or it learns from the annotated memories that it's already run through those sequences. And then the AI annotates it as well. And we have a number of packages that help us deal with this too. And that kind of takes us to G4. So that's going to be shared memory.

Dan Jeffries [00:21:48]: When one agent learns something, all of them in the swarm start to learn it. It's also task specific reasoning. And so that's where we start. We're building an agent learning system where you can create and demonstrate things to the agent, store synthetic memories of the agent. And we're giving it more precision tooling where you keep improving the soda reasoning. In this case, you can then start to fine tune it with workflow data. And that makes the big brain better and smarter, right? And so you have this fine tuning pipeline that starts to come out of it, not just the memories and its system. That also gets us to G5 where we start talking about hot swappable skills.

Dan Jeffries [00:22:24]: So this is where we basically call R.L. strong, right? Where you basically say, look, I want you to like first of all, actually we don't call that RL Strong. I'll get to RL Strong. Hot software skills is where you can load up a skill like hey, I want to be good at Amazon in particular, so I'm going to load up the skill for that. Or I want to be good at clicking through a parent, you know, an old database or salesforce. I'm going to load up the skill for that. So RL Strong though is where we, we unleash the agent to go out into the world on a bunch of clone websites and start learning. But we don't want it to just be something where it learns randomly.

Dan Jeffries [00:22:59]: Like many RL agents. If you look at the paper LLM for teach, it uses the LLM to set the reward policy and tell the agent what to do initially. And then so it very quickly learns and then it starts to blow past the broken role model of the LLM as it begins to Explore and exploit. Right. And so that's where it begins. You have this reward driven agent as well. And coming up to the end here, once you have these kind of hot swappable skills, you can, we're building a real time inference engine so you can swap that based on the entropy. You can swap the skill at the time as it's generating the token.

Dan Jeffries [00:23:36]: Say, you know what, I need to do a lookup or I need to just load this skill pack in order to do this part better before I generate any more tokens. And so that allows the agent to make better decisions in the same way that if they suddenly open up Google. So that's actually it, we're almost there and I'm almost out of time. None of this is easy right now in the world of AI agents, it's often like we wanted to build a website but then found out we had to invent PHP, MySQL, Linux, Apache. Then we have to build WordPress and then we can build a nice drag and drop editor or whatever. So as the old spiritual joke goes, if you want to make a pizza, first you have to create the universe. But you know, each day we're making progress and slowly but surely we're making our way to Rosie the Robot and the world will never be the same.

Adam Becker [00:24:23]: Daniel, I think I can say from reading the chat and from listening to the talk, this was an incredible, incredible talk. Thank you very much. I also got myself a copy of A Brief History of Intelligence already.

Dan Jeffries [00:24:39]: Awesome book. Everyone should read it, it's just fantastic.

Adam Becker [00:24:42]: It's amazing. Well, no, thanks for the. I hadn't heard of it, but now once you mentioned it with such recommendations, I went and I snatched one as you were speaking. So I have a few questions myself, but I see that people in the audience do too. So let me just start by reading off a few of them. We have just a few minutes for them. Let's see how far up I can go. So vaibhav is asking LLM agents are hyped, but can small teams with limited resources compete with big players? And if not, are there any strategies that can help adapt and succeed in a rapidly evolving field?

Dan Jeffries [00:25:17]: Yeah, you've got to be a creative team. You've got to be an applied AI team, which means you've got to get your hands dirty and you have to try stuff every day, including a lot of stuff that's going to fail. It's a scientific process where you have to try stuff and fail a lot and then you can absolutely fine tune the larger Models. And we invent a number of algorithms, for instance, to make the agents better at clicking on things. When you are forced to adapt with limited resources, you often come up with brand new ideas that people who have got $10 billion in the bank never did. That's why they found a different cure for lung cancer in Cuba based on just antibiotics, because they didn't have access to this sort of like western, you know, sort of bigger medicine system. Right. And so they were forced to adapt and they had a serious problem, but, you know, it's the mother of necessity, so they were able to compete in that way.

Adam Becker [00:26:06]: You spoke a little bit about cascading errors here. Yep. We have Tony. He's asking, given cascading errors, does using a mixture of agents type approach to the big brain help? So I know that like, even when you broke down the brain into like multiple kind of like modals, you made a point to say, these are not different agents. At least we don't consider them that. We consider them different parts of a single brain. How do you think about the relationship between mixture of agents and cascading errors?

Dan Jeffries [00:26:34]: It's essentially the same thing. Right. You're just dividing up the task into multiple things. You can think of it as parts of the brain or multiple agents. They could be different models. On the back end, we end up having a swarm of models. We have two OCR models, we have the Mol Mole 7B model for clicking, and we have the large Frontier model, whether that's Claude or GPT4.0 or Momo72B running as the larger agent. And then you could split that even further if you wanted.

Dan Jeffries [00:26:58]: It helps to a certain degree, but it is not a magical fix. Everything that you see on Twitter as a magical fix is not a magical fix. As soon as you hear the word that, it's a magical fix and somebody the first reply is, we're all cooked AGI tomorrow. You can safely discount that by about 75%. Yeah, it's very useful, but it doesn't solve all the problem. It's like, it's just because sometimes the agents can get into a method where they're just talking to each other and they don't come up with a better solution. Just like when we tried actually letting the agent have more epochs to think about things. Right.

Dan Jeffries [00:27:29]: In an O1 style, it sometimes would think itself out of the answer. Right. So like five steps in, I'm like, it's got it break out of the loop, it thinks for another 10 steps and gets the answer wrong. So you know, it's like a group of people talking themselves out. It sometimes helps, but it's not perfect.

Adam Becker [00:27:44]: Do Tony's asking with respect to these agentic reasoning challenges, do graph based neurosymbolic AI approaches help at all?

Dan Jeffries [00:27:55]: Potentially, I mean, I think, I think. But it's you want, that's just really a retrieval mechanism. You really want to think about like how am I qualitatively getting a memory that is useful to me or something like we're experimenting with like show me the same screen. Did I click on that spot already? Great, just return it. Then I don't have to think about it. I just need to click the submit button. Is it the same? That's a specific retrieval. There's a sub layer of that of which am I using this one or this one or that one to do it.

Dan Jeffries [00:28:27]: But it's really about what is the correct information that you're trying to retrieve for the thing that you're doing. That's the real hard part. It does help to have some level of neural symbolic. I would say it's not perfect. So we're taking two approaches. One is inserting memories into the brain. So synthetic memories or memories that are annotated. And then for instance, Robby G3 will grab the memories, both successes and failures.

Dan Jeffries [00:28:51]: It'll grab five and five. And I found sometimes it'll take it from 25 steps down to eight, but sometimes it's still a different problem. Right. And so actually fine tuning on that data is going to be more valuable in doing that. It might not be a retrieval problem. So it's helpful. All these different sort of graph and all these different rag approaches are very useful. But ultimately it's about the knowledge you build into for how it retrieves the information relevant to the task that you're doing.

Adam Becker [00:29:19]: So we have a bunch more questions and I'm not sure if I should just go through them very quickly because I think we do got to go in a couple of minutes. But let me maybe just try to take a stab at like a more abstract kind of reading of some of the questions that are here. So on one hand I hear you saying, well this is ultimately like a scientific approach. You have to just be as much in dialogue with the reality of the situation as you can. You should just ship and somehow like continue to improve over time on the end. And there will be and there isn't a silver bullet. You're just going to have to continue to try lots of different tactics to to continue to improve the agents on the Other hand, we have the bitter lesson, which is let us nevertheless invest in generalizable functions that are somewhat general and algorithms that are general purpose that can allow us to actually learn. Is there a trade off between those two? Right.

Adam Becker [00:30:16]: And if so, how do you feel about kind of like where to land in making those, making that trade?

Dan Jeffries [00:30:23]: If you're a small team, you're going to have a lot of trouble competing with the frontier labs for generalizable intelligence. And so what you have to do is plan for the fact that you're going to get upgraded intelligence at various points and you're going to have to probe that black box to figure out whether it's useful or not. You can think of it as like a graphics card upgrade or like a Linux kernel upgrade that you get for free. So if you're going to be outmoded by that, you don't want to build the stuff around it. The rest of it is sort of like a middleware for those agents and that's where like the friction happens, like integrating it into your stack. It's the same kind of friction you see in any sort of development. So, you know, the challenge is you want to build the expert system stuff, but you want to be willing to let it go. Right, as soon as there's like a generalized version.

Dan Jeffries [00:31:03]: That said, there are some generalized things that I think the open source community can really start to excel at. If you look at the stuff from like Entropics and some of the other groups that are working on entropy to try to predict the next token, I think that's the kind of thing where we can start to see the smaller labs because they can experiment with a million different people trying it in different ways, where they can start to outpace the frontier models. Right? As long as they have access to good open source models and the AI doomers don't end up restricting all models from us while allowing, while allowing unfettered war machines and surveillance bots at the government level.

Adam Becker [00:31:37]: Let's cross our fingers that doesn't happen. Daniel, thank you very much. If you can please linger in the chat and perhaps respond to some of the folks that had asked questions, that would be wonderful.

Dan Jeffries [00:31:47]: Sounds good.

Adam Becker [00:31:49]: Thank you very much for joining us.

+ Read More
Sign in or Join the community

Create an account

Change email
e.g. https://www.linkedin.com/in/xxx or https://xx.linkedin.com/in/xxx
I agree to MLOps Community’s Code of Conduct and Privacy Policy.

Watch More

58:47
Machine Learning Operations — What is it and Why Do We Need It?
Posted Dec 14, 2022 | Views 807
# Machine Learning Systems Communication
# Budgeting ML Productions
# Return of Investment
# IBM
Code Smells in Data Science: What can we do about them?
Posted Aug 14, 2023 | Views 494
# Code Smells
# Data Science
# Hypergolic
# hypergolic.co.uk
We Can All Be AI Engineers and We Can Do It with Open Source Models
Posted Nov 20, 2024 | Views 529
# AI Specs
# Accessible AI
# HelixML