MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Planning is the New Search // Fabian Jakobi // Agents in Production

Posted Nov 26, 2024 | Views 1.2K
# streamline workflows
# Agentic
# memoryrank
Share
speaker
avatar
Fabian Jakobi
CEO & Co-Founder @ Interloom

Fabian is the CEO and Co-Founder of Interloom. Over the last 15 years, he has founded three companies in the enterprise automation and AI space. In 2020, Fabian bootstrapped Boxplot, a knowledge graph company, which was acquired by Hyperscience during their $100m Series E round. He then served as Managing Director for Germany & Global VP Product at Hyperscience. After completing his vesting period, Fabian and his team left to launch Interloom, focused on building the next generation of agentic automation software for human and machine workflow orchestration. Interloom has recently raised their 5.5m Euro Seed Round from Air Street Capital.

+ Read More
SUMMARY

In an era where knowledge workers are inundated with manual tasks and corporate memory systems, agents hold massive potential to streamline and document their workflows. Agents also hold the potential to unlock a vast amount of skills and knowledge from the front-line employees of global businesses. This talk explores how dynamic agents can capture work patterns, from inboxes to task sequences, using clustering and process mining techniques. These methods ground agentic planning and provide more robust, company-specific workflow orchestration. We’ll also discuss how knowledge graphs power effective memory management, and how Interloom’s innovative MemoryRank™ system leverages reflective agents to create and serve agentic memories, optimizing task management for both human workers and specialized agents. Attendees will gain insights into how agentic memory systems can drive efficiency and productivity for end-to-end processes and workflows.

+ Read More
TRANSCRIPT

Skylar Payne [00:00:01]: All right, I think we're ready, just about ready to get started. So everyone welcome Fabian, CEO and co founder at Interloom. Today he's going to talk about planning is a new search. I just wanted to call out that I was looking at the website, was wondering what Interloom was and I love this tagline of being workflows and automation for humans and AI. So definitely excited to hear about what you've been cooking. I think with that I'll go ahead and bring your screen on and I'll drop into the background and we can jump into it.

Fabian Jakobi [00:00:41]: Perfect, thanks. Yeah, super excited to talk about Interloom. There's also going to be a small giveaway we're going to try to demo today and we've never done it publicly, even some exclusive things that we will be showing today. So I'm going to jump right in. We have chosen a bit of a provocative headline with planning is the new search. But what we are actually working on is, as you just said, exactly. Essentially how can we bring humans and AI agents together essentially in workflows at companies. Yeah, perfect.

Fabian Jakobi [00:01:26]: So maybe just two words to Interloom. We're about 17 people across nine cities. As we said, we're working on workflows for human and AI agents and how to actually bring that into production. We've recently, beginning of 2024, raised our seed round with Nathan from Airstreet and are very happy with that choice. And yeah, let's dig into what we have prepared today. Planning is the new search. What does that even mean? Right, let's start with actually what does search mean? So search is one of the largest software markets on the planet, right? We all know Google and all of the other search engines. In 2023, the estimate it's about 167 billion just for search engines.

Fabian Jakobi [00:02:08]: But there is a much, much larger number behind this because we all search all day effectively, right? We search when we look into our inbox, we search when we open Google, we search when we, you know, try to find a file on our computer. And so search is essentially what makes up one of the largest job descriptions on the planet, which effectively is just knowledge work. Right. And in lack of a better definition, we came up with one. And essentially knowledge work is nothing else but literally working with information retrieval, which is nothing else but searches, then taking that information, somehow refining it by making a decision or reasoning over it and then enriching it and then pushing it along towards a productive outcome. Right. And obviously all of that happens in a very open ended world. There's a great post by Airstrip about open endedness and how AI lives in that world.

Fabian Jakobi [00:03:04]: But today we really want to show how we bring that into. Because that's the title of the conference production, because we are an application layer company that really tries to rethink how knowledge workers work alongside agents in these company processes. But what does knowledge work look like? Right, it's essentially the backbone of our society if you think about it. You know, it's every back office and every single time we apply to anything. You know, from getting our passports or getting our tax filings to the taxes being actually processed, from claims to insurance to finance to IT support to everything. Essentially knowledge work is what makes up work itself. We always think in automation in terms of this machine to machine automation. But the reality is even after 30 years of software development, we still do more than we estimate.

Fabian Jakobi [00:03:56]: 70% of all work manually and out of just our experience as knowledge workers. So there's a massive component to this target group. And obviously with large language models coming about, a lot of people looked at, okay, what can we innovate? And I think it was just set in the stack overflow talk that obviously a lot of the first companies and the first use cases are around software engineering. And I think there's two good reasons for it. One is there's a lot of public data available like code and open source codes on GitHub and other platforms. So and the second thing is it's quite easy to test, right? You can really say, you know, if the code was generated, you can literally test it against the test that was also written by an AI and then you see if the code works or it doesn't work, right? Because computers are very strict and structured and they have this yes or no component. But the interesting part is even in the most conservative estimates, there's about 30 million software engineers. I think it's between 30 and 60 million on the planet.

Fabian Jakobi [00:05:02]: But that's against like 1 to 2 billion knowledge workers, right, that essentially work at all these massive small, medium and large companies, upwards to enterprise that have like sometimes hundreds of thousands of employees that are essentially all doing knowledge work. So it's definitely a massive potential. If you think about what all of that knowledge work is made up of, right? If I say manual work, it effectively means nothing else but like language, right? We talk, we write emails, we slack, we work in teams, we work in case management solutions, we work in Salesforce and other platforms. So it's language based. And obviously now with large language models hitting mainstream and with the increased capabilities. We actually believe that this entire group of people will see one of the largest increases in productivity in history. Because if you think about, you know, even just optimizing or augmenting 10% of productivity in that, in that group of knowledge workers, that's essentially one of the largest increases in productivity that we have ever seen dating back to the Industrial revolution. So our thesis is that natural language based task orchestration, right? So how do we actually split out this knowledge work into individual steps and tasks? And how can knowledge workers themselves, using just language, actually orchestrate all of that will be a massive potential for actually society as a whole.

Fabian Jakobi [00:06:36]: But the start of all of that essentially, as I've just already introduced, is what we call planning. And we've all kind of talked about it. I think I've heard many talks, great talks today from other companies that talked about what an agent is and how it works. And there's always like a planning step. But when we talk about planning, we actually like believe that effectively you can take any job on the planet and split it down just in a, in a chain of specific tasks that you need to be doing right? And it might be very complex and we'll see some examples later, but you know, it's just proposing a set of tasks that aim to, you know, that each aim to essentially reach a certain destination or objective. But it's not just one plan that you need to look at, it needs to be the right plan, because there might be many wrong plans to go. And so essentially finding the right feasible plan, given a specific job or a specific destination I want to reach is essentially what we believe is the first step towards getting all of that productivity into production. The problem is that LLMs are just not really good at planning.

Fabian Jakobi [00:07:50]: That's also not like the way they think, right? Going back to Alan Turing saying that both computers and machines think they're just machines and humans think they just think very differently. LLMs are just that, that's just not what they're built for. There's actually a lot of research backing this up. This is for example, Blocks World as a game that they also use to test the intelligence of monkeys. And LLMs just struggle with autonomous planning. They're actually quite bad down to an accuracy level that we can't rely on it at all effectively. Another example of this is essentially simple bench, which even shows that LLMs, when it comes to autonomous planning tasks, they don't even meet like a basic human level capabilities in this example. So what can we do about it? And the answer of this Gets even harder when you think about the fact that all of these tests are predefined data sets or predefined problems that are actually solvable logically.

Fabian Jakobi [00:08:55]: While a lot of the real world open endedness and complexity actually is much more driven by context. Consider a support request for a purchase order where the original order was sent 12 days ago, but the customer has experienced previous delays or responding to a request for proposal, which is one of our early customers, to build a railroad system of 54 km from two cities here in Germany. So those things are sometimes weeks long or months long cases and they require hundreds if not even thousands of tasks and subtasks. And also they're not like single player mode. It's not like one person can do them individually. So from our perspective, the way to go about it, and that's kind of unique about the approach we take, is that we need to learn out of what actually happens in reality. Because this is not even one process that fits all like processes and the patterns of how knowledge workers work at that railroad company or support desk. That's like corporate memory, right? That's like corporate intelligence and know how and that that is actually very valuable to the company that they don't want anybody to know about it.

Fabian Jakobi [00:10:11]: Which is also the fact that that data doesn't exist right now in the public domain, which means we could have not trained models on the way that a specific company does an rfp because that's a competitive advantage. And even in the future the companies hopefully will be very protective of that because it's essentially the core of, you know, competitive advantage to them. Again, on top of that, there might not just be one way to go. Answering or solving this problem with the customer has many ways. And while some of them for a specific company might all be wrong, so somebody would after the fact say, well, telling the customer that we don't care that they experience delays is probably not a really good business practice. But the bigger problem is that there might even be more than one right answer, right? So coming back to the block example or to simple bench like these things are easily testable because I can say they were right or they were wrong. In this case, we don't even know like whether or not there is not many outcomes that might be good and that we might even need to QA them in some way, right? And also if there's a trade off, sometimes a good path or a good plan could be the fastest way to get it resolved. The other one could be the most valuable way to get it resolved.

Fabian Jakobi [00:11:27]: So there's a lot of complexity around how that actually happens in the context of case work or knowledge work. Right. So how do we get that one plan that we can suggest, given a very specific context, like in this case, that we can then serve to a knowledge worker, a human, and essentially propose a plan that we lay out as what to do in order to reach that destination and resolve the support requests to the customer satisfaction. And there's actually a product that we all use probably in the last 12 hours that does something very similar. And it's not just a marketing analogy, it's actually a technical analogy as well. And that is Google Maps. Google Maps takes a destination like in this case on the right, I'm going from Munich to Rome, and within milliseconds it gives me three feasible ways to go. And it already serves surfaces them up average times.

Fabian Jakobi [00:12:27]: That might take me an estimate and so on. But obviously there's way more ways to go. Those are just the feasible ones given the specific context. If for example, I would say I'm going on my bike, obviously there will be a whole different set of feasible ways. Or if along the way I say I would like to see Venice, obviously the left one that's currently selected is not that feasible any more because it's a detour. So, you know, Google already does something, you know that that is almost superhuman, right? Within milli. I mean we get, we are so used to it and it even influences traffic, right? Because we look at it and then we follow the plan. And while we still flee to drive where we want, it heavily influences traffic in general.

Fabian Jakobi [00:13:08]: Right? So from our perspective, it's now possible for the first time to actually build a navigation system of work, so to speak. I will show you now how that looks and give you a very, very short demo. Again, it's like the first one we do, we've just deployed the first customers in the last weeks. And so I'm pretty excited to dive into it. What does a work navigation system look like? And it also explains how we do it. So it will be very visual in terms of like how we actually solve that planning problem. Right. So as I said, work is nothing else but a chain of tasks, which means every job to be done or every destination I want to go to is nothing else but like a case or a top level task.

Fabian Jakobi [00:13:57]: In this case, it might have been created from an order system. And as you can see, there was a new order, it was shipped at a certain date and there's some context, maybe some missing and already an email or some system information, or a colleague of mine tagged, tagged me with some information about the fact that the customer already experienced some delays. And as you can see, it's collaborative with humans because a lot of that knowledge is implicit. So we can't rely on structured data and whatever the leading systems to have all the information. A lot of this is actually known by the employees. So we believe that we need to capture the data where it currently is, which is the frontier of these processes. And so we believe in a very, very knowledge worker human centric experience. And we also believe that the U is actually a massive component of how we get that production, actually the productivity increase actually into production.

Fabian Jakobi [00:14:51]: And so if I for example, were to now solve this, I could manually do it, which is as simple as a to do list. But obviously that's not the fun about it. The fun about it is that the system can plan automatically. And what it now does in the background, it essentially will go through all the feasible ways that were there. It will contextualize them using a combination of a knowledge graph, something very similar to process mining and an LLM for the last mile to collate this information. And as you can see, it proposes as a first step to triage this to my colleague Jaime, who in this case would have then essentially be automatically tagged. And he told me that this customer actually because I was missing this information, was carpenter solutions if I would change the context. So I will reject this plan and I will replan given new context it will now go through and with that new context maybe come up with a different plan because it clusters differently.

Fabian Jakobi [00:15:48]: As you can see, there is now a new task in here which is to give the customer 10% discount. Now this only happens because there was a precedent case in this that by the way, you can see here that it is clustering. We're using like a re rank model here as well as hierarchical clustering to come up with the most similar precedent to a case like that. And then it will take the pathways and it will see that on those pathways of all those similar cases there was like a discount for specifically this customer. I will show you in a second how exactly that works on the backside because we actually visualize that because we believe like showing what the software does is going to be a key component to actually getting into the production of massive insurance companies that run their claims on it. They from our experience don't even want like immediately black box agentic. Everything is machine to machine automation, but rather like an iterative approach. And we strongly also believe that that's the way to go.

Fabian Jakobi [00:16:52]: Because as I said before, a lot of the frontier data and company specific data isn't even collected. It might still be in that 70% of manual work. Right. So let's go ahead and effectively accept that plan. I could now go in and do the same thing for each subtask, right? That also means that we can work on more complex thing over time. You could yet again change the context here, which essentially would mean maybe there is another deviation to that plan. And also going forward, we haven't built that yet. We will be able to dynamically replan over time.

Fabian Jakobi [00:17:29]: So as I said, all of this is based on the actual precedence, which is rerank model and a clustering algorithm, which obviously is pretty good because it's built on top of a knowledge graph. Which also means that oftentimes we can use the relationships and not the LLMs to make it more reliable on the input side. But the one thing we want to present today for the first time is that we have built what we call the flowchart, or we actually call it pathfinder internally, which is effectively visualizing exactly these points. So let's for example go to a similar case here that we might have seen before. Right? Let me choose one. Maybe we just take the one that we were just in, for example this one. It's a case that's been resolved. As you can see, all the tasks are marked.

Fabian Jakobi [00:18:18]: It was completed. I could go back to the thread and read how the humans or the agents actually worked inside this case. But let's look at the flowchart. What the flowchart does, essentially it visualizes the different patterns of the similar processes. There's one process that we are currently in which is this case 61429 and we can actually see where this went. Right? We still have to work on layouting a little bit. Right now it goes all over the place. The future will go from top to bottom.

Fabian Jakobi [00:18:50]: But what we can already see is that it follows certain clusters. The size of the dots within these, by the way, show the size of the cluster. Which means many of the precedent cases here that we can saw before actually run through a cluster that by the way, using JPT mini like we're completely model and cloud agnostic. So you can choose the models for each of the sub agents that do most of this work. But in this case, JBD mini for us is also dynamically labeling these clusters. So after we have created them, we just dynamically label them, which means suddenly they have a lot of meaning and it Will say, well, there's invoice management and client follow up. I mean, that's not really a surprise given that it's an order case. But there's also some things that might be more interesting.

Fabian Jakobi [00:19:35]: For example, there is a triage assign case triage. And as I said, you can highlight the current case we're on. So you'll see it runs through this cluster. But let's expand that, for example, and we'll also see that there's another case triage in case. And as you can see from the title, these ones all went to Fabian, which by the way, the current case did not, while all of these ones went to Jaime. The same thing with the invoice generation task orders, right? I can go all the way down to where this specific case went. But more importantly, for the first time, it will be possible for possibly thousands of employees to just work into a case management solution unstructured the way that would work with teams and Slack. And it would essentially, out of that reality, infer the right plans that are grounded in precedent, right? And even make all of that 70% of manual work accessible both to humans.

Fabian Jakobi [00:20:31]: Which is great because that's why we built it, because we believe humans will really like that control, that they see what actually is behind it. But the reality obviously is this also just shows exactly what we're feeding to the planning agent that then essentially will come up with the plan. So that was essentially just showing you the product and to finish off the whole presentation, that approach essentially. And that also ties me back to the actual title of the presentation, right? We believe that if we solve the planning piece, we actually have a shot at above 10x20, 30x productivity for knowledge work in end to end processes, even the complicated ones. For the simple reason that if you find the right plan based on precedent, so it's grounded, it's not just made up, right? We use agents everywhere, but effectively it's still a plan rooted in reality. As we all know, models are much better if they have a really good context and the right input data, as if they just make it up on the fly. Specifically if it's data, that it's not even in the public domain. But once you have each of the steps, we can now assign human agents, NAI agents to each of the sub steps to show you that.

Fabian Jakobi [00:21:48]: That's also how we go from for example, search, which means collect exactly the right memories and information and entities like, for example, who worked this task before? In similar cases like that, we cannot just tag humans. We can also go Back and tag agents to each of those sub cases. Now I have to log in again. For example, I could go in here, tag, for example an agent that's called odin. We just heard from Stack Overflow, they are offering an API. So for example, we could also take external agents that have access to other knowledge and literally ask questions very specific to that case. Just like with humans, for example, send an email to the customer with an apology and the 10% discount. Right.

Fabian Jakobi [00:22:42]: And Odin, who is our general agent essentially the project planner will now obviously just behind it, it's plain vanilla LLM from GPT right now and obviously you could choose that. But as we add more agents to the platform, you know, like these agents can be assigned to any of these subtasks Already starting to go search in the Internet, search the internal repository, search the knowledge, knowledge base of those companies to collate as much information as we can. And in the second we have the right precedent information and knowledge for each of those subtasks. Essentially we call them mini knowledge graphs. Right, Like a knowledge graph around each of those task clusters we just saw. We will be able from our perspective to automate a lot of that singular task level even with machine to machine automation. Because the smaller the problem and the higher or better the fidelity of the context, the more accuracy we get out of the agents. And while there is massive value in trying to make the models larger and larger and give them more capabilities, we believe they're already good enough if you make the problem small enough if you assign them to the right agents with the right capabilities.

Fabian Jakobi [00:23:48]: Just as you assign tasks to the right humans with the right capabilities and skills to essentially get the highest amount of throughput and the lowest amount of, and highest amount of productivity for end to end processes at medium and large companies. And so that's kind of our approach, grounding it in reality and inferring it from precedent and yeah, that was it. Skylar, I'm happy to take questions, but thanks for listening and yeah, super excited to also later on Connect on LinkedIn or anywhere and chat about agents and the future of planning.

Skylar Payne [00:24:24]: Yeah, definitely, yeah. Thanks for coming. Thanks for sharing your knowledge. I'm going to switch over and see if we have any questions or anything going on in the chat. I'm seeing lots of thumbs up seem some folks wanting to connect. I don't see any specific questions. Remember, reminder to everybody, if you have questions, you can go ahead and throw it in the Q and A section. All right.

Fabian Jakobi [00:24:58]: I mean you can always reach out. Happy to connect to our AI teams and we're always happy to chat to whoever is currently building agents or providing APIs for agents to come into these orchestrated workflows. And I'm super happy to also connect afterwards.

Skylar Payne [00:25:16]: Awesome. Yeah, I'm sure some folks take you up on that. All right, we have a point. Now, do you manually review all the tasks and resolutions suggested by the agent?

Fabian Jakobi [00:25:28]: So we believe that this should be up to the company. Right. We're planning to build what we call accuracy control into the platform sometime end of next year and essentially will be a combination of looking at precedent cases that were very similar, where we also know that they're quite uncomplex. Right. And then we're actually going to do output QA and decide whether or not we think we can reach a certain accuracy level. And that's a statistical prediction in order to decide if we just let the agent go ahead. Right. But from my experience working with very large customers here in Europe and in the States, we signed like about 20 co design partners which are anywhere from like 500 employees to I think the largest one has like 80,000.

Fabian Jakobi [00:26:17]: We feel that they don't want to have like a full self driving platform yet. You know, I think it's absolutely fine to argument a lot of it and just speed up the end to end process. And as you saw on the Pathfinder, we will be able to show how long that process took as well, just like Google does it. So we'll be able to show very quickly that it already yields a lot of value and then over the Next, let's say 1, 2, 3 years when all of that precedent is collated because yet again, most of the data doesn't exist in the current form. Right. And we don't believe it's going to be as simple as take the inbox and just import it. I mean we will do this as well to get a first step, but we believe it needs to be an iterative approach between knowledge workers and agents working together to come up with enough precedent in order for us to then make that prediction and essentially try to literally dark automate some of those simpler tasks. So we actually believe the customers want to.

Fabian Jakobi [00:27:11]: Obviously we could now just turn this on and say just do it. Right. So it's up to the use case and to the customer, depending on how much risk, because we're not doing this for some stuff like sending out outbound emails or cold emails. Right. We actually want to run the claims processes or the import export processes at logistics companies. So I hope that answered the questions a bit.

Skylar Payne [00:27:32]: Awesome. Yeah, thank you so much for sharing. We're at time now, but really appreciate it. Yeah, take care.

+ Read More
Sign in or Join the community

Create an account

Change email
e.g. https://www.linkedin.com/in/xxx or https://xx.linkedin.com/in/xxx
I agree to MLOps Community’s Code of Conduct and Privacy Policy.

Watch More

26:24
The Creative Singularity is Here // Pietro Gagliano // Agents in Production
Posted Nov 26, 2024 | Views 940
# Singularity
# Transitional
# Forms
Generative AI Agents in Production: Best Practices and Lessons Learned // Patrick Marlow // Agents in Production
Posted Nov 15, 2024 | Views 2K
# Generative AI Agents
# Vertex Applied AI
# Agents in Production