MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Managing Small Knowledge Graphs for Multi-agent Systems

Posted May 28, 2024 | Views 289
# Knowledge Graphs
# Generative AI
# RAG
# Whyhow.ai
Share
speakers
avatar
Tom Smoker
Technical Founder @ WhyHow.ai

Technical Founder of WhyHow.ai. Did Masters and PhD in CS, specializing in knowledge graphs, embeddings, and NLP. Worked as a data scientist to senior machine learning engineer at large resource companies and startups.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

RAG is one of the more popular use cases for generative models, but there can be issues with repeatability and accuracy. This is especially applicable when it comes to using many agents within a pipeline, as the uncertainty propagates. For some multi-agent use cases, knowledge graphs can be used to structurally ground the agents and selectively improve the system to make it reliable end to end.

+ Read More
TRANSCRIPT

Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/(url)

Tom Smoker [00:00:00]: There. So, name? Tom Smoker. I was on a call yesterday where someone said, is that a fake Zoom name? I was like, no, this is real. This is my genuine family name. Company is WhyHow.ai. Title? Yeah. Technical founder, CTO. How I tame my coffee is coffee flavored water.

Tom Smoker [00:00:18]: It's just boring and black. It's, yeah, about as basic as it gets. But I am from Australia, so actually do love, like, really good, kind of like single origin stuff. I'm very, very into coffee, but always black.

Demetrios [00:00:33]: Welcome back to the ML Ops community podcast. I am your host, Dmitri Yos. And today, talking with Tom Smoker, I had a blast just thinking about knowledge, graphs, learning how he is using them to make his rag systems more reliable. And really, what he said that stuck with me and will stick with me. I'll probably end up quoting this every two out of three episodes. So you heard it here first, be exactly wrong. And that's what he's going for when he's creating his systems. He's trying to make sure that he can be exactly wrong instead of exactly right, which he breaks down into more detail.

Demetrios [00:01:17]: He talks about how he wants to be exactly wrong because they're for the types of systems that he is breaking down. It is the start of creating a deterministic system with AI and with LLMs. And one way that he's found to do that is by using knowledge graphs and of course, knowledge graphs. One piece of wisdom that he dropped on me with the knowledge graphs are that rice to a farmer is not the same thing as rice to a chef. And so you want to have a knowledge graph that can understand the differences and how these two things are linked but different. For those of you that have been wondering how to bring your rags up a notch, this is the conversation for you. However, he did warn at the end of the conversation that he's seen people dive into trying to optimize their rag systems a little bit blindly. So he breaks down how you can make your rag system 100% accurate and perfect, flawless.

Demetrios [00:02:30]: But what kind of business value does that actually bring? What kind of like. So what is basically the end statement there? Yeah, you created this chatbot that can tell you how many days of holiday you have left, but so what? And how much engineering time or resources went into having to really optimize it. So you have this chatbot that can tell you the HR policies, when maybe just a search bar is sufficient. Let's get into it with Tom. If you liked it, as always, share with one friend or give us a star. Give us some feedback on Spotify. I'm reading all of them. See you on the other side.

Demetrios [00:03:23]: When I read a paper on how to mitigate hallucinations, and it was basically like 33 different techniques on mitigating hallucinations with different strategies. And one of the strategies was using knowledge graphs. So it's probably good for us just to start with, like, what got you down the rabbit hole of knowledge graphs? And I think it was academia, and I know that we. We also were talking about your shirt. For anybody that's just listening, you've got on the archive shirt, you know, showing it proud. So give us a quick rundown on how you got deep down that rabbit hole.

Tom Smoker [00:03:59]: Yep. Yeah, yeah. So, academia, absolutely. The academy, I did a master's in electrical engineering, and I got exposed to a specific thesis with a company that was a large mining company, public called BHP, when I was working in Australia, and my supervisors, Tim French Weilu, Melinda Hockiewicz, were amazing, and they really started to suggest. And so Tim specifically was my primary supervisor, and he's a logician, so he works in computer science, works in math, but he's a logician. So this was 2016, 2017. And he was like, there's representation, there's ontology, there's these things called knowledge graphs. Neo four J was starting to pop off, and it was like, let's understand and see what representation looks like as, like, this is when computer vision was really starting.

Tom Smoker [00:04:43]: So from there, and I won't bore you too much with the details, but from there, really enjoyed it. Did a masters in comparing expert systems and ontologies and knowledge graphs, enjoyed it. Published my research, got myself into a PhD program. Same supervisors, now doing a PhD in cs, just did the same. Did knowledge graphs and knowledge graph embeddings when embeddings were really starting. So I've been doing this for a minute. Yeah, so that was PhD started in 2018.

Demetrios [00:05:07]: Can I just point out that logician may be my new favorite word? Cause it's basically like a magician, but for logic. Is that how you rationalize it?

Tom Smoker [00:05:16]: That's how I think about it. He's got some great quotes, too many to fit into this podcast. I got a lot of respect for him, but one of his best is that he doesn't code. Doesn't code in language. He codes in pen and paper. And I always think about that a lot. It's the purest form. Wow.

Tom Smoker [00:05:30]: But, yeah, absolutely. It's like, really. I think logic is a constantly relevant, you know, art and science, but very much like, how do we think about what happens? And so I was actually, I did a undergraduate master's in electrical engineering, but I actually minored in philosophy. And so the combination there was really common, and philosophy of logic, philosophy of science and representation.

Demetrios [00:05:53]: So tell me about knowledge, graphs, and how they are incorporated into rags because it feels like everybody's talking about them. Feels like something that is coming up because of this mitigation of hallucinations, where you can get better results in some cases. And I want to go down which cases those are. But just give us, like, the breakdown of what you've been seeing lately.

Tom Smoker [00:06:14]: Yeah. So the context for why. How is. My co founders and I were working in this space early last year, building legal rack, right, which was information retrieval, but really, like, hey, you can type a question in natural language that you can get an answer back. And we started to get these hallucinations, and I want to say off the top, that. And I said this in my talk at KGC recently, LLMs only hallucinate, or models only hallucinate. Like, the amazing thing about generative AI is that they're creative. And I think Andre Kabathy says it really well.

Tom Smoker [00:06:44]: It's a much broader quote, and there's a tweet about it. You can find it. But LLMs don't have a hallucination problem. LLM assistants have a hallucination problem. And I think at the time he said that was really right, that the, we're happy with the hallucination until we're not. And so we would get people saying it's hallucinating, all these edge cases. And I think at the time, like, misattribution was a really high hallucination. So someone quoting someone else.

Tom Smoker [00:07:06]: But when you look at the models, we had to rethink what we were doing because we were working in very specific, very high value legal cases, which was like, if, you know, if I offer this person this amount of money or this percent in this fund, does this trigger any clauses? And so these questions can't be wrong. So we were like, look, the risk of hallucination or the risk of error if everything's a hallucination, is a problem. So we had to think about, well, how do we represent, more specifically, how do we create determinism for ourselves, even if that's scoped in? And again, because of my background, we came to knowledge graphs, we came to representation and some level of, like, visibility and explainability in the answer. So then you can adjust, constrain, and then repeatably produce the results that you want.

Demetrios [00:07:53]: So this is the fascinating thing, the repeatability I think everyone is going to be on board with. Now when you talk about knowledge graphs, I had been vaguely familiar with things like neo four j or I think there's other ones like uh, OrangodB. And you get these basically knowledge graph database type things, tools that are out there. Is it just replacing a vector database with a knowledge graph database? Is that how you see it, or do you add both? What do you look at when you're designing the system?

Tom Smoker [00:08:29]: Yeah, so we are at Yau, we're workflow tooling on top of infrastructure and we work with a bunch of different infrastructure people. So there are graph databases, neo four j we work with. I'd like to give a shout out to Kuzu DB out of Waterloo as well. Ephemerisanth, they've got an amazing, they're like the duckdb for graphs. And I think what they're doing is really cool. Use them. But we use vector as well. I think there's semantic similarity in vectorization, especially embeddings, and different types of embeddings I think are really underexplored.

Tom Smoker [00:08:57]: And so for us, information retrieval is a process. It's not a one off like it's a system. In our case, it's a multi agent system. And some of that agency is going to use vectors and semantic similarity and representation. Some of it's going to use graphs and knowledge graphs. So at its simplest form, you can replace a vector database with a graph database, with a relational database. They're all similar in that place that you need to get information and you want to augment the cause that you have. But I think if you look at it as a pipeline, as a system, I think different databases apply in different ways.

Tom Smoker [00:09:31]: And why, how we use all of them, and I personally use all of them in terms of where they're most applicable.

Demetrios [00:09:38]: So how is this working? Do you have like a proxy that decides which database to go to?

Tom Smoker [00:09:44]: Right? Yeah, so I mean, there are ways to do that. I mean, blank Gen has awesome stuff around function calling and tools and how best to call from each one. And I think conditionally doing things, you know, conditional logic in agents is starting to be really promising. But for us it's actually like a sequential pipeline. So we embed, chunk embed, and then store things in vector databases, and then we use that vector database to retrieve and then build a graph. So it's actually like it's in sequence in the way that we like as kind of an opinionated workflow. And so it's all part of a larger system, and it's taking advantage of each of the benefits. Ultimately, what I want to be querying is something that gives me the same answer every time.

Tom Smoker [00:10:27]: We spend a lot of time trying to be exactly wrong, which I think is a cool term in this. Cause I think you can be almost or probably right really quickly. But being exactly wrong is the first step to being exactly right. And so putting it in a graph for us was a good way to store and then see also, like, it's, you can kind of see the schema and then change the schema. And I think that's really powerful.

Demetrios [00:10:49]: What do you mean by exactly wrong? That is a funny phrase. I instinctively laugh, but I'm not sure I fully understand it.

Tom Smoker [00:10:56]: I don't know if it's a broader term. It's what we talk about internally, which is like, when people talk about determinism in anything, in LLMs, again, they're generative, right? I don't think you can get determinism out of an LLM, and I also don't necessarily think you'd want to. I think they're this amazing. I went to a talk the other night at Freepik around creativity and the creator economy when you're supported by these generative models, and I think it's amazing what can be done. But I think getting things exactly right is a really high, really, really hard ask, and I think it's difficult to guarantee, but I think it's much easier to get it exactly wrong. I think you can prove that really quickly. So for us, we can make a graph and look at it and be like, this does not represent what I want it to represent. Right? And that's a step towards progress, because even if I get it exactly wrong, that means my answer's going to be wrong every time.

Tom Smoker [00:11:45]: And that's for us, like, at least, like, if you walk back, that's the step you need to be at, because if you can, you can make it probably right. But then what does that mean for a customer? What does that mean for a product? What does that mean for reliability and consistently consistency? So exactly right is obviously the goal. I think it's lofty, and I think it's harder to prove exactly wrong is like, okay, now we have a framework for determinism, and now we can iteratively improve it to get it where we want to.

Demetrios [00:12:09]: Oh, okay, I see where you're going with that. Now, this is fascinating stuff to think about, but how does the data and what you're doing in the system change? Because I'm very used to this idea of, all right, we've got the embeddings or we've chunked it, we've got the embeddings, we throw it in the vector store and all that. Do you need to now be collecting new data? Do you need to have different sources? How can you make sure that the knowledge graph is as robust as it can be?

Tom Smoker [00:12:47]: Yeah, we just put out an article about this, which is like this broadly, there's like a four square matrix about this, which is like, does the underlying data change or not? Like, is the underlying data static or dynamic? And then is the schema static or dynamic? Right. And so knowledge graph schemas are really interesting. You can call them ontologies if you want. I think that the lines blur with knowledge graphs. I think the term has been taken. There isn't a standard definition. This is a pretty good one from ontotext that I like. But I think people use it in such a broad way because it's so accessible.

Tom Smoker [00:13:19]: People can just see a graph and be like, oh, I get it. And then it's very difficult to do in practice. So it's very accessible at a high level, very difficult at a low level. So what, what changes there is like all the, all the specifics around it, but the way we think about it is create the schema, which is kind of like the types of your, what we call nodes and edges that can change. For example, that's like people place price like those, those broad categories, and then the underlying data, as you can imagine, like there's different people in your database, right? There's different prices in your database. So that's an example of it, of changing. And underlying data almost always changes. To give an example, it's like we work with like a veterinary radiology clinic, right?

Demetrios [00:14:01]: Yeah.

Tom Smoker [00:14:03]: The diseases that can happen to an animal don't change that often. That's in the literature, right? The patient specific data that can change a lot. So those two are in combination. And so the scheme, if a new disease is discovered, sure. If a new type of disease is discovered, then that's different. And so it's constantly moving and shifting. All of this is to say that why we like using graphs for this particular problem is that the schemas are extensible. The data is extensible.

Tom Smoker [00:14:33]: It loads in query back pretty easily. You have some level of structure that you don't necessarily get if you embed in latent space. And then you get some level of flexibility and change that you can't necessarily create easily in a relational database. And so that's where it's kind of the middle ground, which is useful. I hope that answered it. I'm not sure if that answers it.

Demetrios [00:14:53]: If I'm understanding what you're saying correctly. It's basically that when you are able to use a knowledge graph, it over performs, or it performs best when you have almost like this structured data and things that they're going to be changing, but also things that aren't changing. So that's where if you talk about this veterinarian and you have the diseases, but the patients can get new dogs or new cats, they can change their house, whatever it may be. Their dogs get older, and so all of those things are changing. But then there's certain things, like the way that this disease manifests itself, that's not really changing.

Tom Smoker [00:15:37]: Yes. And so this is why we like the idea, we're promoting this idea, but it's also just makes our life a lot easier, which is like, it's easier to make a smaller graph than it is to make a big one. And I spent a lot of time in academia and in industry working on enormous large scale graphs in, in conferences and then in practice. And the two situations I just described to you, if our goal is create some structure that we can change when we're doing our retrieval, right, that's difficult in just pure, just pure embeddings, and that's difficult in relational databases, not doable. It's just difficult. What we like is being like, well, I'd rather just make two graphs. I'd rather make one graph that has the diseases that I can iteratively bring in over time, because I don't want to bring in all of them, I just want to bring in the ones that matter to me. And then I'd also rather have a separate graph, that is the patient information, two different schemas, two different graphs, and then use them to kind of logically compare, because again, they have different dynamics.

Tom Smoker [00:16:39]: One is changing, the age is changing much more frequently than, like, the diseases that can happen change.

Demetrios [00:16:48]: So are you using knowledge graphs mainly to sanity check what's retrieved from the vector database?

Tom Smoker [00:16:55]: In a lot of ways. And so that's one of the ways we can also go straight from CSV to graph, right? That's a level of structured data we can go straight from. I think LinkedIn recently put out a paper around the Jira API like that, semi structured data. We can go straight to graph from that. But when it comes to Saturday checking, yes, I really like that as a term, because if you're looking at unstructured data, if you're looking at a PDF, it's not templated, right? It's just raw text. If you're also looking at combining that PDF with the output of the LLM, which is really useful. Yes, it's very useful to look at what it made and be like, hey, that's almost right. But here's where it's wrong.

Tom Smoker [00:17:32]: And then most graph, like, most graphs, full acid or full crud. Right? So it's like whatever I want to do, I can just delete this one if I want to. And now I have the repeatability. But I got rid of the thing that was off.

Demetrios [00:17:47]: Yeah. Because I think if I remember correctly, the way that I saw knowledge graphs being used were to check for facts in the output. And so there was, it was, I think it was a smaller model that would just, it was trained to check for facts and then query a knowledge graph to see if those facts were true. It was query a knowledge graph or search the Internet, I think were both of those. But basically, like, if you said, you know, this person was alive in 1920, in your output, the smaller model plus the knowledge graph is going to be able to fact check that really easily.

Tom Smoker [00:18:24]: Yeah. So, absolutely. I think it's a broader term called structured grounding, which I have seen recently, which I really like, which is like, hey, we have this like external fact set, like basically like make it like structure Wikipedia. So when I asked this question, it's like I'm getting the exact answer back that I want. Right? Like we're working with, we're working with a customer at the moment who has like a, like a HR or customer care chatbot, right? And it's like when someone asks, I want to call someone about this problem, like which phone number do you bring in? And that can be a very difficult thing to find from raw text, right? And I think an interesting hallucination, and again, I use that word, I think it's difficult to use that word, as I've said. But an interesting hallucination that I've noticed is mean reversion that I see a lot. Like, I think a lot of people think about hallucination as the edge, right? You lead, and you lead an LLM into a dark room and then say, figure your way out. And then it's like it comes to this, like it just creates this, whatever, this magical, this new thing, whatever it may be.

Tom Smoker [00:19:25]: But if you look at something like code generation and you look at the language solidity, which is heavily based on JavaScript. A lot of the hallucinations in generating solidity that I've noticed, I haven't seen research on. This is JavaScript syntax, because it's reverting to the mean of what's most likely, and that seems really common with these models.

Demetrios [00:19:44]: Oh, fascinating to think about that. Like it's just going back to what it knows.

Tom Smoker [00:19:49]: Yeah. Or it's likelihood. Right. And so it's like, let's just find the most likely thing. And so in the veterinary example, instead of finding the rare disease that could be the value of the diagnoses, it finds a disease that is common to a similar animal naively like. And again, I want to be specific about the vectorization. I mean, llama index has some amazing stuff around this. Like you've said, it's probably an infographic of theirs or a post of theirs that you're referring to all the types of hallucination.

Tom Smoker [00:20:20]: There is many ways that you can improve this process. I'm referring to knowledge. Graphs is one of them and one that we like, but this is not the, I don't want to suggest that it's the only way, like what I describe as quite naive retrieval there. There's a lot of different ways to do it, but for us, yes, the control and the specifics and the scope is really important. So then the LLM is only hallucinating within the bounds of what we told it. One of the best hallucinations I've seen. I don't know if you've seen this, but it was a tensorflow hallucination, and there is a, if you've used the Adam or atom before in tensorflow. Right.

Tom Smoker [00:20:56]: I saw a post on LinkedIn, it got a bit of attention, I think, but I don't know which model it is, and I don't want to say which model, just in case, but it hallucinated Adam as John, and then it got the error. There was like, it got a compiler that was like oh hey, there's no John, you know, method or property. And I thought that was really funny, but it was funny on the surface. But then like quite worrying in the absolute, because then like an entirely unbounded latent space with mean reversion is what we're dealing with. And so for us, we're like, look, let's just keep the scope really specific in terms of these things are going to hallucinate. That's their job. Let's make them kind of still be the bumper car that they are, but put the guardrails in really specifically, and graphs for us are those guardrails?

Demetrios [00:21:44]: I really like this idea of basically tightening up the scope as much as possible so that you don't allow the knowledge graph or the LLM or the embeddings to play in this infinite space. You're really trying to say, here's where it is. Just go look over here. It's going to be there, I promise. Type thing. What are other ways that you can do that? Not necessarily. Or what are other examples of you doing that I guess is a better way of putting that question?

Tom Smoker [00:22:17]: Yeah, no, I mean, I can talk to a bunch of ways outside of graphs. I'm not sure if it's in the context of this, but like, I just want to make sure like, you know, re ranking from cohere is, is great. There's the agentic workflows, which I'm happy to talk about. We do a lot of multi agent stuff internally, but like, you know, having checker agents, having conditional agents, I think works really well. Again, basically anything, I said this in my talk, anything advanced rag from, from Llamandex, I think is a really good way to start with this because they structured out really well. What does pre processing look like? What's optimal chunking? Luncheon has similar stuff. Like both of those groups have really explored this and seen this really well. So again, it's a process, not a system.

Tom Smoker [00:22:52]: And so there's a lot of incremental ways you can prove. But yeah, for us with the scope, to give further examples of scope, why we think about smaller graphs rather than one big one, is that like a motivating question for us is like rice to a farmer is different to rice to a chef. So how do you put rice in a database if you want to create some level of consumer interaction? Because there's a buyer and a seller there, right? And so it's like, well, I can't just have a rice node or a rice sell. I have to have some other context. How do I represent it? And so for us it's like, well, scoping in, this is the farmer and this is the chef schema. Then rice within that is in context I think is really good. So scoping in contextually is something that we work on a lot for this stuff.

Demetrios [00:23:43]: Give me more examples. I love that the rice to the farmer and rice to the chef. And of course, when you think about it in a knowledge graph way, it makes complete sense because you understand the relation between those two. And it also makes sense on why it would break down in a traditional relational database.

Tom Smoker [00:24:03]: Yeah, I probably should have said this at the top. I mean, I've been a part of the graph and representation community for a long time, both in practice and in pure and research. And there are great use cases for large, like ontologically aligned, well described knowledge graphs. And I use them a lot in my PhD around like, free bases. Your nails. I think they're great. I'm specifically talking about, I'm trying to solve rag problems and I'm trying to use this architecture to solve rag problems. So I want that to be clear that I think it's difficult to maintain those worldviews in this context.

Tom Smoker [00:24:35]: But another example would be, I worked with a legal document and they had definitions. A lot of legal documents had definitions page. And they would say vehicular capacity. Right? And my, like, just reading that, like, what do you think the vehicular capacity is going to mean?

Demetrios [00:24:50]: How many people fit in the vehicle.

Tom Smoker [00:24:53]: Cool. It's how many cars fit on a road.

Demetrios [00:24:55]: Oh, yeah.

Tom Smoker [00:24:56]: So it's. And that's. And so it has to be in con. In context. You can figure this out, but how do you, how do you put that context in? So it's like you need to tie that definition to that context every single time. And then as you start to try and represent the whole world, there's logical inconsistencies that happen really quickly. Plus, if you want to update over time, how does that work? So, yeah, it's those sort of problems that we are like, scope it, right? Scope in, and then now you can talk about it as if you're in this context.

Demetrios [00:25:28]: Have you seen ways like, when is a knowledge graph too small, or is it ever too small?

Tom Smoker [00:25:34]: Yeah, I think that's a great point. So when I talk about small versus big, it's really like a very accessible way for people to think, like, because I've done this before, I know all the pain points. Making a big knowledge graph is great in theory and so difficult in practice. The ROI, like, there are some great use cases for it, but I think that some people stop before they build it properly. And so for me, when I say a small graph, it isn't a small domain. It's really like, start with a well connected subgraph first, like almost like a just in time graph, and then build that out as you go. So when is a graph too small? Is an interesting question, because it's like the answer is when there isn't enough context to answer your problem, I would say. But I think that's often the point of the graph is efficient navigation.

Tom Smoker [00:26:15]: And traversal but really, you only need a couple of nodes to answer your question. Like, we think a lot about completeness and why graphs are useful for completeness. Like, if you look at diagnoses, this would be an example where vector databases may struggle, because it's not necessarily like a quantitative use, right. It will give you the best diagnoses back. But if your job as a, you know, as a medical professional is to look at all the available diagnoses and then make a decision based on patient history, then you actually want all of them back, not just the most likely or the best ones. And so something like a graph, like, you can represent the entire thing, and so then you can get back the full scope of information. So that's like how small the graph is. Like, as long as it has all the diagnoses in it, then that's really as big as it needs to be at that.

Tom Smoker [00:27:04]: Again, just in time. It's just whatever answers your question.

Demetrios [00:27:07]: And are you creating new knowledge graphs for each use case?

Tom Smoker [00:27:13]: Yeah. So a lot of times, yes, but we like what we. And again, I don't want to be more than willing for people to check us out, but very quickly, like, our job is just to make this. How quickly can you make a graph and how quickly can you change it? But that's the whole process, which is like, don't sit off and wait. Like, we can get it down to like a minute. Just toss a PDF in a schema, like, however many pages you need, just immediately see it be like, this sucks. Change it, fix it, and then iterate. And so, yes, like, graph management is something that we do, but when you look at trying to solve the, the edge cases, I shouldn't do that.

Tom Smoker [00:27:45]: But, like, the difficult problems in rag, the ones that aren't immediately solvable, they are point solutions. Right? It's not a one size fits all. It's usually at the bounds of what is represented at granularity or at condition or at specificity. So, yes, making a graph per thing. So our job is to make that as quick as possible.

Demetrios [00:28:01]: All right, real quick, let's talk for a minute about our sponsors of this episode. Making it all happen. LatticeFlow AI. Are you grappling with stagnant model performance? Gartner reveals a staggering statistic that 85% of models never make it into production. Why? Well, reasons can include poor data quality, labeling issues, overfitting, underfitting, and more. But the real challenge lies in uncovering blind spots that lurk around until models hit production. Even with an impressive aggregate performance of 90% models can plateau. Sadly, many companies optimize for prioritizing model performance for perfect scenarios while leaving safety as an afterthought.

Demetrios [00:28:45]: Introducing LatticeFlow AI. The pioneer in delivering robust and reliable AI models at scale. They are here to help you mitigate these risks head on during the AI development stage, preventing any unwanted surprises in the real world. Their platform empowers your data scientists and ML engineers to systematically pinpoint and rectify data and model errors, enhancing predictive performance at scale. With latticeflow AI, you can accelerate time to production with reliable and trustworthy models at scale. Don't let your model stall. Visit latticeflow AI and book a call with the folks over there right now. Let them know you heard about it from the MoPs community podcast.

Demetrios [00:29:30]: Let's get back into the show.

Demetrios [00:29:33]: Yeah, so now let's go into the points that you mentioned on when knowledge graphs aren't the most useful in the rag situations.

Tom Smoker [00:29:43]: Yeah, yeah. So, I mean, I use vector database rat like the vector database, and semantic similarity works really well, right? And I'd also say that graphs at the moment are really textual representations. And I think there's some amazing work. Good friends with the team at Marco, they're doing some awesome, awesome work around custom embeddings and multimodality, especially images. So I think that stuff is really useful for those, for those sort of embeddings. I would also say, yeah, it depends on how specific you want to be, because there's also a situation in which like, you can change a graph schema, right? But if you already have a relational database with this information, and you've already spent time describing that schema perfectly, and your problem is refer to this database really specifically, I think it is worth just using a relational database for that case. But a lot of, pretty much all of the problems we work with graphs can be used for, and I think vector databases can be used for, just can't guarantee the reliability of it. And I think relational database can be used for a lot of that.

Tom Smoker [00:30:43]: It's just sometimes difficult, and there's issues around natural language into query language, whether graph or relational or otherwise. So I'd say that the problems that you need to be specific, reliable, and change the schema of and update that graphs work really well for, and then problems that you need to be just like high level, like customer responses or like, help me book a trip somewhere or give me a recipe. Like, it's great. Use cases for vectorization. So I'd say that if you're okay with the average being really good, I think that you're okay with vector database. In a lot of ways he was learning.

Demetrios [00:31:21]: And if I'm understanding this correctly, if I've got my PDF documents, I'm throwing them into an embedding model and then putting them in my vector database, but I'm also throwing them into a knowledge graph.

Tom Smoker [00:31:36]: Yeah, that's a great question. What we actually do with the knowledge graphs is there a layer on top? So every node that we have in our graph, whether that's like, you know, type, person name Tom, has a connection back to the chunk, the embedded chunk text, where it came from. So there's provenance from the embedding back to the representation. So they're not necessarily distinct, they're connected. And so you can think of the graph as like an indexing layer on top of the stored data. And the value of that is that again with that scope? If, if you interact with the graph first in a deterministic way, then you just get back just the chunks that you want. I can't remember who it was at a, I was at a, it was either a lang chain post or a Lan chain talk. I think it might have been will from LangChain, or it could have been Harrison talking about context poisoning and how if you set like a top k of ten, so you return back ten chunks, and three of those chunks have the answer in it and seven don't, you're at high risk of making it worse, because if we believed that few shotting a model works, which I absolutely do, then you can few shot it wrong as easy as you can few shot it right.

Tom Smoker [00:32:49]: And so for us, like indexing on top of the information that we actually care about means that the graph can direct it down to just those three chunks, or at least much more like concentrated retrieval process. So again, in our process, like you can get back the graph, you can get back a summary of each node, you can get back the properties, but also we just have optional flags if you want the chunks as well, if you want the provenance, you want the full document, you want the document name. That's also like a process, so they're not necessarily distinct. I think you can think in a lot of times, and we make different types of graphs for different people depending on their use case, but help them do the same. But I think if you think about it as an indexing layer on top, I think that's a useful way to think of it.

Demetrios [00:33:27]: And so because of the order of operations that you were talking about, you're able to catch that if we take that k ten example and you get the seven back that are kind of shit. You're able to catch that with the knowledge graph and be like, all right, well, let's filter out these seven because it's already come out and we already know what the knowledge graph layer it is like a filter in a way.

Tom Smoker [00:33:52]: Yes. Yeah. That's a use case that we use a lot and I think is really you useful. I just, I think LLMs are the coolest and coolest thing. And I've been using LLMs for, I like to say, since before they were cool. I don't necessarily think they're cool now. I think chat GPT is probably cool now. I don't think LLMs necessarily up.

Tom Smoker [00:34:09]: I've been using them since they were, since they were models, since the terms came out, but I do not trust them to, there's a whole conversation about whether they reason or not. But like when I build these systems and want them to be reliable, like, I don't want to just like give an LLM a bunch of information and be like, figure it out. It's like there's mitigating strategies, but I just don't really trust it like that. So it's like I just want to give the LLM only the information it needs and no more. At that point, I'm pretty confident that it's going to fill in a really necessary part of a retrieval pipeline.

Demetrios [00:34:42]: Yeah, it's so funny that you mentioned that because how many times do we get the wrong answer because we've given it too much random context that has nothing to do with the actual output that we're looking for. And then like, of course it's going to hallucinate, right? Of course it's going to go off on something because especially if you're giving it in this case like seven times or seven over 1070 percent is actual crap that you don't need. That is where that context poisoning comes in.

Tom Smoker [00:35:18]: That's me obviously, like using for the sake of demonstration. I think if you, if you use re ranking, if you, if you chunk really well, if you do it, I think you can mitigate that to that point. It's just that I know that even in theory and definitely in practice, in my experience, like it, I can't trust it. Like I'm trying to give people reliable systems, right? So when I say, you know, three out of ten, the reality is it's probably, if you spend time really improving your rag process, I think it's much more likely than that. But it's not 100, right? And so the graph being exactly wrong, bringing it back to that, it's like cool. If it's exactly wrong, then I know that I'm getting back the same four wrong ones every time. Right. How do I improve that process? Instead of being like, I have these mitigating strategies, it's like I have some level of determinism to fix it.

Demetrios [00:36:04]: So let's talk about agents for a minute. What have you been seeing on that level and those use cases?

Tom Smoker [00:36:10]: Yeah, so we've been doing multi agents for a while and so we've actually been working on a few of them. But I think that like there's a really interesting question that we talk about internally, like what is an agent? Because is it a python wrapper around an open API, open AI API call? It's like probably, yeah, but also if you run a script, is that an agent, like, not an LLM? And so it's like what does it mean to be this? And I think the popularity is interesting. We work a lot on like structured grounding per agent and this is how we solve a lot of our problems. Because I'm not of the opinion that you just give a natural language instruction to a system and then it just goes away and does that and nails it every time. It's like you want to break it down into its steps that you understand, you want to decompose it down into steps and then orchestrate those steps. I think there's some cool frameworks that help you do this, but a lot of the time we build what are effectively functional programs because we want to I o test, I think programs, sorry, frameworks, like we use fast API pydad to instructor. They're really useful to uh, to build these systems out. But ultimately like every time I make a call I want to know, or I at least want to point test.

Tom Smoker [00:37:19]: And so breaking it down into agentic systems is actually a really good way to test and produce like, you know, well structured software, which we think about a lot, like so breaking it down into its components, making LLMs do as little as possible than having many LLM calls is how we like to structure it. And I think that at its core is like a multi agent system. I was in these awesome frameworks, like crew obviously are doing really cool work around like how do you do natural language instruction? But we use, we definitely use LAN chain in some, in some pieces. We don't use line chain for everything and it'd be fine. I like love to use it for prototyping. There's some great parts of LAN chain that we use in production around Langsmith, around their bedrock implementation is like a great way to test the different LLMs. But when it comes to wrappers, I think they're really great for testing. When I put stuff out in production, I'm not yet ready to use the wrappers that are available.

Tom Smoker [00:38:10]: So I think that we have multi agent systems, but a lot of that is just well typed. Well, I o tested and well integration tested, like different python modules that wrap calls to specific LLMs.

Demetrios [00:38:23]: And when you say it's well tested, is that basically, again, going back to the reliability of it, that every time I ask this one question, I'm going to get the same output means a few things.

Tom Smoker [00:38:36]: So shout out to Lang Smith and how well they do providence. And that as a program, I think is underrated in terms of how well you can see the different LLM calls and how that works. But I also mean literally to, as in, like, it's pie tested. So it's like, that is a process of like, I need certain coverage over my system. I don't like that. I've tested all of this well structured server, so it's testing the I o based on questions and then breaking those down into modules. So then, for example, GPT four omni comes out, right? And it's like, hey, four is too expensive. 3.5 isn't consistent enough.

Tom Smoker [00:39:13]: This thing is like it's right in the middle. How do we know if it works? And then I can just change the pedantic model name, and now I can just go into a dashboard and see every single step of my pipeline, conditional or otherwise. Is this thing as performant? Is this thing, what's latency like? So that's an important process that we do. But we're in production, right? So that requires a level of tested software. So when I say tested, it's like there's eye tests to it, but there's also literal full test suites and coverage around it.

Demetrios [00:39:42]: Yeah, I like it. And when it comes to the agents, you mentioned how each one, you want to have this structured grounding per agent or per agentic workflow, I think is what you mentioned. Yeah, break down exactly what that is.

Tom Smoker [00:40:01]: I think this is really cool. And I think about this a lot. But if I have a rag system, if I need to give every agent a little bit of information that doesn't already have, and I can get each of those agents 95% accurate, and I have a sequential workflow of five agents, that's 77% performance, right? On expectation. And I know that's not linearly distributed. I know some cases fail every time, et cetera, et cetera. But like, if you look at that, 77% is awful, right? And if you think about it, like, I don't know if I think people use the example of like what if my slack message like didn't show up every now and then? Or like what if I didn't trust like the email protocols and like did that? That's, it's it. The level of reliability is really low, and 95 is a big ask, even per agent, right? So what I think about a lot is like, that rise to a farmer, rise to a chef, like, well, if I can get both of those exactly wrong and I can spend enough time kind of like fiddling with the inputs, outputs and the crud of each structured grounded agent with a schema and a graph, then I can iteratively improve both of those to, you know, I guess maybe 100 is impossible, but asymptotically, like 99. And then now the propagated uncertainty is significantly less.

Tom Smoker [00:41:07]: And so when I think about multi agent systems, I think they are the future in a lot of ways because I work with workflows and agentic systems a lot, which is like, hey, I have a task that I need to do. Break it down. And once you've broken it down, you have a multi agent system, right? So yes, I think the difficulty is we're about to see an explosion of, we just saw a year of medium posts into no enterprises using this stuff, right? And it's like, and I think it's awesome and I love the exploratory nature of it, but I think we might see the same with agents, which is like, it's already not reliable enough per call to a model propagating that uncertainty. Or it might be actually decision making under uncertainty through Joseph Halpern. And I think it's like phenomenal thing that I care about a lot, but I feel like it's a massive risk, right? The turning point for me is going to be, do I give my credit card to a booking agent? And am I willing to just show up at the airport? Right? Once that happens, I need to put.

Demetrios [00:42:08]: More zeros in my bank account if I'm gonna be doing that kind of shit.

Tom Smoker [00:42:11]: Yeah. So that's it, or am I gonna show up? They're like, you booked the flight for tomorrow? Like. Cause what I want to say is like, hey, I'm not doing red eyes anymore, but I will get up early, I have this alliance membership, right? And I prefer one stop over here if I can. Like, how do I like? I should be able to write that as an instruction. The reason I don't is that I don't trust the systems yet, because I don't think each system, each agent is accurate enough. So what we work on is grounding each agent and then iteratively improving each agent, because then you can break down like a really good example that we're a long way away from ever getting this through demo, let alone in practice. But like a global supply chain of, like any sort of. So it's like you have a shipping agent, you have a procurement agent, you have a providence agent, you have a employee agent, then you have those.

Tom Smoker [00:42:57]: Those are also different depending on which whereabouts you are in the world and how do you produce and how do you supply. That should be many different agents doing many different things. That is a completely automatable workflow in theory. Just the difficulty of that is so much in practice. So that I look at is the ultimate multi agent system. Like international physical goods moving around as part of a supply chain. But each of those agents will have to be so specific and so accurate for you to be out of trust at it all.

Demetrios [00:43:28]: And. Yeah, exactly. Like, there's some talk about high risk. If shit doesn't work out, all of a sudden you've got a bunch of ships and shipping containers going to the wrong port or going to like, that's not good.

Tom Smoker [00:43:45]: Hey, we're about to see, I think. I mean, the Air Canada thing was already really interesting, but I think that if 2024 is the year of agents, which I've seen a few people say, I think we're going to see some mistakes before we see some really good systems. It's not physical systems, right? Because physical systems are easy to prove. But why I like agentic systems and it aligns with the small graphs that we have, is that then the specific person or specific domain expert who matters, who's in charge of procurement, it's like then they just deal with the procurement agent or agents and just a procurement graph or schema or grounding or knowledge, and they don't have to deal with the rest of it. I think modularizing those systems is like a really good way to build these, these, these processes and automate these processes. But we're about to see people be like, again, I'll book this holiday for you. And it's like you show up and there isn't a booking, right, that already happens. But these systems are so like, if you talk to an old head about this, like, multi agent.

Tom Smoker [00:44:49]: The multi agent system textbook that I started using three years ago to build these systems is from 2000. Like, this is not a new thing, right? Like, this has been around for a long time, and I think the history doesn't repeat itself, but it rhymes. Keeps coming up, or what's old is new again. Keeps coming up in this. And I think when you speak to, like, when I speak to, like, my supervisors about this, they're like, yeah, this is gonna get. It's already not good enough. But the financial opportunity of automating larger workflows is such that people will do it. And it's easy to build more agents than it is to build a better agent.

Tom Smoker [00:45:24]: Right now.

Demetrios [00:45:26]: It's going to get messy. Yeah, real quick.

Tom Smoker [00:45:28]: Sorry, I could talk about this for too long. We move on.

Demetrios [00:45:30]: Yeah, well, I love this because I've had that feeling, too. I made the joke when autogpt came out. It was like, yeah, everybody was talking about it. It was this huge thing. I think people are easy to project themselves into the future and what life is going to be like when all of this works flawlessly. But to get there, that last 10% that we need to get there could take us longer than it took us to get here. And so I was making the joke of, like, I've been playing with auto GPT and all that happened was I got like, a $300 OpenAI bill.

Tom Smoker [00:46:07]: Yeah, yeah, yeah, yeah. You made a good. You had a good post about this, right. Which is like, more use actually equals more cost in a way that I think people aren't talking about. And, like, I test a lot of different models locally. We use. I use OpenAI a lot through their API just because it's very easy to use. I think the models are great, but, like, there are so many different models, so many different embeddings, so many opportunities to fine tune.

Tom Smoker [00:46:29]: I think there's different levels of models right now, for sure, but, yes, a thought experiment for you. Cause I've asked a few people about this multi agent systems. I think in theory and currently in prototype, they don't break down in different places. They often break down in the same place. Right. Or the same place is the most uncertain.

Demetrios [00:46:49]: Yeah.

Tom Smoker [00:46:50]: With the booking example, right. There would be an agent that goes and gets the latest flights. There'd be an agent that goes and gets the latest hotels. There'd be an agent that goes and gets the latest goes and gets the latest car booking or whatever. There's an agent that, you know, maybe take your nationality and make sure you can go there without a visa, whatever it may be. Like, automates this whole process. I wouldn't trust it because I don't think it's going to make the best decision with all that information. But if I could write my own agent and bring it to that system, maybe I would like.

Tom Smoker [00:47:18]: Would you, if you could control your agent, you could see it, right. So we think a lot about this in AI. Like the master apprentice model, Jason Lew had a really good post about this recently around how. And we actually do this, which is like, a lot of the people we work with, it's like, it's report generation that they can review. So would you bring your own agent? Like, if you could bring your own agent, would you, do you think you would be more likely to use that.

Demetrios [00:47:40]: System and just almost, like, train the other agents how to do it?

Tom Smoker [00:47:45]: The other agents that they're all fully described as an API maybe, like, it's perfectly retrievable. It's small. Like, is the problem that you don't get to control the middle of the workflow? Do you trust yourself more than you trust them?

Demetrios [00:48:00]: Yeah, no, I still am very much like you. Like, I don't trust them at all.

Tom Smoker [00:48:06]: Yeah.

Demetrios [00:48:08]: And so I would much rather. Even if it's. It's funny because it is very much like the time versus comfort or convenience trade off. And right now I am. I would rather spend the time and do it right than that trade off. Because it's the same with, like, self driving cars, right? Yeah. I feel like self driving cars. I am a little bit more apt to let the computer take over the wheel, but with the agents, I'm like, I need more zeros in my bank account.

Tom Smoker [00:48:42]: Cool. Okay, this is. That's the answer I want. So a friend of mine called Linus Ekkenstam, who's been a friend of mine for a long time, but he's very popular on Twitter in this stuff, was recently doing a talk here. I got to see him, and not to tell too many tales out of school, but he has some. He's talking about his children. And he was like, they went in a self driving car, right? And they were like, why are you driving? They're young, young children, right? So they were like. And he's like, well, I need a steering wheel.

Tom Smoker [00:49:10]: Like, I need the control. And they were like, you don't need the control. So what does the future look like with agents if you and I are like, well, I'm not willing to give the wheel up, but if the new generation is willing to give the wheel up, then what are these systems look like? Because, like, I don't trust that system, but does someone ten years younger than me trust that system? And are they a market? It's like, yeah. So I actually think expectation is really important with this stuff. And I think we may be at the stage where people are willing to do that as a booking. I don't know. I think so. Tim French, who was my main supervisor, who I talk about a lot, he, I don't know if this quotes his, but he was the first.

Tom Smoker [00:49:47]: I first heard it from him, which is like, in academia, you write drunk and you edit sober. It's like the way to think about it, right? And then you go out for a beer and you're like, you write a bunch, so you remove inhibition. Wake up the next morning, cold light a day, you look at it, you're like, this is the worst thing I've ever seen. And it's just much easier to edit than it is to write in a lot of ways, right? And I think of LLMs as drunk writers a lot, right? They'll produce content no matter what through prompting, and then I'm happy to edit it. And so then if you carry that through to a multi agent system, to a booking, there's a level of drunk writing that I'm willing to edit, right? There is a level of information retrieval of cost of flights, cost of et cetera. So I would be interested in a multi agent framework that is two parts. One is my control and the second is their control. And then personally, I give and take because right now nothing's automatable because no one trusts it.

Tom Smoker [00:50:44]: Right? As a broad point. But some people trust it. There is a generation coming next who's going to trust it. There is a generation who has the most wealth, who won't trust any of it. Right? So. But what does it look like to personalize that? And I think that's a fascinating question of the sliding scale, personally, of how much you trust these systems.

Demetrios [00:51:03]: It's funny that you mentioned that because I was literally today just listening to another podcast about how, like, sometimes removing yourself from your business can be the best thing to make the business grow. And it's almost like the parallel here is like removing myself from this agent or all of these agents can be what actually makes it work and work better in that thought experiment. So I find that. I find that fascinating. I do think that you're always going to have to have the. Well, maybe not always. You're going to want to see you know, like you're going to want to see what's happening as it's happening, just to kind of double check and know that it happened almost. And.

Demetrios [00:51:54]: But I guess you could, one part of seeing it could just be that you get the confirmation in your email and you look at it and you go, all right, cool flights tomorrow.

Tom Smoker [00:52:04]: Then you show up to your first one and it works, right? Again, I don't know how many of these are going to work, but there is a level, we all have a personal level of risk tolerance. And I think that there is interesting products that are going to be built, interesting systems going to be built. A CEO friend of mine who is post series A, I'm going to leave them nameless just in case there's a problem. I don't think it would be, but they were like, my job daily is I'm just a traffic controller, right? So like, I don't do anything anymore, right? And they're deeply technical, but they've got to the point they're like, I'm really just a traffic controller, right? So I'm the most important person in the company because if I leave, everything crashes, but I don't do anything. And it's like, I think about that with these systems. It's like, well, if you are just able, like what level of personal traffic control are you comfortable with? What level are you? Okay, so when you say removing yourself from your business, there's tears to it, right? So you remove yourself from the operations, but you're very necessary in terms of direction and control, whether that direction is immediate and in the moment or whether that direction was months ago through some level of SOP. Like, I still think it's necessary to have that control. And I think these systems are very similar.

Demetrios [00:53:11]: And where have you seen agents in your work actually working? Or have you found one?

Tom Smoker [00:53:18]: Best example was from GBT three. And I think about this over and over and over and over and over again because it's like, how do you, we think a lot about, like our North Star is getting people to production. And so we want to be the factor for that. And I think the answer a lot of that time is some level of services and scoping in to get to the point. But like, what does it mean to put LLMs in an existing revenue stream and speed it up? For example, when we talk about radiology, if you produce a report faster and you get paid per report, then now it's like, well, if you have the master apprentice model, speed up the apprentices, right? Because there was already some work being done in the legal example. Speed up the, speed up the apprentices. It was GPT-3 it was a post and it was, it was a user, I can't remember, but it was called writing GPT, right? This would have been early last year, and it was an individual, I can't remember their name. And it wasn't like a super popular post on medium, but it's going to be somewhere, I'm sure.

Tom Smoker [00:54:11]: Maybe, maybe it's gotten really popular since I read it. But they were a photographer. They got early access to GPT-3 they were able to develop, and they said, I'm going to build several different, they didn't call them agents, they called it like almost like a board of directors. They were like, I'm going to prompt it and say, write me an article about this thing. Then I'm going to have a editor. An editor, right? And they're like, why can't I extend this? I'm going to have a grammar editor, I'm going to have a colloquialism editor, I'm going to have an SEO editor. And then what I'm going to do is I'm going to say I'm going to have another agent that is going to tell me where to put a photo. Where does a photo exist? So then they could be like, what are the top ten washing machines of 2023? Right? And 3 hours later, and it costs a bit of money, but 3 hours later, at the time you'd have these reinforced systems that would communicate with each other, and at the end they get a piece of writing that is colloquial, that is SEO optimized, that is well edited and has good grammar.

Tom Smoker [00:55:08]: Then they would go out and take the photos. So it's an original piece of content now.

Demetrios [00:55:11]: Oh, nice.

Tom Smoker [00:55:12]: And then they would get themselves to the top of like SEO blog stuff and now they're making ad revenue. And so they can do this with anything. They could do with washing machines. You can do tents. That was the coolest example I've, I still haven't seen an example better than that because it was such a simple use and it was like, even if this doesn't work fully, like, it doesn't need to be probabilistically, like, it doesn't need to be perfect, they've created it. I think about this so much as like, the board of directors, how do you do this? How do you reinforce, how often do you do it? But that to me was like a real trigger that I was, and this is before auto GPT. So when that came out, I was like, this is going to explode. And then we never really saw it because of the probabilistic nature.

Tom Smoker [00:55:50]: But that was the coolest to me. That was like a real unlock.

Demetrios [00:55:54]: So talk to me about why, how in these last couple of minutes and just what it is you're doing, because I, from this conversation, I imagine a lot of people can gather that you know, a bit about knowledge graphs, but it's, I think the really interesting thing that you just said right there is you're highly focused on getting people to production.

Tom Smoker [00:56:15]: Yeah. Yes. So, hot take. I don't know if people writing natural language and getting an answer back is the future of this stuff. I think a lot of people are like, I want to get my rag perfect. I don't think anyone's really thought about what happens if they do think everyone's stuck on that problem. And even if I think they get there, I'm not even sure what happened. But there are valuable business problems to solve.

Tom Smoker [00:56:38]: Right. And as a business, you want. I want to work on those problems. Like, I've worked in tech for a long time, but I've worked in a lot of, uh, like heavy industry, as a technical person, as a machine learning engineer in a large mining company. And so there are, they have a high reliability requirement. There's valuable problems, there's interesting problems to solve there. So when we talk about getting people into production, we use knowledge graphs as a means to an end, as part of a broader workflow process, to reliably get back the information that you asked for. Right.

Tom Smoker [00:57:10]: Consistently and reliably and over time. And we do that through. Currently we have a closed beta, but it's going pretty well. We've had that out for a little while. We're releasing features like a couple times a week. We're just listening to people talking to in our discord. And so we're create helping people. It's developer tooling for knowledge graphs.

Tom Smoker [00:57:28]: So it's helping people create these systems. It can give them more reliability. There's kind of broadly two types of production at the moment. One is like, I want you to improve my existing systems, but I don't know if there's that many people who actually have rag use cases as existing systems. I think a lot of people see us as a means to. There's a difficult thing to say, but it means to production. Not to jump on the historical reference of that term, but they're a way to get to a level of production, because without a level of reliability. Without a, without a low bar of reliability, we can't put this system in the wild.

Tom Smoker [00:58:00]: And so we work with graphs because we see them as the weakest part of the overall pipeline. And so we allow people to unlock that part, which then raises everything to the point that now we have a reliable system. And so that's what we're working on.

Demetrios [00:58:19]: And that's how you can help people get into production. That makes a lot of sense. And I do. Man, it's like you're preaching to the choir here, because I've felt so much of that also, where it's like, all right, cool. The rag is fully optimized, but turns out nobody actually uses the chatbot on our website, so great job.

Tom Smoker [00:58:39]: People haven't thought about that. Like, what does it mean to build the perfect chatbot? And it's like, really? What does it mean? And it's like, so, for example, I worked with a large oil and gas company, but back when I was a consultant, before I was doing this business, and I was helping them build a leave policy, like a leave bot. Someone could be like, how many weeks do I get off a paternity leave? It's like, give me the answer. Hallucinated, like, immediately, right? And they're like. And then everyone's like, oh, this sucks. And so then it got scrapped. As an internal project, they spent a bunch of money. We caught it early because I came in as a consultant when it was working, and I was like, this is never going to work, by the way, if it did, like, what did they solve? Yeah, it's like, a little bit like, not much like.

Tom Smoker [00:59:17]: It's not like a massive pro. So I think that's an interesting way to think about this. Like, if rag worked really well. So this is why we talk about multi agents, because if they just ask a question, get an answer, and retrieval, I think he's like, can be augmented. Really well. I love some of the search stuff coming out. I'm really looking forward to the urban AI products that come out as a, as a Mac user. But I do think that that's why agentic systems are the future.

Tom Smoker [00:59:42]: Because if you can reliably create module after module after module after module, then you can just like my CEO, Chia, likes to say this. It's like he was a lawyer for a while, and he's like, most white collar jobs are just like information retrieval and then reasoning and then information retrieval and then reasoning. And it's like that. You can kind of like, you go away, you read some stuff, you make a thought, you give it as a summary. And so, like, it's, it's workflow orchestration, and it's these systems of creation that I think that if Rag was 100% right, which means the agent is reliable, what does it look like to make a agent by agent composed workflow? That I think is really interesting. And I think that part is why people are saying, agents of the future, those modular boxes.

Demetrios [01:00:27]: Yeah. And there's a lot that you said here I'm trying to take in. I think I'm going to have to listen to this one a few times and just soak it up, because it does feel like there was some knowledge dropped on me here. And I really appreciate all of the wisdom that you're bringing to this. It's super cool to see just that idea of, okay, a chatbot can be useful, but how much business value is it actually bringing, or what does it accomplish? Versus when you were talking about, like, the doctor getting their process sped up. And I know there's a ton of tools out there right now where they'll record the conversation with the patient and the doctor, and then like, half or 90% of the doctor's time after they're with patients is just filling in all of the crap, all that paperwork that they have to do to get the insurance claims and whatnot. And so if you can speed that up, then they can see more patients, they can make more money. And that's actual, like, there's very clear return on investment there.

Demetrios [01:01:38]: Not like this idea of, oh, yeah, I now know that I have eleven holidays left.

Tom Smoker [01:01:45]: Yeah, it's like, I don't know, like, I just, I think there are really valuable business. There's a lot of, like, win win automation opportunities in heavy industry. Definitely. I think healthcare is a really, it seems to be an obvious one for, it's a well regulated industry, but, you know, the people we speak to that we get purely inbound, like the variety of, like, you know, earthquake preparation for hospitals in California and stuff. And like, how valuable those problems are and how difficult they are to solve, but how, you know, in theory, automatable they are and how modular they are as problems is like, very interesting with this stuff. So, yeah, I think those are the, those are the interesting problems that we really like to work on. I appreciate you saying wisdom as well. I'm not going to pretend that I have a.

Tom Smoker [01:02:26]: I've just, I work on this stuff a lot, so I have a bunch of comments. I want to thank Paco as well. Who's the one who introduced us. And so that's genuine, genuine wisdom there, I think is the type of person that I really enjoy speaking to. But yeah, that's how I see this shaking out. Maybe not loudly, because I think it's difficult to write a post about it, but I think quietly and valuably is these processes. It's like what was Linux going to be when Linux released it? And it was like the new operating system, what does it do now? Power every server that powers every gpu that powers every. When you think about these agent systems, those are the interesting problems, I think.

Demetrios [01:03:05]: You said, maybe quietly, I'm not going to write a post about it. And I just respond with, hold my beer, let's give back to you. Give back to you in a week. Let you know what I can do here. There's a lot of fodder for me to write some shit on LinkedIn about this. This is great, man, I really appreciate it. And I am going to take away this idea of be exactly wrong, because if you can be exactly wrong, I really like that. That's where determinism starts.

Demetrios [01:03:33]: And you get out of the weeds of this probabilistic. Well, it could be right, it might not be, but we'll see. That is huge.

Tom Smoker [01:03:42]: Some amazing use cases for probability, some awesome stuff happening. But the problems I'm working on, they can't be probabilistic problems.

Demetrios [01:03:49]: Excellent. Well, I appreciate this, Tom, thanks so much for coming on here. If anybody wants to reach out to you, I know that you're on LinkedIn, so we'll leave your LinkedIn link in the description and, or just go to why, how AI and yeah, find us, have a discord server.

Tom Smoker [01:04:06]: We got a discord. We have a newsletter. If you're not a discord person, newsletter is going well. CEO Chi is cranking out content. We're writing as much as we can about our learnings, doing our best to kind of build and show people what we're working on.

Demetrios [01:04:18]: So yeah, that is awesome. I love it. Well, you've got yourself one new subscriber and me, so thank you very much. Thanks a ton for this.

Tom Smoker [01:04:27]: No worries, man.

+ Read More

Watch More

Building for Small Data Science Teams
Posted Dec 19, 2021 | Views 623
# Spothero.com
# SpotHero
# ML
Managing Data for Effective GenAI Application
Posted Mar 05, 2024 | Views 475
# Generative AI
# Data Foundations
# QuantumBlack
# mckinsey.com/quantumblack
Towards an Automated R&D Workflow for Edge AI Systems
Posted Sep 23, 2022 | Views 772
# ML Workflow
# Edge AI Systems
# SightX AI
# Sightx.ai