Sign in or Join the community to continue

Advancing GraphRAG: Multimodal Integration with Associative Intelligence // Amy Hodler & David Hughes // AI in Production 2025

Posted Mar 14, 2025 | Views 424

# RAG

# GenAI

# GitHub

Share

speakers

Amy Hodler

Executive Director @ GraphGeeks

Amy Hodler is an evangelist for graph analytics and responsible AI. She’s the co-author of O’Reilly books on Graph Algorithms and Knowledge Graphs as well as a contributor to the Routledge book, Massive Graph Analytics and Bloomsbury book, AI on Trial. Amy has decades of experience in emerging tech at companies such as Microsoft, Hewlett-Packard (HP), Hitachi IoT, Neo4j, Cray, and RelationalAI. Amy is the founder of GraphGeeks.org promoting connections everywhere.

+ Read More

David Hughes

Principal Solution Architect - Engineering & AI @ Enterprise Knowledge

David Hughes is the Principal Data & AI Solution Architect at Enterprise Knowledge. He has 10 years of experience designing and building graph solutions which surface meaningful insights. His background includes clinical practice, medical research, software development, and cloud architecture. David has worked in healthcare and biotech within the intensive care, interventional radiology, oncology, cardiology, and proteomics domains.

+ Read More

SUMMARY

Integrating graphs with RAG processes has demonstrated clear benefits in improving the accuracy and explainability of GenAI. Graphs enhance the semantic capability of vector searches with more global enrichment and domain-specific grounding. The increasing adoption of GraphRAG reflects its value as shown in numerous blogs, GitHub projects, research, and formal articles. But graphs are fiddly and an iterative approach is almost always required. Today’s GraphRAG approaches focus on text and lexical graphs. However, non-text data is dense with latent signals that we currently just toss out. Integrating information from images and audio would prove an extremely rich layer of context to agentic workflows. The next major advance in GraphRAG, will be incorporating all the semantic signals latent in images and audio. This session focuses on multimodal GraphRAG or mmGraphRAG. mmGraphRAG represents a transformative step forward in bridging multimodal data through innovative search and analytics frameworks. We’ll demonstrate how integrating the semantic richness of images and text with the contextual reasoning power of graphs, mmGraphRAG provides a comprehensive, explainable, and actionable approach to solving complex data challenges. You’ll learn how to incorporate images into GraphRAG and customize graph schemas as well as search that combines visual elements. We’ll walk you through the high-level architecture and the use of associative intelligence to transform search and analytics. Notebooks that illustrate creating embeddings and creating a multimodal graph from image decomposition will be provided so you can explore how mmGraphRAG can be applied to specific domains. We’ll also leave time to discuss the implications of adding graph pattern analytics to images.

+ Read More

TRANSCRIPT

Click here for the Presentation Slides

Ben Epstein [00:00:04]: We're going to jump right in to our next talk. We're taking a little bit of a shift, but we're very excited about it. This is actually like one of my favorite topics in the entire LLM space right now. Graph systems, Graph structures. It is a space that I have probably the least amount of knowledge in and have been looking deeply to find the most about. I've listened to a bunch of talks, been trying to gather a bunch of knowledge here because anybody who's working with LLMs has tried building rag systems, has tried building retrieval systems. It's always really cool at first. It doesn't really scale that well.

Ben Epstein [00:00:36]: You end up building these connections in nodes. The classic story. A lot of people have tried this and a lot of people have failed. So we're very stoked. We have with us David Hughes from Enterprise Knowledge and we have Amy Holder from Graph Geeks. As you can imagine, a lot of graph stuff there. A lot of knowledge information. Welcome both.

Ben Epstein [00:00:55]: Thank you guys for coming on.

Amy Hodler [00:00:58]: Awesome. Thank you for having us. Super excited. Thank you for having us and AI in production. So excited that we have another graph fan with us. So I've been in the space. Amy Hodler, founder of Graph Geek, has been in the space about 10 years and here with my colleague David Hughes from Enterprise Knowledge. And so we're.

Amy Hodler [00:01:21]: Oh, and David, you're on mute, so I'm going to keep rolling. Take yourself off mute so that we can hear you as we go forward. And we're going to talk about actually advancing Graph Rag. A lot of us are familiar with Graph Rag, but how do we actually. What's the next evolution in it? And we're going to be looking at multimodal integration. David, can you check to make sure your mic's on? Can we hear you?

Ben Epstein [00:01:45]: Not quite.

Amy Hodler [00:01:46]: We can't hear you yet. Okay, so I have the first section anyhow, so I'll go ahead and get started and then we'll. We'll pick up David here in a moment. So this is one of my favorite pictures and I, I hope all of you get a chuckle out of this as well. I did the first time I saw this. So there's a lot going on here and there are things in this image that we can't see, but as humans we actually can infer. I don't know how many of you, well, have kids or, or maybe have been kids, but you know that there's something else going on here that an adult wouldn't do. But what if we just wanted to find a similar image so we're trying to find something that's kind of similar to this unusual picture.

Amy Hodler [00:02:26]: And I'll say that we could search, we could look at a Google search, but AI really doesn't get this image or what is so funny about this? And it's, it's basically, we could iterate on this with an LLM option like over and over again, but that would be expensive and the results wouldn't exactly be that great. So when I did do a Google search, I got a whole bunch of pictures of a sink and like, no, you know, no dinosaurs. But even if I were to search, you know, dinosaur and sink, I probably wouldn't get something exactly like this as well. So. And the other thing is just thinking about the human ability to understand the, what else is going on here? What, what's not being stated? When I looked at Claude, they thought there was a joke here, so they got the humor, but Claude actually thought it was a joke about a T. Rex having short arms and not being able to reach the faucet, which is kind of humorous in and of itself, but not exactly what we're looking for. So LLMs and graph rag together were supposed to help us with search and solve this problem. So our systems today, but it, it hasn't.

Amy Hodler [00:03:44]: So our systems today and Graph Rag today are really overwhelmed by multiple types of data. We have visual and textual data that are often silos. They're not really interweaving this data together. So there's some description of this, plus there's a visual. Be nice to get those interwoven together. So we get an image, text, relationships when we search, which means if we can't have that, we get incomplete, less accurate or maybe less relevant decision or results. And if you're a decision maker, that really means your decisions become silos and you don't get the holistic context you're looking for. We also lack tools that can really explain this multimodal query that we want to do.

Amy Hodler [00:04:28]: Like there's text, there's image, someday, audio that we would want to query together. And we can't really, if we can't, if we have vision search, for example, without interpretability, if we can't really explain what's going on and why things are found. The results also lack explainability. And then there's this, the basic inability to identify trends and patterns within images and within multimodal data as well. For example, what's the relationship between the dinosaur and the toothbrush? What's the relationship between the dinosaur and the running water, which is Interesting element here, but this all means that we don't really have a systematic way to understand the full context of multimodal data and search associative detail. How is one thing associated to another and to then infer meaning, which is really what humans are doing. When I laugh at this image, I'm inferring that there's a kid that stuffed this in the dinosaur mouth and ran out because it didn't want to brush its teeth. So humans do this and without that ability to understand that context, we're not going to really have systems that are able to really improve search.

Amy Hodler [00:05:47]: So what we are talking about here is multimodal graph rag to help get associative intelligence. So for example, if you do a Google search on bananas, you'll get lots of bananas. Bananas on a table. Okay, we get lots of that. But which of these three images are a better match? And actually the two on the edges are actually pretty good of match. They're images of bananas on a table. The one in the middle just happens to be a banana within a whole bunch of other stuff that maybe I don't care about. So how do you evaluate what the better match is? And this is when we're talking about current Graph Rag doesn't really break down components for a match that you can then explore texture, spatial quality, quality, placement.

Amy Hodler [00:06:33]: What is the edges, you know, of the, of the image that I'm looking at. And it doesn't work well for anything that Google doesn't actually have access to. And so this idea of being able to reason across multiple levels of abstraction is really important for associations and multi map. Multimodal graph rag breaks down these components so that they can be explored by, you know, spatial placement and texture and different elements, both individually the subcomponents, but also together. And it's really the blending of these, this context that's semantically loaded with data, whether it's visual or textual or other, that is enabling that reasoning across those multiple different levels, which is how humans actually reasons, which is why we know there's a lot going on in the image of the dinosaur than. Than what we're just looking at. So, David, I'm hoping your mic is back on and we're ready to roll with you. And you can, you can jump in here.

David Hughes [00:07:36]: I don't think so.

Amy Hodler [00:07:38]: Awesome. Great. And I think, I think that's all we've got here. So, David, do you want to actually talk a little bit about the role of images in rag?

David Hughes [00:07:51]: Can you hear me?

Amy Hodler [00:07:53]: We've got you. You Sound great.

David Hughes [00:07:55]: Okay, good. Well, I'm glad everyone can hear me. When we started looking at how can we go and create a system that allows us to do associative search in the domain of video or images and what we're working on right now, which is audio, the first thing that came to mind was let's take that first naive approach of creating and embedding for images and looking at those results. We used the clip model from OpenAI as a base and we used language models. And in this case we have a preference for small language models as you'll see in some of our architectural diagrams coming up to create a caption from a given image. Here we were using the LAVA model and you'll see why here in a moment. And we created embeddings. These are about 523 dimensional embeddings.

David Hughes [00:08:52]: And then we looked at those in a three dimensional space and here you see on the right hand side of that the representation of some clustering of images in their captions. It's really tiny to see. You'll have to trust me when I say that it says a bunch of bananas and that all of the images that are around it are truly about bananas. So next slide from there. The next intuitive step for us was could we decompose a image into its component parts and start thinking about that as a graph? We used that language model, our vision model. This was lava to start experimenting on segmentation. Could we pull out the dinosaur in this image, the faucet, the toothbrush? Can we start looking at things like color and predominance of color textures, the complexity based on the edges found in an image for the objects for that segmentation. Could we create a spatial analytics, a small world network of how the objects inside of an image relate to each other? The dinosaur and this one is to the left of this faucet with all of that information.

David Hughes [00:10:09]: The intent is to create a UI for users to put in a natural language question, to find the semantically similar images in the embedding space and then to use those as the root nodes for a query into a graph space brought back to a language model that then can reason on all of that rich latent signal found in an image and not only provide these are the images that you're looking for, but more importantly the interpretability and the explainability. This is an image that indeed does have a dinosaur in it, but the dinosaur is very predominant and is to the left of the object you asked for. So that's foreshadowing into what we'll go into architecturally and what we can accomplish with this.

Amy Hodler [00:10:54]: AMY yeah, so let's talk a little bit about more into a multimodal graph rag in particular to take a step back and say what's, what's really going on here. So multimodal graph rag, we call MM Graph Rag in a nutshell, there's, there's four ish steps. The first one is really to decompose the image. So we're going to shred it, we're going to break it down into sub components and generate some captions. Then we actually put it together in a graph. So each of the components and subcomponents are loaded into a graph to preserve the content. Because context of how things are related are obviously pretty important. Then we have an embedding space with anchors for global search and this is really the basis of our semantic search.

Amy Hodler [00:11:45]: And then we have the multimodal context, so another layer of context. So this provides blended results between text and image data that is then presented to LLMs so that they can do reasoning with it. And those things together are what give us the associated intelligence. So with MM Graph Rag we really end up providing this kind of almost a hybrid knowledge space which you can then search and reason within. And what does that really give us? One of the I. Lots of things, but I get really excited about high fidelity reasoning within components here. We're talking about images. Are there ships similar with similar markings or surface wear nearby at this image? Can I look for other images with ships that are moving together or apart? Can I understand that? Can I get not just contextual understanding, but more object reasoning and recognition? Can I get more nuanced similarity? Can I add in graph reasoning which would would say things like what's the most important subcomponent in these images? And then as we talked about before, explainability as well.

Amy Hodler [00:13:04]: So that high fidelity reasoning within components and then among sub components as well. And what does that allow us to do? There's just a ton of use cases. So we have things like geospatial analysis, as you would imagine. If I want to find all the buildings with red roofs near water, near a port, you know, I want to be able to search satellite information and aerial images with specific features, maybe specific time or locations. Obvious some surveillance implications as well. If I want to find a similar object based on context or, you know, some kind of spatial relationship, a medical diagnosis. So perhaps I'm looking for images with specific markers or abnormalities based on, you know, annotations and the features, maybe the Roughness of an edge and then things like IP search, intellectual property search, which I find really fascinating. So can I compare new designs against existing patents and images in existing patents to look for similarities in the past? Recommendations? I like this, this quote here.

Amy Hodler [00:14:14]: Can I find cheap shovels with a yellow handle that perhaps I can get shipped nearby? And being able to understand that the handle's yellow if it's not already described, maybe it's just in the image. And then as you can imagine, historical and archival data management. So maybe I want to catalog images from archives based on complex visual features. I think this is particular important right now as think about, you know, pre and post actions that we might have to do, rebuilding in wartime and things like that. So lots of use cases there and many more, you know, such as gaming, virtual reality, education, research, you know, design, you know, students as well looking, you know, looking for patterns. So lots to do there. And I think David is going to dive quickly into architectural diagram as well.

David Hughes [00:15:08]: Quickly indeed. But let's get into the architecture of the system that was built here. Starting from the left hand side. As I mentioned, we have a vision model that is helping us not only create those embeddings, but first thing it needs to do is help us understand the image by creating a caption for it. So a language model creates a caption and then that is used downstream with an embedding model. And that is clip in our case to take both the caption and the embedding of the image and create that as a single embedding and store that into a vector database. Here you see that we're using Lance db We're very big fans of embedded systems for most use cases. And you'll see that we're also using an embedded graph database.

David Hughes [00:15:58]: Kuzuk just both of those are great systems to use for not only prototyping but then going to production. Once we have that embedding in the vector database, going back to the left hand side, we use that same vision model to decompose the image. And it's important that we're not just pulling out things like the colors, the predominance in that we are pulling out textures, things like that. But those spatial analytics that comes from the fact that we get bounding boxes from the segmentation of objects in the image. We can do two things with those, or at least the two that we're using. One is we can use the bounding boxes center point to determine what is to the left, right, above, below, overlapping. We have that capability to build a small World Network. But there's another thing you can do with bounding boxes.

David Hughes [00:16:51]: You know, the overall dimension of your image. You can take the bounding box of any of the segmented objects and now empower a language model to understand is this a picture that has bananas in it? Or because it takes up the majority of the overall dimension of the image, it is a picture of bananas. So we take all of that and we take that decomposition, we do some further processing on predominance and scoring, both quantitative and qualitative, and we store all of that in an embedding or into a graph database. Kuzu and then we we leverage the power of agents using BAML and we'll see what that looks like in the next slide. So here we have orchestrated an entire system, an agentic workflow using baml. I highly recommend that you learn more about this language for working with AI. And that is BAM link at the 12 o'clock talk on this same stage, I think by Bhaibov, the CEO of baml. But here we believe in the the architectural paradigm where agents should be composable into overall workflows.

David Hughes [00:18:06]: Agents can use small language models that have an affinity for the task and the atomic tasks that they are working on. There's a collection of agents and that is your gentic workflow. There should be things like tool usage, dynamic tuning. Explainability is going to come from the graph rag portion of this and that. It is an entire ecosystem. That is the paradigm of how you can take something from a prototype and actually move it into production. In the next slide I think what we'll do is we'll hand off and look at what is that semantic layer. And here what we're looking at is the semantic layer is coming both from the fact that we have an embedding space that was used with clip and that is giving us the ability to ask natural language questions that do have that semantic reasoning capability.

David Hughes [00:19:01]: Here you can see that we have selected a bunch of bananas on a wooden table and in fact we can see other images that are also related to wooden tables and bananas. In the next slide what we'll start seeing is what does that look like in that graph component though here the yellow nodes are actual images and the brown nodes are the small world network of the objects found in those images. The pink nodes just happen to be a representation here of what that instance is of an object. More importantly though, on each node in our graph we have a rich collection of properties. You can see some of those on the left, but even on the Relationships, we have predominant score. So, for example, an image is using the color red, but we actually have both a quantitative and a qualitative measure of how much is it using that color red. So that is where we are with the richness of semantics. Giving all of that information back to a language model, though, is where the power of this whole system comes.

David Hughes [00:20:15]: If a language model not only sees that these are the most similar objects, but now can start to understand our images, it can now start to understand these are the objects in the images, how they relate to each other, the colors that are used, the complexity, the textures. It now can generate a response that says, not only are these the images, but of the top 10 that you asked for, these three are the best, and here's why. And that's what multimodal graph rag is doing. So where are we going moving forward? I mentioned that we're talking about audio and that's, that's one thing that's more at this higher level, but you can actually decompose an image one more level and you can decompose an image into its pixels. If you think about a picture has pixels, you can then, in this case a medical image, start to think about medical images, have three dimensional pixels called voxels. You can do community detection and represent lesions or tumors as a collection of voxels that are next to each other based on their intensity in the image. Then you can start to ask questions about the neuroanatomical location of a lesion or a tumor. And if you have a graph that then says this structure, the cerebellum affects balance.

David Hughes [00:21:38]: And now you have this community of nodes representing a tumor that is encroaching upon the cerebellum. You are likely to see symptoms of balance in this particular patient. You can do temporal evaluation of is this lesion progressing and getting worse, all as a matter of turning a image of a brain into a graph or a brain graph. All right, so that's our future direction and we'll close out and open this up for some questions and hopefully some good answers. Here is our contact information. Amy and I do a lot of collaborative work. Here's our independent groups that we work with, but also just reach out to us and we can just set up a time to talk to you as both Amy and David. All right, what questions do you have, Amy?

Ben Epstein [00:22:28]: David, I have this. I mean, I guess I'm not surprised because this is the AM production conference, but this was without a doubt the best graph talk I've ever heard. I every single time I've heard I've listened to a graph, talk, graph, rag graph, any type of system. I have consistently walked away and thought like okay, I get it, but I don't get it. I don't get how it works, how it actually gets embedded in the system. And I think I actually get it. So that was really phenomenal. I also just want to call out that my company's architecture is really similar to yours.

Ben Epstein [00:23:03]: I also love Lance db. One of my favorite tools. Baml is my single favorite tool that I've ever worked with in my entire life. Their family is going to be on our track. Like David said later today, track three, make sure you check it out. It is a game changing tool for working with LLMs. I have a lot of questions but I want to make sure the community gets their questions first and mine are kind of related. The top one is are there any tips? You actually covered this a little bit, David, but are there any tips for prompting vision models for image segmentation like you did for multiple multimodal search? For example, did you use different prompts to extract different aspects of the image? Color charts, fruits? I know you sort of covered this, but like maybe a little bit more detail.

David Hughes [00:23:46]: Sure. So my first approach was prompt engineering, much like you may be experimenting with now. I found that two choices really accelerated the quality of what I was getting back. One, I didn't see a difference in a lot of the big language models, so I reverted back to using lava, which is a local model I can run on Ollama. I'm a big fan of small atomic models for small atomic tasks, so that would be my first recommendation. Second is I would migrate away from prompt engineering and I would start using constrained outputs in discrete deterministic systems using BAM link. That doesn't go into a lot of detail, but BAML gives the ability to describe types for the data shape and topology that you want back from your language model. Language models can reason and understand that better.

David Hughes [00:24:42]: And plus BAM will implement something called schema aligned parsing, which guarantees that whatever the language model gives you, it's going to be in the shape or topology that you need. There's so much more it does, but those are the two recommendations I would make or reach out to me and we can talk.

Ben Epstein [00:25:00]: Yeah, the transition for me from prompting to schema defined outputs was so like obvious but like it was always intuitive. But then I built a system like it so much worse at a company before. And then seeing bam, I was like this is everything I was thinking it would be but like built you know correctly. And so I agree. But there's a follow up there which is the answer is kind of yes, but maybe if you have any extra thoughts around it, which is does it make sense to build an ontology of an image and to maybe relate around that? Like one of my questions was, you know, like in my system, especially when using baml, I'm being as strict as I can to define the entire scope of like what types I want, what I'm trying to extract, how I think about those structures. How much are you letting the models determine what those structures are? Obviously like in production, if you're using baml, you aren't. But in the experimentation, almost like engineering phase, how much are you letting the models tell you what that ontology is versus like strictly defining it at the beginning?

David Hughes [00:26:02]: I think that that's actually a bigger graph question. And actually it's one of the areas that enterprise knowledge and I know that Amy and her work with graph geeks, we focus on a lot and that is when starting with any particular challenge, an ontology should come first. It often isn't. And I think that's where a lot of projects go from prototype and then move into trying to go into production and begin to really struggle. It's because they didn't make that investment into understanding ontology first. So to the, the, the question, I would respond develop an ontology from there. Yes, create your types in BAM link based on your ontology and the composability of those are going to be driven by that ontology. Language models can attempt to do ontology extraction.

David Hughes [00:26:53]: There's even some great companies like Lytra who are, and I think I pronounced that right in apologies if I didn't, who are trying to use language models to extract ontologies. But I really do think it's the subject matter experts in the work of ontologists and taxonomists, if you can get access to them, really are worth the investment for any kind of AI project.

Amy Hodler [00:27:13]: And I think we're going to see ontologies for both images and sound develop for graphs, very much like schema.org has done in a lot of different areas. I think we'll have image ontologies for medical and you know, image ontology for, I don't know, vacation parks, whatever, whatever it might be. And we'll probably see that happen in audio. The more we integrate this stuff into graphs, the more we're going to need to do that so we can then collaborate and do cool things like graph algorithms of image subcomponents. So I think that's just kind of a natural evolution.

Ben Epstein [00:27:47]: I love that I'm going to keep the short because there's so many more questions. I love that framing of it because the way that I've been thinking about this for a really long time is that people are getting a little bit lazy with LLMs where they're like, oh, give the LLM the problem, let it solve it. And we've forgotten to do product work. So from where I like play, it's really in the same way you define the graph ontology. I'm saying, well, define like the product problem, like what is the job to be done to find that out? First build those structures, do the really hard thinking and then the LLMs become a lot more performant and like a lot easier to test and a lot easier to evaluate because you know what you're looking for. Seems like the same thing here. Like you can't just. We saw, you know, a year, a year and a half ago those, those open source projects where it was like, give the LLM an episode of Friends and it'll build a graph round.

Ben Epstein [00:28:35]: It's like cool. But what, like, what does that do? So I like that you're, that you're still pushing really hard on subject matter experts and building ontologies and that upfront work which makes the downstream work so much more, you know, performant and impactful speaking.

David Hughes [00:28:51]: Go ahead, go for it.

Ben Epstein [00:28:52]: No, no, please, no.

David Hughes [00:28:53]: I was just gonna say that actually, that what you just said defines a prototype and a project in an experiment versus you're building something to go to production.

Ben Epstein [00:29:02]: Yeah, exactly. I mean, and for anybody building startups that are afraid that you're like an LLM rapper or chatgpt rapper, if you're doing that product work and you're thinking about the problem or the vertical or the system, you're not a rapper. Like you're being empowered by the tools underneath. But you're doing all this product work that other people aren't doing. That really becomes a differentiator. A cool question that I didn't consider, but it's a great one, is how computationally expensive are these graphs to query? And I would also say how similar, how expensive they build.

David Hughes [00:29:33]: Well, let me go in reverse to build them. If you're using embedded systems, you're saving a lot of time and work and engineering using something like Kuzu for the graph and Lance DB for the vector database because they're running as part of your program, so they're embedded. You're not running them on a separate service or a separate vm. So there's a lot of acceleration and mitigation of cost and engineering complexity. The query complexity because you first are doing that associative search into a vector database to pull back some top K, let's say five images. You then get the metadata on those results which are the IDs for the nodes in your graph and those become the root or the anchor of where you start your query. So you're actually getting a very strategic and atomic approach to your graph query. And that limits the complexity of trying to do this big graph global search.

David Hughes [00:30:37]: You can do follow on queries on that like these are the top images. Let the language model understand and explain why and then the follow on query might be. And if you liked these images because you took that graph out, put it into network X, did some community detection and then brought all that back and did scene analysis, you could say you might like these other images even though they don't contain bananas. They have the same kind of scene composition. So there's so much more you can do with that. The last thing I'll say is I do think ontologies are critical and yes people have used in the past this concept of just, just let the language models do it all. I think with ontologies, graph structures and associative search, even some of the self optimizing systems that are trying to figure out how to solve more complex problems using some of these new architectures like do new IO if you have an ontology and all of this structure in place, those are always going to be more successful than if you just do the naively. I want to do this.

David Hughes [00:31:37]: Go do it.

Ben Epstein [00:31:41]: Yeah, that's, that's really interesting. I have to think a lot more deeply about this. I have to keep thinking, thinking harder about this is very interesting. I want to make sure that we have time for. I think the last question we'll get to. I almost am certain that the answer is going to be yes. But I'm curious how you think about it. Does it make sense to combine this, all of the things that you're extracting and all this information with physical knowledge like laws of motion, thermodynamics, maybe an option to connect the embodied world with the cognitive one.

Ben Epstein [00:32:10]: Very cool question, Amy.

David Hughes [00:32:12]: You want to start off with that one?

Amy Hodler [00:32:15]: I think the, the answer is yes, at least in my opinion. Of course there's, there's some challenges there, but I think the, you know, David and I occasionally will talk talk about should we be calling this sensory rack or sensory intelligence? Because when we talk about going from text to image. The ne. The next step is audio, which I believe David has a bit, you know, is working on. I'll just say that and then you're going to talk about motion and video and being able to understand that's going to be important. The more you can add in multiple senses, the, the richer your. Your graph is going to become, the richer your associative search is going to become more nuanced. So I think that's a natural evolution.

Amy Hodler [00:32:59]: I think there's a lot to be done to get there, but it is definitely directionally where you would want to go. Embodiment is a whole big topic in and of itself and I'm not sure we could tackle that ourselves, but, but I think going in that, that direction. David, what are your thoughts there?

David Hughes [00:33:18]: I would agree entirely with everything you just said, Amy. I do think looking at that whole sensory intelligence component missing right now and there's so much signal that is just being left on the table because it's not incorporated. And that's the intent of efforts like Multimodal Graph Rag is to take in and try and do a better world representation of information and data than just the, the, I think very constrained, narrow view of text.

Amy Hodler [00:33:50]: Yeah. Graphs are masters of context. That's what they do. And so the ability you add in another data type another time element and you just get more.

Ben Epstein [00:34:05]: I'm putting your QR codes back up on the screen for anybody who's joining a little bit late and wants to learn a little bit more. This was an amazing conversation. I hope to see both of you in the Bamble discord. I'm incredibly loud over there, asking so many questions and bothering by Bob all at all hours of the day. Dave and Amy, thank you both so much for coming on. This was, I mean, a really fantastic talk.

David Hughes [00:34:27]: It was a pleasure. Thank you so much.

+ Read More

Sign in or Join the community

Like

Comments (0)

Popular

Watch More

Challenges of Working with Voice AI Agents // Panel // AI in Production 2025

Posted Mar 14, 2025 | Views 471

# Voice AI

# AI Agents

Transforming Healthcare with AI: Automating the Unseen Work // Shaun Wei // Agents in Production

Posted Nov 26, 2024 | Views 1.3K

# Healthcare

# HeyRevia

# AI Agents

Small Language Model - From Experiments to Production // Joshua Alphonse // AI in Production 2025

Posted Mar 14, 2025 | Views 310

# LLM

# AI Apps

# PremAI