MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Integrating Knowledge Graphs & Vector RAG for Efficient Information Extraction

Posted Sep 30, 2024 | Views 205
# RAG
# Knowledge Graphs
# Vector Retrieval
Share
speakers
avatar
Nehil Jain
MLE Consultant @ TBA

Hey! I’m Nehil Jain, an Applied AI Consultant in the SF area. I specialize in enhancing business performance with AI/ML applications. With a solid background in AI engineering and experience at QuantumBlack, McKinsey, and Super.com, I transform complex business challenges into practical, scalable AI solutions. I focus on GenAI, MLOps, and modern data platforms. I lead projects that not only scale operations but also reduce costs and improve decision-making. I stay updated with the latest in machine learning and data engineering to develop effective, business-aligned tech solutions. Whether it’s improving customer experiences, streamlining operations, or driving AI innovation, my goal is to deliver tangible, impactful value. Interested in leveraging your data as a key asset? Let’s chat.

+ Read More
avatar
Sonam Gupta
Sr. Developer Relations @ aiXplain

Sonam is a data scientist turned developer advocate.

+ Read More
avatar
Matt Squire
CTO and Co-founder @ Fuzzy Labs

Matt is CTO and co-founder at Fuzzy Labs, a consultancy dedicated to using MLOps to help technical teams get the most out of AI and ML. He enjoys AI, bio-inspired computing, and functional programming.

+ Read More
avatar
Valdimar Eggertsson
AI Development Team Lead @ Snjallgögn (Smart Data inc.)
avatar
Binoy Pirera
Community Operations @ MLOps Community
SUMMARY

In our September 12th MLOps Community Reading Group session, live-streamed at the Data Engineering for AI/ML Virtual Conference, we covered the paper "Hybrid RAG: Integrating Knowledge Graphs and Vector Retrieval for Information Extraction." The panel discussed using hybrid RAGs to pull information from unstructured financial documents. The paper's approach combines knowledge graphs with vector-based retrieval for better results. We critiqued the context-mixing methods and lack of graph pruning techniques. Overall, it was a solid session with great insights on improving financial data extraction using hybrid models.

+ Read More
TRANSCRIPT

Binoy Pirera [00:00:00]: Okay, let's get started. So once again, everybody, welcome back to the September edition of the Mlops community reading group. So we meet once a month, if you weren't here last time, we meet once a month to discuss and, you know, dive deep into the latest research in the AI and ML space. And we had a really good time last week. We have people from all over the world covering pretty much all the time zones joining us and from various different fields and in various different experience levels, coming here to give us opinions and sharing insights. It's really awesome to have you all here. And today we'll be discussing another cool research paper. I've just dropped the link.

Binoy Pirera [00:00:40]: Yeah, and we're also live streaming to the data engineering conference virtual conference that's happening right now. So if you're joining, if you're watching us from there, come say hello in our slack channel. We have a reading group channel. Just come say hello. We usually keep the conversation rolling after the session, as usual. We have a bunch of amazing community members joining us to host the session. And, well, obviously we got Sonam, as usual. Zonem is from developer relations at AI explained.

Binoy Pirera [00:01:15]: We also got Matt Square, he's a CTO and co founder Athenae, I'm at. Thank you for being here. And Waldemar Eggston, he's an AI developer at smart data. And also, Nahel's running a few minutes late. Pretty early for him. Nails, the co founder at an AI startup that's still in stealth mode. So we have a solo lineup of presenters for you today. So at any point, if you guys want to share your opinions, insights, whatever it is, please unmute at any time.

Binoy Pirera [00:01:44]: And when I. We'd like to keep things fairly open ended. So without further ado, over to you, Sonam.

Sonam Gupta [00:01:52]: So, I don't know. Well, first of all, welcome, everyone. Good morning. Good evening. Wherever you guys are from. It's early morning for me, I can promise that. And so, okay, so I don't know if you guys get a chance to read the paper, but, you know, it's rag, and then Rag Bin has been the talk of the towns. And since it came out, there have been a variety of versions of.

Sonam Gupta [00:02:15]: Of rank that companies and researchers have built around. So now the question is, does everybody understand rank? I mean, I'll be honest, I'm still learning about it. And with this paper, and in today's session, we are discussing the paper called hybrid rank integrating knowledge graphs and vector retrieval, augmented generation for efficient information extraction. Now, in this paper, Thiber drag combines the strengths of both vector retrieval and knowledge graphs to create a more accurate and efficient information, sorry, efficient system for information extraction. So now going on to the motivation, what was the motivation behind this research? It basically arises from the challenges faced by financial analysts when they try to extract the valuable insights from this unstructured documents, such as earnings call transcripts or reports. When they took, the good thing about the paper that I found is that they took a very domain specific example and they worked through it. So it's quicker to, in my opinion, it's easier to understand when you take a concrete example, but it has its own disadvantages and we can discuss that later. Yeah, okay.

Sonam Gupta [00:03:39]: So traditional models are like, what was the challenge? Why did they want to figure it? Like, why did they come up with hybrid drive? So, traditional methods or models, even with the help of drag techniques, they struggle to handle the specialized terminology or any complex structure found in these documents. And financial documents are definitely crucial for industries when it comes to decision making. But LLMs face issues like hallucination, as we are all aware of and lack of context. So like, while vector rag helps by retrieving text, helps with the whole similarity, the contextual component of it, it doesn't fully account for the hierarchies that are involved in the document or the data and the nuances of these documents. So this is where the knowledge graphs comes in. Like knowledge graphs, on the other hand, they represent information as entities and relationships, as we discussed earlier, which can help overcome some of these limitations. Now, combining all of this, the authors introduced hybrid drag to improve this process by using both knowledge graphs and vector retrieval to enhance this, the accuracy, the appropriateness, or the accurate results from a question answering system, for example, especially when you're dealing with financial data. And specifically, they demonstrate this using earnings call transcripts, which the format they are in.

Sonam Gupta [00:05:16]: They are structured as q and a pairs, and this is where they show how hybrid drag could outperform traditional methods in extracting relevant and accurate information. Now, this is also like, you know, around the motivation, why they wanted to solve, and they took this domain specific example. But going deeper into the methodology, starting with vector rag, I briefly introduced it like what it is, but, you know, if you see the paper in the methodology section, they describe the vector rag in more detail. They describe it where this approach, as you know, it's the approach where external documents are divided into chunks and then converted into memberings, which are of course, the numerical representations. The system retrieves the most relevant chunks based on the similarity to the user query. And now these chunks are used as context. For the aluminum to generate responses, there is a whole another. We can take another session talking about chunking, I'm sure, but for this one, just, you know, the brief, I'm going to talk about it briefly.

Sonam Gupta [00:06:26]: So now what is the limitation? If you're just using the vector rag, it often lose the hierarchical structure, especially if you're talking about financial documents. And it could lead to less accurate retrieval of the context. And moving on to like, you know, how you can construct the knowledge graph. There's not much mentioned in the paper, in my opinion, about this whole, how they're constructing it. But what I understood from it is that this process involves knowledge extraction and knowledge improvement. Now, what is knowledge extraction? It's basically identifying what are the entities and what are the relationships from this unstructured data using different NLP techniques. And now what is knowledge improvement? Knowledge improvement is basically you are refining the graph by removing if there are any redundancies and then completing the missing links, and then fusing information from multiple resources. Now, the author describes how they used LLMs to extract entities like companies or financial metrics, etcetera.

Sonam Gupta [00:07:37]: And relationships are like company CEO or product launches and so on. From these transcripts, which were then structured into, you know, the, there's, I think, a knowledge graph specific terminology, subject, predicate, object. Now there's more like, you know, they talk about the graph rag as the next step, but I'm going to pass on to Valdimar to discuss further.

Valdimar Eggertsson [00:08:06]: Yes, thank you. Okay, so I was going to continue where Sonam lapsed off. We have graph rack. We talked about Vex two erratic. But graph rag is a hot new topic that I've noticed recently. It's one of the reasons I was interested in this paper, because I haven't really tried it. I've worked with knowledge graphs a bunch. I'm really interested in this intersection of structured knowledge.

Valdimar Eggertsson [00:08:33]: And these statistical methods can help with reliability and yeah, to have some kind of symbolic knowledge behind it, and that's basically the essence of this paper, is just injecting knowledge from knowledge graph into the context of a language model. To get that, get slightly better answers effectively. I'm sharing my screen here. You see my mouse, there is a schematic here showing how they constructed the graph. Now we look at, we start with these earning reports, which I'll get into in a minute. I'm just going through the paper section by section. So they got like a bunch of documents, not that many though. And the goal is to get the graph out of it and with an artist graph, you can do lots of things.

Valdimar Eggertsson [00:09:27]: You can use graph algorithms, like measuring properties of the network and whatnot. You start with texts. They have a prompt for text preprocessing, which is the first step to get a processed report, which is just an abstract version of the text with not as many redundancies. And then finally they process that again to get the actual trifflets or the subject, the predicate object statements, which are entity relationship connections, which build up the graph. And that can help with answering questions, which is the ultimate goal, is just to ask their LLM questions about the data and get a reliable answer back. So LLMs can sometimes struggle with these kind of relationships, but the parent of the son of these kind of questions that are easily found in the graph, just hopping a couple of hops in the graph by adding like, if you're asking a question about a particular company, you can fetch all of the owners of the company, or the partners, employees or whatever, and have that, all the connections there in the context, that's effectively what they do. So a subgraph, which consists of the relevant nodes and edges, is extracted from the full knowledge graph to provide context. And then that's just put into the, into the LLM.

Valdimar Eggertsson [00:10:57]: So it's a different way of searching instead of similarity search based on the entities and relationships. And then is there a technique which is just, hey, let's try combining the traditional rack and the graph rag into like, into the same context for that Lm, what they call the amalgamation of the two contexts, which allows them to leverage the strength of both approaches. The vector ag component provides a broad similarity based retrieval of random vision. This is what everybody's been doing. While the graph element contributes structured relationship reads contextual data. It's pretty good. And they had a few different metrics that used to measure. These are all kind of automatic metrics, I guess they've cited into gps before.

Valdimar Eggertsson [00:11:56]: So they have faithfulness, which is a measure of checking hallucinations. Basically, it's about what they say. It's a faithfulness is a crucial metric that measures the extent to which the generated answer can be inferred from the tried to provide the context. So an LLM can answer a question like seemingly perfectly, because it wants to be super helpful, but then turns out it doesn't come from any data, and it's just telling you what you want to hear. So they have to break the question answer part, answering to fundamental statements. I don't know, it's about the earnings of a company and who owns a company, what not to break it apartheid. It's a complicated question and an answer and then they just prompt LLM to check the evidence, which is with a retrieved info from the vectors or from the graph, and say yes or no or how much it fits. Get some kind of score.

Valdimar Eggertsson [00:12:59]: It's good to have this. I'm bit skeptical of like using LLMs to valued LLMs, but still it should spot some hallucinations. Then we have answer relevance, which is an all metric used to evaluate the system and show that their approach is better in some aspects than the other ones. This one was a bit confusing. It's about how kind of, how well does the answer match the question? And they generate different questions for the answer that could fit it, then use the similarity method to compute it. I don't want to go into this much, but it's like they generate a question for a given answer and then they find out if there are. If many different questions could fit this answer and if they're all the same, then it's a relevant answer. So I've never seen this before.

Valdimar Eggertsson [00:13:57]: Seems kind of cool then that precision recall, but in this different definition. It's something used in Rakas, this automated framework. We're evaluating LLMs, but it's a factory, just good on precision and record, where, you know, like what really matters for me, like I've been doing lots of retrieval, it's just a recall. You want to see how often is the true answer in that retrieved documents. So like, you fetch 20 chunks or ten chunks or five chunks. How often is it? There's a precision where we want to know how many relevant documents are found. So is that there's a trade off between those usually, and actually they were using like chat DBT 3.5 or GPT 3.5. But for me, the precision doesn't really matter because DBT four can kind of filter out the noise in the retrieved info and just find what matters.

Valdimar Eggertsson [00:15:00]: So you just really want the important parts to be somewhere in the retrieved info. Does anyone have any questions or comments at this point? I'll talk briefly about the data set next before handing the word over.

Nehil Jain [00:15:13]: Yeah, I want you to comment on the precision part. I think that makes sense. You should focus on recall before precision. But it's not a black and white in the sense that even in the last paper we read, as the context gets longer and longer, at least in the long context problems, there's a loss on the middle problem. And so that is one problem where you might have too much context. But then if you don't have high precision, then LLM will get confused. And the second problem can be that if you're working with slms which have smaller context, then you need to be able to get the relevant stuff in smaller amount of context window. Otherwise, again, you may not be able to pick it in.

Nehil Jain [00:15:56]: So that's how I see it. But definitely you need to do recall first before you do.

Valdimar Eggertsson [00:16:01]: Yeah, for sure. And also, yeah, I mean, if you're in retrieving like 30,000 tokens for every question, much better to just get 5000 tokens and save time and compute. So for sure.

Bruno Lannoo [00:16:16]: Another remark or question for if someone noticed, like I was very interested in the fatefulness metric. I think it's a very interesting idea to split up the answer into statements and then having evaluate whether the statements are covered by the data. However, I do notice that, like, I've been experimenting myself with a little bit with splitting up in statements or for an exercise I was doing, and I noticed that sometimes the LLM creates longer statements when it extract statements, and sometimes shorter ones. And I do think that there might be still like some room there to kind of. Because currently it seems like the less statements are discovered into the, into the answer, the better the score will be. Because it's as long as the couple of statements that are in there contain some amount of truth, it will have a very high faithfulness. But I think there might be still, like, it would be interesting to be able to integrate something there with also the number of words per statement or something like that. So that like the faithfulness is kind of normalized on the amount of word used.

Bruno Lannoo [00:17:23]: Also depends a little bit. Or they define being right depending on exactly how they query the statement verification. Is it if there is something true in a longer statement, is it right even if there is also something wrong, or like. Yeah, that depends. Exactly. I don't think those details are provided here.

Valdimar Eggertsson [00:17:39]: Yeah, that's a good point. Judges of thought I had, because we're talking about like knowledge graphs and these logical triplets, we could maybe break it into very simple statements like, yeah, let's say no access y.

Valdimar Eggertsson [00:17:56]: It's quite interesting that. I find it quite interesting that they've had to construct their own evaluation metrics here. Now, evaluation of LLMs as a general field is not very well developed. So sure, they've had to figure out some techniques here, but at the same time, none of these techniques seem to be graph specific, I don't think.

Bruno Lannoo [00:18:19]: But, but I do think that, that what Vladimir says is kind of right. Like this. This statements could have done graph, been done with graph extraction, like, and that would also be an interesting variant. You could do a regular text statement extraction and a graph statement extraction and have these two faithfulness matrix next to each other, maybe weighted or, or compared. That could be really interesting. And I do think like what you're saying is also right, like evaluation seems to be very young in developing, so it's normal that people have to invent their own. But these are very creative metrics, I think, that are interesting to see.

Valdimar Eggertsson [00:18:54]: Okay, I'll continue about the data description. Great to have these talks about the contents because it's probably more interesting than the content. So they basically made, they needed to create a dataset for their purpose. Just financial documents where you can construct a knowledge graph also, I guess constructing the knowledge graph was maybe the tricky part. So what they said, highlight here if you see it. In short, there's no publicly available benchmark data set to compare vector RAC and grag Ragdez, either for financial or general domains. To the best of our knowledge, I'm kind of guessing that there's something somewhere just needs to have, like there's something that needs to be linked to a graph. There's like Wikipedia and Wikidata for example.

Valdimar Eggertsson [00:19:52]: Anyway, but this was all done for the financial purpose. They were working for hedge fund or something. So what they did was they took transcripts from earning calls of Nifty 50, which is like top companies in the indian stock exchange took one quarter's worth. Yeah, it's kind of a small dataset, it's like 50 transcripts, it's long documents, one for each company. Guess we could just see most of it here in this table. There were 50 companies, one document for each company. And this is kind of relevant to keep in mind that this is not a very large dataset because, yeah, if it were like 5000, it's still not that large. But when we look at the results later, it's kind of relevant.

Valdimar Eggertsson [00:20:51]: They have 16 questions per document. They constructed the graph from it, they go into some details here about this scrape stuff and blah, blah, blah, doesn't matter that much. But they say as the diverse domains, it was finance and let's talk about money. It's like healthcare, oil, telecommunication, et cetera. And I think my part is done. Maybe I had a couple of comments. Well yeah, about the matrix. I would have liked to see some kind of Schombun review.

Valdimar Eggertsson [00:21:29]: It's instead of using GPT 3.5 for everything else, I don't know what they used for the auto evaluation for guessing GPT four and well, yeah, it's mostly it. All the comments are for the other sections. I guess we'll have around later to tell you what you think. So, maht, are you gonna say words? Implementation?

Matt Squire [00:21:53]: Yes. Surprising, no? Human. Review from Samantha I think there's a number of surprises in this paper, which are not in a good way, I'd suggest. So maybe we'll get into that. I kind of similar to Vladimir. I was really excited by the idea of this paper. We were doing a lot of work internally to try and understand how those knowledge graphs and vector based rag intersect. The paper claims that they have a novel method here that hasn't appeared in the literature before, which I'm not entirely sure that's true, but again, that's something we can debate.

Matt Squire [00:22:32]: In any case, let's talk about how they built it and how they implemented this. So, yeah, I mean, you know, they firstly, they needed to construct, they needed to do two things at a very high level. They needed to construct a knowledge graph, and they needed to chunk the text up into a more traditional vector database so that we can do standard, so called standard rag on that. In fact, they needed to chunk the data in both cases, but for different reasons. So as they explained here, they used a library called pipdfloader to import these documents, and then they break that document down into chunks. Now, the purpose for which they are chunking here is so that they can examine those chunks one by one and build up the graph. Then those chunks are going to be discarded. I just wanted to disambiguate that, because we're going to talk about chunks later in the context of vector rag, and it's for a different purpose in each case.

Matt Squire [00:23:31]: And so hopefully that makes sense. You know, there's little bits of technical detail here that don't particularly matter for our purposes, like the specific chunker that they're using, the chunking strategy. They are using lang chain here as well. So this chunking strategy is part of LangChain is notable that, you know, there are more advanced ways to do chunking that they're not considering in this paper. There's things like semantic chunking, that sort of thing that aren't considered. So that's notable. In any case, they break it down into these chunks, and their intention here is to scan through the chunks and extract entities that are relevant to this financial domain. They have a predefined list of entities they're interested in, so they want to pull out companies, financial metrics, products, services, locations and so on.

Matt Squire [00:24:23]: Right. But they don't just want the entities, they want the relationships between the entities. So they want to be able to say that this company here, company a, has a board member, Bob, another board member, Alice. And maybe Alice is also a board member of company B. And maybe the earnings for company a is such and such. This quarter, last quarter it was some other number, so on and so on. I'm trying it here to describe to you and draw a picture in your minds of what this graph would look like. But I think one of the shortcomings of this paper is they don't do that for us.

Matt Squire [00:25:04]: So I would really have liked here for them to give an example, to say, here is our earnings report, here is what we extract from it. Here is what the graph looks like, because that would really help in terms of intuition building, to help us as readers to be able to understand why, as they claim, the combination of graph plus vector database actually adds value here. In any case, we'll move on through the thing. The things they highlight here is the verbs. So when they say verbs, they're talking about relations between these entities, so they'll have a predefined set of things they're interested in, like is an investor in something, or, or if somebody has a directorship on the board of some company, those are the sorts of verbs they are interested in. Again, I speculate, because they don't tell us what those verbs are. Unfortunately, they then talk about building this pipeline in LangChain to extract those features. So.

Matt Squire [00:26:08]: Well, I guess I've kind of covered that. But they're pulling out entities, pulling out verbs, they ultimately persist. All of that as a pickle file. So you think about a graph, you think about this grandiose thing where lots of entities are connected to other entities, which are connected to other entities, and so on. But you can simplify that as these triplets which have been mentioned before. So we have, you know, subject, object, predict subject, predicate object, or however we want to describe it. But the idea is that we have a thing which is connected to another thing. That's three items that we need to persist.

Matt Squire [00:26:44]: So they store these triplets as python data structures. They persist into a pickle file, and later on you'll see they load that back in. So they go on to describe the two approaches, vector rag and graph rag. In both cases, they're using this GPT 3.5 model, rather somewhat, perhaps somewhat outbasive model, something to consider there. But in any case, with the vector approach, they are doing what you would usually do for RAC and for those who don't know what you would usually do, I'll just briefly describe that. Sodam has provided a wonderful intuition for that already. So what we're going to do is we'll take the text, we split it up into chunks of a fixed size. They're using 1024, and then each of those chunks gets assigned a vector that represents that chunk.

Matt Squire [00:27:36]: It represents the meaning of that chunk that allows us to later on, say, I have a question and I want to find chunks of text that are semantically related to that question. We retrieve those from what's called a vector database and then we perform our rag query. So we try to answer that question. So in their pipeline for Vexorag, what they want to do is given a question, which will be a question about the financial statements. Incidentally, they want to retrieve the chunks. They want to build what we call a context, and that is just a concatenation of the relevant chunks. And then usually the next step would be generate a response using a large language model. So we ask a large language model to say, given the user's question, given the context, can you answer the question? However, they stop there.

Matt Squire [00:28:28]: They stop there because later on they need to combine these two approaches together. So let's talk about the second approach, which is the graph rag. Here the goal is to build up a graph. Well, we've already kind of done that. We've saved it as a pickle file. So actually all they do here is load up all of those entities and build a graph. They're using a python library called network x to represent that graph in memory, and that allows them to do a traversal. All the traversal means is that we can pick a starting entity in that graph and we can explore the graph.

Matt Squire [00:29:03]: So we can say from here, I'm going to go to my neighbors, I'll go to the next neighbors and so on. So we'll explore the graph, we'll traverse it. So their pipeline is then to say, we'll traverse the the graph to find things that are relevant to the question that's been asked. Rather than finding chunks of text that are relevant, we'll find these entities that are relevant, and those entities have ultimately been derived from the text. But we're looking not just at the entities, but the relationships between them. Hopefully all that makes sense so far. I'll just pause if there's any questions before I go on.

Bruno Lannoo [00:29:41]: There was a small question I did have around there, like in these numbers, they talk about the number of triplets in the graph and the number of edges. And those are almost identical, but I would have thought they would be identical. Any idea what the difference is?

Matt Squire [00:29:55]: That's a really good question. I don't know. I suppose I'm going to speculate a little bit. Are there things that don't have any relationships in that graph? I don't know if anyone else has any thoughts on that or anyone who's read the paper.

Sophia Skowronski [00:30:13]: But directed or undirected graph, do they specify? Like, what type of graph.

Matt Squire [00:30:19]: See, they don't actually specify. It sounds a good point. Yeah.

Sophia Skowronski [00:30:23]: Is it a bunch of. So it's one big graph. It's not separate graphs for each document or each company. So, like, how do they manage, like, entity resolution and duplicated entities like Orange county versus OC and references like that?

Valdimar Eggertsson [00:30:41]: I'd say that's one of the major shortcuts, like, in my mind, because I used to deal, like, I did a couple of papers in entity disambiguation. And it's easy to construct a graph with all the relationships in the text, but that's not a good graph. You need to have reliable entities. I think it into it.

Matt Squire [00:31:00]: I mean, I feel like these are all really good questions that are basically not answered in the paper, and we can speculate about it, but that is in itself a shortcoming of the paper, I feel.

Nehil Jain [00:31:11]: Yeah, Matt, I was just asking if they mentioned how big the graph in the end was, how many entities there were, or how many nodes there were.

Matt Squire [00:31:20]: Yeah. Well, I suppose this number here should be. How many nodes did it add to? Yes, it's 11,400 ish. But then why the number of triplets and the. They don't. This doesn't quite line up well. Yeah, we need to see the graph. Right.

Matt Squire [00:31:39]: We need to know how dense this is because. Yeah, so it looks like it's a very highly connected graph, which, to Valdemar's point is a questionable thing, isn't it? Okay, so I'm not able to add any clarity there, but that's more. I'm going to blame the paper myself on that one. Okay. So the final thing that I want to cover in describing the implementation is this last .4.4 where. And I'm going to sound like I'm leaning on the paper quite heavily here, but in a kind of critical way. But again, I looked at this and I thought this is a kind of short for what it's doing, you know, because this is the bit where they talk about how they combine the two techniques and essentially all it says is, well, we take the output from that context that we built from vector, and that context which we built from the rag search, sorry, from the graph search, we just combine them together, we concatenate them. So it says here, we concatenate the two contexts, form a unified context, and they place the vector first and the graph results secondly, which is kind of arbitrary.

Matt Squire [00:32:55]: So firstly, that feels kind of naive in terms of just concatenating two things together. There are things they could do, like apply, re ranking, or do other things to combine the results in a more sophisticated way, but they don't do that. The other thing they note is that the precision is impacted by the ordering. And this is echoes of the missing middle, that which we discussed in the previous reading group a little bit, where the amount of stuff you have in the context impacts how a large language model pays attention to that. It doesn't equally weight everything in that context. And that's a problem I feel they're probably coming up against with this concatenation. If they swap the order of concatenation, they probably see different results, which is kind of interesting that they didn't try both ways at the very least. So, yeah, anyway, that hopefully covers the implementation.

Matt Squire [00:33:53]: I'll stop sharing my screen, but we'll continue the discussion if there's more questions.

Nehil Jain [00:33:59]: Yeah, so I had some thoughts about, as Matt Shari, how the has been written, like the size of the data, the way they approach certain things, the models they use, et cetera. And so I was searching other literature that has been recently published around, like using graphs or graph based neural networks, etcetera, to do rag. And actually, Matt briefly touched on all of the points that I. So, one is that it's pretty uncanny, you need to do some kind of disambiguation and de duplication of the context. And there are papers now which are saying, hey, we used a special type of prompt or something else to take the graph and flatten it into some kind of hierarchical text structure. And so that would be an interesting thing to read. I mean, I was just reading the abstracts and kind of only understanding the insights. I didn't read all the papers because then I might be writing my survey paper at that point.

Nehil Jain [00:35:02]: But, yeah, so people are spending time trying to understand how to actually just combine them more effectively, which is not done in this paper. The second thing was, there is a paper which is already claiming that rewriting with graphs is really helpful. And so if you use gnns, which is graph neural networks, again, I don't have much that's why I have to learn what they are. But they're using gnns as a step after, which is a common step. So re ranking has now become a common step. In advanced tracks, they call it where you do the retrieval, then you re rank the retrieval and then do the generation and so on, so forth. And so they're saying, can you re rank the different pieces of chunks and have a more, like a better context using Gnsitive the last, I think that's it. And then there's a lot of conversation also about, in those papers, which is not covered here, about like, how do you go about pruning the graph? Like, once you figure out for a given list, how do you improve the retrieval, given that, hey, this is the query and this is the graph.

Nehil Jain [00:36:10]: How do you make sure that you covered the most relevant pieces? And how do you work with a graph database, I guess, or whatever, using the stereograph to retrieve it better? So those were some things we should definitely dive deeper into in other papers. This paper kind of lacks a bunch of detail around that, which I think we all are kind of desiring and speculating here and sizing. Oh, maybe this is what will happen. Anyway, so in the results piece, I mean, there isn't anything really, um, new, in my opinion. I read through the results and what it says is basically that, uh, in faithfulness, there isn't. So this, this table basically covers most of the stuff that they're talking about here. Um, and that's kind of very short and sweet to look at. Um, these are the metrics that Vladimir talked about.

Nehil Jain [00:37:04]: And so faithfulness is kind of the same. And then in terms of recall, they're saying that the recall of grass is not the best, but they're not fully describing why that is. But when you combine it, because they're just concatenating it, you will always have all the context available, the hybrid rag. So you will always have a good recall. So you can, because of concatenation, get the best recall. And then in terms of precision, because the number of chunks that are retrieving from vector is not the best, it is hurting the overall precision as well. But as we discussed, I think recall is more important in drag than decision in most cases, in most practical cases. And then, yeah, faithfulness, there isn't that big of a difference, I would say.

Nehil Jain [00:37:54]: And the interesting thing is the. I forget what this was.

Valdimar Eggertsson [00:38:02]: Yeah. It's the similarity of hypothetical questions that could correspond to his answer. So it's like they make.

Nehil Jain [00:38:10]: Oh, yeah, yeah, yeah, yeah. So that's a new metric which is kind of creative and interesting. They are not using any graph related metrics either, and so it's hard for me to reason about it because they came up with the metric and then we haven't really looked at the data. So what it really means, like, if they shared at least a transcript and comparison to understand how they're doing it, that would help internalize it. Faithfulness, precision recall are common in ragas and other eval frameworks nowadays, so that's definitely also useful. Any, any thoughts or questions around this, guys?

Keith Trnka [00:38:48]: Actually, just clarification. I thought the four eval metrics were all from ragus. Are they not? Or are they, do they differ in some way?

Nehil Jain [00:38:58]: This answer relevance also part of rags?

Keith Trnka [00:39:02]: I think so, yeah.

Valdimar Eggertsson [00:39:04]: At least there's a reference like footnote.

Nehil Jain [00:39:06]: Okay, so only that might be a mess.

Valdimar Eggertsson [00:39:08]: Maybe it's all from. Right, rag is. Yeah, they just site. Rag is for the contest precision and contest recall. But not the old ones.

Keith Trnka [00:39:16]: Oh, but not the others. I'm not saying I necessarily like the metrics, just that I don't think they pulled them out of a hat.

Nehil Jain [00:39:23]: What do you think about, like, not having any way to. All the metrics are rag related, but is there anything we should also look at in how they traverse the graphs and the pieces around retrieval related to graphs, this context or not? Share about and yeah, I think that's, that's all I have from results perspective, if you don't have any other questions.

Valdimar Eggertsson [00:39:46]: Yeah, so in the results, they're comparing, like, either you have the vector context or the graph context, or both. But that's not really fair, because if you have both, you have twice as much stuff in there. So this is one of the things that kind of stood out to me, because maybe if they had like, much larger vector context, or like a bigger part of the graph included, then you wouldn't. Yeah, basically the number of tokens that used isn't really fair. And then about them, about the. There's the concatenation. I mean, we wanted to touch upon this, but, like, you could use the, like, which entities are in the graph to somehow filter the documents found by the backdoors or vice versa. Like some kind of.

Valdimar Eggertsson [00:40:39]: Something more alive than just concatenation.

Matt Squire [00:40:44]: It's kind of weird, right? It's like they're, they're not taking advantage at the end of the fact that it's a graph. They take advantage of the fact that it's a graph when traversing a graph, clearly, but they don't take advantage of the graph nature of it when they construct the context. And as you say, if all they're doing is suppose there's a lot of overlap between the two contexts, then could they just repeat the same context twice and get the same result? For that matter, they've not done a good enough job of convincing me that that endpoint where they combine the graph result with the vector result is actually because it's a graph versus because it's duplication of context or additional information, like you suggested.

Valdimar Eggertsson [00:41:31]: Yeah, because they, like, they have these two steps of how to generate the graph. They first take the report and make a more abstract version of it, and then they take that abstract version and make it into a graph. But maybe it was enough to just have the abstract version of the reports in a textual way. Yeah, maybe you don't need to have a graph, you just need to have the relational info. Like this company is owned by this company, all these statements are in the graph ultimately. But then we have these graph algorithms. Like, yeah, wish I could, like, I hope some, maybe someone in the room knows more about them. But like what you could do, instead of just taking all the one hop neighborhoods, we could actually search the graph based on the thing.

Valdimar Eggertsson [00:42:21]: Like we could do graph algorithms. It's like fantastic opportunity, and I can imagine it's being done somehow. I mean, just like knowledge graphs are parts of Google Search, and all these big companies have knowledge graph behind what they do. And then now they have other labs too. And I bet they're combining those two in a spark way somehow.

Thomas Kwok [00:42:45]: Yeah, I feel like they are not used utilizing the graph structure. They may have used like sparkle queries to maybe use LLM to generate sparkle queries for like better graph traversal, but yeah, that's that. But I was always concerned about the scalability of this method, because the graph here is rather small compared to a lot of real life enterprise graphs. And touching that on the precision recall trade off, I wonder if the graphs get to billions of, maybe millions of nodes, precision and recall, the way of simply concatenating them could be problematic if applied to a real life application.

Bruno Lannoo [00:43:25]: Yeah, I was also wondering if we normalize to the amount of tokens provided if the graph would almost be able to put the whole graph into the context. Because I imagine that the graph will be a lot more condensed and if you compare it, the vector database, maybe ten reference text might be as much as almost the full graph would be from the graph database. Normalizing might, might actually be a very interesting thing to do to really see, all the one compares to the other one.

Matt Squire [00:43:55]: Thinking out loud here. What I might be interested to see if anyone's done it or seen it, is making a larger language model itself traverse the graph. So all of these things where we're taking a graph and ultimately condensing it into a context, if it's a sub graph, whatever it is, the context can contain the same structural information as the graph contains. But there's no reason the large language model has to follow that structure. Ultimately it's going to do what it's going to do according to its text generation model. Whereas imagine if you had a model which can. It's got a chain of thought reasoning, so you have a model that's been told to, I don't know, look at this node, this is your starting point and you have a bunch of options. And one of your options is to go to the database and get the nearest neighbors.

Matt Squire [00:44:50]: And it's therefore able to traverse the graph through a sort of reasoning cycle, if you see what I mean. I don't know what you do with it, I don't know what applications you might have, but you've basically given the model the power to traverse the graph directly and forced it to work within a graph structure.

Valdimar Eggertsson [00:45:07]: I've heard of some of these. Any chats that are able to like traverse the vat that's similar? I haven't looked into it much.

Bruno Lannoo [00:45:16]: So to me it sounds like it's obviously a very good idea that like that would be much more successful. But I think it takes a lot more engineering to actually implement that idea than what was done here. So I think this is just a first approach where like you're doing it the straightforward, easy way, but what you're describing is most likely way more powerful.

Matt Squire [00:45:34]: If anyone builds it, let me know.

Valdimar Eggertsson [00:45:35]: Yeah.

Binoy Pirera [00:45:38]: Gaps in the paper.

Valdimar Eggertsson [00:45:39]: I think.

Binoy Pirera [00:45:40]: I think everybody saw hybrid drag and everybody was like, yeah, let's do this one. And it's fair enough, but I feel like there was a lot of gaps in the paper. So if you guys have any suggestions to what paper we should do next time, please feel free to like email us and we're happy to go through it and see if there any gaps, regardless of whether it's about hybrid drag or anything that's trending. But yeah, it was a fun session. There was a lot of diverse opinions. Does anybody have any thoughts, you know, about how's everybody feeling?

Valdimar Eggertsson [00:46:10]: I had one last thought maybe about the. So we have like the documents in the vector store and we have the graph, but we could also like if usually when I think of a graph, I think of Wikipedia and then each entity has like a document as well, like the Wikipedia article and yeah, so we could have the knowledge oracle is on a graph somehow. I don't know. Not a very concrete thought, but it's my last note to throw out there. And otherwise just the hip rate, like hip combining. This is also a bit of disappointment that didn't see some crazy new method, but like, graph methods and lawless graphs can definitely be used to improve rag. And I think of it sometimes, like, we often have hierarchically organized documents, like document tree, that's a kind of a graph. And I just look forward to actually trying out graph rag.

Valdimar Eggertsson [00:47:08]: I haven't done it yet.

Bruno Lannoo [00:47:09]: When you're talking about attaching documents to the nodes, I don't know if that's like firstly necessary because I think you can model it really fully with a graph too. But I feel like if you go to Wikipedia or wiki data, like, there is also a distinction between relations between entities and attributes of the entities. And I think you can model this with like being like, oh, this. This person has an age and it has a relationship age to a number. In theory you can kind of get around with just a pure graph, but I don't see them talking about these attributes, which are normally modeled more as something separately, which is a little bit like you're saying this. It's like having this text next to each entity, and I wonder if that would also improve it quite a bit, having like explicitly modeling attributes as something more separated than just relationships for sure.

Binoy Pirera [00:47:59]: Awesome. I guess we can wrap it up then, right? If anybody else has any thoughts, please feel free to just unmute and just let us know. Or we always keep the conversation rolling in our slack. Like I put a link to join our slack workspaces in the chat. So if you want to just join us and suggest what people we should cover next time, or if you want to post one of these sessions like Matt, Naheel, Sonam and Voldemort, just be to say hi to me and thank you so much for joining. Always fun to see people from all over the world from different time zones joining us and giving us all these diverse opinions. So stay tuned. We will discuss a much cooler, better paper next time and lovely to see you all here.

Binoy Pirera [00:48:39]: Thank you so much.

+ Read More

Watch More

Information Retrieval & Relevance: Vector Embeddings for Semantic Search
Posted Feb 24, 2024 | Views 1.3K
# Semantic Search
# Superlinked.com
Managing Small Knowledge Graphs for Multi-agent Systems
Posted May 28, 2024 | Views 640
# Knowledge Graphs
# Generative AI
# RAG
# Whyhow.ai
Graphrag: Enriching Rag Conversations With Knowledge Graphs
Posted Jul 26, 2024 | Views 203
# GraphRAG
# LLMs
# Graphlit