MLOps Community
+00:00 GMT
Sign in or Join the community to continue

How LlamaIndex Can Bring the Power of LLM's to Your Data

Posted Apr 27, 2023 | Views 2.8K
# LLM
# LLM in Production
# LlamaIndex
# Rungalileo.io
# Snorkel.ai
# Wandb.ai
# Tecton.ai
# Petuum.com
# mckinsey.com/quantumblack
# Wallaroo.ai
# Union.ai
# Redis.com
# Alphasignal.ai
# Bigbraindaily.com
# Turningpost.com
Share
speaker
avatar
Jerry Liu
CEO @ LlamaIndex

Jerry is the co-founder/CEO of LlamaIndex, the data framework for building LLM applications. Before this, he has spent his career at the intersection of ML, research, and startups. He led the ML monitoring team at Robust Intelligence, did self-driving AI research at Uber ATG, and worked on recommendation systems at Quora.

+ Read More
SUMMARY

Large Language Models (LLM’s) are starting to revolutionize how users can search for, interact with, and generate new content. There is one challenge though: how do users easily apply LLM’s to their own data? LLM’s are pre-trained with enormous amounts of publicly available natural language data, but they don’t inherently know about your personal/organizational data.

LlamaIndex solves this by providing a central data interface for your LLM’s. In this talk, we talk about the tools that LlamaIndex offers (both simple and advanced) to ingest and index your data for LLM use.

+ Read More
TRANSCRIPT

Link to slides

Thanks so much for the opportunity uh to have me present here. I'm super excited to, to give this talk. Uh The goal of this talk is how Lama Index can connect our language models with your external data. Um Thanks so much for the opportunity uh to have me present here. I'm super excited to, to give this talk. Uh The goal of this talk is how Lama Index can connect our language models with your external data. Uh So my name is Jerry, I'm one of the creators uh and co-founders of Llama Index. And I'm super excited to be here. Uh So my name is Jerry, I'm one of the creators uh and co-founders of Llama Index. And I'm super excited to be here. So the basic context here is that L L MS are a phenomenal piece of technology for knowledge generation and reasoning. Uh They're pretrained on large amounts of publicly available data. And so you've all seen the amazing capabilities of L L MS by just playing around with stuff like BT, So the basic context here is that L L MS are a phenomenal piece of technology for knowledge generation and reasoning. Uh They're pretrained on large amounts of publicly available data. And so you've all seen the amazing capabilities of L L MS by just playing around with stuff like BT, you know, they can answer questions, they can generate new pieces of content. Uh They can summarize stuff for you. You can even use them as planning uh agents. So basically you can have them perform actions, get a response and perform more actions over time. you know, they can answer questions, they can generate new pieces of content. Uh They can summarize stuff for you. You can even use them as planning uh agents. So basically you can have them perform actions, get a response and perform more actions over time. Pretty much every application developer that's using limes thinks to themselves. How do we best augment language models with our own private sources of data? And so whether you're an individual or an enterprise, you're gonna have a bunch of raw files lying around, for instance, like PDF S power points, Excel sheets, uh pictures, audio, you might use a few workplace apps like notion Slack sales force. Pretty much every application developer that's using limes thinks to themselves. How do we best augment language models with our own private sources of data? And so whether you're an individual or an enterprise, you're gonna have a bunch of raw files lying around, for instance, like PDF S power points, Excel sheets, uh pictures, audio, you might use a few workplace apps like notion Slack sales force. Uh You might have uh uh if you're an enterprise, uh a heterogeneous sources of data collections from data lakes, structured data, uh vector D BS, uh you know, object stores, all these different things. And so the key question is how do we best augment with all this data? Uh You might have uh uh if you're an enterprise, uh a heterogeneous sources of data collections from data lakes, structured data, uh vector D BS, uh you know, object stores, all these different things. And so the key question is how do we best augment with all this data? There's a few paradigms for injecting knowledge into the weights of the network these days. Uh Probably the most classic machine learning example is through, you know, some sort of fine tuning or training or distillation process. Uh The idea is that, you know, you take this data and basically run some sort of optimization algorithm on top of this data that actually changes the weights of the network itself to try to learn the new content that you're feeding it. There's a few paradigms for injecting knowledge into the weights of the network these days. Uh Probably the most classic machine learning example is through, you know, some sort of fine tuning or training or distillation process. Uh The idea is that, you know, you take this data and basically run some sort of optimization algorithm on top of this data that actually changes the weights of the network itself to try to learn the new content that you're feeding it. And so, you know, if you look at a pretrained model like chat G BT or G T four, they already internalize a huge amount of knowledge. If you ask about anything about Wikipedia, they'll be able to uh understand and, and give you uh you know any, any information about any Wikipedia article on there. And so, you know, if you look at a pretrained model like chat G BT or G T four, they already internalize a huge amount of knowledge. If you ask about anything about Wikipedia, they'll be able to uh understand and, and give you uh you know any, any information about any Wikipedia article on there. However, I think uh another paradigm that has emerged these days is in context learning. And so for a lot of users, fine tuning tends to be a bit inaccessible for both performance and cost reasons. And so these days, a lot of applications developers are using this paradigm where they combine a pretrained language model with some sort of retrieval model to retrieve context and they manage the interactions between the language model itself as well as this retrieval model. However, I think uh another paradigm that has emerged these days is in context learning. And so for a lot of users, fine tuning tends to be a bit inaccessible for both performance and cost reasons. And so these days, a lot of applications developers are using this paradigm where they combine a pretrained language model with some sort of retrieval model to retrieve context and they manage the interactions between the language model itself as well as this retrieval model. So the way it works is imagine you have some knowledge corpus of data. And so imagine you have a notion database of various text files. This is about uh you know, like I say the biography of an author, it's about Paul Graham. And so the idea is that you have this knowledge corpus of data and then you have this input prompt and given this input prompt, it would look something like the following. It would basically say here's some context and then the retrieval model will be responsible for retrieving the right context from the knowledge corpus, putting it into the prompt So the way it works is imagine you have some knowledge corpus of data. And so imagine you have a notion database of various text files. This is about uh you know, like I say the biography of an author, it's about Paul Graham. And so the idea is that you have this knowledge corpus of data and then you have this input prompt and given this input prompt, it would look something like the following. It would basically say here's some context and then the retrieval model will be responsible for retrieving the right context from the knowledge corpus, putting it into the prompt and then giving, given the context, answer this following question and then you put the question in here and then you send this entire prompt over to the language model in order to get back a response. and then giving, given the context, answer this following question and then you put the question in here and then you send this entire prompt over to the language model in order to get back a response. So some of the key challenges in context learning is how do you actually retrieve the right context for the problem? Then we'll talk about some common paradigms as well as less common paradigms that might solve more advanced use cases. How do you deal with long amounts of context? How do you deal with source data that's potentially very large and also very heterogeneous. You know, you might have unstructured data, semi structured data, structured data. So some of the key challenges in context learning is how do you actually retrieve the right context for the problem? Then we'll talk about some common paradigms as well as less common paradigms that might solve more advanced use cases. How do you deal with long amounts of context? How do you deal with source data that's potentially very large and also very heterogeneous. You know, you might have unstructured data, semi structured data, structured data. And how do you actually trade off between all these different factors like performance latency and cost. And how do you actually trade off between all these different factors like performance latency and cost. This is basically what Lama index's entire mission is focused on. And so, you know, imagine you're building some sort of knowledge intensive uh language model application, whether it's a sales tool, marketing tool, recruiting tool, et cetera. This is basically what Lama index's entire mission is focused on. And so, you know, imagine you're building some sort of knowledge intensive uh language model application, whether it's a sales tool, marketing tool, recruiting tool, et cetera. And your input is gonna be some que rich query description. It's gonna be either just a simple task, uh like a simple question that you want an answer to. It could be a complex task that you feed in. Um The idea is that it's basically something that you would normally feed in to chat G BT. But here you're feeding it into us as an overall system. And your input is gonna be some que rich query description. It's gonna be either just a simple task, uh like a simple question that you want an answer to. It could be a complex task that you feed in. Um The idea is that it's basically something that you would normally feed in to chat G BT. But here you're feeding it into us as an overall system. Lama index itself is a central data interface for a language model application development. And so we sit on to uh on top of your existing data sources or databases and we manage the interaction between your data and your language model. And the response is basically a synthesized response with references, actions sources, et cetera. Lama index itself is a central data interface for a language model application development. And so we sit on to uh on top of your existing data sources or databases and we manage the interaction between your data and your language model. And the response is basically a synthesized response with references, actions sources, et cetera. And so going into a little bit about some of the components of Lama index. And and I think the goal of this talk is is to talk about uh some of the additional use cases that are really solved. So we'll get into that in just a few slides. Um Our goal is to make this interface fast, cheap, efficient and performing. So we have three components. Uh we have data connectors which you can find on Lama hub. They're basically just a set of data loaders from all these different data sources into a document format that you can use with uh Lama index uh or even line train. And so going into a little bit about some of the components of Lama index. And and I think the goal of this talk is is to talk about uh some of the additional use cases that are really solved. So we'll get into that in just a few slides. Um Our goal is to make this interface fast, cheap, efficient and performing. So we have three components. Uh we have data connectors which you can find on Lama hub. They're basically just a set of data loaders from all these different data sources into a document format that you can use with uh Lama index uh or even line train. Um And then the other next step is data indexes. So once you adjust this data, how do you actually structure this data to solve all the different use cases uh uh of kind of uh knowledge augmented generation. Um And then the other next step is data indexes. So once you adjust this data, how do you actually structure this data to solve all the different use cases uh uh of kind of uh knowledge augmented generation. And then the last component is this query engine which basically completes this black box. And once you have these data structures under the hood, now you have a query engine that takes in the query and is able to route it to these data structures to give you back the response that you want. And then the last component is this query engine which basically completes this black box. And once you have these data structures under the hood, now you have a query engine that takes in the query and is able to route it to these data structures to give you back the response that you want. So the goal of this is really just to give you a few concrete use cases that Llama index solves. Um And so probably the most simple one that pretty much everybody is doing these days. Why uh using Llama index or using another kind of vector DB stack is semantic search. And so imagine you load in some sort of tax documents, right? So first line of code, just load in some tax documents. So the goal of this is really just to give you a few concrete use cases that Llama index solves. Um And so probably the most simple one that pretty much everybody is doing these days. Why uh using Llama index or using another kind of vector DB stack is semantic search. And so imagine you load in some sort of tax documents, right? So first line of code, just load in some tax documents. Second line of code, you build this like simple vector index from these documents. And typically what that looks like is you chunk up the text embed it, put it in a vector database and store it for later. Second line of code, you build this like simple vector index from these documents. And typically what that looks like is you chunk up the text embed it, put it in a vector database and store it for later. Then during query time, you know, you ask some question like, you know, what did this author do growing up? Imagine the text data is about the author and you would just do embedding similarity based retrieval from this uh uh uh you know, vector database uh or a document store. And then you would take the top K of all thant chunks inject it into the input prompt and get back a response. Then during query time, you know, you ask some question like, you know, what did this author do growing up? Imagine the text data is about the author and you would just do embedding similarity based retrieval from this uh uh uh you know, vector database uh or a document store. And then you would take the top K of all thant chunks inject it into the input prompt and get back a response. And so for simple questions like fact fact based questions like what do the author do growing up or software, there's semantics in the query that map well to semantics in your knowledge corpus. This works pretty well. And so you can see we using this paradigm. The answer is, you know, the author grew up writing short stories programming on an I B M 14 oh one, this is retrieving the relevant chunks from your knowledge corpus. In order to generate this final response, And so for simple questions like fact fact based questions like what do the author do growing up or software, there's semantics in the query that map well to semantics in your knowledge corpus. This works pretty well. And so you can see we using this paradigm. The answer is, you know, the author grew up writing short stories programming on an I B M 14 oh one, this is retrieving the relevant chunks from your knowledge corpus. In order to generate this final response, there's also summarization. So summarization is actually a little bit different because instead of doing top K retrieval, you actually want to go through all the context in a document in order to synthesize a final response or summary of the document. there's also summarization. So summarization is actually a little bit different because instead of doing top K retrieval, you actually want to go through all the context in a document in order to synthesize a final response or summary of the document. And so for instance, you know, we load in some documents through the first line. And so for instance, you know, we load in some documents through the first line. And here this is another data structure called the list index. So instead of uh storing, you know each node with an embedding associated with it. A list index actually just creates AAA kind of a flat data structure of all the nodes within a document. So that when you ask any sort of query or input prompt over it, uh you explicitly want to go through every node within your list in order to use that as context to generate the final response. And here this is another data structure called the list index. So instead of uh storing, you know each node with an embedding associated with it. A list index actually just creates AAA kind of a flat data structure of all the nodes within a document. So that when you ask any sort of query or input prompt over it, uh you explicitly want to go through every node within your list in order to use that as context to generate the final response. So for instance, for a query like could you give me a summary of this article and new line separated bullet points? This is the first response that you would get. And then this is an example of the answer. So for instance, for a query like could you give me a summary of this article and new line separated bullet points? This is the first response that you would get. And then this is an example of the answer. There's also connecting to structured data where you can actually run text sequel uh queries on top of your structured data. And so that's actually a different paradigm than unstructured data because you inherently want to convert this into the query language, they could run over a structured database. There's also connecting to structured data where you can actually run text sequel uh queries on top of your structured data. And so that's actually a different paradigm than unstructured data because you inherently want to convert this into the query language, they could run over a structured database. And some of our advanced constructs include stuff like being able to synthesize across heterogeneous data sources. If you have notion documents as well as Slack documents, how can you best synthesize information? Uh not just you know, uh by treating everything as one combined vector store but actually explicitly going through and combining information across your notion and Slack documents. And some of our advanced constructs include stuff like being able to synthesize across heterogeneous data sources. If you have notion documents as well as Slack documents, how can you best synthesize information? Uh not just you know, uh by treating everything as one combined vector store but actually explicitly going through and combining information across your notion and Slack documents. Here is an example diagram that shows this use case. Here is an example diagram that shows this use case. And probably the last thing I'll talk about is also multis stepp queries. If you have a complex question, how do you actually break it down into multiple smaller ones over an existing data source in order to best get back the results that you would want. And probably the last thing I'll talk about is also multis stepp queries. If you have a complex question, how do you actually break it down into multiple smaller ones over an existing data source in order to best get back the results that you would want. So for instance, let's say you have some existing index uh or knowledge corpus about the author. And you ask a complex question like who is on the first batch of the accelerator program. So for instance, let's say you have some existing index uh or knowledge corpus about the author. And you ask a complex question like who is on the first batch of the accelerator program. So therefore, you know, you can actually break this down into smaller questions, use those to, to uh you know, get back uh retrieval based answers and then combine everything together to synthesize a final answer right here. So therefore, you know, you can actually break this down into smaller questions, use those to, to uh you know, get back uh retrieval based answers and then combine everything together to synthesize a final answer right here. Long story short, there's a lot of stuff, you know, there's more stuff in these slides uh and a lot of this is found in the docks. And the idea is that Lama index solves both the uh kind of short uh the the kind of simple use cases of semantic search as well as more interesting interactions between the retrieval model and your language model. Awesome. Thanks for your time. Long story short, there's a lot of stuff, you know, there's more stuff in these slides uh and a lot of this is found in the docks. And the idea is that Lama index solves both the uh kind of short uh the the kind of simple use cases of semantic search as well as more interesting interactions between the retrieval model and your language model. Awesome. Thanks for your time.

+ Read More

Watch More

How to Systematically Test and Evaluate Your LLMs Apps
Posted Oct 18, 2024 | Views 14.9K
# LLMs
# Engineering best practices
# Comet ML