Image created using DALL.E 2
1. 🌟 What is DREAM?
2. 🚶 Code Walkthrough
3. 📍 Conclusion
Given the myriad of options for LLMs, embedding models, retrieval methods, re-ranking methods and so on, it can be challenging to determine which combination will work best for your usecase. Who has the time to explore each combination one by one?
DREAM Architecture
So, Distributed RAG Experimentation Framework (DREAM) is a blueprint, comprising of a kubernetes native architecture and sample code, to demonstrate how Retrieval Augmented Generation (RAG) experiments, evaluation and tracking can be conducted in a distributed manner using Ray, LlamaIndex, Ragas, MLFlow and MinIO on Kubernetes.
By setting up the necessary K8s tooling and running the experimentation, evaluation and tracking in a distributed manner, we ultimately want to be able to compare and contrast the different combinations of RAG parameters and pick the one that works best for our usecase.
Parallel coordinates plot illustrating the different combinations attained from distributed RAG experimentation and their performance on various evaluation metrics
As shown in the architecture diagram above, DREAM uses the following technologies:
For installing all these components, you can follow the steps outlined in the installation guide. You might notice that DREAM is part of a larger project I'm calling GOKU (GenAIOps on Kubernetes), which is coming soon!
Here you go: DREAM Github :)
This steps in this noteboook are quite straightforward:
Screengrab of MinIO after pushing the PDFs
This is where the fun begins!
Workflow for distributed Golden Dataset generation
This is about to get a little complicated, so here's the overall workflow visualised:
Workflow for distributed experimentation, evaluation and tracking
Before we get to the juicy bits, let me describe the search space and evaluation metrics. In the sample code, our search space spans over 3x RAG methods, 2x LLMs and 2x embedding models. We use 3 RAG methods native to LlamaIndex - chunks with overlap, sentence window retrieval and hierarchical automerging retrieval. We use OpenAI 's gpt-3.5-turbo and gpt-4 as our LLMs, with text-embedding-3-small and text-embedding-3-large as our embedding models. For evaluation, we use the ragas framework's faithfulness, answer_relevancy, context_precision, context_recall, answer_correctness and answer_similarity as metrics. To understand the RAG methods and ragas metrics in-depth, you can checkout my previous article on Advanced RAG:
https://www.linkedin.com/pulse/performing-evaluating-tracking-advanced-rag-ft-azureml-prabhat-i1rkc/
Cue Ray Tune!
Code for invoking Ray Tune for searching over the param search space
Code for experiment() function
Code for query_engine_picker()
Code for model initialising helper functions
Code for evaluator() function
Finally, we leverage the amazing experiment tracking capability of MLflow to record experiment results, establish lineage with the golden dataset and visualise experiment results. Here's a flurry of screenshots that speak for themselves!
Code for logging experiment results to MLFlow
Tabular view of experiments
Parallel coordinates plot illustrating the different combinations attained from distributed RAG experimentation and their performance on various evaluation metrics
Another tabular view of the experiments
In this article, we took a look at DREAM, which is a blueprint for tooling and code that demonstrates how distributed RAG experimentation, evaluation and tracking can be done using open-source technologies including Ray, LlamaIndex, Ragas, MLFlow & MinIO on Kubernetes.
Overall DREAM architecture and workflow
This is a bad first draft of what can be done in terms of the extent to which the distributive nature of the experimentation exercise can be optimised and exploited. For instance, it might make sense to use Ray Data for reading and writing the csv files. We can take things a step further and use distributed calls to the embedding model to create the VectorStoreIndex! I hope you use this as a building block and go nuts with optimization in your own projects :)
Another interesting idea to consider is how to turn this into a re-usable no-code/low-code workflow. Notice how the steps running in the Jupyter notebooks can be neatly organised into a linear DAG. If we fix the parameters of the RAG search space, we could package up steps in an Argo Workflow and trigger the distributed experiment, evaluation and tracking as low-code/no-code pipeline, on any arbitrary unstructured data in S3!
https://www.linkedin.com/pulse/performing-evaluating-tracking-advanced-rag-ft-azureml-prabhat-i1rkc/
Originally posted on: