Evaluating and Debugging Generative AI : A Deep Dive Into the Second Lesson
Generative AI (GenAI) is having a moment
August 11, 2023Generative AI (GenAI) is having a moment. In just the past few months, diffusion and large language models have revolutionized the field of machine learning. From creating realistic images to generating human-like text, not a month goes by where there isn’t a new, powerful model. We’ve come a long way from the avocado chair of the first DALL-E.
However, training and evaluating these models can be quite complex. That’s why DeepLearning.AI, in collaboration with Weights & Biases, has launched a new course **Evaluating and Debugging Generative AI**. In this blog post, we’ll give you a sneak peek into the second lesson of the course, taught by the Carey Phelps, founding Product Manager at Weights & Biases.
Specifically, we’ll learn about diffusion models, how they’re trained, and how to evaluate them using best-in-class tools.
The Magic of Diffusion Models
Diffusion models are denoising models, where the primary task of the model is not to generate images, but to remove noise from images. The model is trained by adding noise to images and forcing it to predict the noise present on the image.
If you are interested in Diffusion Model we recommend checking out our Guide to Using Stable Diffusion XL with HuggingFace Diffusers and W&B.
During the training phase, we’ll monitor relevant metrics like the loss curve, but there’s an interesting twist. The loss curve tends to flatten out quite early, which could mislead you into thinking that your model is fully trained. However, by sampling from the model regularly during training, we observe that the quality of generated images continues to improve despite a plateau in loss. Meaning it’s crucial to log not only the loss but also the samples at regular intervals.
Hands-on with Weights & Biases
One of the highlights of the course is the integration with Weights & Biases, a powerful platform for experiment tracking, dataset versioning, and model management. We train our diffusion model on the sprites dataset (Fruits&Veg + Game Icons) using the training notebook from the DeepLearning.AIcourse and log the results to Weights & Biases.
Let’s take a look at some of the steps involved. Here is the code snippet you will be using in the lesson after setting everything up.
That code initiates a Weights & Biases run for our training process and logs the loss, learning rate, and the current epoch at each iteration. We also log the image samples generated at the end of each epoch, allowing us to track the model’s progression visually.
Sign up for the free course and run the code alongside the lessons in the new DeepLearning.AI platform!
Diving Deeper into Training a Diffusion Model
When we train our diffusion model, we set up a training loop. At the start, we initialize a Weights & Biases run to keep track of the training. As the training proceeds, we log the loss, learning rate, and current epoch. Here, the model checkpoints are saved every four epochs, providing a snapshot of the model’s state at that point in time.
Still: the most rewarding part is observing the model’s progress visually. The initially noisy and grainy images generated by the model become clearer and more recognizable as training progresses. This logged in your Weights & Biases workspace where you can view the image generations in real time. You can visit the result of the code from this lesson’s training here!
Finally, once our model starts generating high-quality images, we make it available for the rest of the team using the Model Registry feature in Weights & Biases. This allows team members to view all the best model versions, the lineage of the model, and get back to the training run that produced the model.
Key Takeaways
The “Evaluating and Debugging Generative AI” course offers a blend of theory and hands-on experience. Lesson two focuses on the training part, sign up for the course today and practice:
- Managing hyperparameter config
- Collecting artifacts for dataset and model versioning
- Loging experiment results
- Tracing prompts and responses to LLMs over time in complex interactions