Sign in or Join the community to continue

Sharing the Wheel: Guiding LLMs While Staying in the Driver's Seat

Posted Apr 27, 2023 | Views 770

# LLM

# LLM in Production

# Adept AI

# Rungalileo.io

# Snorkel.ai

# Wandb.ai

# Tecton.ai

# Petuum.com

# mckinsey.com/quantumblack

# Wallaroo.ai

# Union.ai

# Redis.com

# Alphasignal.ai

# Bigbraindaily.com

# Turningpost.com

Share

speaker

Jacob van Gogh

Member of Technical Staff @ Adept AI

Jacob is an ML Engineer at Adept AI, where he's focused on creating LLMs that assist you with your favorite software tools. Prior to Adept, Jacob developed ML systems spanning a variety of problem spaces including Computer Vision for document understanding, Reinforcement Learning for rideshare dispatching, and Continual Learning for ETA prediction models.

+ Read More

SUMMARY

Adept AI is developing a natural language software collaborator that utilizes LLMs to perform software tasks described in natural language. However, LLMs can suffer from overconfidence, hallucinations, and a lack of self-awareness, which can lead to incorrect actions. Jacob highlights an example of how the model can make a wrong action and emphasizes the importance of implementing safety checks such as action reversibility and content filters. By incorporating safety checks, Adept AI aims to improve the model's capabilities and ensure that it moves in the right direction.

+ Read More

TRANSCRIPT

Link to slides

That works. Now. That works. Now. Cool. Can we try playing this video? So what A I is building is a natural language software collaborator. If we could try full screening this video, maybe. Cool. Can we try playing this video? So what A I is building is a natural language software collaborator. If we could try full screening this video, maybe. Um So what we can see here is that someone has entered in a natural language request to our model and our model is now controlling the mouse and keyboard to sort of execute this task on Air B N B. Uh The person here was trying to find like a weekend getaway in San Luci Visco, California. Um So what we can see here is that someone has entered in a natural language request to our model and our model is now controlling the mouse and keyboard to sort of execute this task on Air B N B. Uh The person here was trying to find like a weekend getaway in San Luci Visco, California. Uh We're training our model to use a variety of software tools. So um here we're also doing something on sales force Uh We're training our model to use a variety of software tools. So um here we're also doing something on sales force and really the uh goal is for everyone to be able to have a dept to carry out software tasks for them by describing things the same way you might describe them to an assistant. So we want you to feel like you have this universal software tool expert as your personal assistant or a collaborator. And it's as if you were looking over their shoulder as a de perform these tasks for you and really the uh goal is for everyone to be able to have a dept to carry out software tasks for them by describing things the same way you might describe them to an assistant. So we want you to feel like you have this universal software tool expert as your personal assistant or a collaborator. And it's as if you were looking over their shoulder as a de perform these tasks for you and we can go ahead and go on to the next slide. and we can go ahead and go on to the next slide. Um So in terms of like how we're achieving this, it's not anything revolutionary. Um So in terms of like how we're achieving this, it's not anything revolutionary. Um And we can sort of step through this a little bit. Uh We start by training this foundation model. Um So something that has this general understanding of language or increasingly more inputs than that, and then fine tuning the model to learn specifically how to do software tool actions. Um And we can sort of step through this a little bit. Uh We start by training this foundation model. Um So something that has this general understanding of language or increasingly more inputs than that, and then fine tuning the model to learn specifically how to do software tool actions. And ultimately, this means that our project is powered by large language models. And so it faces a lot of the same challenges that, you know, other products are facing such as not knowing when to abstain our model being a little bit overconfident or having hallucinations in terms of inventing actions that maybe weren't appropriate given the context of the software tool. And ultimately, this means that our project is powered by large language models. And so it faces a lot of the same challenges that, you know, other products are facing such as not knowing when to abstain our model being a little bit overconfident or having hallucinations in terms of inventing actions that maybe weren't appropriate given the context of the software tool. And, you know, all of the other problems that people have brought up today. Um The difference is maybe though that the stakes are a little higher because, you know, the model is taking actions for you so we can go ahead and go to the the next slide. And, you know, all of the other problems that people have brought up today. Um The difference is maybe though that the stakes are a little higher because, you know, the model is taking actions for you so we can go ahead and go to the the next slide. So imagine a scenario where you were um on Amazon doing a little bit of shopping, maybe you wanted to buy a couple of books and imagine you were on this page and you uh requested that adept, you know, add to Cart. Um Now you'll have to take my word for this, but this seems like a pretty straightforward thing for our model to get right. There's this clear button that says add to Cart we can probably see how our model would, would map this request to that particular action really well. So imagine a scenario where you were um on Amazon doing a little bit of shopping, maybe you wanted to buy a couple of books and imagine you were on this page and you uh requested that adept, you know, add to Cart. Um Now you'll have to take my word for this, but this seems like a pretty straightforward thing for our model to get right. There's this clear button that says add to Cart we can probably see how our model would, would map this request to that particular action really well. Um But if we maybe were to go to the next slide and imagine that you used our model to navigate to this other book, and you sort of gave it the same request. Add to cart, maybe before realizing that add to cart is not a part of this page. Um But if we maybe were to go to the next slide and imagine that you used our model to navigate to this other book, and you sort of gave it the same request. Add to cart, maybe before realizing that add to cart is not a part of this page. And now this is problematic, right? Because um our model And now this is problematic, right? Because um our model in a in an ideal world would sort of maybe ask you for a clarifying question because a dear doesn't seem to be relevant to this page. in a in an ideal world would sort of maybe ask you for a clarifying question because a dear doesn't seem to be relevant to this page. Um But at the same time as we all know, machine learning models can get things wrong. And there are now, you know, a number of wrong actions that it could take here. But one in particular is especially painful because where there used to be an add to cart button, there is now a buy with one click button and semantically, those are even sort of similar, right? You could imagine that adds a cart and by now with one click maybe have similar sentence embeddings or however you want to think about it. Um But at the same time as we all know, machine learning models can get things wrong. And there are now, you know, a number of wrong actions that it could take here. But one in particular is especially painful because where there used to be an add to cart button, there is now a buy with one click button and semantically, those are even sort of similar, right? You could imagine that adds a cart and by now with one click maybe have similar sentence embeddings or however you want to think about it. The difference though is that in addition to buy now with one click not being the intended action of this request, it's a much more serious action, it has much more serious consequences, it's purchasing the book and that is not what the user intended. The difference though is that in addition to buy now with one click not being the intended action of this request, it's a much more serious action, it has much more serious consequences, it's purchasing the book and that is not what the user intended. And so how do we try and prevent these failures from being so painful to our users. Again, in this world, where large language models have this huge failure space due to the nature of its incredibly versatile output space, um We can go ahead and go to the next slide. So again, we're not doing anything revolutionary here. But the issue lies with the fact that we're sort of sending our user requests and And so how do we try and prevent these failures from being so painful to our users. Again, in this world, where large language models have this huge failure space due to the nature of its incredibly versatile output space, um We can go ahead and go to the next slide. So again, we're not doing anything revolutionary here. But the issue lies with the fact that we're sort of sending our user requests and without any sort of safety checks, just taking the actions from the L MS as being given. So one of the things that we're sort of exploring or we think is a good idea if we could go to the next slide without any sort of safety checks, just taking the actions from the L MS as being given. So one of the things that we're sort of exploring or we think is a good idea if we could go to the next slide is sort of surrounding our large language model with this guiding system. is sort of surrounding our large language model with this guiding system. So this is like a number of checks that we can perform um So this is like a number of checks that we can perform um such as checking action reversibility or maybe a content filter um to add a little bit more color to the output of our large language model. And the key insight I think is that these are not necessarily large language model or M O based checks that we can do, you know, we could do simple word filters to start with. And by doing simple things that are not dependent on our large language model, we gain the ability to iterate really quickly because we don't have to incorporate that into our, our model to begin with. It gives us the ability to iterate on such as checking action reversibility or maybe a content filter um to add a little bit more color to the output of our large language model. And the key insight I think is that these are not necessarily large language model or M O based checks that we can do, you know, we could do simple word filters to start with. And by doing simple things that are not dependent on our large language model, we gain the ability to iterate really quickly because we don't have to incorporate that into our, our model to begin with. It gives us the ability to iterate on uh what the product might feel like if we had these sort of checks in place and to make sure that we're moving in the right direction before we spend a lot of time up leveling the ability of our model. uh what the product might feel like if we had these sort of checks in place and to make sure that we're moving in the right direction before we spend a lot of time up leveling the ability of our model. And so, you know, by doing these things, we could maybe say that if the model is trying to do an action that we deem sort of irreversible, maybe we should also ask the user for confirmation before we do that. And so, you know, by doing these things, we could maybe say that if the model is trying to do an action that we deem sort of irreversible, maybe we should also ask the user for confirmation before we do that. And ultimately, the goal is to use all of this to feed into our large language model and ultimately improve its capabilities. So yeah, now is a good time to go to the next slide. Um adapt cares very deeply about our data engine. Hopefully, data engines are also not a a novel concept, but the idea is that we have some model that we can deploy into our product and we use that to gain user feedback And ultimately, the goal is to use all of this to feed into our large language model and ultimately improve its capabilities. So yeah, now is a good time to go to the next slide. Um adapt cares very deeply about our data engine. Hopefully, data engines are also not a a novel concept, but the idea is that we have some model that we can deploy into our product and we use that to gain user feedback and we use that user feedback to then inform our data collection strategy such that we're always collecting the data that is most impactful in terms of improving our model and and improving the experience that our users are having when, when using our product that is driven by this model. Uh some things that we care about in particular if we go to the and we use that user feedback to then inform our data collection strategy such that we're always collecting the data that is most impactful in terms of improving our model and and improving the experience that our users are having when, when using our product that is driven by this model. Uh some things that we care about in particular if we go to the next uh option. So we really value fast cycles at a depth at a depth. We strive for 24 hour cycles for internal builds. So what this means is that every day, we're building a new model, deploying it to the product that we use internally and gaining, you know, user feedback on that model that day and potentially updating our data collection strategy that day um in these short cycles and ensure that we're getting fresh user feedback. Imagine interacting with a model that was maybe two weeks old, the user feedback is, is old. And so what that means is that next uh option. So we really value fast cycles at a depth at a depth. We strive for 24 hour cycles for internal builds. So what this means is that every day, we're building a new model, deploying it to the product that we use internally and gaining, you know, user feedback on that model that day and potentially updating our data collection strategy that day um in these short cycles and ensure that we're getting fresh user feedback. Imagine interacting with a model that was maybe two weeks old, the user feedback is, is old. And so what that means is that any insights that you're gaining might not be relevant because maybe a model that you train today would not have those problems anymore. any insights that you're gaining might not be relevant because maybe a model that you train today would not have those problems anymore. It also enforces a high standard of internal tooling. We care about our workflow reliability and our visibility and instrumentation that we need for fast insights. You know, that way we're able to inform our data collection strategy every day and it requires this really tight integration between engineering and data collection. We're always chasing that that best data point that we could get to improve our model in the most impactful way. And so if we go to the next slide, It also enforces a high standard of internal tooling. We care about our workflow reliability and our visibility and instrumentation that we need for fast insights. You know, that way we're able to inform our data collection strategy every day and it requires this really tight integration between engineering and data collection. We're always chasing that that best data point that we could get to improve our model in the most impactful way. And so if we go to the next slide, there's sort of like three things that we've learned early on when building a product that's powered by L MS that's taking actions for you. You know, they can fail in many ways and you really want to focus on those which are most painful to your users. there's sort of like three things that we've learned early on when building a product that's powered by L MS that's taking actions for you. You know, they can fail in many ways and you really want to focus on those which are most painful to your users. Again, these actions that are maybe irreversible or have high consequences if you get them wrong. Um I don't think we should shy away from non M L checks on L L M outputs. We don't need to throw out all of the, the types of technologies and, and strategies that we've used in the past for controlling. Um you know, these automated decision making systems that we've been building and keeping a tight integration between data collection and product and engineering efforts. I think really uh makes the data collection efforts that you have Again, these actions that are maybe irreversible or have high consequences if you get them wrong. Um I don't think we should shy away from non M L checks on L L M outputs. We don't need to throw out all of the, the types of technologies and, and strategies that we've used in the past for controlling. Um you know, these automated decision making systems that we've been building and keeping a tight integration between data collection and product and engineering efforts. I think really uh makes the data collection efforts that you have uh more optimal and more efficient. And so we really value that as well. Um So those are sort of like the takeaways. I, you know, thank you for, for listening. And if you know a depth here in depth, we think that we're working on really difficult problems. And so if these sorts of problems excite you in any way, you know, feel free to give our our careers page a look, you know, we'd love to chat. uh more optimal and more efficient. And so we really value that as well. Um So those are sort of like the takeaways. I, you know, thank you for, for listening. And if you know a depth here in depth, we think that we're working on really difficult problems. And so if these sorts of problems excite you in any way, you know, feel free to give our our careers page a look, you know, we'd love to chat. Awesome. Thank you so much. And a couple of people said that um I don't, I didn't see the notebook. I can't see the notebook. Was that on one of the slides? Awesome. Thank you so much. And a couple of people said that um I don't, I didn't see the notebook. I can't see the notebook. Was that on one of the slides? I think that might have been the previous I think that might have been the previous talk. Oh, maybe it was OK. And then someone asked how would you integrate with something like Mac automated? talk. Oh, maybe it was OK. And then someone asked how would you integrate with something like Mac automated? Mac automated? Um Well, I don't know if I could speak to that explicitly. I'm not super familiar with that tech, but if, if I could, you know, imagine what it might look like. Um So we spend a lot of time sort of trying to collect data that really maps um a user request into these sort of like Mac automated? Um Well, I don't know if I could speak to that explicitly. I'm not super familiar with that tech, but if, if I could, you know, imagine what it might look like. Um So we spend a lot of time sort of trying to collect data that really maps um a user request into these sort of like action space that you would need to do to execute that. So in the browser, it's like given a user request, what sorts of like things do we click or key down in order to achieve that task. So I imagine in Mac automated, there might be like a similar thing that we would try and map to if we wanted to leverage that technology right now, we're doing everything in the browser. Um But we think that the idea in general extends to, to things that like sort of take control and execute tasks for you. action space that you would need to do to execute that. So in the browser, it's like given a user request, what sorts of like things do we click or key down in order to achieve that task. So I imagine in Mac automated, there might be like a similar thing that we would try and map to if we wanted to leverage that technology right now, we're doing everything in the browser. Um But we think that the idea in general extends to, to things that like sort of take control and execute tasks for you. Awesome. Well, thank you so much, Jacob. Awesome. Well, thank you so much, Jacob.

+ Read More

Watch More

The Confidence Checklist for LLMs in Production

Posted Jul 14, 2023 | Views 868

# LLMs in Production

# LLM Deployment

# Portkey.ai

Beyond the Hype: Monitoring LLMs in Production

Posted Jun 20, 2023 | Views 819

# LLM in Production

# Monitoring

# Arize.com

# Redis.io

# Gantry.io

# Predibase.com

# Humanloop.com

# Anyscale.com

# Zilliz.com

# Nvidia.com

# TrueFoundry.com

# Premai.io

# Continual.ai

# Argilla.io

# Genesiscloud.com

# Rungalileo.io

LLMs & the Rest of the Owl // Neal Lathia // Agents in Production

Posted Nov 26, 2024 | Views 1.3K

# LLMs

# AI Agents

# Gradient Labs AI