MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Using LLMs to Punch Above your Weight!

Posted Apr 27, 2023 | Views 1.2K
# LLM
# LLM in Production
# Anzen
# Rungalileo.io
# Snorkel.ai
# Wandb.ai
# Tecton.ai
# Petuum.com
# mckinsey.com/quantumblack
# Wallaroo.ai
# Union.ai
# Redis.com
# Alphasignal.ai
# Bigbraindaily.com
# Turningpost.com
Share
speakers
avatar
Cameron Feenstra
Principal Engineer @ Anzen

With a strong background in data-related startups, Cam brings invaluable expertise to Anzen, where he leads the development of innovative products and technology infrastructure. Previously, as a Staff Engineer at People.ai, he played a key role in creating some of the company's crucial AI products.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

As a small business, competing with large incumbents can be a daunting challenge. They have more money, more people, and more data, but they can also be inflexible and slow to adopt new technologies. In this talk, we will explore how small businesses can use the power of large language models (LLMs) to compete with large incumbents, particularly in industries like insurance. We will present two examples of how we are using LLMs at Anzen to streamline insurance underwriting and analyze employment agreements and discuss ideas for future applications. By harnessing the power of LLMs, small businesses can level the playing field and compete more effectively with larger companies.

+ Read More
TRANSCRIPT

Link to slides

So, uh my name is Cam. Um And today I'm gonna talk about how you can use large language models to punch above your weight. So whether that means, uh you know, just do doing things faster, doing more with less or um you know, even as a small company, potentially competing with bigger companies where you might not have been too uh able to otherwise. All right. Um Thanks anybody who's listening. Um So, uh my name is Cam. Um And today I'm gonna talk about how you can use large language models to punch above your weight. So whether that means, uh you know, just do doing things faster, doing more with less or um you know, even as a small company, potentially competing with bigger companies where you might not have been too uh able to otherwise. So, um a little bit about my background, uh I, when I first came out of college, uh looking to go into the software industry, uh I was mostly interested in uh machine learning and I, I wanted to go train models and I, I had um you know, some reasonable skills with sort of Python data analysis less so training models. But at my first job out of college, I, I spent a lot of time training So, um a little bit about my background, uh I, when I first came out of college, uh looking to go into the software industry, uh I was mostly interested in uh machine learning and I, I wanted to go train models and I, I had um you know, some reasonable skills with sort of Python data analysis less so training models. But at my first job out of college, I, I spent a lot of time training uh machine learning models for doing various uh N LP tasks. Um And I actually started to gravitate more towards the software side. Um I got more interested in kind of the infrastructure and a big reason for that was just that the feedback loop uh was. So much faster writing software than, um, training that at least that's how I saw it. uh machine learning models for doing various uh N LP tasks. Um And I actually started to gravitate more towards the software side. Um I got more interested in kind of the infrastructure and a big reason for that was just that the feedback loop uh was. So much faster writing software than, um, training that at least that's how I saw it. Um, now, uh, that's very much not necessarily the case anymore, which is a big part of what I'm gonna talk about, uh, today. Um, so sorry. Ok. Uh, so, Um, now, uh, that's very much not necessarily the case anymore, which is a big part of what I'm gonna talk about, uh, today. Um, so sorry. Ok. Uh, so, um, let me tell you just briefly about Anzen, which, uh, is the company I've been at for about the past year and a half. Um, so the problem Anzen is tackling is, um, pretty big $23 billion every year is spent on basically employees suing their employers. Um, and that's rising year over year um, let me tell you just briefly about Anzen, which, uh, is the company I've been at for about the past year and a half. Um, so the problem Anzen is tackling is, um, pretty big $23 billion every year is spent on basically employees suing their employers. Um, and that's rising year over year fairly consistently. Um And, you know, today, there actually aren't that many good approaches available to businesses. Um, you know, they, they will buy insurance most of the time and then separately, they'll rely on, uh, legal lawyers, legal advice for a lot of things. Eventually they might hire a head of hr who, who knows a lot of domain knowledge about compliance. fairly consistently. Um And, you know, today, there actually aren't that many good approaches available to businesses. Um, you know, they, they will buy insurance most of the time and then separately, they'll rely on, uh, legal lawyers, legal advice for a lot of things. Eventually they might hire a head of hr who, who knows a lot of domain knowledge about compliance. Um And so the solution that we're, we've come up with that we think, uh, is pretty interesting is in two parts. So number one, we actually sell insurance, we write Um And so the solution that we're, we've come up with that we think, uh, is pretty interesting is in two parts. So number one, we actually sell insurance, we write uh various different types of lines of coverage to businesses, um, to protect them if they get sued. But we also package a software product, uh, that aims to help avoid companies being sued in the first place. Ultimately, time is our most valuable asset. And so, uh, even if you're covered, when, uh various different types of lines of coverage to businesses, um, to protect them if they get sued. But we also package a software product, uh, that aims to help avoid companies being sued in the first place. Ultimately, time is our most valuable asset. And so, uh, even if you're covered, when, when you get sued, uh, it's still, um, not something you want to do and, uh, you know, sort of a, a great side effect is oftentimes the things that, uh, help companies not be sued are also much better for their employees. Um, and so, uh, it, it can help them offer a better employee experience as well. when you get sued, uh, it's still, um, not something you want to do and, uh, you know, sort of a, a great side effect is oftentimes the things that, uh, help companies not be sued are also much better for their employees. Um, and so, uh, it, it can help them offer a better employee experience as well. Uh, and for context, we're still, uh, quite a small team at, we're, we're under 20 people, uh which was sort of the inspiration for the talk. Um And um basically, I'm gonna talk about uh a couple of specific ways that we've been able to deploy large language models to help uh solve some of our own problems very quickly. Um And I'll share a few learnings uh of things we've learned along the way um that you might be able to apply to your own work. Uh, and for context, we're still, uh, quite a small team at, we're, we're under 20 people, uh which was sort of the inspiration for the talk. Um And um basically, I'm gonna talk about uh a couple of specific ways that we've been able to deploy large language models to help uh solve some of our own problems very quickly. Um And I'll share a few learnings uh of things we've learned along the way um that you might be able to apply to your own work. Uh So, uh you know, probably it's not that big of a surprise that it's hard to compete with a big company as a small company. Um You know, they have more money, they have um bigger teams, they can throw people at a problem. Uh Oftentimes they have lots of valuable data sets that you can't really get otherwise, Uh So, uh you know, probably it's not that big of a surprise that it's hard to compete with a big company as a small company. Um You know, they have more money, they have um bigger teams, they can throw people at a problem. Uh Oftentimes they have lots of valuable data sets that you can't really get otherwise, um they have network effects, uh you know, in, in the case of insurance, if I know uh that my friend has had a good experience with a particular company, uh I might be more likely to just go with them. Um But on the other side, uh they're also less agile um slower to deploy to new technologies. Whether that be um they have network effects, uh you know, in, in the case of insurance, if I know uh that my friend has had a good experience with a particular company, uh I might be more likely to just go with them. Um But on the other side, uh they're also less agile um slower to deploy to new technologies. Whether that be um sort of a uh if it's not broke, don't fix it, type attitude or um some bureaucracy or whatever it is. Um um sort of a uh if it's not broke, don't fix it, type attitude or um some bureaucracy or whatever it is. Um And so, uh specifically with, with, in uh the insurance side of our business, we are competing with really big companies. Uh And so the first example I'm gonna talk about is how we were able to use a large language model to streamline our underwriting process. And so, uh specifically with, with, in uh the insurance side of our business, we are competing with really big companies. Uh And so the first example I'm gonna talk about is how we were able to use a large language model to streamline our underwriting process. Um So underwriting is basically the process of taking a bunch of information about a company. Um you know, financials, they, you know, if they've ever been sued before a whole bunch of different things. Um And then a person actually decides, Um So underwriting is basically the process of taking a bunch of information about a company. Um you know, financials, they, you know, if they've ever been sued before a whole bunch of different things. Um And then a person actually decides, is this a company that's too risky or do we want to write them an insurance policy? is this a company that's too risky or do we want to write them an insurance policy? And if we do want to write them an insurance policy, how much should we charge for it? And if we do want to write them an insurance policy, how much should we charge for it? Um And so, uh what I have here is kind of a diagram of how, uh in the kind of type of insurance we write how this normally looks. Um So there's someone called a retail insurance broker that sits ty typically in between, uh you know, about 98% of the time in between the actual client who is a company, um who wants insurance and the insurance company. Um And so, uh what I have here is kind of a diagram of how, uh in the kind of type of insurance we write how this normally looks. Um So there's someone called a retail insurance broker that sits ty typically in between, uh you know, about 98% of the time in between the actual client who is a company, um who wants insurance and the insurance company. So generally the uh company will give an in uh approach the broker, uh they'll have them fill out a long application which I'll talk about in a second. Um And then they'll go to any number of different insurance carriers and say, hey, uh can you give me a quote on this, those insurance carriers take it back to the broker and ultimately takes it to the client. Uh And then So generally the uh company will give an in uh approach the broker, uh they'll have them fill out a long application which I'll talk about in a second. Um And then they'll go to any number of different insurance carriers and say, hey, uh can you give me a quote on this, those insurance carriers take it back to the broker and ultimately takes it to the client. Uh And then if any of them are good, uh the policy gets issued. if any of them are good, uh the policy gets issued. Um And one thing I wanna point out here is maybe you can kind of see from this diagram, you know, each one of these brokers has many clients and for each one of those clients, they're corresponding with many different companies. Um And so it's really important for us to make the process for the broker as simple as it can possibly be or else they're just not gonna do business with us. Um And one thing I wanna point out here is maybe you can kind of see from this diagram, you know, each one of these brokers has many clients and for each one of those clients, they're corresponding with many different companies. Um And so it's really important for us to make the process for the broker as simple as it can possibly be or else they're just not gonna do business with us. Um And to give a sort of uh zoom in a little bit on what traditionally happens inside the insurance company would be they get this application, um, you know, emailed or maybe faxed in some cases. Um They would have actually a team of people who waits for those applications to come in, Um And to give a sort of uh zoom in a little bit on what traditionally happens inside the insurance company would be they get this application, um, you know, emailed or maybe faxed in some cases. Um They would have actually a team of people who waits for those applications to come in, reads them, um, determines a, if they're a duplicate, sometimes a company might be working with multiple insurance brokers and you only wanna uh process it once. Um and they'll also manually extract a bunch of different information um to make the underwriter's job easier. Uh And then they'll hand it off to the underwriting team who will um decide if they want to write the policy, how much to charge and send a quote. reads them, um, determines a, if they're a duplicate, sometimes a company might be working with multiple insurance brokers and you only wanna uh process it once. Um and they'll also manually extract a bunch of different information um to make the underwriter's job easier. Uh And then they'll hand it off to the underwriting team who will um decide if they want to write the policy, how much to charge and send a quote. Um and just a note about the applications, these applications, um they are really complicated. There's tons of different formats generally. Um Insurance companies will accept any format that you give them. Um As long as it includes uh the appropriate information. Um and they're, they're long and dense and have all sorts of questions um which actually makes extracting that information in an automated way. Uh Quite a challenge. Um and just a note about the applications, these applications, um they are really complicated. There's tons of different formats generally. Um Insurance companies will accept any format that you give them. Um As long as it includes uh the appropriate information. Um and they're, they're long and dense and have all sorts of questions um which actually makes extracting that information in an automated way. Uh Quite a challenge. So um what we implemented on Zen is something like this. So um you have the broker sends the application uh as soon as it hits our inbox, we have an automated uh classifier that classifies if it's an application So um what we implemented on Zen is something like this. So um you have the broker sends the application uh as soon as it hits our inbox, we have an automated uh classifier that classifies if it's an application and then we have a separate component that extracts information from it. Um If the application is a duplicate, um it ends there. Otherwise we, we send an alert to our underwriting team um who more or less does the same process uh as in the previous slide, they decide uh if they, if it's a company they want to insure or we want to insure and how much to charge. and then we have a separate component that extracts information from it. Um If the application is a duplicate, um it ends there. Otherwise we, we send an alert to our underwriting team um who more or less does the same process uh as in the previous slide, they decide uh if they, if it's a company they want to insure or we want to insure and how much to charge. Now, um a couple of details about how we put this together. So uh there's sort of two parts to the solution, there's the classification piece and there is um the extraction piece. Um And so uh for the classification piece, um we were actually able to put together a classifier that performs pretty well uh with, with very little work required on our part. Um So we started with, Now, um a couple of details about how we put this together. So uh there's sort of two parts to the solution, there's the classification piece and there is um the extraction piece. Um And so uh for the classification piece, um we were actually able to put together a classifier that performs pretty well uh with, with very little work required on our part. Um So we started with, uh we built a classifier based on Google's Burt model. Um And we, I mean, basically the baseline performance was essentially, you know, the same as if we had picked randomly. Um And we within kind of an afternoon, um we're able to pull down all of the attachments from our team inbox, which was about 300 at the time, uh we built a classifier based on Google's Burt model. Um And we, I mean, basically the baseline performance was essentially, you know, the same as if we had picked randomly. Um And we within kind of an afternoon, um we're able to pull down all of the attachments from our team inbox, which was about 300 at the time, um manually label them and then come up with a classifier that uh had about 95% accuracy, uh 90% precision and 100% recall. Now, um we did uh test configurations that had better precision. Um but there was always a tradeoff with recall and for this use case, uh we were OK with um getting a little bit of noise uh as long as we captured as much as we could. um manually label them and then come up with a classifier that uh had about 95% accuracy, uh 90% precision and 100% recall. Now, um we did uh test configurations that had better precision. Um but there was always a tradeoff with recall and for this use case, uh we were OK with um getting a little bit of noise uh as long as we captured as much as we could. Um And we were also tested out a bunch of different models all open source um on, on hugging face. Um and a bunch of different training configurations and were able to choose the one um that had the characteristics we like the most um for the extraction piece, we actually ended up using an API uh AWS textt. Um And we were also tested out a bunch of different models all open source um on, on hugging face. Um and a bunch of different training configurations and were able to choose the one um that had the characteristics we like the most um for the extraction piece, we actually ended up using an API uh AWS textt. Um But particularly the question answering feature of AWS text track, which lets you um add a query like what is the company imply, applying for insurance? And uh it will extract that from a PDF Um But particularly the question answering feature of AWS text track, which lets you um add a query like what is the company imply, applying for insurance? And uh it will extract that from a PDF document, um which uh works very well. Um You know, they, they hold their cards a little bit close to the chest in terms of how it's implemented, but um safe to assume based on the performance that, that there is some kind of large language model powering that under the hood. document, um which uh works very well. Um You know, they, they hold their cards a little bit close to the chest in terms of how it's implemented, but um safe to assume based on the performance that, that there is some kind of large language model powering that under the hood. Um And uh you know, I just really want to stress um especially with the classification model. Um You know, this is something that we couldn't pull off the shelf there, there weren't any good options. Um But as I said, we were able to put it together and well, the, the training data we were able to put together in an afternoon and basically end to end, we got the whole thing working in under a week Um And uh you know, I just really want to stress um especially with the classification model. Um You know, this is something that we couldn't pull off the shelf there, there weren't any good options. Um But as I said, we were able to put it together and well, the, the training data we were able to put together in an afternoon and basically end to end, we got the whole thing working in under a week um compared to uh training a model from scratch, um It's sort of at least an order of magnitude less effort and less data uh than we would have needed to get similar performance otherwise. um compared to uh training a model from scratch, um It's sort of at least an order of magnitude less effort and less data uh than we would have needed to get similar performance otherwise. Um So now one more example of something uh that we use large language models for it, Zen. Um So uh one feature that we wanted to build for our risk management platform or I guess I should say a hypothesis we had uh that we wanted to build a feature and validate is that um Um So now one more example of something uh that we use large language models for it, Zen. Um So uh one feature that we wanted to build for our risk management platform or I guess I should say a hypothesis we had uh that we wanted to build a feature and validate is that um right now, uh generally entrepreneurs startups um relies a lot on their lawyers to in to review individual documents uh and let them know if there's anything potentially wrong with it, right now, uh generally entrepreneurs startups um relies a lot on their lawyers to in to review individual documents uh and let them know if there's anything potentially wrong with it, um things like offer letters uh separation agreements, if, if someone gets laid off or something like that. Um And so we, we thought it would be a really interesting feature to build um a system where someone could just upload a document and get immediate feedback on potential compliance issues. Um And, you know, even though we're not providing legal advice and that's clearly spelled out on our website and everything like that. um things like offer letters uh separation agreements, if, if someone gets laid off or something like that. Um And so we, we thought it would be a really interesting feature to build um a system where someone could just upload a document and get immediate feedback on potential compliance issues. Um And, you know, even though we're not providing legal advice and that's clearly spelled out on our website and everything like that. Um For this type of feature, we were really uh wanted the accuracy to be as high as it could possibly be because we certainly don't want to tell uh someone something is wrong or just have completely kind of bogus outputs. Uh like you might occasionally get if you use something like chat GP D. Um For this type of feature, we were really uh wanted the accuracy to be as high as it could possibly be because we certainly don't want to tell uh someone something is wrong or just have completely kind of bogus outputs. Uh like you might occasionally get if you use something like chat GP D. Um So the first step we was that we gathered a bunch of domain knowledge from attorneys. Um We learned, you know, what are the potential things that, that you see in offer letters, for example, um that are issues Um So the first step we was that we gathered a bunch of domain knowledge from attorneys. Um We learned, you know, what are the potential things that, that you see in offer letters, for example, um that are issues and then we put together a system, um that looks a little bit like this. So for, for each type of document, we have um a number of different features that we want to extract, for example, for an offer letter, uh we might want to know what's the salary. Um Is this person an exempt or a non exempt employee? and then we put together a system, um that looks a little bit like this. So for, for each type of document, we have um a number of different features that we want to extract, for example, for an offer letter, uh we might want to know what's the salary. Um Is this person an exempt or a non exempt employee? Um And then for each one of those features, uh we sort of have a two step process. So first we use uh sentence embedding to extract the most relevant portions of the document. So more or less something like semantic search. Um excuse me. And the second piece is that we use a question answering model um to extract the exact uh bit that we're looking for. Um And then for each one of those features, uh we sort of have a two step process. So first we use uh sentence embedding to extract the most relevant portions of the document. So more or less something like semantic search. Um excuse me. And the second piece is that we use a question answering model um to extract the exact uh bit that we're looking for. So for example, after this step for the salary feature, we would have $40,000 and then we would have exempt or non exempt. So for example, after this step for the salary feature, we would have $40,000 and then we would have exempt or non exempt. Um Based on that, we run some business logic and come up with uh a set of positive and negative insights uh based on that domain knowledge that we've gathered from attorneys. Um Based on that, we run some business logic and come up with uh a set of positive and negative insights uh based on that domain knowledge that we've gathered from attorneys. Um So in terms of how we actually implemented this for this, uh we were able to just pull models off the shelf. Um specifically a question answering model and a sentence and betting model. Uh The question answering model is again based on uh Google's bet model. Um the sentence and betting model uh is actually not really a large language model. It's, it's based on Microsoft's mini L M. But Um So in terms of how we actually implemented this for this, uh we were able to just pull models off the shelf. Um specifically a question answering model and a sentence and betting model. Uh The question answering model is again based on uh Google's bet model. Um the sentence and betting model uh is actually not really a large language model. It's, it's based on Microsoft's mini L M. But um but for this particular task, that was uh the, the best performing model that we tried. Um um but for this particular task, that was uh the, the best performing model that we tried. Um And additionally, like it didn't require uh a ton of data. Uh we were able, we had a few dozen examples of each type of document we wanted. And based on that, we were able to sort of crap manually write prompts um to help us extract each feature. Um And additionally, like it didn't require uh a ton of data. Uh we were able, we had a few dozen examples of each type of document we wanted. And based on that, we were able to sort of crap manually write prompts um to help us extract each feature. Um Now, um I think for this particular feature uh at this point, it's too early for us to say if uh you know, our hypothesis is true or not. Um We're still um developing it and, and taking it uh giving it to people for feedback, um the feedback we've gotten so far has been really good. But um the main point is that we were able to put together this together in uh about a week, um at least for a prototype. And so, um rather than, Now, um I think for this particular feature uh at this point, it's too early for us to say if uh you know, our hypothesis is true or not. Um We're still um developing it and, and taking it uh giving it to people for feedback, um the feedback we've gotten so far has been really good. But um the main point is that we were able to put together this together in uh about a week, um at least for a prototype. And so, um rather than, you know, again, training a model from scratch or something like that where it would have been a big investment to validate that hypothesis, we were able to do it uh very quickly. Um And the end result is, feels uh actually fairly kind of magical. Um you know, again, training a model from scratch or something like that where it would have been a big investment to validate that hypothesis, we were able to do it uh very quickly. Um And the end result is, feels uh actually fairly kind of magical. Um So uh quickly want to go over a couple of uh learnings, obviously, this is not a complete list. But um so, you know, probably not a surprise, but uh definitely if you want to host these things on your own, um it's important to have someone who, who knows a lot about infrastructure uh to do it in, in a good way. I mean, so, So uh quickly want to go over a couple of uh learnings, obviously, this is not a complete list. But um so, you know, probably not a surprise, but uh definitely if you want to host these things on your own, um it's important to have someone who, who knows a lot about infrastructure uh to do it in, in a good way. I mean, so, you know, for one thing, these models are really resource intensive, um you know, they're, you have to understand how they're gonna perform or if it's all gonna fall over under high load. Um And so, you know, many of the same things that you've always had to do for deploying production systems. Um You still have to do and they might even be more important when you're deploying L L MS. you know, for one thing, these models are really resource intensive, um you know, they're, you have to understand how they're gonna perform or if it's all gonna fall over under high load. Um And so, you know, many of the same things that you've always had to do for deploying production systems. Um You still have to do and they might even be more important when you're deploying L L MS. Um And evaluation metrics are also very important. Um You know, ideally, you want to be able to quantify uh if your model is performing as well as it did uh in tests or, or as, or if the performance is going up or down over time. Um Um And evaluation metrics are also very important. Um You know, ideally, you want to be able to quantify uh if your model is performing as well as it did uh in tests or, or as, or if the performance is going up or down over time. Um The third point is um you know, if you're running massive production workloads, it does make sense to use the best hardware and everything like that. But uh especially if you're just working on a prototype or anything like that. Um It is definitely possible to uh deploy pretty large models. Uh You know, maybe when you get to something like The third point is um you know, if you're running massive production workloads, it does make sense to use the best hardware and everything like that. But uh especially if you're just working on a prototype or anything like that. Um It is definitely possible to uh deploy pretty large models. Uh You know, maybe when you get to something like um you know, many billions of parameters, uh it, it wouldn't work so well. But um you know, we were able at least for prototypes to deploy models on normal C P U instances that we use for other things as well. Um And they worked. um you know, many billions of parameters, uh it, it wouldn't work so well. But um you know, we were able at least for prototypes to deploy models on normal C P U instances that we use for other things as well. Um And they worked. Um Now, the last thing I wanna call out is something to factor into your costs if you are relying on an API is that um what probably you have some layer of business logic on top of the outputs that you get from the models. And so software is all about iteration. And so ultimately, you're gonna want to change uh that business logic. And so um Um Now, the last thing I wanna call out is something to factor into your costs if you are relying on an API is that um what probably you have some layer of business logic on top of the outputs that you get from the models. And so software is all about iteration. And so ultimately, you're gonna want to change uh that business logic. And so um and, you know, probably you're also gonna wanna test uh how different, you know, for example, prompts or, or different uh in inputs uh to the model. Look, um if you're relying on an api doing those changes is, is very expensive. Um And in some cases might even be cost prohibitive. Um So it, it's definitely something you have to factor into your expectation of cost. and, you know, probably you're also gonna wanna test uh how different, you know, for example, prompts or, or different uh in inputs uh to the model. Look, um if you're relying on an api doing those changes is, is very expensive. Um And in some cases might even be cost prohibitive. Um So it, it's definitely something you have to factor into your expectation of cost. Um Now, uh maybe uh anyone watching is thinking, you know, why am I not talking about Chat G BT or something like that? Um But uh definitely we do have a lot of cool ideas based on Chat G BT. Um, you know, I've kind of presented the things that we've been able to get into production and are working well. Um Now, uh maybe uh anyone watching is thinking, you know, why am I not talking about Chat G BT or something like that? Um But uh definitely we do have a lot of cool ideas based on Chat G BT. Um, you know, I've kind of presented the things that we've been able to get into production and are working well. Um But we think that there's a lot of potential for uh generative models or, or something, you know, GP T 41 of these models to potentially be a really good underwriting assistant. Um It might go fetch data uh about a company, it might, um just sort of make an initial recommendation. Um But our experiments have, have shown that it actually seems to be pretty good uh at it. Um But we think that there's a lot of potential for uh generative models or, or something, you know, GP T 41 of these models to potentially be a really good underwriting assistant. Um It might go fetch data uh about a company, it might, um just sort of make an initial recommendation. Um But our experiments have, have shown that it actually seems to be pretty good uh at it. Um And then the, the 2nd 11 of the really difficult things about compliance is that the uh information you need to know is scattered all over the place. That's why. Um, well, one of the reasons people rely on lawyers a lot of the time Um And then the, the 2nd 11 of the really difficult things about compliance is that the uh information you need to know is scattered all over the place. That's why. Um, well, one of the reasons people rely on lawyers a lot of the time Um And so, you know, one thing that we think could be interesting would be to be able to ask a question about uh a specific compliance issue in natural language and get um a really great answer uh based on or, you know, at least Um And so, you know, one thing that we think could be interesting would be to be able to ask a question about uh a specific compliance issue in natural language and get um a really great answer uh based on or, you know, at least be pointed to the right location. Um But that's something that is, is just an idea at this point. be pointed to the right location. Um But that's something that is, is just an idea at this point. So, uh thanks everybody for listening. Um You know, it's definitely a really amazing time to be alive, really cool time to be in software. Um You know, I'm, I'm super excited about uh all the advances we've seen in A I um but, you know, even without all of the brand new models, um you know, large language models can still be a great asset um like they have been to us at Onsen. So, uh thanks everybody for listening. Um You know, it's definitely a really amazing time to be alive, really cool time to be in software. Um You know, I'm, I'm super excited about uh all the advances we've seen in A I um but, you know, even without all of the brand new models, um you know, large language models can still be a great asset um like they have been to us at Onsen. Um And last thing is, you know, we are, we are a startup, we're always hiring. So, uh if any of these uh problems sound interesting to you or, or you wanna learn more about what we do, uh Please reach out. Um We're always looking for, for great people. Um And last thing is, you know, we are, we are a startup, we're always hiring. So, uh if any of these uh problems sound interesting to you or, or you wanna learn more about what we do, uh Please reach out. Um We're always looking for, for great people. Um Yeah, and thanks to the organizers, thanks to anyone listening. Um You know, it, it's really an honor to, to speak. And um if we have time, I guess I can take a question or two Um Yeah, and thanks to the organizers, thanks to anyone listening. Um You know, it, it's really an honor to, to speak. And um if we have time, I guess I can take a question or two for you. There's all kinds of time. There's just a little bit of a delay between the stream and the actual uh chat that is happening in the um so we can just shoot the shit here for a minute and then when there is for you. There's all kinds of time. There's just a little bit of a delay between the stream and the actual uh chat that is happening in the um so we can just shoot the shit here for a minute and then when there is a question, it's like you wanna make sure you nothing too bad guys. a question, it's like you wanna make sure you nothing too bad guys. Yeah, man. I mean, the while we're waiting there is uh I mean there's so much good stuff that you're talking about there. Yeah, man. I mean, the while we're waiting there is uh I mean there's so much good stuff that you're talking about there. But I'm wondering on more of a again, I wanna go more to this philosophical level and talk about like, But I'm wondering on more of a again, I wanna go more to this philosophical level and talk about like, if you're working with this, with these large language models and you're dealing with if you're working with this, with these large language models and you're dealing with like problems that you're not seeing before. like problems that you're not seeing before. What's your main way of going about just debugging shit and being like, wow, I've never, I can't Google this because I'm probably the only one in the world that is seeing this right now. Right? What's your main way of going about just debugging shit and being like, wow, I've never, I can't Google this because I'm probably the only one in the world that is seeing this right now. Right? Yeah. Yeah, that, that's a really good question. Um You know, I mean, Yeah. Yeah, that, that's a really good question. Um You know, I mean, trial and error uh to some extent, you know, that's always the fallback. Um trial and error uh to some extent, you know, that's always the fallback. Um You know, I think that, um, you know, to some extent you can, uh You know, I think that, um, you know, to some extent you can, uh uh for example, if you have a situation where, you know, you have a bunch of possible parameters, like, you know, it's, it's actually feels kind of similar to training a model sometimes because you're like, OK, let me run all of these different things. Come up with some metric. Yeah, I guess that's part of the answer is like, really you just need some uh for example, if you have a situation where, you know, you have a bunch of possible parameters, like, you know, it's, it's actually feels kind of similar to training a model sometimes because you're like, OK, let me run all of these different things. Come up with some metric. Yeah, I guess that's part of the answer is like, really you just need some evaluation metric. I mean, yeah, like, I don't think that there's really, you know, no one on earth can explain what the L L M is thinking. And so like, there's no, you know, at least not today, there aren't tools where you can sort of see the flow of information through it and, and use that to um evaluation metric. I mean, yeah, like, I don't think that there's really, you know, no one on earth can explain what the L L M is thinking. And so like, there's no, you know, at least not today, there aren't tools where you can sort of see the flow of information through it and, and use that to um um debug what's going on. But, but, you know, traditional kind of software engineering techniques uh you know, can also work reasonably well. um debug what's going on. But, but, you know, traditional kind of software engineering techniques uh you know, can also work reasonably well. Awesome. Did I share this? Did I hold on, let me share this with you. Speaking of which, I don't know if you've seen this yet, but Awesome. Did I share this? Did I hold on, let me share this with you. Speaking of which, I don't know if you've seen this yet, but that's pretty cool. that's pretty cool. That's, that's the shirt that I I made because it's like, yeah, man, we got to call this out like a lot of this when it comes to reliability and just making sure that there's, there's this big question that I have in my mind, like no amount of prompting, correct pro prompting, right? Or like no matter how good of a prompt engineer in between quotation marks, you are That's, that's the shirt that I I made because it's like, yeah, man, we got to call this out like a lot of this when it comes to reliability and just making sure that there's, there's this big question that I have in my mind, like no amount of prompting, correct pro prompting, right? Or like no matter how good of a prompt engineer in between quotation marks, you are um um you can't force the model to give you output. So like, how do you go go across those kind of problems? Right. you can't force the model to give you output. So like, how do you go go across those kind of problems? Right. Yeah. No. Yeah, I mean, it is, it is, it's funny because uh you know, I probably seen all of this like a I agent stuff out there, you know, putting one of these on a loop and uh you know, that was actually like um around the same time, it was kind of blowing up. Yeah. No. Yeah, I mean, it is, it is, it's funny because uh you know, I probably seen all of this like a I agent stuff out there, you know, putting one of these on a loop and uh you know, that was actually like um around the same time, it was kind of blowing up. It was an idea I was thinking about and I experimented and it, it's real, it's really funny because it's like you give chat GP t these instructions, like, give me this exact format, do not give me anything else, blah, blah, blah, blah. And I was like, yeah, sure. Like, let me help you. It was an idea I was thinking about and I experimented and it, it's real, it's really funny because it's like you give chat GP t these instructions, like, give me this exact format, do not give me anything else, blah, blah, blah, blah. And I was like, yeah, sure. Like, let me help you. And so, and, and I mean, ultimately, you know, the models are nondeterministic too. So you don't even actually know which is a lot different than most software. When you, you know, you give it an input, you don't even know that the output is gonna be the same, especially when you're using something like, you know, chat, like an API they might have completely changed the model under the hood uh without you knowing. And so it's definitely not an easy problem. Um, And so, and, and I mean, ultimately, you know, the models are nondeterministic too. So you don't even actually know which is a lot different than most software. When you, you know, you give it an input, you don't even know that the output is gonna be the same, especially when you're using something like, you know, chat, like an API they might have completely changed the model under the hood uh without you knowing. And so it's definitely not an easy problem. Um, you know, I mean, I, I follow all sorts of people on Twitter and I see, uh there's a, you know, a lot of people are kind of doing kind of ad hoc, I would say research on, um, you know, specific techniques and uh I find myself frequently kind of bookmarking things of like, oh, that's an interesting idea. You know, maybe I'll try that sometime but it, it's an emerging field I would say. Um and you know, I mean, I, I follow all sorts of people on Twitter and I see, uh there's a, you know, a lot of people are kind of doing kind of ad hoc, I would say research on, um, you know, specific techniques and uh I find myself frequently kind of bookmarking things of like, oh, that's an interesting idea. You know, maybe I'll try that sometime but it, it's an emerging field I would say. Um and uh you know, I would guess within the next couple of years it will become, you know, a formal discipline or, or something of that nature. uh you know, I would guess within the next couple of years it will become, you know, a formal discipline or, or something of that nature. Yeah, we'll see, we will see. Uh So there are some awesome questions coming through in the chat uh that are getting a bit more specific with you. Have you checked with shape what part of the documents get used most in the classification? Yeah, we'll see, we will see. Uh So there are some awesome questions coming through in the chat uh that are getting a bit more specific with you. Have you checked with shape what part of the documents get used most in the classification? Um Um No, I would, I guess the answer is like no. Um but mainly we just evaluated uh the outputs. Um and, you know, we're a small startup, we have a million things to do. And so, you know, we kind of monitor how it works as it goes on. But we, you know, we aren't, we haven't really dove into uh like what you're saying. Um No, I would, I guess the answer is like no. Um but mainly we just evaluated uh the outputs. Um and, you know, we're a small startup, we have a million things to do. And so, you know, we kind of monitor how it works as it goes on. But we, you know, we aren't, we haven't really dove into uh like what you're saying. Um But that's actually a really interesting idea that maybe we should try is at least just evaluate, you know, does the first two pages or whatever, like usually yield the same classification result. Um But that's actually a really interesting idea that maybe we should try is at least just evaluate, you know, does the first two pages or whatever, like usually yield the same classification result. Um I, I just don't know the answer at the moment. I, I just don't know the answer at the moment. That's awesome. Sebastian. Great question. Reach out to me. I got some socks for you. That is what happens when you ask these awesome questions. All right. Next up, what are the core parts of underwriting content from L L MS that would need to be evaluated to ensure output, what is required? That's awesome. Sebastian. Great question. Reach out to me. I got some socks for you. That is what happens when you ask these awesome questions. All right. Next up, what are the core parts of underwriting content from L L MS that would need to be evaluated to ensure output, what is required? Sure. Yeah. Um So, I mean, Sure. Yeah. Um So, I mean, that's one of the interesting problems like, I think I had one slide where I had one of these sort of nine page applications and, you know, on top of that, whenever we're underwriting, we get like financials, we get you know, their employee handbook. And so, um, you know, ideally we, that's one of the interesting problems like, I think I had one slide where I had one of these sort of nine page applications and, you know, on top of that, whenever we're underwriting, we get like financials, we get you know, their employee handbook. And so, um, you know, ideally we, we just give that all to the L L M and, you know, it would do our job for us. But, um, you know, I mean, maybe with GP T four, you know, 32 K tokens, like it'll be doable, but context size is definitely one of the, um, problems we have to think about when we get deeper into it. Um, you know, I think that we just give that all to the L L M and, you know, it would do our job for us. But, um, you know, I mean, maybe with GP T four, you know, 32 K tokens, like it'll be doable, but context size is definitely one of the, um, problems we have to think about when we get deeper into it. Um, you know, I think that one idea we've been like potentially thinking about is, you know, maybe asking on the first pass, having a large language model or chat G BT or whatever, one idea we've been like potentially thinking about is, you know, maybe asking on the first pass, having a large language model or chat G BT or whatever, um kind of summarize each document and then put that, put those summaries in the kind of the final uh prompt that uh makes the evaluation something like that. But um, I guess the answer to the question is there's a lot of information that's the application financials, you know, history of if they've been sued. Um, you know, and based on all of those things, we might even ask for more um kind of summarize each document and then put that, put those summaries in the kind of the final uh prompt that uh makes the evaluation something like that. But um, I guess the answer to the question is there's a lot of information that's the application financials, you know, history of if they've been sued. Um, you know, and based on all of those things, we might even ask for more and and like knowing all this, do you feel like like knowing all this, do you feel like the method that you're currently using is the only method that could actually do this or are you also looking for? Like, is there another way? the method that you're currently using is the only method that could actually do this or are you also looking for? Like, is there another way? So doing what specifically So doing what specifically like trying to, I guess the idea is the end goal that you're, you're getting at? Right? And with, with feeding all these L L MS or feeding these L L MS all this data. like trying to, I guess the idea is the end goal that you're, you're getting at? Right? And with, with feeding all these L L MS or feeding these L L MS all this data. Like, I just wonder if there's, and I didn't mean to get so philosophical. I think it's probably because it's late here. But Like, I just wonder if there's, and I didn't mean to get so philosophical. I think it's probably because it's late here. But you caught me at a good time. I need my, my scotch and cigar and this is, uh, it's what you get for being the last one. I get to chat with you on this kind of stuff. you caught me at a good time. I need my, my scotch and cigar and this is, uh, it's what you get for being the last one. I get to chat with you on this kind of stuff. But yeah, I, I mean, I, I love, uh, pontificating so, you know, you won't, won't find more. But yeah, I, I mean, I, I love, uh, pontificating so, you know, you won't, won't find more. Look at that Boca Word man. Oh my God, bringing out the real guns. All right. But at the end of the day, it's like the, Look at that Boca Word man. Oh my God, bringing out the real guns. All right. But at the end of the day, it's like the, the thing that I wonder about is the thing that I wonder about is do you need to use because there's a lot of times that we talk about in the M Os community that machine learning might be over engineering something. And then I feel like now with these new large language models potentially using them. do you need to use because there's a lot of times that we talk about in the M Os community that machine learning might be over engineering something. And then I feel like now with these new large language models potentially using them. And in my own experience when I've played around with them, sometimes I spend so much time trying to get what I want out of it that I'm like, dude, I, if I just did this, it would have been faster than actually having to do it. So I just wonder, do you ever want, do you ever think about, are you And in my own experience when I've played around with them, sometimes I spend so much time trying to get what I want out of it that I'm like, dude, I, if I just did this, it would have been faster than actually having to do it. So I just wonder, do you ever want, do you ever think about, are you inventing a nail for a hammer or is this the only way in your mind or in your eyes right now that this, this can happen? inventing a nail for a hammer or is this the only way in your mind or in your eyes right now that this, this can happen? Uh I mean, I think, yeah, it's a really good question and I mean, I think, I, you know, the traditional wisdom with machine learning has been like, use it as little as possible only when you really need to. Um And I think that that's a lot of the reason that for uh the use cases we put into production that far, thus far, we weren't like, let's just throw a bunch of info at chat GP T and like, let them handle it. Um Uh I mean, I think, yeah, it's a really good question and I mean, I think, I, you know, the traditional wisdom with machine learning has been like, use it as little as possible only when you really need to. Um And I think that that's a lot of the reason that for uh the use cases we put into production that far, thus far, we weren't like, let's just throw a bunch of info at chat GP T and like, let them handle it. Um Because yeah, I mean, I do think, but also there's things like, you know, I mean, if we were able, if we tried to create a system to like do underwriting, uh you know, I mean, there's actually a lot of examples of unsuccessful insurance companies that have tried to do that. And so I think that for unsolved problems, it, it can work really well. Um You know, and, Because yeah, I mean, I do think, but also there's things like, you know, I mean, if we were able, if we tried to create a system to like do underwriting, uh you know, I mean, there's actually a lot of examples of unsuccessful insurance companies that have tried to do that. And so I think that for unsolved problems, it, it can work really well. Um You know, and, but yeah, I mean, I agree, I think that, you know, what will be really interesting is seeing, you know, good software built on top of the large language models. Like I think a lot of people say, you know, text is the new U I like, I mean, you know, I love my terminal but I don't think that's gonna happen. Uh but yeah, I mean, I agree, I think that, you know, what will be really interesting is seeing, you know, good software built on top of the large language models. Like I think a lot of people say, you know, text is the new U I like, I mean, you know, I love my terminal but I don't think that's gonna happen. Uh You know, uh and so, you know, there's, and, and using large language models presents new U X issues, right? Like often time, you know, I think in the like performance optimization discussion and they were talking, they, they got a little bit into like U X fixes for, for slow models and stuff like that. So, um, You know, uh and so, you know, there's, and, and using large language models presents new U X issues, right? Like often time, you know, I think in the like performance optimization discussion and they were talking, they, they got a little bit into like U X fixes for, for slow models and stuff like that. So, um, yeah, I think that, uh, you know, it, it is a really cool new primitive. Um, you know, and, and we'll see kind of what, what it works for and what it doesn't, undoubtedly there's gonna be a lot of things made that are way over engineered and, you know, don't actually make any sense. And I mean, I think yeah, I think that, uh, you know, it, it is a really cool new primitive. Um, you know, and, and we'll see kind of what, what it works for and what it doesn't, undoubtedly there's gonna be a lot of things made that are way over engineered and, you know, don't actually make any sense. And I mean, I think if you start using large language models where they aren't necessary, it can actually potentially make your software worse because something you could have just written code for and it's so slow only works 80% of the time or something. if you start using large language models where they aren't necessary, it can actually potentially make your software worse because something you could have just written code for and it's so slow only works 80% of the time or something. Yeah, it only works 80% of the time. It's super slow. I mean, yeah, you're just, you're degrading the whole system. So there is, there's another question that came through. How are you getting it? Evaluation metrics? Yeah, it only works 80% of the time. It's super slow. I mean, yeah, you're just, you're degrading the whole system. So there is, there's another question that came through. How are you getting it? Evaluation metrics? Um That's a good question. Um I would say that uh at the moment, um for uh our internal, for, for our internal classification system, the evaluation metric is that everything goes to a slack channel and I take note when I see things that are wrong. Uh, you know, and, and people give me a heads up if it misses something. Um, and then, Um That's a good question. Um I would say that uh at the moment, um for uh our internal, for, for our internal classification system, the evaluation metric is that everything goes to a slack channel and I take note when I see things that are wrong. Uh, you know, and, and people give me a heads up if it misses something. Um, and then, uh periodically I've been kind of going back and seeing if I can improve the model. Um And, you know, I always have this original labeled data set that it was trained on. Um uh periodically I've been kind of going back and seeing if I can improve the model. Um And, you know, I always have this original labeled data set that it was trained on. Um And yeah, I mean, I would say at, at the moment we're, we're doing things mostly at a small scale. And so um the evaluation metrics are still on the manual side. I think that we will certainly over time get to the point where we uh have things like regression testing and stuff like that And yeah, I mean, I would say at, at the moment we're, we're doing things mostly at a small scale. And so um the evaluation metrics are still on the manual side. I think that we will certainly over time get to the point where we uh have things like regression testing and stuff like that uh in a more automated way. But at the moment, it's just something that we, we think is important. So we're doing uh doing it relatively manually. uh in a more automated way. But at the moment, it's just something that we, we think is important. So we're doing uh doing it relatively manually. Excellent, dude. Cam. Thank you so much, man. You've finished us, you've finished strong. And this has been incredible. I'm gonna kick you out now, loving you. This is the end of the party. You don't got to go home, but you can't stay here. And so thank you. Excellent, dude. Cam. Thank you so much, man. You've finished us, you've finished strong. And this has been incredible. I'm gonna kick you out now, loving you. This is the end of the party. You don't got to go home, but you can't stay here. And so thank you. We will see you later, man. So much for We will see you later, man. So much for having me. Yeah, it was, it was like I said, talks were amazing. Uh super honored to, to be able to talk and um having me. Yeah, it was, it was like I said, talks were amazing. Uh super honored to, to be able to talk and um yeah.

+ Read More

Watch More

1:01:43
How to Systematically Test and Evaluate Your LLMs Apps
Posted Oct 18, 2024 | Views 14.1K
# LLMs
# Engineering best practices
# Comet ML
Stopping Hallucinations From Hurting Your LLMs
Posted Jun 20, 2023 | Views 861
# LLM in Production
# Hallucinations
# Rungalileo.io
# Redis.io
# Gantry.io
# Predibase.com
# Humanloop.com
# Anyscale.com
# Zilliz.com
# Arize.com
# Nvidia.com
# TrueFoundry.com
# Premai.io
# Continual.ai
# Argilla.io
# Genesiscloud.com