Extending AI: From Industry to Innovation
Sophia Rowland is a Senior Product Manager focusing on ModelOps and MLOps at SAS. In her previous role as a data scientist, Sophia worked with dozens of organizations to solve a variety of problems using analytics. As an active speaker and writer, Sophia has spoken at events like All Things Open, SAS Explore, and SAS Innovate as well as written dozens of blogs and articles. As a staunch North Carolinian, Sophia holds degrees from both UNC-Chapel Hill and Duke including bachelor’s degrees in computer science and psychology and a Master of Science in Quantitative Management: Business Analytics from the Fuqua School of Business. Outside of work, Sophia enjoys reading an eclectic assortment of books, hiking throughout North Carolina, and trying to stay upright while ice skating.
David joined SAS in 2020 as a solutions architect. He helps customers to define and implement data-driven solutions. Previously, David was a SAS administrator/developer at a German insurance company working with the integration capabilities of SAS, Robotic Process Automation and more.
At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.
Organizations worldwide invest hundreds of billions into AI, but they do not see a return on their investments until they are able to leverage their analytical assets and models to make better decisions. At SAS, we focus on optimizing every step of the Data and AI lifecycle to get high-performing models into a form and location where they drive analytically driven decisions. Join experts from SAS as they share learnings and best practices from implementing MLOps and LLMOPs at organizations across industries, around the globe, and using various types of models and deployments, from IoT CV problems to composite flows that feature LLMs.
Sophia Rowland [00:00:00]: My name is Sophia. I'm a senior product manager at Sass and I just like my drip coffee black and I can drink almost a pot of it in a day.
David Weik [00:00:10]: Hi, I'm David. I'm a solutions architect at Sass and I like to mix things up so I don't get to become a coffee snob.
Demetrios [00:00:19]: What is happening, MLOps community? We are back with everyone's favorite podcast. I am your host as usual, Demetrios. And today, talking with Sophia and David from SAS, we got into so many different use cases and so many different things that they've had the pleasure or displeasure of seeing out in the wild and how you can hopefully take steps to not fall into the same holes as others have done. There were a few horror stories. We talked about zombie models being just out there and companies not even realizing it. I've heard about this. We talked a few podcasts ago to someone who actually had that very same story. And Sophia talked about how she has met people with no governance in place.
Demetrios [00:01:11]: So let your ears perk up and hopefully you get a good laugh out of that or a good cry, because ideally you're not in that situation. But if you are, I understand, I empathize and you can have your cry. So this was an awesome breakdown about some of the best practices that they've learned over their years implementing mlops in different industries. There's a plethora of use cases out there for machine learning and generative aih, we talked a lot about many of them and what the downfalls or pitfalls are with these different use cases. Let's get into it with Sophia and David. All right, and a huge thank you to sass, our sponsor of this episode. If anyone is curious on what they are innovating on these days, you can find a link in the description. Go check it out.
Demetrios [00:02:05]: They're doing some really cool stuff and Sofia and David attest to it in this episode. Oh, and by the way, like subscribe, whatever. Leave us a comment, share it with a friend, you know, all that good stuff. Have some fun. We'll see you on the other side. So I've got a confession to make, and this is going to tell you a lot about the vibe of my day today, which is I said, you know what? My normal dose of vitamins. I'm just gonna double that because I knew I was talking to you both today. So I hopefully am extra super powered right now for this conversation.
Demetrios [00:02:49]: So, David, I know you're in Columbia right now. You're traveling. Sofia, you're at home. I presume both of you are doing some really cool work. When it comes to working in the weeds on mlobs and different AI products, I would love to know, like, what are you getting your hands dirty with day in, day out? And maybe, Sophia, you can kick us off.
Sophia Rowland [00:03:17]: Thanks. So I work currently as a product manager, and I'm focused on our solution for mlops. And I get to work with a wide variety of different customers and a variety of use cases. So some of my things that I get to do is speak to our executives, our users, and figure out what are they trying to do with mlops, what are their gaps, what problems are they trying to solve? And from those conversations, it gives me a really great insight on the space and where people want to go. And I can then take that, translate it to requirements, and start working alongside our developers to bake that into one of our products. So I get to work with dozens of customers across industries worldwide of different sizes to really start to map out what the mlops landscape looks like to these different organizations.
Demetrios [00:04:14]: Love that. How about you, David? What do you get in your hands dirty with?
David Weik [00:04:18]: I think my attention is currently kind of split. One is on the, I would say, like, ground level mlop stuff where we are talking about real time integration of models into a fraud process where we. Yeah, I mean, if you're standing at the closure, you don't want to wait for your transaction to be processed. So there needs to be real time or even stream processing. And we need to make sure that we don't charge you or have you as fraudulent when you clearly aren't. And on the other hand, I talk a lot about how do you put generative AI into processes and make sure that they run great in production. It's a real split that I see right now where we are still talking a lot and trying to face the challenges of traditional mlops and then trying to adapt how these new technologies fit into that as well.
Demetrios [00:05:13]: Yes. And that is a little bit of a path that I wanted to go down, which was a what is traditional ML mlops problems and challenges look like? What are the new gen AI problems and challenges look like? And then what do the use cases where you have both of them stacked on top of each other? I know that's becoming more and more common these days, so I would love to hear what you're seeing in that realm also. Now, before we get into it, let's talk a little bit about what some of these use cases are. You mentioned fraud, David. Are there other ones that you're seeing commonly as like now, this is a very clear ML problem. I can think of a few off the top of my head, like loan scoring. I think we have recommender systems. Those are very clear.
Demetrios [00:06:10]: There's different pieces that I imagine you're seeing. What are they?
David Weik [00:06:16]: Yeah, I think one of the classics is still churn prediction. Right? It's cross industry, it's a topic you always will add. And the question there, especially in mlops, is starting to change where it was a lot of batch focused in the past and it's moving more and more into real time processes where also the data scientists now have to talk to a different set of people. Where before they talked maybe with the data engineers and they knew them already because that's where they got their data from and they knew how to build a process together. And now they are talking to application developers all of the sudden and they want things like Openapi free spec, or they tell them, yeah, you have to integrate into my Java application, or even worse, sometimes like COBOL applications or whatever, and kind of navigating that space. How do you go from batch into more? This application focused world is really something that a lot of data scientists just get thrown into and they need help along that way because they might have traditionally not worked in that space.
Demetrios [00:07:24]: What a great point. That is something that I've been seeing a ton and we actually were just speaking a few days ago to Catherine, who wrote a book geared towards data scientists on software engineering best practices. And it was very much that, like she wrote the book because she did not feel that she was equipped as a data scientist with these software engineering best practices. It sounds like you. Oh, and one thing from this book is that she said the biggest takeaway is understanding the common language. A lot of times data scientists don't even know what an API is or CI CD and why it's very important. So have you seen ways that these data scientists now having to think about it in real time as an application have been successful?
David Weik [00:08:20]: So for me it was always the kind of tool based approach to, to show them how they can translate a batch based approach into an API based approach and what kind of difference that makes for them and trying to help them understand. So maybe building a small JavaScript application together with them and for that, that's a great use case for Genai to just help it pull something together and getting them to test it to see how it feels like and getting them to understand that every additional input they require in a process is additional input field for the frontend application. So like in batch, where just everything is available, or just an SQL query away in the API from the front end developer, it's much different. And if you require 100 inputs into your model, that might be a hurdle to getting it into production. And when they see they have to fill out each individual field, so making it really explicit, they start to rethink and like, oh yeah, let me go back to variable selection. Maybe I need to trim some of them out because they don't make any sense. Right. And that only becomes apparent if you make it a paint for them.
David Weik [00:09:34]: So that has been a great thing to showcase and talk about.
Sophia Rowland [00:09:41]: And, I mean, just to jump in from even the organizational perspective, we are seeing shifts in how data scientists begin to work within the team because, I mean, you've really hit home for me there with your earlier point. When I went through my master's program in data science, what we were taught was the age old adage of let's throw the model over the fence. It is not our problem once we've trained it. And our Capstone project was a great example of where this went wrong because we went in and we were working with a local healthcare provider, and to date, that was the most depressing data set I've ever seen in my life. It was 300 individuals who, where two thirds of them had passed away, unfortunately.
Demetrios [00:10:30]: Oh no.
Sophia Rowland [00:10:31]: And we were trying to figure out who would survive a particular procedure. And so we went through all of this effort. The data was so messy. 1200 columns, 300 rows put forth, all of this effort, doing the cleaning. And what did we hand off to this healthcare organization? Training code. We weren't taught to give them a trained model object or the scoring code. It was, here's your training code. Best of luck.
Sophia Rowland [00:10:58]: And I sit with that regret to this day about how awful what we did was for that organization and moving into actually my career and figuring out that that's not acceptable in a lot of organizations. Like, you want your data scientists to be integrated inside that team where we're all working together on that sole focus of how do we get this model in a form and location where it can be used to really affect that decision making? Because it's not just the data scientists. It's not just the Mlops engineer or it resource. We need to also work with our data engineers, with our business, with our end users, with model risk, so that we can have a efficient system that is actually valuable for organizations so that we, or moving past, hopefully throwing the model over the fence.
Demetrios [00:11:46]: Yeah, yeah. But now these days, I'm guessing you take a different approach.
Sophia Rowland [00:11:54]: Absolutely. I've taken a hard curve left ever since I started working with customers. Before moving into product management, I worked in a similar role as David. I focused on helping financial service customers with their advanced analytics needs. From that customer advisory perspective, where I focused a lot on leveraging open source models, text analytics optimization and mlops became one of my unique niches, where I got to figure out how to take what the data scientists are doing and putting it into production, where it can be used, where it can be monitored, and trying to solve the problems that they encounter. It's very interesting, as you talk to these customers, it feels like there's a quote where every happy family is happy in their own way and every unhappy family is unhappy in their own way. So everyone started bringing up all of these very unique problems at the organization. I started to kind of batch them down into a few different ones, but you still see folks who recode their models for production.
Sophia Rowland [00:13:03]: The data scientist develops it in one language, and to productionalize it has to be in something else. And of course that's an opportunity to introduce errors with. The data scientist isn't the one doing the recoding. You have to test that your outputs match from the original model. You have dependency management. So if you've developed your model using one version of Python and one version of packages, and that's not what's available in production, you get some very interesting errors. I've gone in and I've done, I've worked quite a bit. A lot when the development environment doesn't match the production environment and you'll have to start to interpret those yourselves.
Demetrios [00:13:43]: But that's funny you mentioned that because there is another person that came on here recently, and he was talking about how he was working at Yandex, and one of the big projects that he did was go and analyze all the models that were in production and cut out like 10% of them. And he said he was able to save so much money because they were just basically like zombie models hanging out there. They weren't getting hit, but people were afraid to take them offline because they didn't know if they were actually driving revenue or not. And so you think about those scenarios and you're like, wow, that feels like they were missing a piece of the circle. You know, it's like you put it out there and then you just kind of forget about it and you don't recognize if you're getting ROI on it. You don't have any clear metrics on what that model is doing. And this seems like it's even a step further. You don't even know the model exists and it's out there.
Sophia Rowland [00:14:40]: Exactly. Exactly. There's been. I think there's a very interesting number of customers that you'll find that, or in organizations that just have models hanging out in production. And of course, that's going to incur a cost if no one's using them. Paying for the compute, for the storage. You're opening yourself up to potential security vulnerabilities or attacks by just leaving them hanging out there. And so it's.
Sophia Rowland [00:15:08]: I've started to add a little reminders in some of my presentations about the analytics lifecycle to say, hey, check your models in production. Are they actually being used? If not, you are allowed to retire those.
Demetrios [00:15:22]: Oh, that is classic. Yeah, we're giving you permission right now.
Sophia Rowland [00:15:26]: Yeah, go on.
Demetrios [00:15:26]: It's okay to take them offline. Anybody listening? You do not need that model out there, especially if it's not actually bringing any value to the company. And people will probably thank you for it, because, like, this situation at the index, it'll save money, and it also.
David Weik [00:15:44]: Saves you the time your talent invests in keeping those models alive. Right. They could be doing something much more productive or interesting to them. Keeping things on health support is fun for a time, but if it doesn't drive anything, you don't get any great feedback on it. It will also lead to you maybe looking elsewhere, where you get more interesting challenges to tackle. So you also need to think about, how do I manage all of these models, even the productive ones, to keep the talent that can create them? Be engaged in my company and look for new use cases, or think about how they can improve the models that are now in production and driving value before they leave, or kind of go back to doing something completely else.
Demetrios [00:16:36]: Yeah, it's funny you mentioned that, because I was reading the evolution of Michelangelo's blog post, and one of the big things that they said when they went from Michelangelo 1.0 to Michelangelo 2.0 was they had certain use cases that were first class citizens because they were driving so much revenue. And what they had to do is beforehand, any ML model that went out in production, any use case was treated the same. You would have someone who is an engineer babysitting a model that may or may not be actually driving revenue, and then they have the same SLA's on a model that is driving millions in revenue, and so it just doesn't make sense at all.
David Weik [00:17:26]: Yup. And this is going on at any scale of company, right? It's not just the big hitters like Uber that have these types of problems. I was working with a company and they had a student work on a project for them, and the model was actually quite good. And then the student left, of course, after some time. And how did they operate this model in production? Before we came along and trying to solve that pain point for them, they had to import a CSV file, send it off to that specific laptop where the environment was all set up. Then they rerun the Jupyter notebook and exported the CSV file, send it per email back to the production system, and then they were good to go. Right. I was like, I can't even imagine doing any of these steps.
Demetrios [00:18:18]: I love hearing about these horror stories.
David Weik [00:18:21]: It's painful sometimes.
Demetrios [00:18:23]: Yeah. It just shows you the lengths that we'll go to and the capabilities that we have to hack something together. It's amazing.
David Weik [00:18:34]: Yep.
Demetrios [00:18:35]: Humans are resourceful, that's for sure. Sophia, I wanted to ask you, I know that you have been getting into some other use cases that we talked about before we hit record.
Sophia Rowland [00:18:48]: Yeah, there's been some very fun ones, some very helpful ones. So there's been. In my chats with folks throughout fass, there's been folks who've been working with on a worker safety use case. So there's this large shipping and logistic company that they've been working alongside, and they've had a problem with worker injury. They've seen individuals injured on the line, on this assembly line and they're trying to figure out a solution. How do we make sure that people follow proper procedures? Typically when, for example, if you're working. Working on an assembly line and you're trying to repair it or take it down for maintenance, you have something called a lockout tag out procedure. And this ensures that no one can start that assembly line while you're working on it, because should they get started while you're in there, you really are liable for a lot of risk.
Sophia Rowland [00:19:38]: And so they've been working with the computer vision models that are running on the cameras in this warehouse to identify specifically worker safety, safety risk. So making sure, for instance, no one's run, like leaning over an assembly line that's running, no one's putting their hands on parts of the assembly line where it shouldn't be. And also ensuring that that lockout tagout procedure is followed. Where you physically cannot start that assembly line until someone comes in and undoes that lock. Because that's where they've seen a lot of workers injured, and so they're just having these computer vision models run alongside the cameras in this warehouse so that they can identify and alert a manager or someone on the floor whenever a potentially dangerous situation or scenario is occurring to, again, you know, prevent people from hurting themselves on the assembly line. There's another much more positive one that we've been working with one of our partners, clearblade, on. So clearblade, they work in IoT edge AI and digital twins and I, we work with them in solar farms. So we are looking at how we can make solar farms more efficient, making adjustments as needed based on where the sun is, based on where the clouds are to maximize the power that we get from the sun.
Sophia Rowland [00:20:57]: And I've seen a very interesting LinkedIn post from one of the folks who are working on this project, because recently, in the last few months, we had a partial solar eclipse here in North Carolina. And from here, like, you couldn't see it without the glasses and you could only see part of the sun covered. But he actually had the data to show where that drop in the distribution was. And it was a perfect correspondence to how much of that sun was blocked during that partial eclipse. So he pulled it right from the solar farm data from our headquarters. Cause we have a solar farm out back that's a lot of fun. And you can see the little, we have sheep that mow it. So it's a really nice perk about being in hq, is seeing the sheep mow the solar farm.
Demetrios [00:21:41]: Oh, that's awesome. Now, going back to that first example, not that I don't want to talk all day about solar farms and sheep, but the first example that you're mentioning seems like there's some challenges with integrating the vision model with, with the camera and then the camera being able to alert people. And is it that the system cannot turn on if there is some danger scenario? Or is it that it just gets alerted and. Yeah, break that down a little bit more? Cause it seems. Yeah.
Sophia Rowland [00:22:20]: So with the lock lockout tagout procedure, it is a physical, like, I go in and there's like a yemenite something over the assembly button that I pull down and I physically lock. So if I'm going to go on the align, I physically lock it so no one else can start it, but they're seeing that workers aren't following that procedure and physically locking it, which means someone can walk by and see that, hey, the assembly line is off. The button's open. I'm just going to press the button. So because the lock is physical. All you can really do is send out that alert. But I, with the way that this team has been working, is that you can actually start to stream together different pieces of information inside of that flow for how you start to know when is something addressable. So you can say, I only want to look at things in this part of the screen.
Sophia Rowland [00:23:11]: I can look and see, compare that with that computer vision model to see is this appropriate based on the other information in the scenario? And from then I can combine that with logic to send that alert down to the line.
Demetrios [00:23:24]: And what are some of the hard challenges that you face with that?
Sophia Rowland [00:23:27]: Well, with computer vision models, you do have several different frameworks that you are working with. So a lot of different people have their own different preferences from storing in an onnx framework or from building out their tensorflow, Pytorch, or even their SAS models. So there's a lot of different models coming in that data scientists develop based on their preferences. And from there you have to compare to see what is the most efficient model that you want to use for the framework, what gives you the most accurate results, and be able to actually have the infrastructure and the ecosystem to run that model, regardless of what it was developed in, but making sure that we can put that into where we need it to be in just a few clicks with IoT, you'll also see folks do things like develop models for a specific machine. So this is one model for this one machine or one model for this one sensor. And from that they get thousands of models that they have to manage. So the scale becomes really crazy. We have one organization that we work with, they are a manufacturer, and I think they're probably the most analytically mature organizations that I've worked with.
Sophia Rowland [00:24:41]: And they're currently running 10,000 models in production for their different machines, and they're hoping to scale that up to 20,000 really quickly. So they are really chugging out those models. And with that scale, it's really hard to identify errors without proper monitoring, without alerting to know when to focus your attention. Because you can't hire 20,000 people to each sit and watch this single model. You have to have some way to know, hey, these are the rules and the thresholds for what an error would look like. And hey, send that alert so that someone can take action as needed.
Demetrios [00:25:20]: But at least it's semi standardized. Like, you don't need custom metrics for each of the 20,000 or do you?
Sophia Rowland [00:25:28]: Not for each of the 20,000. So it's pretty similar metrics for similar machines. It's just they want to be very specific for each one. So this is the machine for this one. For some of these manufacturers, they are working with machinery that are decades old, and they are seeking out ways to be more efficient with the machinery they have available, because buying a new machine might be tens of millions of dollars. How much is a model cost to monitor that machine, to look at predictive maintenance for it to understand when it's time to maybe switch some of the adjustments on that machine so that you get value out of it without having to actually replace that million of dollar machine that's running their manufacturing plants 100%.
Demetrios [00:26:15]: Yeah, and actually, we were just, I was just talking to Richard on here a few weeks ago, too, about doing ML in the industry, like heavy industry, and he was saying how if you make the wrong prediction and you say that a machine needs servicing and it's off the line, but really it doesn't, then you're losing out on that too. And so the cost of error is very high.
Sophia Rowland [00:26:41]: Exactly. And I've been working pretty closely with our model risk team here at SAS, and they are experts at knowing how much a bad model costs, because it's not just monetary value. Sometimes a bad model causes reputational risk, or perhaps it even causes harm to individuals. And so how do you quantify what that bad model costs so that, you know, is it worthwhile? Can I weigh that in my cost analysis for the value of this analytical project? Am I even considering that? Because a lot of organizations don't think they need to worry about model risk or for when things go wrong.
Demetrios [00:27:24]: Unfortunately, AI models causing reputational harm. That is a perfect segue into our next topic that I wanted to get into, which is Genai and how people are utilizing it or going what I feel they are very quick to put something into production. And I would love to hear what you've been seeing out there. David, I want to bring you into this too. Just like, where has it been successful? Where has it not been? What? Like, as people are trying to bring Genai into the fold, what does it.
David Weik [00:28:00]: Look like to me still? The coolest thing is how many new people it enabled to play with AI in general. Because now that we have multimodal models as well, how easily they use it for a traditional computer vision use case, and they use omni for that. They are not even aware that they could maybe train a traditional computer vision model for that, but they just found this tool and they can now validate their own use cases and ideas. Without having to find someone in their data science team that they can get excited to work with them or they don't need a budget, really. So that's where I've seen a lot of new brainstorming and ideas come into. The data scientist teams is from business users. They just get an email. So I tried this and this was my prompt, and here's the result.
David Weik [00:28:57]: I really liked it. Can we put it into production? Right? And then the discussion really starts to become more interesting. But just seeing people get enabled by a technology like that is so cool to me. And it's still amazing, like what kind of crazy use cases business users can come up with, because we just don't know what their challenges are. Sometimes it's something like German that everybody knows about, but oftentimes it's the smaller issues where they say in Germany, for example, there's a very, everything is standardized. We get all these forms from the government that we have to fill out and oftentimes print out, and then somebody has to retype them. And we would think, yeah, let's train a computer vision model for that. They don't even think about it in the business units.
David Weik [00:29:50]: They're just like, let's put the image in and see what it gives me. Right? Then they come back and like, look how great this is. So a lot of the work, I also find, is helping them to understand which things should we rely on Genai and kind of take these additional investments that we have to make as an organization to introduce them into our architectural stack in general, versus where could we maybe shift that use case to more traditional machine learning applications just because they didn't know about them or thought it was a crazy high task to do. So that's my first point on Genai. And then I would say the thing I see most is really people trying to utilize large language models in either this typical document Q and a style thing, which they mostly use internally, and the other one is trying to drive up dark processing. So how do we automate our processes more with the help of LLM? So how can we maybe give it data and then let it decide on the best next action or ask follow up questions, decide where to route, for example, emails. Right? You might have info at your company and then you just get anything in there. And traditional machine learning or natural language processing might have failed there because, I mean, especially in Europe, but also in the US, you get tens of languages in there.
David Weik [00:31:24]: So you first have to understand that, and then you have to classify them and then route them to the right. And LLMs are just great at that. And because in this routing example, you also always have kind of a human in the loop, the person that gets the email that they then can decide on. Yup, that was correct or nope, I know how to send it along and we can track that and understand how it's behaving. So that's kind of the two main areas, I would say, where I see companies trying to invest into this. But I also have to say that I see a lot of nuance fought around it. Of course, everybody starts out using the anthropics, the OpenAI, the geminis of this world, because it's an API you can easily integrate and try out. But then the follow up questions come around.
David Weik [00:32:17]: Costs about, wait, what if OpenAI changes the model version? I just use the default endpoint and they change the model version. Do I now have to retest everything? Or I what happens? Or back in February when they change something on their GPU configuration, you only got googly gog back from chat GPT. These are the types of risks. If you really want to start relying on it, can you? So oftentimes we start to figure out which large language models, as an organization they want to use and introduce kind of a common repository of large language models. And then from there, in the actual use cases, start to discover which language model fits best and have a cost to performance metric attached to it. So usually the big models, they are the best. But maybe if I invest more into prompt engineering or invest more into rag, I can get down to a lower scale model and thus save cost or save on round trip time. I think round trip time is something that a lot of people don't talk about it.
David Weik [00:33:30]: If you have an application that you want to feel interactive, the response has to come back to you in less than 800 milliseconds. The average call to the big players out there is around a second that might feel sluggish to people. And then you introduce rag, third party API call like you introduce more and more stuff to it and feel sluggish. So even there, hosting your own open source models can become a value add, even though it's a quote unquote worse model. If you look at all the benchmarks, it might be better for that specific use case. So that's what I see a lot of discussions around. How do you integrate it? It's not like there's one clear use case winner. To me, it's really about the question, okay, we have a lot of ideas, but how do we get them into production and keep them in production.
Demetrios [00:34:31]: Great points. So many tangents that I want to go on off of what you were saying, because this idea on how it's like the easiest and most common way to start is just by hitting an API, and you can have even beyond that. Before that, like you were saying, you have business side of the house using chat, GPT and just prompting, and so there's an easy win. The problem I think I've also heard people talk about is not only the cost, but when they really start looking into it, they go back to that governance piece and say, wait a minute, our team is doing what and giving what data to who? So I know that's come up quite a few times. And if you're in a big enterprise, it's a much bigger problem than if you're in a small, fast moving startup that doesn't necessarily care that much about their data, or it's not like they have PII or anything, any sensitive data. But if you're in these enterprises and you're using it to help you with spreadsheets or ask questions on things, and it's not been sanctioned by the company, and it's not its own enterprise instance, then you can get into some hot.
David Weik [00:35:52]: Water, even for startups. I would say if you think you don't have privacy issues around the data, then the question almost becomes, do you have valuable data? Right? Is it just something anybody can scoop up then, or replicate your use case? So it would always be careful on that front. But also having good privacy policies, even as a startup, can be a differentiator for you. So the big enterprises, they have to do it for reputation sake and just because they are very mature, and you would get slandered much more easily in the press. But even as a startup, I don't think you should be too lax with that. I think if we look at the AI incident database, that's like a public database source. There's also more and more startups showing up in there because of that more lacks kind of work and feel with data, and I don't want my data to be just out there. Right.
David Weik [00:36:56]: Might make me hesitant to try the next startup product. So thinking of that is always critical.
Sophia Rowland [00:37:04]: Yeah. A few months ago I was talking to a healthcare system who they were diving into chat, GPT and large language models because they wanted to better support their doctors by reading through all that documentation, all of the patient records. But they had so many security concerns, couldn't simply use an externally hosted large language model because of those privacy concerns they had to look and see what can we host internally? What can we firewall around? What can we do to make sure that the private, personally identifiable information for our patients do not leave our local system? And even with some organizations, I know where my husband works, he's not allowed to use, he's not allowed to start, to start asking questions about the code. He can't use stack overflow to address questions because they're that concerned about their code getting out. And so folks like that, how do you use a copilot? How can you ensure that copilot keeps your code private? Because even people are protecting their code because that is their ip that they want to save.
Demetrios [00:38:10]: How do you think the governance in Genai compares to governance with, like, traditional ML?
Sophia Rowland [00:38:19]: I think it's still emerging. I think folks are still getting their head wrapped around what they actually need to do for genitive AI purposes. I think because trustworthy AI is still something that I think is a growing field, especially here in the US. We don't have a lot of trustworthy AI regulations on the books. We're still merging through. What are the standards and best practices? Our National Institute of Standards of Technology fairly recently just released their AI risk management framework. That is our first go into saying, here are some standards around how you can build trustworthy AI systems. A lot of it does apply to generative AI as well, but there are unique considerations.
Sophia Rowland [00:39:08]: I know there's a lot with hallucinations. Generative AI can sometimes be confidently wrong. And I think this is just an emerging societal issue where if someone has a platform and they say something with confidence, someone out there is going to believe them. And we just aren't really at a place where we are saying, hey, that's wrong. You need to not blindly trust everything that not just a machine learning model tells you, but also that you, you can't blindly trust everything that you read on the Internet. So there's definitely a need for literacy and some, or analytical literacy and some understanding around that. But I think there are some unique considerations. There are some general considerations in the trustworthy AI space that still applies, but it's definitely still emerging as people move along.
David Weik [00:40:02]: On the technical side, you have simple questions like, now we want to introduce a vector database, right? Who's going to do the SLA for that? How are you going to do the backup? What if we have two departments sharing the same one? How do we ensure the data doesn't cross pollinate? Or even if you go away from embeddings, go to something like where you want to enhance the rack from a data catalog or an SQL query. How do we make sure that the calling application can only retrieve the data that it's supposed to and not just everything within the organization and does leak stuff? These are questions we have been dealing with. In the mlop space, we know how to document training data and how to move from one step to the next. But in the Genai space, it's much more wild west right now. And then also on the monitoring side, how do you monitor just text? Do you use a LLM that then scores that? Or do you try to break it down like most chat interfaces? Like thumbs up, thumbs down. So you break it down into a classical ML kind of binary target variable? Or do you have actions that a user can take based on what the other recommends? You track that, right. How do you kind of translate this hard to grasp text and put it back into something you can understand and track over time to see the model is actually performing in a way you want it to perform even outside of all the ethical concerns. Right.
David Weik [00:41:44]: Just from a pure technical aspect, how do you monitor random text?
Demetrios [00:41:50]: Basically 100%, I think the evaluation piece is so unsolved right now. And what you get is you're seeing a lot of companies pop up because there's this gigantic gap. Nobody really has it figured out. And so you get companies that are coming up and saying, all right, we do evaluation, and we've done a few surveys in the community that have been about evaluation. And what I've seen is sometimes you'll get people who say, yeah, evaluation is important to me, but I'm not doing it. Like, I have more important things on the priority list that I need to do before I do that. And then other times, you just get the thumbs up, thumbs down, which is okay, but that's not really that good of a signal. And so it feels very flawed in so many different ways.
Demetrios [00:42:51]: And then also, you're probably doing batch evaluation, so you're not really doing it in real time, so you can't tell if somebody just got a horrible answer. You can see that maybe the day after or a few hours after, whenever you're doing your batch, maybe a week after, who knows? But I was recently in San Francisco, funny enough, and talking with one of these people that created an evaluation company, and she was saying her name was Brooke, and she was saying that for her, it's very important to understand what the metrics are that you're trying to evaluate on. And she comes from the autonomous vehicles world, and she was saying that you want to look at these evaluation set as like corpuses of data as opposed to a one to one type thing saying, oh, was this answer good or was this answer bad? No. You want to try and see, okay, what is our target answer? Or what are we trying to go for? And then can we make that as like pointed as possible? And then can we see all of the answers and view them like, were there anomalies or are there statistical outliers on that? As opposed to what I think a lot of people are doing right now is saying like, okay, here's the output, here's what I was going for. Uh, I guess that looks alright. And then you bring up another one. Here's, here's the output. And uh, and just going off of vibes.
Demetrios [00:44:23]: Right.
David Weik [00:44:24]: The, the vibes one is even worse when people go off these standard benchmarks that are out there, right? And they just pick like the winner in three benchmarks. And it's so random. Yeah, it's a wild space and it's so use case dependent, as you mentioned, you really have to drill it down into your specific use case. And then also it becomes a question. Are you also evaluating your embedding model or you just evaluating your LLM?
Demetrios [00:44:55]: Yeah, exactly. Are you evaluating the information that's retrieved from the vector database that you have to evaluate things at every step along the way?
David Weik [00:45:05]: Yeah. And you always have to keep those free chatbot questions, I would say in mind, which is kind of, they are traditional. Right. Even when we were doing the more like old school rule based chatbot, it was always, is my chatbot talking about something that it doesn't have an idea about? Because marketing started a new campaign. Now users ask the chatbot and has no clue about that new campaign, but it will answer anyways. Or is it something where the user asks the question, did chatbot should have the answer but it didn't deliver it? Or is it the third and the worst category where we think we delivered the correct answer, but the chatbot and the user didn't accept that answer and didn't take it, or didn't do something with it? And how do you keep those three? Kind of, the first one is rather easy, I would say, but the other ones become more and more complex as you have a mature system.
Sophia Rowland [00:46:07]: And then you start to wonder, as you get deeper into that need for truth, at what point is the number of guardrails you put around the large language models output just start to become its own rules based approach? We're almost, I've seen examples where individuals like we are. Chatbot is. Our large language model output is 100% true, and we know it because we've compared it against this and this, and we know that it is correct, so there's no hallucinations. And at that point, it's. You've almost just rebuilt a rules based approach to confirm that my chatbot will never be wrong.
David Weik [00:46:49]: Yeah, that is.
Demetrios [00:46:50]: That is a great point where you just have to hard code in all of these different pieces or all these guardrails, and you end up realizing that maybe it was easier if we just changed the interface so it wasn't a chat bot, and it was just point and click, and then you wouldn't have to even deal with all of the unwieldy actions of a large language model. That's one of the things that I don't know about you all. But an honest question, are you bored of chatbots yet? Because I 100% amp.
Sophia Rowland [00:47:26]: It's definitely everywhere. I think everyone's scrambling to find a way to make business value out of generative AI, and they're throwing every use case that they have at it. And sometimes it's just a traditional machine learning or roles based approach would have given them much more lift. But I've been chatting with so many users who are scrambling to say, hey, my executives want a generative AI system. This is what they need. And I'm trying to figure out how to implement it based on this particular use case. And you're looking at it like, that's a little bit of overkill. I'm so glad to see that so many people are interested in AI, but I want to tell people that it's not just generative AI.
Sophia Rowland [00:48:09]: That's not the only AI there is. I've seen on TikTok where someone's claiming that they're older than AI. It's like this 20 something year olds. Like, I'm going to tell my kids I'm older than AI, and everyone's just like, you're not.
Demetrios [00:48:27]: Yeah, they need to hear about the AI winner.
David Weik [00:48:31]: The worst offender I have seen is where they basically build out the chatbot to ask for variable inputs that could have been just a slider or a button. Right, if you could change around. But they were like, yeah, you can now talk to them. I'm like, no, just give me a slider. The value is from one to ten. Like, just give me that slider range. I don't care.
Demetrios [00:48:55]: 100%. That's what I feel like. You're making it worse sometimes. Yeah. Just because we can use AI doesn't mean we should.
David Weik [00:49:05]: Yep.
Demetrios [00:49:05]: And a lot of times, and I guess that's what happens at the beginning, people got really excited with the technology and the potential and so they thought, wow, this is gonna, and also you have a lot of onlookers saying this is going to change the world, this is gonna change how we do business itself, this is gonna change the world of work. All of that type of narrative is not helping one bit. And so what happens, the byproduct is that you have developers putting Genai into whatever they can. And I think the epitome of that was somebody commented in the community how Meta now has like meta AI, right, and Instagram and WhatsApp. And somebody was saying in the community, slacken. This was just some PM having to hit their KPI's like, let's all be honest about it because nobody's going to meta. And being like, all right, Genai, search type thing. All right, I don't know who's using that.
Demetrios [00:50:10]: If you are, please tell me. And I would love to hear about your use case, but I definitely am not.
David Weik [00:50:16]: Yeah, I mean, I turned off the Google and bing features for search. I just, just give you a straight up search, please. Thank you.
Demetrios [00:50:27]: Yeah, I often wonder am I the weird one? Because I did the same and I'm like, am I missing out on something here? When I want to do the AI stuff, I'll go to perplexity, but when I want ten links, or I just want links in general, then I go to Google. Because half the time it's that I need to navigate to a website and do something on that website. Like for example, today, as you know, David, being a german resident, telecom is one of the mobile phone providers and I had to go to Telecom de today I don't need AI to tell me what is on that website. If in the future we can have agents, ideally that will go and deal with all the crap that I had to deal with with Telecom, that will be splendid. But for now I don't think we're there. And so that's kind of like the segue into my next question. Have you seen people trying to implement agents and have you seen any success there?
Sophia Rowland [00:51:34]: I've seen some interesting use cases with agents. Some of them are very well guardrail to the point that it does seem very rules based. I know that is something that some folks are exploring for how to leverage inside software that's for advanced users, so that it's a little bit easier for folks to figure out. So I don't have to know how to use this advanced software. I just type in, this is what I want to do. It's done it for me. It's given me some suggestions for what I should do next. So I've seen folks trying to use it to make things even more user friendly since we all know how to type in questions as Google's really good for that, for teaching us that piece, but it seems to still be ongoing and it seems to be very for specific tasks.
Sophia Rowland [00:52:27]: So it's almost like you have to think ahead of time to understand what an individual might want to do and what we want to make easier for that individual. So I have to think as a newbie, as someone who's never seen this software before, how do I make this an easier workflow for them so that I can think about it ahead of time, almost like a new way or a new user interface into using some software.
David Weik [00:52:50]: I was in talks with a customer where they implemented, again in kind of trying to automate more of their processes around incoming mail that they have, and they reverted back to a simple agent that just had a choice between two or three actions and then did those actions and then was done. Much more sophisticated approach that had like multi agent that handed off things, but it just broke too many times. They were like, we can't babysit this thing. So when you're like into multi step agents or multi reasoning things, then at least in the things I have seen in actual production, it just broke down once there was some complexity introduced. I had one demo built internally that worked great for me because I understood how all the agents were laid out, how they could interact. And when I was chatting with the system, it worked every time, flawlessly. I handed it over to a colleague and he came back to me five minutes later and was like, this thing is garbage. It doesn't work at all.
David Weik [00:54:00]: He had no understanding of how to talk to that thing. And now, then I realized maybe I may be wrong.
Demetrios [00:54:09]: Yeah, just go hand it over to Reddit to stress test it. They'll come back with a lot. Oh, great. Yeah, that's, that's what I think about. It's of course, in this walled garden. It works beautifully with the user n of one. Right. And then you give it to the next person.
Demetrios [00:54:31]: And since it is open ended, I think that's where a lot of times we fall flat because it is so open ended. And that's why I like the fact that there's a lot of people trying to change the interface. So it's not a chat bot it's more point and click and then potentially you have those very clear verticals or task lists that the agent knows what to do. And like you were saying, sophia on, almost like making a very complex program a bit more user friendly is it feels like it's just a little bit more advanced help button.
Sophia Rowland [00:55:17]: It's almost like we're bringing back Clippy.
Demetrios [00:55:20]: Yeah, good old Clippy. We totally should have. That is so funny. But I do see the need in that. Like, especially for me, when I was trying to learn how to use Photoshop years ago, I would have an idea on what I wanted to do. But Photoshop is so powerful that you can spend hours just clicking on different ways to or different effects that you can add to the photo. And I would end up going and having to type into Google or YouTube, usually YouTube, and get a tutorial on how to do XYZ. But the hard part wasn't really figuring out how to do it.
Demetrios [00:56:05]: The hard part was trying to put into words what I wanted to be done. And so because that was the hard part, this new interface that we have or this new way of dealing with it, if it's through natural language and a chatbot, it doesn't solve that hard question of like, I have this idea in my head, I don't know what it's called, I don't know the special language for it. I just want this to happen. And because I can't reach into the screen and like shape it how I want it is still going to be very difficult with the most advanced form of LLMs out there often kind of.
David Weik [00:56:45]: Like a brainstorming thing, kind of trying to get it to give me the thing I want to then go look up, right, what's that specific industry term for X?
Demetrios [00:56:56]: And I'm like, give attorney, please. Oh, totally, 100%. Now that we've discussed a few of these challenges, I would love to know. Just like when you've seen people that are excelling on their AI or ML journey, are there common themes that come up again and again? Like have there been some things that have cropped up and you've said, wow, I notice that this tends to happen in very mature teams or very high performing MLAI teams.
Sophia Rowland [00:57:39]: I think what I've seen is excellent lines of communication. They're working together. They aren't tossing things over the fence saying this isn't my problem. They are really working as a team to bring in these different individuals to share their knowledge. Hate to use the corporate lingo, but breaking down those silos so that we do have a cohesive team. They are experimenting. They are doing proof of concepts. They are looking to see what has potential value.
Sophia Rowland [00:58:13]: How can we start small and bite off something that we can actually chew and become really good at that and then start moving into seeing how can we make this something that has business value? What are the next steps? They have really, well, nice processes that are flexible, but offer some levels of standardization. So they have a series of steps that they feel comfortable following, but they're happy to adjust those steps over time to become. To meet specific needs. And so they're not building all of that out at once or. But they are trying to grow over time. They're not automating everything all at once. They're looking to see what are the phases we need to do and how do we grow over time, instead of just biting off everything at once. And they are looking at their successes, celebrating it, but continuing to look on and see what's down the line.
Sophia Rowland [00:59:08]: What else can we try? What other experiment can we move forward with? And sometimes it doesn't work out. We don't have to sink, stay attached to that sunk cost. We can take those models out of production and retire them if they're not valuable.
Demetrios [00:59:25]: Nice callback to the beginning of the episode. I like it.
David Weik [00:59:29]: I think Sophia already hit on the major points for me. One that I kind of always see is when the data scientists work closely with the data engineers and they think of not just the cool model that they're going to build, but how they can actually then run it in production and have those points where they, if they have to retrain the model, that it's not like they have to start from scratch again, but they have a process for that in place. If they are scoring the model in batch, that they have a point of integration with the data engineer. So it's not something completely novel, but there is a clear point where this handoff happens and where the integration then actually works in production. So if those two teams fit on the same floor or share kind of a common team structure, then I always see them be successful across like one or two use cases, but then really scaling up to more of them without getting hampered down because they have to reinvent the wheel every time. More of like a. Kind of like a platform approach that they take to these problems, and they have clear points where those teams interact and then deliver.
Demetrios [01:00:48]: Well put. Well, thank you, David and Sophia, for coming on here. This has been awesome. I love chatting with you just to hear all of your diverse experiences. And then also hear that. It feels like we have been talking to the same people almost. These problems that I've been hearing a lot of people having, they are not unique. You all see it out there, and we get to triangulate them on this call.
Demetrios [01:01:20]: So it's been a pleasure.
Sophia Rowland [01:01:23]: Thank you for having us.