MLOps Community
+00:00 GMT
Sign in or Join the community to continue

AI in Healthcare

Posted Jul 19, 2024 | Views 2.1K
# Healthcare
# Continual.ai
# Zeteo Health
Share
speakers
avatar
Eric Landry
CTO/CAIO @ Zeteo Health

Eric Landry is a technology veteran with 25+ years of experience in the healthcare, travel, and computer industries, specializing in machine learning engineering and AI-based solutions. Holding a Masters in SWE (NLP thesis topic) from the University of Texas at Austin, 2005. He has showcased his expertise and leadership in the field with three US patents, published articles on machine learning engineering, and speaking engagements at the 2023 Applied Intelligence Live, 2020 KDD conference, Data Science Salon 2024, and former leader of Expedia’s MLE guild. Formerly, Eric was the director of AI Engineering and Conversation Platform at Babylon Health and Expedia. Currently CTO/CAIO at Zeteo Health.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
SUMMARY

Eric Landry discusses the integration of AI in healthcare, highlighting use cases like patient engagement through chatbots and managing medical data. He addresses benchmarking and limiting hallucinations in LLMs, emphasizing privacy concerns and data localization. Landry maintains a hands-on approach to developing AI solutions and navigating the complexities of healthcare innovation. Despite necessary constraints, he underscores the potential for AI to proactively engage patients and improve health outcomes.

+ Read More
TRANSCRIPT

Eric Landry [00:00:00]: Okay. Name is Eric Landry, CTO, CAIO Zetao Health. And I love my coffee Americano. My espresso machine is one of my prized possession in my house.

Demetrios [00:00:16]: Welcome back, mlops community. We are here with another podcast. I am your host, as usual, Dmitri Ose. And we talked all about the healthcare space, bringing AI into the healthcare space. High, high value uses of a rag chatbot, not this HR bullshit that you get. High value use cases and the nuances of it, the ethics of it, the conversational design, and how it can be so difficult. I really appreciate it. He's been doing AIH for so long in healthcare, and I love the fact that he still is hands on building.

Demetrios [00:01:00]: You can tell when we talk to him, he's going in there, he's hacking around. It probably would have been easy for him to sit back and become a manager, become an executive by now. But no, he's having fun with it. So let's get into this conversation, and as always, if you liked it, make sure to tell one friend so we can keep this gravy train rolling. So I think we should start with what we were just talking about, and that is you coming into this field in 2005 and explain to me, paint the picture of what things were like trying to do. ML and AI in 2005.

Eric Landry [00:01:53]: Yeah. So I entered the field through my school work at University of Texas. And back then, my studies were. My thesis was about document clustering of scientific articles about genetics. My experimentation was basically testing different algorithms. A lot of it is processing the data. An LP is usually pretty compute intensive, so finding the compute and the resources and the memory was always constrained. I was at that time working at Sun Microsystems, you know, at home, I had a little MacBook, and I would start the algorithm running overnight, and then half the time wake up in the morning with a stack overflow or some other crash at my computer.

Eric Landry [00:03:01]: So eventually, I wound up taking a lot of my work, a lot of the schoolwork with me to work at some microsystems. We had a lab full of the latest, greatest servers. You know, what would take overnight at home would take a couple hours over lunchtime. It still took a couple of hours, right? Yeah, but, you know, and then all the algorithms, as I had mentioned before, I wrote in Java. So there was a lot of attention paid to understanding, reading a research paper and then writing the proof in Java. And a lot of times I would find that there, you know, that the proofs were actually incorrect in the papers. Not a lot of times, but sometimes. And so, you know, being able to, you know, fast forward, you know, being able to, like, iterate through different algorithms quite easily was kind of a mind blowing to me, but, yeah, just kind of get strengths around the compute, the memory, just data processing, data transformation and cleaning, et cetera, was all a lot more challenging back then.

Demetrios [00:04:18]: Yeah, people don't realize how good they have it with Pytorch these days.

Eric Landry [00:04:23]: Yeah, I got real good at Java, figuring out what the optimal data structures were and how to optimize data compression and things like this. To achieve some of the algorithms, you need all the data and memory. So that was a lot of times a challenge.

Demetrios [00:04:43]: So, yeah, it feels like you, throughout your career, have been attracted to very hairy problems, because now you're in the healthcare space working in AI, and there's a lot of constraints, and there's a lot of difficulties in healthcare, whether it is with the data itself or it is with just how you can use AI, the different use cases. And so I feel like we're going to touch on a lot of these different pieces, but it would be nice to know about what you were doing at your last job, Babylon Health, because I know that you were the director of AI engineering and the. The conversation platform. That was like the prequel to what you're doing now at the current job. And what did that look like?

Eric Landry [00:05:35]: Those are two different roles. The AI platform. My team was building a cloud agnostic platform, because at that time, we were attempting to be cloud agnostic. So we built the infrastructure around being able to deploy our models, regardless of. Of whether it was Azure or AWS or whatnot. So all the challenges along how to optimize the different types of deployments, we had streaming, inferencing on a stream, kafka stream. We had inference as a service, and then batch type streaming. So we were supporting all those.

Eric Landry [00:06:21]: We were also supporting, you know, cloud based training.

Demetrios [00:06:25]: And these were NLP use cases.

Eric Landry [00:06:27]: Still, there was any. It was all that. It was NLP. It was, you know, structured data. We had a lot of medical data we were training models on. So that was that piece. And then also the conversation AI. Conversational platform.

Eric Landry [00:06:50]: We built chatbots. They're intent based chatbots back in the day, does anybody remember intent based chatbots? So we were using open source frameworks to build the chat bots. Unfortunately, we were building a conversation platform which included live chat, as well as, you know, automated responses with the chatbot. And you could hand off that, you know, the user could request to be handed off to a live agent if they weren't satisfied. With response from the chatbot. So we built a whole system around all those features. And we, unfortunately, we released our chatbot probably three or four months before Babylon closed shop in the US. But our early indications that we were, our whole focus was to deflect users patient, or our users from the live agents with an automated agent.

Eric Landry [00:07:59]: And we were able to deflect up to 40% of the users. So that's the real use case and the real value of those chatbots.

Demetrios [00:08:10]: Yeah. And again, this is something that, it's like history rhymes. Right? Now, everybody understands that because I think everybody's tried to throw a support chat bot with an LLM at their product.

Eric Landry [00:08:25]: Yeah, it's interesting to kind of see the progress so far and kind of trying to, I guess, get through the hype and really understand what the real value is. I'm kind of slowly evolving. My thinking has started to evolve to where you can't just release an LLM based chatbot without appropriate controls. And so the reality is almost something kind of in between, maybe. I don't know. I'm not quite there yet. But I'm experimenting with some technologies that would enable kind of a, I guess a hybrid type chat assistant.

Demetrios [00:09:13]: And when you say hybrid, you mean like rules based or intent based plus LLM.

Eric Landry [00:09:18]: Yes.

Demetrios [00:09:19]: That makes sense. Yeah. You can't let it be anything, or else you wind up on the front page of the Internet.

Eric Landry [00:09:25]: Yeah, because. Yeah, exactly. Yeah.

Demetrios [00:09:28]: You said something that chatbot said, something it shouldn't have, or is just basically a rapper for GPT four.

Eric Landry [00:09:37]: Yeah. I mean, and understanding healthcare, you know, okay, so somebody sold a truck for a dollar. Like healthcare, the stakes are a lot higher. Right. So we kind of have to get it right.

Demetrios [00:09:50]: Yeah. And that goes back to the hairy problems that I was talking about. Like, there's real ethical issues, there's human lives potentially at risk when you're dealing with your use cases, as opposed to. I do understand that selling a truck for a dollar is a bummer, but it's not like there's human lives at risk.

Eric Landry [00:10:10]: Right. Yeah, exactly. That's the punchline I used to tell my teams when I was in travel at Expedia, was engineers want to tend to engineer, and a lot of times over, engineer things. And so my punchline a lot of times, Washington, well, you know, if this thing breaks, planes aren't going to fall out of the sky. You know, I had to rethink that in healthcare, you know, the stakes are a little bit higher, you know, than ruining somebody's vacation.

Demetrios [00:10:44]: Exactly. At Expedia, it's like, well, somebody's vacation, maybe you're not even ruining it, maybe you're just giving them an opportunity to do something else more interesting. But, yeah, with healthcare, it's a different story. And going back to some of these difficult issues with data, I mean, we all probably are familiar with Pii and how big of a pain in the butt that is. So that's like, first off. But I remember when I was talking to a friend that I mentioned before who was working at Babylon data, Jeremy, and he was saying, when I talked to him back in 2020, he mentioned how one of the engineering issues that he was looking at in that moment was how to keep all of the data that is created in Canada. In Canada, they don't. He didn't want anybody else to be able to touch that because that's illegal.

Demetrios [00:11:42]: But he still wanted all of the ML engineers to be able to train their models and have the most robust datasets that they possibly could have without, like, jeopardizing that Canada dataset. And so those are some. Yeah, like, data access issues that are not easy to figure out.

Eric Landry [00:12:05]: Yeah, that's a real challenge. And, you know, the issue is that you cannot transfer data between regions, so you can't transfer data from the UK to the US to train a modeler or inference on a model. So you had to come up with strategies around how to keep the data in place while you developed your solution. A lot of times we had different. Sometimes we had different models and a lot of times it makes sense because the data is different, it's a different shape and different rules that can be applied to it. We also had experimented with federated learning, where the data stays in place. But you train the model and so you're not transferring the data between regions, but you have agents, I guess you would call it agents that live in the different regions, and then what's transferred are the weights of the model to come up with the optimal trained model.

Demetrios [00:13:13]: And did that go forward? I think you mentioned that wasn't the most successful POC.

Eric Landry [00:13:20]: It didn't go forward. I think in these enterprises, different things take priority. That actually never surfaces a priority. But it was a great experiment and we were quite pleased with, with the work we've done, which is pretty cool because it also, it's not just between regions, but you can train a model on your phone. So if you've got 100 users, thousands of users, their data never leaves their device. But we can train a model based on all that data, which is much.

Demetrios [00:14:00]: More appealing for things like healthcare or if it's like things that I'm saying with my therapist, I prefer that data to be my data and not have to send it out to get models trained and whatnot.

Eric Landry [00:14:15]: Yeah. Our PoC was on skin cancer images, images that were taken. We tested it on a mobile device, training to detect if there's skin cancer or nothing. So you can see, you can imagine the uses for this kind of technology.

Demetrios [00:14:33]: Yeah, 100%. And you did bring up something there on how it didn't really become a priority. In your experience leading AI and ML teams, what are the biggest priorities and how do you champion for them? How do you know which ones should be going to the top of the line and like being able to, with confidence say, yes, here's the metrics that we're looking at. We know that if we're doing this, we can have success.

Eric Landry [00:15:07]: Yeah. You know, a lot of these priorities come from the top down. Right? So you've got, you know, product, product people, you know, basically doing the market research and understanding where I. The best use of the resources are. I always treated it as we were equal partners in it and that, you know, I've been in the industry for a long time, I'm not a product manager, but I built a lot of products. Right. So, you know, together, you know, if you have the right relationship with a product manager, together you can decide what the priorities are and the challenge is how to balance that with. You want your team to feel enabled to experiment and explore creative solutions.

Eric Landry [00:16:03]: So how do you balance what the enterprise wants you to do versus what you think you can do and what you can build? So it's a tricky balance and of course you've got to play some politics to get through it all. And at the end of the day, hopefully it all works out. Your team? I always felt like my team should have agency to decide what they build within boundaries, but it can work in a reverse too. The engineering team should be able to propose new products and features.

Demetrios [00:16:39]: What are you working on now?

Eric Landry [00:16:41]: Currently working in a startup called Zoteo Health. Our focus is on bringing healthcare to underserved communities and what that kind of the why behind that is that. So we want to give these communities information about healthcare to keep them engaged in their health care. There are studies out that show people who are engaged in their health tend to live healthier lives, right? Yeah. And the reason why it's important in these communities is because these communities, like black and hispanic communities, at least in the US, they don't trust the healthcare system. I think something like 50% of blacks and Hispanics don't trust the healthcare system. And this kind of lack of trust in the system leads to them not being engaged in their health care. There's a study, I quote, often it's posted on National Institute of Health, that says there, I think it was, the year was 2018.

Eric Landry [00:17:53]: There were 100,000 preventable deaths at the cost of $100 billion to the healthcare system. Because people were not engaged in their health, they basically weren't taking their medications. So to me, this feels like pretty low hanging fruit. Everybody looks for shiny objects. How do we solve cancer? Whatnot? While that's super important, but this looks like, to me, keeping people engaged in your health seems like a solvable problem. So that's what our focus is on. And the technology we're building is basically what I'm calling conversational AI framework. Kind of.

Eric Landry [00:18:40]: The lead technology, of course, is a rag based chatbot, but it's built within a system. So I'm thinking about it holistically about what can these conversations tell us, not just about individuals, but what can they tell us about certain populations and segments within the populations. And so we can, from this, we can kind of begin to make inferences about how we can keep these people healthy. Right. Not just kind of in place, like statically analytics, which we will do on these conversations, but in real time. What are the trending topics, what are the trending terms, and what can we learn from that and how can we action it? So, just a simple example. In Austin, Texas currently, so we can build a window of. I used to do this at Expedia.

Eric Landry [00:19:45]: We were using elasticsearch to make real time inferences about certain things. And the way you do that is put one window against another in time. So the last week, compared to the last month, what are the trending topics or trending terms that stand out? So you can imagine, in healthcare in Austin, Texas, this week, coughs and fevers are trending, and it happens to be flu season. Maybe we should send a notification to the people in Austin that they should go get their flu shot. Right. That's a very simple example, but you can imagine the kind of features and things that we can create out in this.

Demetrios [00:20:36]: So, I understand correctly, engaging in your health is primarily just going to the doctor, it's taking medication that's prescribed to you, and it's that kind of stuff that some of us might think like, yeah, of course that's what you do, but you're saying that that's not the common thing that some people do, you would think.

Eric Landry [00:21:03]: But, yeah, I mean, you know, some people have, you know, for example, diabetes. You know, there's a daily. You know, some people need a daily reminder to, you know, test their blood glucose, fold. You know, some people need a reminder, you know, even me. Like, I generally stick to my annual checkups, but, you know, it slips. Like, I usually do it in the summer, but guess what? Last year, I did in December. You know, who knows what could happen in those few months, you know? Yeah, but it's this just kind of engagement, not just in kind of, like, daily thinking about your health, but also kind of annual checkups if you have for, you know, prostate cancer. There's the notion of people who have been diagnosed with prostate cancer.

Eric Landry [00:21:59]: There's a notion that some of them can be monitored. You know, there's no treatment. They're just monitoring their health, and they need to stick to the plan. Right. To keep them healthy.

Demetrios [00:22:12]: Okay. And I have been known to talk a little bit of shit on rag chat bots because I think they, for the most part, aren't fully thought through, and the use cases are not that high value in your use case, if it is keeping people engaged and getting them to take their pills or go to the doctor and it's saving lives, like, the stakes are very high there. I think that is a great use of a chatbot.

Eric Landry [00:22:49]: So you just scratched an itch right there. Like, yeah, I'm with you, man. There's so much hype around this. There is value that's undoubtable, but there's so much hype. And I feel like there's a whole lot of attention paid to different types of retrieval, all different lang chain and all these technologies that I feel like we haven't really figured out how to get the most out of what we have. Um, let's step back and understand what it is we have and how we can apply it to a real use case. Um, I I'm with you. That, like, people think it's magic, man.

Eric Landry [00:23:40]: It's not like. And. And we discussed earlier about guardrails. There have to be guardrails, especially in something as consequential as your. Your health. Right. You know, we're primarily, you know, seeking engagement with our population through notifications and information. A lot of my experimentation has been around how to limit hallucinations, how to treat people, and engage them in a conversation where they are curious about their health.

Eric Landry [00:24:18]: How do we lead them in a conversation about their health? We have healthcare providers and professionals on our staff. Then I talk to them and it's really interesting to me how doctors think about when somebody walks in the door, how do they size them up and how do they understand how best to help them and support them in their health. And so a lot of the experimentation I've been doing, aside from how to limit hallucinations, is how to engage them in a conversation, especially these underserved populations that don't trust the healthcare system. How do we get, you know, bring trust to them? So first of all, our knowledge base is a curated, trusted knowledge base. So any information we give them is from that knowledge base and limiting the knowledge. So basically, I'm not using the LLM for its knowledge. I don't want its knowledge. I may as well go and search Google, right? I want the knowledge for a solution to come from the knowledge base of our curated information, the high quality data.

Eric Landry [00:25:29]: High quality data. Right. That our healthcare professionals have advised us on. Right. And how do we give them that information in a way that's meaningful to them? It doesn't like their eyes don't gloss over when we give them a whole bunch of technical information. The doctors on our staff that I've talked to, they basically say, give them information that they need a bit at a time. Somebody asked, how do I get tested for prostate cancer? Or how do I get tested for diabetes? You don't give them three paragraphs. You give them the basic information.

Eric Landry [00:26:12]: There are three types of tests you can do for prostate cancer. Allow them to then ask more detailed questions about the PCA test and then lead them through it. Prompt them to the next question to ask. That's more meaningful to the patient than just giving them three paragraphs, you know?

Demetrios [00:26:36]: Yeah. Try and make it bite sized and easily digestible.

Eric Landry [00:26:41]: Yeah. And that's, to me, that's a lot of the challenge, you know, a lot of what I see, the information I see is like, how do you respond to a question? Well, cool. Like, we have a really good way to optimize question and answer, but how do you optimize the conversation? Right. So think of it as a conversation, not just a question and answer. Right?

Demetrios [00:27:05]: Yeah. And that's where the theories on how much to give someone come in. And you don't wanna just give them this blast of information, which is hard with LLMs because they're constantly being super verbose.

Eric Landry [00:27:20]: So, yeah, so that's a challenge. That's actually also something I've done a lot of experimentation on. Is tone of voice like how you speak to them?

Demetrios [00:27:30]: I was going to ask because of the different colloquialisms in the underserved communities or underrepresented communities. Are you trying to put on different slang or accents or is that like a step too far? Is it seen as cheesy? Have you experimented with it at all?

Eric Landry [00:27:56]: Yeah, not in that sense. What I've experimented with is how to detect basically who that person is. Right. Not necessarily. We should have a. We intend to have a profile of the user, so we'll know their race, we'll know their age, but how you can make inferences about not just level of education, but how they want to be conversed with. Right. And we don't.

Eric Landry [00:28:30]: Right now, our tone of voice is basically a standard tone of voice represented by our branding. Right. The zetao branding. But we intend to do experimentation about how we converse with people. Like, do you, you know, do you want to talk with yourself? Like, you know, do you want to talk to somebody that has the same tone of voice as you? Is that how you best respond and how you are best engaged in a conversation or somebody opposite? You know, I. You know, that's kind of experimentation that we want to do, right?

Demetrios [00:29:07]: Yeah, I wouldn't trust it. If it was talking like me, it.

Eric Landry [00:29:12]: Would be like, wait a minute. Most people would probably be freaked out by talking to themselves, you know?

Demetrios [00:29:18]: Yeah, completely. But that's fascinating to think about. Like, does it give you more credibility depending on the tone? And if so, do you want to be more formal? Because that's going to give you more credibility. Or do you want to be more informal and speak like people do from the different accents or communities that they're coming from?

Eric Landry [00:29:44]: Yeah, exactly. And how do they best respond? You know, maybe somebody without an accent, let's say a hispanic accent, responds better to somebody with an american accent or something else, who knows? Right. That's kind of the experimentation that we need to do. Right.

Demetrios [00:30:04]: And how are you evaluating this?

Eric Landry [00:30:07]: Yeah, so I've built a framework for evaluation based on Ragus. I'm using. I've done an integration with ML flow to track the experimentation. So baseline, we, you know, certain things that we're looking for, you know, obviously a lot of the, you know, faithfulness for detecting hallucinations and bias. What I found is that a lot of these kind of basic bias detection mechanisms don't test, don't inform all bias. I found a dataset on hugging face. It was basically labeled set for bias against different cultures. I tested some of those utterances against an LLM and then the bias detection, and it didn't catch some fairly heinous things that were spoken, you know, so.

Eric Landry [00:31:08]: So I wind up writing some, you know, my own bias detections for those cultures. Right.

Demetrios [00:31:15]: And that's just hard coded in there as guardrails.

Eric Landry [00:31:18]: Yeah. It's not. It will be. But we. What I do, my process is that I'm a testing against a ground truth set of data I have. And before I deploy, I do a test to make sure it's good against that set. I will also be monitoring in production so we can send alerts if somebody starts screaming crazy shit like, you know.

Demetrios [00:31:49]: Funny you say that, because I was at a conference last week, our AI quality conference, and somebody showed me a screenshot of a support bot interaction that their company was having, and the person literally was writing in all capital letters, give me a fucking human, you dumb fuck.

Eric Landry [00:32:13]: Computer.

Demetrios [00:32:14]: And then the LLM answered, like, something that was not getting them the human, and that just infuriated the customer. Right. It's like the last thing you want to do is not get them the human and they're just going apeshit on it. And so, yeah, that's what you want to try and avoid. And I like the fact that you set up monitoring so in case that does happen and something slips through the cracks, you know, right away, and you can. Presumably you can step in and say, hey, sorry about that. Let's take a few steps back.

Eric Landry [00:32:54]: Yeah, yeah. I think there also, I'll add that I don't know if I've mentioned it yet, but I kind of view it as at least two types of bias. There's this kind of, like, subjective type of bias, which is kind of what we're talking about. I mean, some of it's obviously that subjective, but cultural sensitivity kind of things, right? But there's also bias. Like objective, clearly objective bias. For example, the medical community advises the general population to start getting tested for prostate cancer at the age of 50. I keep saying prostate cancer because our mvp is going to be basically focused on prostate cancer, and we'll build out from there for different types of disease. We don't want to inform a black male to start getting tested at 50 because the guidance is the age of 40 to 45 because they have a much higher incident.

Eric Landry [00:34:00]: The black and hispanic communities have a much higher incidence of prostate cancer. So there's this kind of objective measurements that we have to take also data quality type measurements.

Demetrios [00:34:14]: It seems like you decided to play the game on hard mode. Not only is healthcare, the healthcare space, very complicated and very hard, but now you're working in a space that is tailoring to these underserved communities. And there's a lot of difficult areas in that too, that you have to navigate through.

Eric Landry [00:34:38]: Yeah, yeah, I think, yeah. So a little bit about that for me, you know, when Babylon shut down last summer, I was kind of in a burnout. You know, the problem. And by the way, this is probably about the third one I've been through like this. Right? That's if you're in technology, guess what? Prepared. Pair yourself. So the CEO, Kevin, had contacted me, and I was kind of the point of like. And had been numerous contacts to looking to fill a role.

Eric Landry [00:35:20]: Basically, this one checked several boxes for me. The healthcare kind of decided that's where I want to play. You know, that the unfairness in our healthcare system, certainly in the US, speaks to me. It kind of like scratched several itches, and that's why I kind of decided to go with this. And aside from. Yeah, okay, it's a pretty interesting technical challenge as well. And that, that was kind of the trigger when I started thinking about it was like, man, you know, there's data sets out there that would be kind of cool to evaluate and apply against this bias problem. And, you know, that's when.

Eric Landry [00:35:58]: And that's the one that kind of put me over. I'm like, okay, yeah, this is a real cool problem to solve, too.

Demetrios [00:36:05]: Well, what's fascinating also is that it's not like you have the technical challenge of going out there and solving cancer. It's like the technical challenge, you know, that if you can just get people to take their pills or scan their heart rate monitor or whatever it may be, these kind of low lifts for people, then it can lead to a much longer life and a much better outcome. And so the technical challenge is more on the implementation of what is already in the world. Right. It's not like this cutting edge bioscience, or you have to go discover a new drug, any of that. So that's also another piece that I find fascinating. And, yeah, now you get to navigate more nuances of what kind of tone of voice are we using to get people to do things? And I imagine. Are you bringing in psychologists too, to try and help with this stuff?

Eric Landry [00:37:12]: Or.

Demetrios [00:37:12]: Or good copywriters, marketers that can sell ice to an Eskimo?

Eric Landry [00:37:17]: Yeah. Yeah. I should say. I should mention we do have a. I don't really know what her title is, but she's a writer and she's advising on tone of voice. And basically I gave her a whole set of questions and answers from our chatbot and she made a lot of corrections on it. So now I've got to go back and see if I can fix it. That's the challenge, is trying to figure out.

Eric Landry [00:37:43]: And by the way, I say this before, but, um, I think prompt engineering is an oxymoron, because to me, there's. There's not a whole lot of engineering principles that at least I can find to apply to it. You know, that's a great point.

Demetrios [00:37:59]: It's throwing spaghetti at a wall. Really.

Eric Landry [00:38:03]: Maybe I'm doing it wrong. I don't know. But, like.

Demetrios [00:38:07]: Yeah, so it's.

Eric Landry [00:38:08]: Yeah, I mean, it's. See, you know, the interesting thing is, it does seem like, you know, technically it should be an easy solve, but changing people's behavior is really, really hard.

Demetrios [00:38:21]: Yeah.

Eric Landry [00:38:22]: You know, there you go. And kind of. It brings my mind. One of the other things we're looking into is, you know, all these wearable devices using those to monitor people remotely and then sending them a notification and entering them into the chatbot to kind of inform them about the metrics that we see from their monitoring device. This is actually something that my team had done at Babylon. We were looking at remote monitoring for blood pressure, and we had a stream of data that if we detected some metric out of bound for. For blood pressure, we would send a notification on the app, and then that would put them into a chat with our chatbot and inform them about, you know, what it means and give them information. And if they would like to speak to a live agent to schedule an appointment.

Eric Landry [00:39:26]: So we're actually looking at something similar here at Zetao Health as well, using a remote monitoring device to create a new conversation so that they'll, again, engagement in the healthcare. Right?

Demetrios [00:39:44]: Yeah, I wear a whoop, and I find it a little bit annoying when I get the notifications from Whoop that will tell me how my sleep went. And I also have a, like, 13 month old daughter, and so my sleep isn't the best, actually. My daughter wakes up probably, like, three, four times a night still. She's got really bad sleep, and my wife takes the brunt of it. She's the champion here. But I still am waking up every once in a while. And so, whoop. Will send me these notifications.

Demetrios [00:40:20]: Like, your sleep could have been better last night, and it's like, these passive aggressive things, like, you might want to take a nap today. Yeah, it's like, man, I already feel like shit. You're not helping the scenario here. Like, what's your deal?

Eric Landry [00:40:33]: Whoop. You're giving me.

Demetrios [00:40:34]: So many, like, trying to just break me down. And so I feel like I am mentally stronger because of all of the abuse that I've had to take from this whoop notifications over the last year.

Eric Landry [00:40:48]: Yeah. Hey. Yeah. A good friend of mine, his mother has diabetes, and he was monitoring, and so he kept getting woken up in the middle of night because her glucose level was dropping below some threshold. And so. So he had to kind of figure out how to set her diet up, but he was losing sleep because his mom wasn't paying attention to her health.

Demetrios [00:41:09]: Oh, man. So these are the things. Now, I just wanted to touch on this evaluation again, because I think one thing that I've. I've thought about, and I don't know if this is possible. I imagine you've looked at it when it comes to guardrails, and it comes to. I know you mentioned you're using ragas right now. There's also other options out there, like one called guardrails AI. And I know there's Nemo guardrails from Nvidia.

Demetrios [00:41:40]: There's a ton. Right? Have you not thought about, hey, let's just throw everything at it just in case, and so it can catch every single edge case. Maybe ragas is more. It's stronger in this area. And then guardrails, we can set up our specific guardrails we want. Or maybe, I think they have, like, templates for different verticals. And then Nemo guardrails has this strength. Or is that just too redundant and overkill?

Eric Landry [00:42:09]: Well, I think so. I'll just kind of describe my thinking in my thought process. So I'm using Ragus in a development phase. So, for example, I create a baseline of how this thing is behaving using ragas. I'm logging those in ML flow. So I have a history of those metrics. Right? And so I have a baseline. And just to be specific, I started out using OpenAI chat, CPT 3.5.

Eric Landry [00:42:54]: I kind of did some kind of testing anecdotally with different models, but this one just seemed easy, and it was the best for me at that time. Then I started doing this study on hallucinations. So what I did now have a baseline, right? And so I wanted to. I started with the LLM. Okay. So I ran the same tests against literally every LLM I could find. Well, I should say that's not kite. That's a bit of hyperbole.

Eric Landry [00:43:30]: Every LLM in Amazon bedrock, right?

Demetrios [00:43:34]: Yeah.

Eric Landry [00:43:36]: So now I have a baseline on a dozen different models and the metrics that I care about with regards to hallucinations are the ones I'm paying attention to, specifically faithfulness. Right? And so that helped me decide which LLM to use. And it turns out. So that was, I think anthropic sonnet is the one that, based on my test, and I'll say this, don't trust me, trust it with your own data. Everybody out there, this is not an easy pass. You still have to do the hard work. Use it on your data, because it may behave differently depending on your data. That created a baseline based on that model.

Eric Landry [00:44:25]: Then the next thing I did was testing against different embeddings, models for the retrieval and different of the parameters on the retrieval, the k values, etcetera. What I've been able to do is create a baseline incrementally move it based on the metrics that I'm paying attention to. That's how I'm using raguse. I'm also using it to monitor, and I'm not going to say, I'll say production, but we're not in production yet. We haven't delivered it in MVP yet. We tend to do a pilot in October for Mount Sinai Medical center with Mount Sinai Medical center. So we'll start monitoring then. We're probably not going to have much throughput to really tell us a whole lot, but at least it will start to give us data.

Eric Landry [00:45:23]: So I'm using Ragus to measure against a baseline and to monitor in place, and then to take it to the next part of that discussion about the guardrails. I'm experimenting with Nemo guardrails. This is the Nvidia open source project. I'm looking at that for several things, and I'll tell you one of the reasons why I like it. So the guardrail is basically, I'm using it to test for prompt injection security violations, to make sure that chatbot doesn't say something stupid. And so, you know, the guardrail will capture that. And, you know, I can either change the response or, you know, make some statement about, like, sorry, don't, I can't respond to that, whatever, like to be decided. But what I also like about it is something, what we had discussed earlier, I don't remember if it was on or offline, about the need for something between intent based chat bots and LLM based generative based chat bots.

Eric Landry [00:46:41]: Because in some cases, I need to know what the response will be. Right now, mind you, the intent is going to be predictive, but at least I know what the response will be because I'm writing it out myself or whoever our product people or clinical people are writing it out, how to respond to specific intent. So that's how I intend to use guardrails, not just protect as guardrails, but also for an intent based type of response.

Demetrios [00:47:16]: Yeah, it does feel like anytime anyone wants to know information about other people and not themselves, that is a, that's ripe for guardrails. But then it gets tricky again because it's like, well, what are other people my age having problems with? Those kind of questions can come up. But if it's like, what is Eric having problems with? Then the chat bot should be like, oh, well, Eric's fine, and you can ask him yourself. I'm not telling you anything about Eric, but if it's like, oh, yeah, people 40 to 45 generally are looking after these things, or these are the number one causes of health troubles in your area or of people in your demographic with your weight, height, that type of thing and youre background, whatever it may be. So, yeah, it's a, it's a fascinating problem to think about. And I'm sure once you put it into production, you're going to see all kinds of ways that people are trying to mess with it.

Eric Landry [00:48:25]: Dude, I've already warned everybody, man. Like, I've done all my testing. We've got our ground truth that is like, you know, medically certified. I guarantee you, having done this for a long, long time, we put it out there with real people. If you think it will go wrong, it certainly will help. I've learned that the hard way. Now, mind you, we're not going to have the throughput like Expedia we had, I don't know, some of the models we were exposing to up to 2000 requests per second. We're not going to have if we have 2000 requests in a week.

Eric Landry [00:49:03]: Cool.

Demetrios [00:49:04]: You know, so you can kind of babysit it, especially in the beginning.

Eric Landry [00:49:08]: Yeah.

Demetrios [00:49:09]: And especially if you're working with one hospital or healthcare center, then it's really easy to be making sure that, like, all right, cool. This hasn't done what we didn't want it to do.

Eric Landry [00:49:23]: Yeah, yeah. We'll be monitoring. And in fact, what, when we started the pilot, what we intend to do is start with a subset. Start with 50 users. See how it behaves, see how they behave, see how they react to it. They may hate it. We might have to start all over again. Who knows? That's an argument for getting exposure as early as possible.

Eric Landry [00:49:50]: They may love it. I'm expecting a lot of it open.

Demetrios [00:49:55]: Fingers crossed.

Eric Landry [00:49:56]: Yeah. But we'll incrementally, you know, we'll learn from that, you know, have several cycles while we'll learn, we'll, you know, incrementally increase the traffic exposed, and we'll learn more each round of, you know, incrementally adding new users to it. Right?

Demetrios [00:50:11]: Yeah. Well, I think it is awesome to hear, and you're adding so much value with a rag chat bot that I usually will talk about how it adds no value to companies because you get the HR experiences where it's like, great. Now, you spent all that time, all that effort on that rag chat bot, and your employees know if they have twelve or 15 days of holidays that you could have probably solved with search. But this proactive experience of trying to get people to be more engaged with their healthcare resonates with me. I love the vision, and I thank you for coming on here, Eric.

Eric Landry [00:50:52]: Appreciate it.

+ Read More

Watch More

Language, Graphs, and AI in Industry
Posted Jan 05, 2024 | Views 1.1K
# ROI on ML
# AI in Industry
# Derwen, Inc
A Journey in Scaling AI
Posted Mar 30, 2022 | Views 762
# Scaling
# Centralization
# Model Serving
# Data Platform
# Ocado
# Ocadogroup.com
Scaling AI in Production
Posted May 19, 2021 | Views 546
# Machine Learning
# ML Systems
# AI
# AIEngineering
# bit.ly/AIEngineering