Building Agents for Healthcare // Lars Maaløe // Agents in Production 2025
speaker

Lars Maaløe is co-founder and CTO of Corti. Maaløe holds an MS and a PhD in Machine Learning from the Technical University of Denmark. He was awarded PhD of the year by the department for Applied Mathematics and Computer Science and has published at top machine learning venues: ICML, NeurIPS, etc. His primary research domain is in semi-supervised and unsupervised machine learning. In the past, Maaløe has worked with companies such as Issuu and Apple.
SUMMARY
Healthcare is one of the most vital and far-reaching sectors in our society, touching every individual at some point in their lives. Yet, it faces mounting challenges: rising administrative burdens, increasingly complex disease patterns, and growing patient volumes strain already stretched systems. In this talk, Lars will explore the untapped potential of AI agents to address some of healthcare’s most pressing real-world problems. He will present Corti’s unique approach to developing domain-specific agents equipped with healthcare-relevant skills—engineered not only for impact, but within a framework that places governance and safety at its core. Join us to learn how AI can be responsibly and powerfully deployed to support the future of care.
TRANSCRIPT
Lars Maaløe [00:00:08]: Thank you for inviting me and thank you for wanting to listen a little bit to me here. So I am not coming so much from the ops community. My background is machine learning and research and stuff like that and building big beefy models. And I've been researching within machine learning from before generative AI was a thing. At least I wrote my research in generative AI and had to explain a lot of people what it was all about. And then ChatGPT came out and that explained it much better. So I'm just a big fan of machine learning and a big fan of what these models can bring. And that made me be one of the co founders of a company that wanted to bring machine learning models to healthcare.
Lars Maaløe [00:00:47]: And that is because I have an engineer in me in terms of building cool technology that actually works in production. And secondly, I also want to see if we can contribute to something good out there in the world. And I believe that healthcare is definitely a place that could need some and benefit from some insightful and intelligent technologies. So a little bit something first and foremost on healthcare. Don't want to bore you too much on this, but I think that it's important statistics just to make sure that we know why healthcare is such an interesting field to approach with agenda technologies. First and foremost, there are a lot of patients for only very few clinicians out there. And those patients become more and more because we have an elderly population that is getting older and older and this elderly population is therefore also in need of more healthcare treatments. We see healthcare services market that is strained.
Lars Maaløe [00:01:44]: They are under massive burnouts. They're under massive burnouts because of their workload, the many patients and also the treatments and so on that are becoming more and more complex by the year and more and more complex. Treatments is of course good for us as patients, but it also takes more and more time and it takes more and more training and so forth to actually make sure that you can onboard the newest, newest knowledge still in the healthcare system today. Even though that people many years ago started to use computers, et cetera, they are facing a massive burnout due to. Due to the administrative tasks that they're facing. They're. There are numerous studies on this. We picked this number from one of the studies.
Lars Maaløe [00:02:27]: But the studies are reporting everything from 25 up to 50% of the time spent by a clinician is spent on documenting and the administrative tasks. And these are tasks that they're not trained up on. They're trained upon facing the patient and having that direct conversation with the patient and helping the patient Furthermore, there is a massive increase in terms of administrators versus the physicians in the healthcare system and that is due to this, this increasing administrative burden that is on the system and it seems like no digital system so far has really helped on limiting that. It has of course, a big effect on the conditions in the market and where a large percentage are contemplating on leaving healthcare for a more peaceful job within the private industry or similar to the adjacent industries to the healthcare provider market where they can maybe also make the same amount of money and have a normal work distribution. We are missing healthcare professionals going forward and it doesn't seem like we are able to attract all of the healthcare professionals that we need for the future. So we as a company, we believe in providing foundation models and infrastructure to build the safe healthcare AI. And the big promise for this safe healthcare AI is first and foremost to focus on patient safety and secondly to ensure that the administrative burden is lowered. A lot of the administrative tasks should be able to be automated away by smart AI systems and then finally that the documentation integrity is as high as possible.
Lars Maaløe [00:04:18]: We have been in business for quite a while now. We were founded back in 2016. We started out as a research project and we are now at a state where we are, we're recovering 92 million patients patient interactions each year and that is ever increasing, especially in Europe and the US where we're getting more and more coverage and we are getting coverage within everything from the provider space with the hospitals, the inpatient outpatient space to the, to the individual clinics to emergency medical services, when someone is calling into a number, one number similar all the way to the payer space, the insurance and so forth. So what is it that we provide? We provide first, foremost at the core things, we provide a big foundation model. And that foundation model has been trained on audio and textual data and classification tasks at hand. It's a compilation of a lot of different functionality and we train it in segments here. This foundation model is structured from everything from large language models for text based large language models, to audio based models, to all the way to classifiers on top and so forth. And this is quite an intricate orchestration of different model parameters that are communicating in various shapes and forms.
Lars Maaløe [00:05:42]: We are training a lot of the model parameters on top of also open source parameters. So if a new large language model is coming out there in the open source environment, then we're looking at those parameters, we can distill, we can do so on, so on in terms of actually making sure that we are building on top of the best parameters in the market within our healthcare specific foundation model. Here we are latching this foundation model onto a big pile of memory. This is first and foremost contextual patient information. So there can be historical patient engagements, interactions and so forth to structured information about diagnosis, about lab results, et cetera. And then we are also latching on to guideline data, general guideline data about best practices, triaging guides and stuff like that. That interaction between a really, really beefy set of machine learning models in the foundation model layer, together with that memory is making for the core recipe of the rest of our system. And the rest of our system are a lot of workflows and tools that are built in terms of supporting with a lot of different use cases within healthcare.
Lars Maaløe [00:06:55]: Here are some examples of all of the use cases that we can support with. And that's everything from building triage agents. So for instance, for a nurse that sits there and asks you some questions and want to make sure that he gets the right answers towards those questions in terms of escalating a patient or not towards dictation use cases for a physician that is sitting there and wanting to dictate the patient node to ambient scribing. Maybe some of you have heard about these ambient documentation cases where you open a microphone in the room and when you have that live conversation with your doctor, then a system is listening in and transcribing and structuring the documentation. At the end of the day, to save that administrative burden all the way to research agents that are going and understanding what is being said and so forth, and can research in a large database of everything from research articles to historical guidelines and so on. So a lot of use cases are embedded in this and of course all of this packaged as a part of our API where we have streaming interfaces and async interfaces for whatever application you want to build. So for the ones that want to build text based agents or voice based agents, they can actually tap into our API. At the core of what we do and at the core of our innovation in terms of these, these agents is that healthcare is difficult.
Lars Maaløe [00:08:24]: And for ensuring that we get the right information out of a conversation, you need to have, you need to be able to extract the right clinical information from that interaction. And there we have invented what is called facts R, which is a reasoning agent that is going into any type of healthcare dialogue material. Patient journal, like patient records can be a hundred pages, hundreds of pages long. And Faxara can go in and find out the clinical findings and extract those clinical findings for downstream tasks. That gives the ability to for instance.
Speaker B [00:09:05]: Most healthcare AI apps process transcripts after the visit, leaving clinicians with bloated notes and hours of edits. That's because they are built from general purpose models that don't understand the complex needs of healthcare. Facts R is different. It's the latest part of our API infrastructure for healthcare builders. Designed to power smarter, more trusted healthcare AI apps. It helps apps extract, vet and refine clinical facts in real time, surfacing vitals, medications, history and more. As the conversation unfolds, clinicians stay in the loop, guiding the AI, not cleaning up after it. The result? Clear EHR ready summaries with 49% less missing content and 86% less noise.
Speaker B [00:09:50]: Building with facts R makes healthcare AI apps faster, safer and more clinically useful. Start building with Facts R today. Head to Corti AI.
Lars Maaløe [00:10:01]: So basically FAXR is the ability to extract these clinical findings from any source of information and then you can start building any kind of agent on top of that. And to give you some examples of how our agentic workflow is happening and working, I'm just going to make it very, very a simple run down here. So basically instead of, instead of just having a user that prompts in a limb or a user that prompts in a limb similar to this interface has its its limitations. First foremost the LM will never say I don't know which is kind of a problem inside healthcare because you want an LM2 or you want the response to the user to be to. To be as accurate as possible. And hence you don't want the big wall of text from GPT4 that seems very convincing, but it can be one large hallucination at the end of the day. Instead we have built what we call an orchestrator here, which is a supervisor that can go in and understand the prompt and from understanding that prompt it can first foremost wrap back governance layer and making sure that everything that it says that is coming in and out of this orchestrator is vetted against the governance layer. All of that is audited.
Lars Maaløe [00:11:18]: So hence we can live up to the regulatory demands of the healthcare system and make sure that nothing is coming out of the orchestrator that should not come out thereby. A user can ask for instance for find references in the transcripts to diabetes and the transcript here could be from a dialogue with a patient. An orchestrator can then go and say I don't have any information about the transcript. I should retrieve the transcript. I found two mentions of diabetes and it can do that by latching on to some of our experts. In this instance it would latch on to our interaction expert, which is another LEM that is coupled together with a database that would go into the historical interactions and being able to extract the the findings hints or the findings given the prompt. So the orchestrator is thereby able to reason on the user query. It's also able to go back and ask the user for more information in order to actually find the relevant piece of information.
Lars Maaløe [00:12:26]: When the user then can or the user can then add more information or more specific prompts to this and what our orchestrator can then do, it can latch several prompts onto it and it has an agent to agent protocol interface with these experts and it can now tap into another expert. In this case the user is asking about predicting the diagnosis code. So the orchestrator would know that this is about clinical coding and it can then go into the revenue cycle management expert that is another set of has actually several different LLMs that are looking into several different databases of clinical guidelines, etc. In order to actually make sure that it can code as well as possible. This coding space is approximately 140,000 thousand unique codes that it's going into and it's finding the given the interaction and the clinical node and then try to code it as good as possible and report back with these codes to the user. We have a large set of these experts based on our API, based on our functionality in terms of going into references and references here is a lot of medical research IT guidelines databases to these RCM experts. They can go into the coding data databases all the way down to form experts that are also specialized in building out or writing out different forms. For anyone that has been in contact with the healthcare system.
Lars Maaløe [00:13:56]: They know how many different forms there are to EHR experts that can go into the electronic health record system and find the useful information from that system, talking to the protocols of that system in terms of actually interacting with that. So real integrated AG218 interfaces here and everything can also be latched on through an MCP protocol here in order to for anyone that wants to latch onto also our system, they can latch onto that system through mtp. And we're also using MTP embedded in our system. So what does all of that mean? That means that you can start building quite interesting and exciting healthcare specific agents that can help with various things here. In this example I'm preloading an example around diarrhe. We blooded the stool. I'm sorry. And then first foremost I'm asking to summarize it and there it goes into the interaction context and then I'M asking to code it and there it goes into the.
Lars Maaløe [00:15:00]: The medical, medical coding agents that are then coding this interaction.
Demetrios [00:15:05]: Just one thing. Is it possible to make it a little bit bigger?
Lars Maaløe [00:15:09]: I can try.
Demetrios [00:15:10]: Oh, or maybe it's a video. All right then, never mind.
Lars Maaløe [00:15:14]: Don't worry. I cannot. I can also go and do it live instead. But yeah, let me know if that becomes the questions here. But it shows you can do quite a lot of things here at the end of the sentence or the secrets here. I'm asking it to come up with some questions to ask the patient. So you can start making an agent that is also supporting the condition in terms of the dialogue with the patient. And finally making it.
Lars Maaløe [00:15:44]: You can start building like real automated agents where people are more and more people have started to use voice or build voice based agents based on our systems and also chat based agents based on our system. So back to being hopefully more or less on time here. So basically I think that one of the things that we have learned in the last period is that building healthcare AI is first and foremost difficult. You need these very tailored models that are fine tuned and structured towards the healthcare domain. They need to be as accurate as possible. Hence facts are. And that introduction in terms of being able to extract clinical findings as good as possible, you need to have the best speech recognition systems in terms of actually recognize all of the medical terms that are set with. Within the, within the domain.
Lars Maaløe [00:16:41]: And for the, for the agents, you need very specialized agents within healthcare. They can. That can tap into specialized domain knowledge. You need to make sure that those agents are fine tuned in terms of making sure that they can find the right clinical information. Anyone that has played around with making a rag system, they know how difficult or how easy it is to get the first rack system up and running, but they also know how bad it is, especially when it needs to tap into a lot of information. And everyone that has worked with the context of a large language models then they know that the advertised size of the context is very limited in the real world. Especially when you have some more complex domains like healthcare, where you have a medical vocabulary that easily exceeds 100,000 different terms with different medication names and new medication names coming all of the time, new research coming all the time, etc. So building these agents is a lot of fun because, because now we can modularize it or we can build these different agents, these different tools and, and there are a lot to build for different use cases.
Lars Maaløe [00:17:51]: I think that's it. That's it from me. And if there's any questions, I'm happy to stick around.
Demetrios [00:17:56]: I've got questions. I've got a lot of questions for you. First and foremost, probably the question you get hit with with most frequently is how did you get the data? How did you clean it from pii? How are you like, is it a small sample of data? Because if you're training in a large language model you need a lot of data. So.
Lars Maaløe [00:18:23]: Yeah, yeah. So first, foremost we built our first products to, to, to our, our first go to market was to go into the virtual communication of. Because we knew that we would, no matter what, need a lot of data. And virtual communication is smart because you have one node that you can tap into and then you can get access to a lot of data points at the same time. So that was a way to get access to a lot of the data that we've worked a lot on, anonymization and derivation of data and that we're really good at so we can strip it from the sensitive data. And then finally we are working more and more on building synthetic data based on our learnings from the real world. So when you have a lot of examples from the real world, when you have a really, really good. For instance, this clinical coding system is enabling us to look at the full distribution of data.
Lars Maaløe [00:19:17]: I was talking about these 140,000 codes and we can then see where we want to build more data and we can build more data from close to and then resenders to real world cases and data that allows us to scale even more so we can start building a lot of data out there.
Demetrios [00:19:33]: There are a few regulation type questions. First is, is the product registered as a medical device according to FDA requirements And the other one is, is it HIPAA compliant?
Lars Maaløe [00:19:49]: Yeah, it's, it's definitely HIPAA compliant and it's living up to all the standards that it needs to. Also in terms of data processing and so on. On the medical device. No, it's not a medical device in the US and it shouldn't be in the US we're considering right now on applying for a medical device because we want to market some more of our capabilities. On the diagnostic side. We have done a lot of research on the fact that our models can be good at least assisting with diagnosis. You need to be careful on your words here and that would need to be regulated if we want to market that. So our research is showing some, some good results in terms of that.
Lars Maaløe [00:20:32]: So that would be a good requirements for the fda. Europe and the UK is a little bit different in, in the UK it's based on the MDD and in Europe it's ndr. And, and, and there we are. For the, for the uk it has been now a, a requirement to, for it to be a Class 1 medical device. So that is there in terms of these 18 scribes and so forth. So this regulatory environment is moving constantly. We have a quality management system in place and all of that. So, so ready for the regulatory requirements that are, that are coming at us.
Demetrios [00:21:04]: You know they're coming, they're just fucking sitting there licking their fingers. They're so happy to find you. That's hilarious. So there is, there's a great question here around. Can you speak to the security protocols and, and can you speak to the security protocols across experts, I. E. Access and patient protections?
Lars Maaløe [00:21:37]: Yeah. So we are, we are capable of deploying our entire, first of all, everything that is encrypted in transit and at risk. Right. And so forth. Want to go too. That's following the standards in terms of what it needs to be following. Then secondly, everything is locked and so we have audit locks on all interactions, which is super powerful within healthcare because you want to know, given a prompt X, what answer was actually posted and for what experts was it posted from so you can have a full trail and full traceability of the interaction that happens. And secondly, on the security, if I understand the question correctly, it is probably the right answer to the question or else please direct me is we are able to deploy our technology on any environment, for instance, and in any tenant, so to speak, and we can segregate the tenant completely and that's quite powerful for the use case of healthcare.
Lars Maaløe [00:22:47]: We've just come live with our sovereign cloud offering, where we have a wonderful case in Switzerland, where Switzerland have a relation that no healthcare data can come outside of the borders of Switzerland and therefore we can deploy our entire stack within Switzerland and we can deploy our entire stack for a different company that is then running that stack within their own firewalls, et cetera. And those firewalls doesn't need to be connected to the rest of the Internet. So a lot of flexibility in this and all of that flexibility. It's a lot of, through some engineering, but it's also needed within the healthcare market where there are a lot of requirements from big healthcare providers that want services to run without Internet access, for instance, and so forth.
Demetrios [00:23:37]: So there's a few questions that are kind of circling the same topic and almost like people are asking about those experts and actually when you brought up that slide, I Had a question too on the experts and. And how you go about creating the different experts. Is it the same base model that you're fine tuning? Are you doing some kind of graph rag on top of it? Is it just changing the prompts and then giving it access to different forms? Like that form one, which I totally relate to. There's a million different forms in healthcare, so I can imagine what kind of a boon that is for the medical professional channels. Can you give us a bit more context and understanding on how you went about creating the experts?
Lars Maaløe [00:24:28]: Yeah, I can give you a really, really complex answer and then help me out on getting less complex because I would love. It was just one element that we could set another prompt for and then we could solve the use case. To give you an example, just what I call the revenue cycle management expert, what we have in that flow is that we have classification models and these classification models are trained on top of an LLM body. So a parameter set that we have used for another training on documentation. So we've taken that parameter set that we have built a classification model on top in terms of predicting into this large corpus of 140,000 codes that LLM that is predicting those codes can then be changed to another reasoning LLM. That reasoning LLM is in a system where it can go into the indexes and other guidelines towards this coding interface and find the relevant information. And that reasoning can then go in and say the classification results look right or they look wrong or this is the reason why you should code differently because you can't code these two codes together because the new guidelines that were updated in 2025, it tells us that we are not allowed to do that, etc. Etc.
Lars Maaløe [00:25:53]: Etc. So a lot of complexity to that. So just from those two modules that could be in the entire like the presentation in itself to go into that, the first module that I was explaining on the coding prediction models here, they have spawned approximately five research publications out of that one. And the second one is two. And going on how we have built these coding systems and how they benchmark on the coding systems and how they benchmark against other machine learning models and whatnot.
Demetrios [00:26:24]: Those classification ones are just classical ML models.
Lars Maaløe [00:26:29]: Yeah, yeah. So the classification model is basically, we like to use an LLM as buddy for these classification models because we get a lot for free. So we can reuse some of that. And we can reuse it because a lot of this medical language are learned inside these large language models and there are all of these limp to big kind of papers and so on. We utilizing things similar to that in terms of making a proper ranking model. But yes, in the sense of if you refer to categorical cross entity in terms of optimizing it almost kind of.
Demetrios [00:27:05]: But yeah, okay, so sorry I cut you off. Keep going. We got the two models so far.
Lars Maaløe [00:27:10]: Yeah. So that's one of the flows of showing that RCM flow. So in that RCM flow, if you, if you want to, if you, if you're going into that tool and those or those experts and tools here, then then you can start having a jet based interface with that. And then you ask our classification model to predict the codes. You're asking our reason reasoning model to reason about those codes. You can also ask about getting information about the guidelines etc. We optimize those flows constantly and we have different parameter like we have a lot of model parameters inside those that flow. For our reference, like the other FL that I showed you on the slide here, I can also show that again was the.
Lars Maaløe [00:27:50]: If you want research agents, if you want to build a research agent within that we have tailored elements that are also looking into different research publications and that's also based in the same way. We can also utilize some out of the box LLMs for that and prompts. And we are quantifying constantly whether we're performing better or whether a new out of the box LLM is performing better. And if a new out of the box box LLM is coming out and suddenly performs better, for instance in German or French or something similar to that, then we have a good target to beat that one within that domain. So see the question unfortunately not too simple because there is so much to gain in terms of the accuracy that we can build within these experts and we want to build that accuracy. That's our ip.
Demetrios [00:28:38]: Yeah. Yeah. It's fascinating to think about that and also it's cool that you can can be benchmarking it against the different ones so you know, if there's space to, to run and I imagine that's what you're constantly doing. I think I understood you correctly when you said that you're really looking at you have a whole suite of LLMs. What I didn't understand is are these base models that you've gone and you've trained yourself or is it it you're grabbing some base model and then fine tuning off of it or both depend?
Lars Maaløe [00:29:19]: That depends. We're doing both. And we're also doing distillation. We're doing all of the tricks in the book and we're testing it out and so forth and I think that that's how everyone should work with it. For exact coding model that I was referencing before like all of our parameters has at some point point been trained as a base model and whether it has been distilled or whether it has been so on so on it would be dumb of us I believe being harsh to yet again have to train on the entire corpus of the Internet. So that's our analogy but where we are taking the parameters that we are training and putting them into a new configuration, a new model setting and so on. That depends on the expert and the service.
Demetrios [00:30:06]: Excellent Lars. This was awesome dude. I really appreciate you coming on here and you all are doing such great work. I love the fact that you're helping out and making such an impact in this space. Thanks for coming on. I look forward to us getting to hopefully meet in person one day.
Lars Maaløe [00:30:24]: Likewise. Thank you so much.
Demetrios [00:30:26]: All right man.
