MLOps Community
+00:00 GMT
Sign in or Join the community to continue

ML Engineers Who Ignore LLMs Are Voluntarily Retiring Early

Posted Jun 27, 2025 | Views 16
# LLM
# AI infrastructure
# Typedef
Share

speakers

avatar
Kostas Pardalis
Founder @ Typedef

Kostas is an engineer-turned-entrepreneur with a passion for building products and companies in the data space. He’s currently the co-founder of Typedef. Before that, he worked closely with the creators of Trino at Starburst Data on some exciting projects. Earlier in his career, he was part of the leadership team at Rudderstack, helping the company grow from zero to a successful Series B in under two years. He also founded Blendo in 2014, one of the first cloud-based ELT solutions.

+ Read More
avatar
Yoni Michael
Co-Founder @ Typedef

Yoni is the Co-Founder of typedef, a serverless data platform purpose-built to help teams process unstructured text and run LLM inference pipelines at scale. With a deep background in data infrastructure, Yoni has spent over a decade building systems at the intersection of data and AI — including leading infrastructure at Tecton and engineering teams at Salesforce.

Yoni is passionate about rethinking how teams extract insight from massive troves of text, transcripts, and documents — and believes the future of analytics depends on bridging traditional data pipelines with modern AI workflows. At Typedef, he’s working to make that future accessible to every team, without the complexity of managing infrastructure.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

LLMs are reshaping the future of data and AI—and ignoring them might just be career malpractice. Yoni Michael and Kostas Pardalis unpack what’s breaking, what’s emerging, and why inference is becoming the new heartbeat of the data pipeline.

+ Read More

TRANSCRIPT

Yoni Michael [00:00:00]: And our view is inference is the.

Kostas Pardalis [00:00:02]: New transform data started like slowly to become more of the product and not just like a back office thing that the company needed like to run the.

Yoni Michael [00:00:09]: Business platforms that exist out there. Don't treat these as first class citizens.

Kostas Pardalis [00:00:13]: It's not like, okay, you're not getting a bonus this month because your prompt was not good.

Demetrios [00:00:17]: Dude, who's going to be the pets.com of the LLM bubble? I'm excited because I was the metaphorical cupid of YouTube.

Yoni Michael [00:00:34]: Yes. Did it. Ultimate matchmaker.

Kostas Pardalis [00:00:36]: Yeah, you did it. Building a company is always a journey. Usually people talk about what they've done after the fact, so they tend to like to remove a lot of the gory details.

Demetrios [00:00:47]: Suffering.

Kostas Pardalis [00:00:47]: Yeah. So both me and Yoni, we, we, we, we came from similar experience of the market but from different angles. So I was working mainly like more let's say traditional data infrastructure. So I saw a lot of the data engineering work there, especially in the enterprise. Yoni came from data infrastructure again, but more from the ML and we started working on that. Okay, how we can rethink, let's say and build new tooling around working with data infrastructure for the problems that we have today. Because keep in mind that the dominant tools that we are using were created 12, 13 years ago. Completely different use cases.

Kostas Pardalis [00:01:37]: Spark, Trino, even the commercial tools like Snowflake, they are kind of similar. At the end of the day it's all about how we move into the cloud and how we do this big data thing which the dominant use case is bi, it's analytics. There is ML, but ML was always more of a niche thing. At the end of the day nothing compares to the BI market if we take it in just the numbers out there. But in 12, 13 years many things have changed. Many new workloads came in. There's like ML, what Yoni was working on today we have AI. We'll talk more about that later.

Kostas Pardalis [00:02:13]: There were things about embedded analytics. A lot of product related data started slowly to become more of the product and not just a back office thing that the company needed to run the business. And of course these tools were not built for that. We could kind of like pat them to do it. But as the market was demanding more and more and more, the pain was becoming bigger and bigger and bigger and we felt that like, okay, this is like the right time to go out there and start building something in this space.

Yoni Michael [00:02:46]: Yeah, no, I think you're right on there. I think now more than ever what we were seeing is teams saw the value in putting data products into production as quickly as possible because they realized the direct correlation that they have with business outcomes, the more effective they can put data products in into production, the better business outcomes they get. And it's not just now the Silicon Valley companies, right, if you think about it, it's all the non tech first companies also saw this value there, right? But the tooling didn't, didn't enable them to do that, right? So I could tell you how much time we spent like thinking back in my time at Tecton, like helping customers debug through their Spark logs, right? And working with them to figure out, you know, we would go and onboard them and these weren't Silicon Valley companies that had run Spark Clusters before. So here's Spark 101. This is how you, this is a Spark config. This is how you go through and.

Demetrios [00:03:35]: Get your boy, you're in for a tree.

Yoni Michael [00:03:38]: And at the end, what we saw is the main thing that's evolved from 15 years from now when the brilliant guys from databricks went out and built Spark, is that there's a couple things that have really changed since that time. And one thing hardware has evolved tremendously, which is Kostas and I always talk about this, is that at that time distributed workloads were a lot more necessary, right? Because you were a lot more limited on the resources you can get through aws. Now you can go and check out a huge instance on EC2 and DuckDBs come in and data fusions come in and they've kind of flipped the analytics world on its head where you can do a lot more in single node, right? And we saw that at Tecton too where we'd have these clusters and you'd be running like hundreds of gigabytes of workloads that weren't really necessary to have distributed Spark clusters there, right? You can really do a lot on single node. So that's one thing, One thing. And I think one of the catalysts that we were kind of identifying the other things, AI is come into the picture too, which has totally changed the nature of workloads, right? It's no longer only about structured tabular data, which we have great platforms for right now and more or less a solve. There's still a lot of challenges there, but now there's unstructured data, there's a lot of text data, there's images, there's videos. All the different modalities there that these platforms just weren't built for, right? They're first class citizens they're structured tabular data. They do that really, really well.

Yoni Michael [00:05:09]: But they don't have the ergonomics, the capabilities. They weren't built from the ground up with first principles having these characteristics of AI workloads in mind.

Kostas Pardalis [00:05:18]: The other thing that AI did was actually acting like a catalyst to the market and the industry. Because if we take and see what was happening in the past 12, 13 years, we have the analytics market which is huge, but still it's a backend kind of work, right? You have reports that you need to build and deliver. These reports are consumed by people who make decisions, blah, blah, blah. But it's not a product, it's not the customer facing product of every company out there. Now these things started changing gradually, but it was changing slowly. We had ML. ML was like part of that because when you have recommenders now you, the data becomes the product itself. Like you need to use the data in order to recommend and increase your revenue.

Kostas Pardalis [00:06:07]: Right. And so like BI doesn't do that. But the link between, let's say how the data is used and the value that's created is like kind of hard to identify. Right?

Demetrios [00:06:20]: That's been a traditional problem. And I know that there's so many teams that have talked about that. Like how do you champion for the work that you're doing if you can't draw a direct line from what you are doing to revenue generated or saved 100%.

Kostas Pardalis [00:06:35]: And especially the way that data teams are built, I think in the successful teams, take a company like Lyft for example, or Uber, like any Silicon Valley data driven company. And if you see the data teams in there, the layers that you have of people from the SREs that they take care of your AWS accounts to the data platform people who are making sure that there is always capacity, capacity is there for your sparks clusters, to the data engineers who are making sure that the data is delivered and then the analysts and the ML people and the data scientists on top. The guy at the bottom has no idea how his work is used actually to make a decision of how to improve the revenue that you have. That's part of what Yoni was saying also that the model that these technologies were built and the requirements in terms of the talent needed to maintain and scale them, it was not sustainable. If you want to make it accessible to everyone in the market out there, contrast that to how we build apps. If we were building apps like SaaS applications the same way that we did with data, the industry would be, I don't know, one tenth of what it is today and the people that are working there probably would be 2% of what is today. What AI did, because we started getting into the data, becoming more and more of a product, is that the AI came in and said, is everything about data now, whatever product we are building one way or another consumes data and spits out data. And it's not just the data we're used to as like Bjorny said, now we have also unstructured data that we can process like in ways that we couldn't do like before.

Kostas Pardalis [00:08:31]: And there is an opportunity there to make these technologies even more accessible to more people.

Demetrios [00:08:36]: Yeah, it's easier than ever.

Kostas Pardalis [00:08:38]: Yeah, exactly. So I kind of like accelerated. That's what I'm saying. Like, it's not like it wasn't happening. Like already the industry was leaving the SaaS and selling the software as a service, as a way like to build value. And getting more into the data is becoming the driving force of the next iteration of the industry. But AI came in and accelerated like 100x.

Demetrios [00:09:05]: What do you mean by that exactly? The leaving the SaaS? Because I've heard so many different versions and viewpoints of that idea of like SaaS is not the way forward.

Kostas Pardalis [00:09:16]: Yeah. So if we think about, okay, you need to be old like me to do that. Okay. Because you have to see how the market developed in the past, like 20 years. Okay. I'm revealing a little bit of my age now. So what happened with, let's say from the 2010 to 2020 where we had all these companies like the Salesforce became really big. The work days of the world out there.

Kostas Pardalis [00:09:47]: Right. What was actually happening is that we took all the activities that humans were doing and we tried to automate them using software and deliver that software over the cloud. The cloud was important because the cloud allowed for efficiencies in the financial point of view that you couldn't do it with other delivery methods. And that helped grow a lot and commoditize the software at the end. Right. So we are using taxis. Right. You go out there, hey, hello.

Kostas Pardalis [00:10:22]: Or call someone to come and pick you up. No, now you can use an app, like to do that. Right. So we kind of created platforms or ESOPs, for example. We have like Shopify. Right. We created for each activity that like humans do, like in business and their personal life, software platforms to make this process more efficient. And that was delivered through SaaS.

Kostas Pardalis [00:10:49]: We kind of saturated that. Okay. I don't think there's much left out there that we haven't turned into a software platform one way or another. So if we like to keep the growth of the industry the way that was, okay, what's next? Now the magic of turning every process into a software is that you catch a lot of data, you capture a lot of data, and this data is sitting somewhere. And now the question is, what do you do with this data? So the first thing, do BI reporting. Do your financials. Bi, by the way, for people who don't know, the first owner in the business of BI was like the cfo because it was like primarily the first need was to go and build financial reporting, right? We came into like, okay, now let's do marketing reporting, let's do sales report, blah, blah, blah, like all that stuff. And then, well, you know what, now that we have all the interactions of these people with our products, maybe we can build recommenders.

Kostas Pardalis [00:11:51]: And now we can automate and send like an email and be like, hey, you know what? I saw that you were looking into that stuff. I found this one that might be interested. Right? And this can drive, let's say, the behavior of the world to go and buy more. That's based on data. So we have all this data and we have also a lot of unstructured data. We have all these customer supports that people are calling thousands and thousands of people to tell their issues and all that stuff. But we couldn't work on that before, or at least easily enough. There is the opportunity out there to take all this data.

Kostas Pardalis [00:12:28]: It's an untapped opportunity to turn more and more value out of that. Now the question is, okay, can we do it with the technologies that we had? And actually it was like, it's kind of hard. And then AI came in. Actually, LLMs came in. Right. And they offered a number of tools and most importantly, in my opinion, to a much, much broader audience out there than ever before. So people now, they can work and do things with their data that previously was really, really hard to do. You had to be in Lyft or Uber or Apple.

Demetrios [00:13:10]: Yeah, I like that. Highlighting how mature the company's data teams and just overall team and vision of how to use their data needed to be before AI came. And then you relate that to app builders and how simple it is to build an app versus how simple it is to put a recommender system into production.

Kostas Pardalis [00:13:31]: Yeah, yeah. And I can give like an example for people, like to understand how big of a gap there is between the maturity that, let's say the app building part of the industry has and the data one. Right. So Imagine like today you have a front end engineer builds like the front end, they work in react, blah blah blah, whatever they are doing there. You have a backend engineer, they're doing their stuff with Firebase or whatever and you have an app that you can put out there. Now these two guys there, they are working, each one of them is working on their own thing and somehow they can merge their work and have an app working at the end. YouTube probably have more experience than me on that. Try to contrast that with what has happened with the data teams.

Kostas Pardalis [00:14:23]: The data scientist is going to build a notebook and what happens next? You hand this notebook to data engineer and what is the data engineer doing? Rewriting the code to put it into production. Now take this and put it into app building. How many salesforce we would have if we had to literally the front end developer hand the code to the backend developer and the backend developer going to rewrite the thing in order to release that. Right. So that's like the difference between the level of maturity that let's say one side of the industry has compared to the other one.

Demetrios [00:14:58]: Yeah. And AI is that great equalizer.

Kostas Pardalis [00:15:03]: Actually it's the catalyst. I would say it's not there. And I think a big problem that we are seeing today with AI is that getting things into productions, like really, really hard. It's great, like for demos, like we can build demos, very impressive demos, but getting into production is really hard. And the reason for that is because the tooling that we have and the engineering practices that we have are not there yet to deliver, let's say the same capacity of delivering software with the quality that we need as we do like with building applications.

Demetrios [00:15:40]: It's funny. Yeah, you mentioned that and how you're, you now have a new Persona that is able to access that. So that is that catalyst. And it's, it's like the market is demanding that you have the ease of use because you have all these backend and front end engineers who are able to use AI and they're coming in and then they're like, okay, I got something 80% there, I can show a demo, I'm good. But then you're like, so let's, let's use this in our app.

Yoni Michael [00:16:10]: Yeah. And I think part of that is like kind of what you're describing is if we take a look at the AI journey that companies go through, Right. Chat interfaces, that's the primary thing that came out. That is still, I think, the dominant way that people interface with LLMs and AI. Right. And so you have, let's say interactive AI. That's where people tend to start building apps, building AI products. Right.

Yoni Michael [00:16:32]: They'll go and grab LangChain or they'll go and grab a llama index and really expect to either have like a human in the loop or you know, interactive on the other end, like an agent waiting for like an AI response that comes from these LLMs. Right. And I think that's where companies tend to start primarily because that's where most of the tooling exists now. Then they tried, they already start seeing value. Right? Okay. We're building this product, it's adding some value. But now I really want to go and scale this. Like how do I now do AI at scale? How do I actually build a product in production? And the challenge there is the nature of all these models are non deterministic.

Yoni Michael [00:17:14]: Right. And so a lot of the concepts that were once used for data teams with structured and tabular data don't necessarily apply for unstructured data because of the nature of non deterministic. And so what our theory is and what our thesis is is that we really want to create and help companies and build a platform that can help them create deterministic pipelines with using very familiar interfaces that they're already used to in concepts on top of non deterministic models, for example. Right. And that's where a lot of the challenges, that's what really the problem set that we're excited about, helping teams do, is how do you build a lot of clarity and a lot of stability and reliability and help companies actually take that next step. Where I think about how do we scale our AI products and how do we put these pipelines into production in a resilient and a stable fashion. And that's where a lot of the tooling is kind of missing. Right.

Yoni Michael [00:18:09]: They kind of start doing the one off interactive AI. And then when it comes to putting this actually in production at scale, you know, you're talking about concepts like context windows and tokenizing and partitioning and chunking, all things that are new muscles that data teams really need to build to be able to run these things reliably in production.

Demetrios [00:18:28]: Even just is anthropic working right now? Is the API up?

Yoni Michael [00:18:35]: Exactly. So what are the latencies like? Right. And then all the new versions of the model is coming in, introduce new things for cost optimization. Right. Inference is expensive. Right.

Demetrios [00:18:44]: Time to first token. That's so true. There's so many new metrics that you're looking at that you previously had not been exposed to and the platforms that.

Yoni Michael [00:18:54]: Exist out there don't treat these as first class citizens. Which means that teams that have great engineering teams are diying it. They're saying, okay, how can I go and build something around these constraints that I have and around these properties of non deterministic models to try and simulate how we're currently interacting with our structured and tabular pipelines? Right. So if you're like a great Spark shop, for example, you'll go and write these complex UDFs, which are brittle, hard to maintain, and, you know, on top of Spark and not taking advantage of all of any of the performance that Spark has to offer because it's not running in distributed fashion. Or you'll go and build some complex logic on top of lambda functions where you're trying to hit the APIs of these models. But lots of engineering cost goes into that, that's hard to maintain. And so teams are trying to piece things together to try and help them run AI at scale in production. And so that's where we see that there's a really nice opportunity is how do you build a lot of these properties really natively into an engine and a platform that helps companies go and scale their AI workloads? And as much as everyone.

Yoni Michael [00:20:03]: And I think it's true, like agents are here and you know, 2025 is a year of agents. We're going to see a lot more of these come into production. But I also think you have a lot of teams that are starting to think about, okay, like, you know, I'm collecting all these transcripts, right? I have, you know, a call center of 60 people that are doing thousands of calls per day and I want to perform semantic analytics on it. Right. And our view is inference is the new transform. Right. It can be used very much in the same way as what teams are used to when they're building pipelines for structured data. Right.

Yoni Michael [00:20:39]: It's a very powerful form of transform. So if you're trying to do semantic analytics at scale, where there's no human in the loop, how do we enable teams to really take advantage of this new powerful form of transform that's there?

Demetrios [00:20:51]: I like that. I've heard that before from the head of AI at wise. He was saying we should be thinking about these different LLM calls as just a way to take unstructured data and turn it into structured data.

Yoni Michael [00:21:07]: That's what you see, that's what we're trying to help teams really do. And we think that teams are doing that already on their own. Right, but can you Build something that treats this as really the first class problem that you're trying to solve.

Demetrios [00:21:20]: And let's take a moment real fast to highlight the difficulty of that scale in production. Because I remember I was talking to my buddy who works at decagon and he was saying, you know, to get the agents working for the customer service stuff, that's not really the hard part. We had a working demo up and we were able to sell with two weeks of when we generated the idea later. What's really hard and what kept us up at night is how do we do that for a company that is receiving thousands of customer service requests per minute?

Yoni Michael [00:21:58]: Yep. And that's where batch inference becomes really relevant. Right. And really prominent there. But then also what you were saying is like you're taking all of these transcripts at scale and there's a lot of things that you need to think about that are properties of, let's say the AI models. Right. So the context windows. Right.

Yoni Michael [00:22:17]: So if I have this transcript, is it enough for me to take each individual message and pass it to the LLM? No, because it turns out that you need to have the context of the entire conversation. So are you going and applying arbitrary buffer around each individual message? Well, that's going to have increase your input token count, which is going to make things more expensive. So there's a lot of these nuances and complexities that when you're trying to run things at scale, you need to, you need to give people the expressivity to be able to one experiment before pushing to production. So how do we build this in kind of a notebook environment, test out different prompts that we have to make sure that the data quality is coming back. But then also what you're saying is true, is like how do we create structure of the unstructured? Right. So really long transcripts, blobs of text into nice neat structured tables. Right. And so, and that's where the true power lies.

Yoni Michael [00:23:09]: Because if you can take things and create structure out of it now there's a lot more that you can do to control input token count and you can do a lot more around understanding what the context window in that you want to send to the LLM. And so and then you get into things like rate limiting and like model latencies and model cascadings and things of that sort that are also really important when you're running things at scale.

Demetrios [00:23:31]: It does feel like too, if you're doing that kind of stuff in Spark, you're over engineering it.

Yoni Michael [00:23:37]: Yeah, yeah, 100%. And if you're building it on Spark, Right. I think it just the architecture that was built and it's a brilliant, super powerful platform. Right. But it wasn't built with unstructured data in mind. Right. And so you're not getting a lot of the guarantees that you can get if you had a platform that was built for and around unstructured data processing.

Demetrios [00:24:04]: I also remember there was this guy, Zach, that was putting an agent platform into his company and he was talking about how one of the things that they did was they were trying to allow everyone at the company to build their own agents for whatever the use case was. And I know that's a common thing that most people are dealing with. They're saying, we know that each individual department needs agents and we know that these agents are going to be better served if the subject matter expert builds them. So how do we create a platform that can allow folks to build their own agents? And one thing that he did was he created a metric for when someone is building an agent so that, hey, if you put this into production and wherever you're going to expose it, the expected traffic is going to be this much and this is the expected cost. And so. So that's a fascinating piece. But then the other side of that is I wonder how much you have been encountering folks like the people that we talk to at process that say we don't really look at cost in the beginning. We just want to make something that is working.

Demetrios [00:25:19]: So is it possible? Then we go into how do we optimize it? Can we delight the users and then can we actually, like, optimize that?

Kostas Pardalis [00:25:28]: Yeah, that's a great point. Because there is also a hidden complexity that we didn't have before. Again, the thing with AI is that we really have to rethink how we're building software. So one of the things that AI enabled is that the people who are actually the domain experts now, they have to be part of the building the technology itself. If you think how we were building software before, we would get, okay, our customer, we have like a product guy, we have a Persona. Let's say our Persona is salespeople. We try to understand what the salespeople need. Then we'll create some PRDs, we'll go to the engineering team, like, hey, now you have to build that.

Demetrios [00:26:15]: How many story points?

Kostas Pardalis [00:26:16]: Yeah, how many story points? Roll out the software, give it to salespeople, somehow it will work okay, but now that's not enough because this behavior of the software depends on the salesperson directly, right? So how do we involve that person in the process is like something that I don't think we have figured it out yet. And also how we let the engineers keep doing engineering because we still need engineering. It's not like we can live in a world where everything is kind of random at the end of the day. And the sales people at the end of the day, they are still salespeople. They get paid for selling, not for building software, right?

Demetrios [00:27:13]: Yeah, yeah, yeah, that's fine.

Kostas Pardalis [00:27:15]: So it's not like, okay, you're not getting a bonus this month because your prompt was not good. Dude, we can't do that. Right. So this is I think like a big challenge, especially for the people who are building like more customer facing products.

Yoni Michael [00:27:32]: But thankfully the salespeople don't have to write SQL though, right? They can just write prompts which is like. And like that's easier for them to be able to go and let's say iterate with than having to go through and try and do like the old business analysts.

Demetrios [00:27:45]: But also it brings up the point of this is built for engineers, it's not built for sales folks. And so the fact that like you were saying, playing around in a jupyter notebook, trying to give that to a salesperson is not going to work. How do you create an environment that someone is able to natively go in and it's, it's like do you focus on the lowest common denominator of someone who is technically apt at being able to do something or are you trying to push for how do we enable.

Yoni Michael [00:28:17]: Our power users, the role of the heads of AI that we see now coming in and becoming very common at a lot of companies. Most companies that we talk to have a head of AI at this point. Right. And really I think the interesting part about that role is doing exactly what you're talking about is trying to bridge that gap. They're essentially like the AI sherpas within the company, right? If you think about it, they're going to like the head of product marketing, they're going to the head of sales, they're going to all the other business functions there. And they're saying, hey, I'm the gatekeeper for AI. What do you want to try and build for this? How can we try and add value to your teams to be able to leverage AI? And so the head of sales will say, hey look, we're losing out on a lot of deals and all of the information we have is in this sales slack channel that we have. There's a lot of Data there and context around.

Yoni Michael [00:29:07]: Why did we not hit our numbers over the last week, over the last month and quarter? Right. Can we build, can you build me some pipelines that are gonna go through and leveraging LLMs, try and identify the patterns in there so that we can then go and improve and increase conversion rates, whatever it is. Right.

Demetrios [00:29:24]: If I know this sales team, I know the answer and it's probably marketing. Marketing fucking sucks.

Yoni Michael [00:29:28]: But then you go to the marketing guys and they're like, okay, like look at the sales guys, right?

Demetrios [00:29:33]: What product with marketing they're like, we're.

Yoni Michael [00:29:35]: Giving them qualified leads, right? That's kind of the thing. Right. And I think there will be more advances there where it's like, okay, how do we get. But at the end of the day, do you want your head of sales to be going in and trying to work in a platform and experimenting with different prompts? I think there's a balance there. Right. But that's what these, I think heads of AI are trying to do is like create the AI roadmap based on all the feedback they're getting from other business functions within the company. Which is like a really interesting.

Demetrios [00:30:06]: Sherpa is a great term.

Kostas Pardalis [00:30:08]: So there is the UX part which is like a big thing that needs to be figured out, but there is also the platform, like the infrastructure side of things. Right. And I think, and going back a little bit like to why you shouldn't do that. Like on Spark for example. Right. The thing with AI and LLMs is that the workloads changed their characteristics like dramatically because of LLMs and inference and GPUs. So in the past, again, 12, 13 years, everything was like pretty much CPU bound. Spark is like an amazing tool if you want to crunch numbers and make sure that your CPUs are always operating at 100%.

Kostas Pardalis [00:30:56]: That's what they're trying to do. First and second, okay, how we move data around, how we shuffle, because we need to move from one server to the other. And if you have thousands of them, then how do you make this reliably so if something breaks, we can resume. Now put like an LLM in the equation. CPU doesn't matter anymore because your CPU is going to sit idle there waiting for the GPU to return like a result. Right. So from a CPU bound workload, we go into more of an I O bound workload. Then, okay, you're building your udf, you're running these things, but guess what? There's zero reliability when you're talking to GPUs right.

Kostas Pardalis [00:31:40]: Like a GPU. Because the systems are not mature yet. We're kind of. I would say how kind of like the Internet was like before 2000. Like literally back then, you would connect on the Internet and if your mom was picking up the phone, you would lose your connection. Right. Does this happen today? No. Right.

Kostas Pardalis [00:31:59]: But kind of what happens with LLM?

Yoni Michael [00:32:01]: You're aging yourself again.

Kostas Pardalis [00:32:02]: It's okay. It's okay. We can discriminate on names. We're in San Francisco. But then with the LLM, okay, you send a request. You send. Let's say. I don't know, let's say you're using gemini, that has 1 million tokens.

Kostas Pardalis [00:32:22]: Like context window. You said like 1 million tokens there. Right. It starts doing its magic there. And in the middle of that, it breaks. What happens? You can't resume that. Right. There's no.

Demetrios [00:32:35]: That literally happened to me yesterday. I did that exact same thing. I'm pairing up all of the attendees for the event tomorrow and I'm sending an email to them saying, hey, you should know each other, because I think it would be great. And so I asked you.

Yoni Michael [00:32:49]: Trying to get a new company built. I see.

Demetrios [00:32:51]: I'm trying to be that matchmaker again. Exactly that. And so I'm actually Gemini, without me prompting it to. It says why it paired up each one, but it's taking a long time. And halfway through, I said, all right, well, I'm going to go get lunch and it should be done by the time I get back. I got back and the whole thing was just not there anymore. It magically disappeared. So I had to prompt it again and then ask it to create it again.

Yoni Michael [00:33:18]: And you've already wasted cost on the input token for the first time. That didn't work. Right.

Kostas Pardalis [00:33:22]: Even before you reach the costing. It's like the reliability thing. Right? Because think that now you are doing this like 12 hours before your event and it breaks and you can't send the emails. Like, you're not going to be in a very happy position. Right. Imagine you are ATT and you are going to process all the transcripts of the previous day. Right. To create tickets for your engineers for the next day to work.

Kostas Pardalis [00:33:50]: And we are talking about probably tens of thousands of hours of transcripts that you have to do there. And that breaks. Right. Where's the SLA? There's no SLA. There's no SLA right.

Yoni Michael [00:33:59]: Now that LLMs and data teams are used to SLAs, right. That's how they get rated and qualified. And are they doing A good job. Are they hitting their SLAs? Are these pipelines getting executed in the amount of time that they need to throw LLMs into the loop? It's a whole.

Demetrios [00:34:15]: It's the Wild west.

Yoni Michael [00:34:16]: Yeah, it's the wild west. Right, exactly.

Kostas Pardalis [00:34:17]: So you have this super reliable CPU focused technology that is Spark. You put LLMs in the equation there and it doesn't work anymore. Like the reliability that you expect from something like Spark. It's not going to work and it's not its fault. Right. Or your clusters are going to just sitting there and just waiting for the GPUs to return something back. And the reason I'm saying that is because people might say, okay, why not just go and build on top of that for our LLMs? And the reason is that because very, very soon, and again, let's say you have infinite money, you don't have a problem with that. You will have an extremely hard time creating reliable systems that will reliably value to your company.

Kostas Pardalis [00:35:11]: And the moment that that happens, people will lose faith in the technology itself. And we see that with AI, right? People do get lose faith because they don't care. At the end of the day, they are not doing it just for the technology. The technology is making their lives harder. Why they would care. Right. Not everyone is a tech junkie in Silicon Valley that have to work towards advanced general intelligence. Right?

Yoni Michael [00:35:38]: Yeah, I think that's an important thing too. Right. You talk about that stat that 70% of AI projects never make it into production. Everyone that's building or trying to innovate in the AI space always famously claims that thing and then they fit in somewhere under that umbrella for why the problem that they're trying to solve is the main reason for that. And it's all based on conversations that they're having with these heads of AIs and teams to try and figure out and pinhole what the problem is. And there are multiple layers to that.

Kostas Pardalis [00:36:09]: Right.

Yoni Michael [00:36:09]: The one that we're very focused on is taking it. And running AI at scale in production is a very hard unsolved problem that teams just really need to struggle with because they're used to the world of things working relatively well. Right. They don't always hit their SLAs, but when they have these pipelines that are running across large data sets, you know, they have on calls, they have processes for being able to run these, they have retries, they have orchestrators like Airflow that are kind of good at helping run that retry logic and build all of that, and that's where we think that there, there's AI, like the new workloads that are introduced and the new paradigm that we live in now. The AI, the infrastructure behind it needs to be rethought.

Demetrios [00:36:52]: Well, it's funny you mentioned that, because I remember back in the day my friend Diego was telling me about how he felt like there was going to be this new term for folks that focused on reliability, but specifically for the ML systems. And he was like, maybe we could coin it as the MLRE instead of the sre. It's the mlre. But now it feels like what you're talking about is the AI RE, the AI reliability engineer, who is going to be 100% heads down focused on can we make this reliable? And that goes into all of these new metrics that you're talking about. It's not just can we look at the logs and traces and pipe it into Datadog or can we like have that kind of analysis?

Yoni Michael [00:37:40]: Yeah, they want to be able to feel confident around their pipelines and be able to even think the next. Can we put an SLA on this? Right. And given the current tooling, it's very, very hard to do that based on the ergonomics and the tooling that's there in platform that exists for working with these models. Right. And so that's the AI infrastructure kind of opportunity that we're thinking about.

Kostas Pardalis [00:38:05]: Something about the roles that exist already. Because I think that there is a gap there. Someone has to fill this gap. I don't know if it's going to be the AI engineer, although my experience so far with AI engineers, I feel like they're coming more from the data scientist side, which is great. You need people who can model and understand how to work with models and at the end of the day creates something that delivers what has to be delivered there. But I think there's a huge opportunity there for people, especially from the data engineering background and ML engineering background, to step in and actually take this role. Because at the end of the day, their reason that they existed was to add reliability to working with data. Because working with data was always an unreliable business.

Kostas Pardalis [00:38:58]: It was always really, really hard. And the job of these people were to make sure that everything is delivered on time and is with the quality that we need. We have SLAs, we have all these things. And if you want to understand a group of people, you have to look into the language that they are using. Right. And one of the most commonly used terms for data engineers is depotency, independency is like the attribute of a system that if you put the same inputs, the output will always be the same. Right. That's great, because when you have that, you can ensure that there is a reliable system there.

Kostas Pardalis [00:39:35]: Right now that says something about data engineers, that they literally breathe and live and exist for reliability at the end of the day, the same like with ML engineers. But the ML engineers, I think they have an added advantage that they've already had to take into account that we are working with systems that they are not deterministic. Right. And LLMs get this thing to the extreme. So I think these two groups of people, they have an amazing opportunity to actually transform themselves into something that will be extremely important for the future of the industry. And if they don't do it, someone else will do it. Right.

Demetrios [00:40:20]: It might be the SRE. Yeah, I was thinking 100%. A lot of folks that were in the MLOPS community in the beginning came from that SRE background and they were tasked with figuring out the reliability of the ML systems and the ML platform. And then you had this, the rise of the platform engineer. And a lot of Those folks were SREs that were rebranded. And you have the platform engineer now. You are probably going to start seeing more and more of these AI platforms. And one of the jobs that the AI platform engineer is going to be tasked with is exactly that.

Demetrios [00:40:57]: How can we make sure that whoever's building with AI can do that confidently?

Kostas Pardalis [00:41:03]: Yeah, And I hear a lot from people, especially from the data engineering world, because, yeah, like if your whole existence is around determinism, right? Like when something comes that's like so different to what you are doing, like your initial reaction is like to reject it. Right. And that's like the biggest, I think, danger for them right now. You see a lot of rejection. It's like, oh, we're data engineers, leave us alone with that LLM stuff. We are going to do our thing. You crazy AI engineer, go do whatever you want. I don't want to know about it.

Kostas Pardalis [00:41:37]: No, because if you go like, you don't focus on that and you are okay to feel a little bit uncomfortable, you have a tremendous amount of value that you can deliver because reliability is literally what is missing to turn LLMs and AI into what they are promised to be. So I think there is, for both data engineers and ML engineers, a huge opportunity here. They need tooling. I mean, it's not the existing tooling enough for that stuff, but the tooling is different conversation. It's not their job to build the tooling the industry should build the tooling. But the most important thing is the mindset that they bring and the experience that they bring. And that's something that no tooling can do. Right.

Kostas Pardalis [00:42:31]: So these people have literally sitting on gold, but they have to do something with that otherwise they will meet a huge, huge opportunity, in my opinion.

Demetrios [00:42:40]: Yeah, I've heard some people talk about how they can't connect the dots. You hear everyone banging and screaming from the rooftops on how AI is only as good as the data that you give it. And garbage in, garbage out. These are like the tropes that are so common. And then I saw someone say, but explain to me how that's possible because right now I just go in and I give a prompt and there's no data that's going into that, that's just the prompt. And so I was trying to put two and two together to really encapsulate why it is like that. And on one hand you have me just going in doing one off tasks with ChatGPT or Gemini and that's used as a bit of a productivity or I'm talking to it, I'm trying to learn something new, I'm trying to understand something or I'm using it more like a browser. I'm asking it to tell me these different things.

Demetrios [00:43:38]: That's not necessarily a data product, but then you have products that the company uses and like you were talking about with the support and all of the data that's going to be going into the context window, there is not something that you're doing one offs with. That's something that should be very operationalized.

Yoni Michael [00:43:58]: Yeah, yeah. And so here's like an example that we use one of our early design partners to help kind of crystallize this too. Like for a use case. Right. Is like we talked about it, this team has let's say 60 call center folks. It's a, for all intents purposes, an insurance tech company. Right. And whenever you go and you get a new policy for insurance, you get this thing called a deck page is what they call it in the insurance.

Yoni Michael [00:44:24]: It's a declarations page. So it's like a summary of all your coverages and all of your policies there. Now the problem here is that if a call center representative misrepresents what's in the decks page. So let's say there was someone. This is an example that Kostas recently built out is like I have roadside assistance and it says on the deck page it's only for 15 miles but the call center Representative, during the call with the customer, tells them that it's for 50 miles. That's a liability that they're then taking on and could potentially get sued for that because they misrepresented what the actual policy is. Right. And so how do you build these pipelines that are going to, let's say.

Yoni Michael [00:45:07]: And this is where the data quality portion comes in too. You want to have the declarations paid, structured in a way, and be able to manipulate it through using some nice text and chunking and partitioning capabilities, along with a transcript side by side, and be able to go through and look for and filter all of the questions that were asked by the customer and then the answers that the support person gave them and then match that to the portion of the deck page that they're actually talking about. Right. You don't really care about like the niceties, the hi, how are you? Like, you don't want to feed all that shit. Shit, it's okay to say into. Into like the LLM because going to be more expensive. You don't want to just take the whole transcript in and of itself. You want to be able to partition and chunk it and only send the relevant information that you need in order to understand whether the customer support agent represented what was in the declarations page accurately.

Yoni Michael [00:46:04]: So you're taking the question and the answer and you have the expressivity and the tooling to be able to do that and then feed it. And that's where the data quality portion comes in to feed it into the LM to make a decision as to whether, yes, what the customer support representative said is correct or no, he actually misrepresented it. The actual amount of roadside assistance this customer had was 15 miles, as stated in this portion of the declaration page. Right. So then you want to create that report very quickly that there was something that they misspoke or they misrepresented and send that up and escalate it to the team that can then be proactive around handling this case so that they don't end up getting sued in court and having to pay out for this misrepresentation. Right. And so this is an example of what teams are thinking and trying to do at scale, Right. Where it's not like I'm sending this response immediately back to like an AI agent or there's like a human in the loop.

Yoni Michael [00:47:00]: Sure, you want to have somewhat real time, which is like an overloaded term, but like, you know, it's within the next few hours is totally fine to be able to do that. But you're getting these transcripts that are coming in, you know, thousands a day. How do you go and actually build these pipelines in a way that is trying to. To create some determinism on top of the ability to work with these models? And so context, windows, chunking, partitioning, all these things, we need to arm engineers, AI engineers, data engineers with the ability to actually build these in a robust manner and then give them some of the guarantees, like Kostas is talking about, that they're already used to. Right. Like, I need each transcript, the medium time to have it being reviewed needs to be. Or the mean time to have it reviewed needs to be three hours. Right.

Yoni Michael [00:47:48]: And so be able to give these teams those guarantees that within three hours of a customer conversation happening, we'll know if the representative did well or if we need to go and fix things on the back end.

Demetrios [00:48:00]: Right?

Yoni Michael [00:48:01]: Yeah.

Kostas Pardalis [00:48:01]: And I want to add something here. There's always data. Like, you can't say that there's no data, even if you want to keep it. Just to the. Oh, but I'm asking a question to the LLM about I don't know how to change my baby's diaper. Yeah, right.

Demetrios [00:48:20]: In a proper way.

Yoni Michael [00:48:21]: Yeah.

Kostas Pardalis [00:48:23]: Your prompt is the data, actually the structure of the whole dialogue on its own. It's important just because of how LLMs work. Right. Like LLMs, you take also the previous conversations that you say you fit it back there. So you create data that you fit to that. The LLM itself is built on data. And then anything that is, let's say, outside of the trivial things of asking what we would ask on Google, for example, it requires extra data. There is a reason that tools are becoming so important.

Kostas Pardalis [00:49:02]: We wouldn't have agents if we didn't have tools. A lot of the work that you're doing with tools is actually fetching data. Now, it might be, let's say if you are using cursor. If cursor didn't have the context of your code base, it wouldn't help you. And what is your code base? In this case, it's data. Right. When you use Claude coder, whatever it's called, and you're like, hey, find me the file that does this, this, or that in my code base. And it runs a tool that does, like a find in the grip, it gets data back.

Kostas Pardalis [00:49:40]: Right. Your code base again. And the outputs of these tools are the data. You see that already we are creating actually something that I think the data engineers and the ML engineers, again, would be very familiar. We are Building pipelines of feeding data, getting data out and use that to the next step. Blah blah, blah, blah, blah blah blah. The deep research functionality, it's pretty much okay, I'll search on Google or Bing, I'll get the raw HTML that is returned from the queries that I sent there and I'll work on that. That's data again, anything non trivial using an LLM requires data.

Kostas Pardalis [00:50:18]: So I think the way that we think of LLMs, of let's say this kind of Oracle, that it's not accurate, yeah, you can do it. And that's like part of what made them so successful because it was so easy for people to experience something by just talking to it. But at the end of the day, what we are doing with LLMs is that, and let's go back to the SaaS example. What we were doing was building software and we were forcing people to learn how to think and operate the way that the machines can do it, right? And now we change the equation there because we made the machines be more thinking and working like the humans do, right? That's what makes it so accessible. But at the end of the day, we still have a machine that has to do something. We tell it with natural language to do that. And they are very generic, they can do many different things, but still they are going to do in a vacuum. And everything is driven by data at the end of the day.

Kostas Pardalis [00:51:25]: So even in the online use case where you are just chatting with the bot and asking about things, you will copy paste something, you will take a picture from somewhere and be like, hey, but the CSS here doesn't look good. Look at this picture, that's data. The difference is, okay, how the things that you do, just you as Demetrios and Gemini and your Excel sheets with the attendees there and there is the company that has to do that at scale every other day for all the new leads that they have, right? So that's a different approach. You can't do it with chatbots anymore. And that's what we are talking about in how you put these things into production.

Demetrios [00:52:14]: One thing that I feel like we need to hit on if we're talking about reliability is evals and how you think about reliability in the context of evals and getting that also, like, where do they fit in in your worldview?

Kostas Pardalis [00:52:32]: One of the problems that we have is that the first iteration of evals platforms were inspired primarily from, I would say more of like the engineering that happens in the application layer. So if you think about evals like in the common case, you have a model, an input and an output, and that's what we care about. So you're saying, okay, I put this into this model. I take this output. Is this output what I expect, like, to get? Now, the problem is that in my opinion with this model is that there's a lot of context that is actually missing, right? Especially in cases where you have to invoke many different models, like, to achieve a goal, right? So let's take the case of processing a transcript, right? What most people do is, okay, the first part is I'm getting my audio file. I'm going to use something like Whisper. Then we need a transcript that has some issues, right? Like the output. I mean, it's usually really good, but still there are things that need to be corrected.

Kostas Pardalis [00:53:40]: You get this big chunk of text that you have there, and then what people do will, like, okay, clean it up. Maybe use an LLM to go and fix some of these issues. So let's say something is misspelled, right? LLMs are great to go and find these and fix them, but still they can make a mistake. Right now this is like on the very, very first stage of processing that right now the next step is, okay, let's start creating some summaries. So we will break it down into some pieces, create a summary for that, store the summaries. Then on another level, we'll take all these summaries, create a summary of the whole thing, and we will end up, let's say, with the summary that you will put on your website when you put the podcast episode out there. Now, if you consider, let's say, the eval just as individual step there, right? You are having, like, the problem, and you can't evaluate the whole pipeline of creating, like, going from the audio to the end result, which is the summary that you have there, right? To do that, you have to trace all the calls and you have to consider all the calls, and you have to see maybe the LLM that corrected some references in there, made, like a huge mistake and that changed completely all the summaries, right? I'm exaggerating, but the thing is that a step at the beginning can affect the result at the end. But if you take a call in the middle, it might still be perfect, right? But the data was wrong, so of course the output was wrong.

Kostas Pardalis [00:55:22]: So the question is, okay, how we can work on that, how we can build these more complicated workflows? And I think with. If you take into consideration agents, for example, it's even worse because the agents can make tens of different calls, go back, run calls again. How do you evaluate the output at the end? Because that's what I see at the end. I ask Claude to write some code for me. It can take a few minutes, who knows what is doing there. But definitely there are many back and forths and calls to the LLM. So what do I val? Right. So I think this is like an important thing that is missing and we'll get there.

Kostas Pardalis [00:56:14]: But I think we need to rethink also the infrastructure that you are using for that. Because now we are talking about a lot of data and data that it's. It's not like a unit test, it's more of like how, let's say in the observability world we were doing traces over distributed systems, right? So that's what I want to see out there. And I think that is going to change a lot. Like how people work and how they can build actual reliable systems and incorporate it.

Yoni Michael [00:56:49]: I think as we talk about that theme of 70% of projects don't make it into production, that's one thing that we hear from the, the AI leaders out there, right? Is great. I built a lot, you tell me. Inference is a new transform, right? So now I'm counting on the results of these pipelines to be mission critical. They're making business decisions for me. So they say, how do I know that this pipeline, this multi stage pipeline, like Kosas is saying, this multi stage pipeline that's going through a bunch of different inputs and outputs into LLMs ended up making the right decision for me. Right. And that's really what is top of mind for that. And now these eval platforms, platforms that have come out, there's lots of them, they all have different flavors and different angles.

Yoni Michael [00:57:31]: I would say very important part of building trust in your AI pipelines. Right? But really what we need to be able to do is if we have these multi stage pipelines that are running in production, we need traceability all the way up. So how can you traverse up from the final decision what was the input for getting there? But then also what were the stages before that and how do we provide visibility and observability to AI leaders to really build confidence and understand which step of the pipeline was actually the wrong. Did the model not perform well, which then propagated down to the next stage. Same thing for like AI agents, right? Like they have 10 steps that they need to go and do. Which step of that, of that AI agent's workflow, was it not good? Do I need to go and tweak the prompt, do I need to go and modify and iterate on? Right. And so, so that's where I think the next stage is. It's also for agents, also for these multi stage production pipelines that we think are very important for businesses to run to.

Yoni Michael [00:58:33]: And you need to provide them that level of confidence so you can do things like give confidence scores and things like that that models have. But what Costas is saying is true, is you need to have all of the outputs and the reasonings behind these models and how they made decision and give them a really nice interface and tooling to be able to then go and review it very quickly and know what they need to go and iterate on in order to get that output data quality to feel the good. Like Maybe it's not 100% of the time they have confidence, but when they're running these things in batch at scale, if we can get 99% confidence that the output for this mission critical pipeline is good, that's probably good enough for us. But there's tooling and infra that needs to get built for that and to.

Kostas Pardalis [00:59:13]: Add something here because I think that is relevant also with the previous generation, let's say, of data infrastructure. So a term that every data engineer and probably every data practitioner is familiar with is data lineage. Right. So everyone's thinking a very important part of ensuring the quality of our data is keeping track of the lineage. Like how, okay, I have this end result here, this report, how these reports came into what it is when there are literally hundreds of tables that we have to operate on to get that. And it appears that when we are working in a fully deterministic world, just keeping track of the column level is enough, because knowing the data type is enough to reason about what is happening. Right. And it fits also well, like with the column number, let's say, nature of these systems, the YOLAP systems that we have, but you can't do that anymore.

Kostas Pardalis [01:00:19]: Slightly small differences in the input that you put and the prompt that you add there, which you didn't have before as an additional piece of data there, can change dramatically like the output that you get on the other side. So what you need now is more of like row leveling, which is a very hard problem. And it's not something that has been developed primarily because it was hard enough. It's not needed enough for people to invest in that. But I think that that's something that we are also working ourselves on that I think this is part of how you can make people create a traceability there that it will definitely change completely how the quality of the evals that you are doing.

Demetrios [01:01:11]: I'm not sure I fully understand role level lineage.

Kostas Pardalis [01:01:15]: So the role level lineage is that. Let's say you start. Let's take again the example of the transcript, right? The output of your whisper model is going to be a blob of text. Right? Now you might do a few things like, okay, break it down into pieces, chunk it, da, da, da, whatever. Now the next step is like for each one of these chunks that you have, each chunk is a row, right? This is going to be fed into an LLM with a prompt that says, do you find any references here for the names of the participants that they are mistakenly transcribed and if yes, fix them. Right, now your first row got into the LLM and the second will do the same, the third will do the same, blah, blah, blah, blah, blah. Right? Now the next step is that you are going to take each one again of them and create a summary of each chunk. So you see that each row goes through steps of processing, but because of the differences in the data that you feed at the row level and how the prompt might change for each one of them, you might have different results.

Kostas Pardalis [01:02:34]: So you want to be able to track that. Something that you didn't need to do.

Yoni Michael [01:02:39]: Yeah, like a tree traversal exercise. You wind up in this leaf node. But then what were the nodes before that that led to that? Right? So that's what you want to kind of traverse back up this multi stage pipeline and see, so if, let's say.

Kostas Pardalis [01:02:54]: You have your end summary, right? Because at some point you get all these mini summaries and you put them all together into an LLM and you put the output which is like your summary and you see the summary there and you're like, no, I don't like that. So now you want to go backwards and you're like, okay, what contributed? What data contributed in this particular output that I have here? And then you say, oh, it's like these five mini summaries that I have. You want to be able to track these five and recall them. So you, as the human or whatever eval machinery you are using, can have access to this particular data with the lineage as it was before. You can't do that because the lineage is just keeping track of the metadata of the columns that participate. But now you can't do the same thing because the actual data has a big effect. It's not just the data type, it's not something let's say a join breaks because I was expecting a data type of integer and it was a string, for example, something like that. Okay, I'm just making things up now.

Kostas Pardalis [01:04:06]: It's much more like you have to get into much more detailed views of the data itself to understand how the LLMs at its stage operate to give a good or a bad result. And you have to be able to navigate that. And the data infrastructure does not keep track of that. So it's not an information that you can recall. And that's like one of the things that I'm saying that it is a big problem to solve because it's not just like the evals from point of view of how scientifically do an eval that has validity, it's valid. It's also how to capture all the data because it adds overhead, store all the data, which adds overhead, and process all these evals that now explode. In terms of the number of evals that you have to do there, right?

Demetrios [01:04:52]: What you're talking about is the logs. And this data lineage that we have is not sufficient. It's not painting good enough picture for us. Because even if we know that, yeah, this call went through successfully, or this data was transformed in this way, we can't get that granularity that you're talking about. And so you don't see that as something of a job of an observability tool.

Kostas Pardalis [01:05:24]: It's like the same thing as we had. Like, if you think about like the data platforms, right? So you have something like databricks or Spark that would go and actually execute whatever logic you deploy there, right? Then you most probably would have another tool that is doing like analyzes the lineage or does the quality checks, blah, blah, blah, whatever. Now take the QA of data and the lineads there as inputs and substitute the names with evals instead of qa, right? And the lineage still has to be there. I think one of the problems is that we've been building applications around LLMs for chat modalities primarily. So of course when you do that, it's all about, okay, that's the prompt of the user, that's the output, is this a good one? But when you start getting into an environment where the interactions with the LLMs for an outcome become much more complicated and they have dependencies between them, you have to expand your understanding of that to the whole pipeline that is built to do that. Again, you might need probably a different tool to do that. But still, this different tool we need to access data that the engine that does the processing has to capture which on its own is like a hard problem. And then it has to work with.

Kostas Pardalis [01:07:04]: It will face the same problems that observability in the up world have, which is there's like a lot of data. The value per piece of data is not that high. So we have to be extremely good at storing this data to make it affordable. And then you have the additional problem of eval being slow and expensive. So again, how do we pick the right evals to do and what kind of tooling we have to give to the users to build the splunk of LLMs at the end of the day, right. I don't know. I'd love to see how this is going to come out because I think it is a pretty lucrative space to build. Although I know they're very as in any other LLM related activity, like thousands of companies trying to do it.

Kostas Pardalis [01:08:05]: But I still believe there's a lot of noise and not that much signal. And again, what I'm trying to say is that I think there's tremendous value for the data people to go and build solutions for LLMs because primarily it's driven by more application engineering people and there's a lot of foundational stuff that comes from the data world that they are needed if we are going to be building with LLMs.

Demetrios [01:08:34]: And I wonder in your time talking with folks, it feels like right now, because there's so much open space and there's so many new pieces that we're trying to add to our platform and our ecosystem of putting this into production, there's a lot of things that you could do. Where have you seen people focusing on what absolutely needs to be done before we can do anything else? Like what are the main bottlenecks? Do I need to go out and get an evals tool? Do I need to go out and get a proxy or like an AI gateway?

Yoni Michael [01:09:08]: Yeah, the eval tools are important, right? You need that feedback loop to be able to help you understand if you're building effective AI. If, if the output of what you're doing, if it isn't what you're expecting, why isn't that right? So this eval tools provide you that feedback loop to be able to go and do that. Something that I always think about when you're talking about tabular and structured data world is engineering teams were really good at building canary builds, for example, that go back and try and sense any sort of regression that happens in these pipelines that are more or less deterministic because you're dealing with structured and tabular data. And so we'd run these nightly canary builds and the output of that would be like, oh, there's like 4% drift or like 5% drift because you introduce some regressions by adding some application code that actually caused some drift from what we're expecting to have the output of these pipelines being. Right. That same kind of mindset needs to be applied now and thought about. And that's where I think we're talking about the opportunity here is like how do we take that same concept and allow teams to be able to build these kind of of canary pipelines, for example, on top of output for non deterministic models. And that's a very big challenge because the nature of the data is totally different.

Yoni Michael [01:10:25]: You're dealing with lots of text, right. These evals are very expensive to store and process and build insight on top of right now the more complex the problem, I think the bigger the opportunity from an engineering standpoint and I think the more fun it is to go and solve it. That's why I think it's kind of of wide open for helping teams build that. Right. Like imagine if you're able to like the transcripts example, right. You're able to assign certain scores and confidence levels to the output of the pipelines that you ran on a daily basis based on certain properties that you consider to be successful outcomes of the pipelines. Right? Which is what kind of canary builds were built for. Right.

Yoni Michael [01:11:03]: It's a very deterministic way of being able to evaluate your software and your pipelines. And so applying that same concept to that unstructured data processing is I think going to be very important and a huge unlock. For now, AI and data teams feel confident around putting these and leveraging inference as a new transform in production.

Kostas Pardalis [01:11:24]: Everyone would like to do something with AI, right? Everyone's like, oh, okay, we should invest into that. That's like why? I think that's one of the big reasons behind having the role of the head of AI is because there is a mandate from the board or whatever that we need to do things. Like with AI, we have no idea what to do.

Yoni Michael [01:11:42]: But it's a race because all of our competitors are doing it. We need to be on that too.

Kostas Pardalis [01:11:46]: So we'll bring this poor guy here whose job is to go and find use cases. So he will go, as Bjorny said to the marketing folks, be like, hey guys, how can I help you? I have budget by the way, so that's great, right? You don't. So let's do it. Salespeople, the same thing, engineering the same thing, product, the same thing. And then they have to go and build something. And so the first thing is like, okay, many components are still at the stage where should we focus from all the different things that we can do, where we should focus, which project we should run. And this is typically components that are early in the journey. Then you have cases where they build the demos.

Kostas Pardalis [01:12:37]: But it's hard to deliver consistently the value to whatever the stakeholder is because again, salespeople don't care about your signing technology. Okay, if you are going to help them, you have to help them. And you are talking about business lines here, that they are as quantitative as they can be. It's like, I have a quota, dude, or I'm to going, going to lose my job. So like, can you help me with my quota or not? Like, that's it. If you help me, 30% of the times I have to think, is that like worth the time? Like doing it? I don't know.

Demetrios [01:13:10]: And potentially you can mess up what I'm doing 30% of the time because you're sending the wrong email or you're sending some hallucinated jargon that's.

Kostas Pardalis [01:13:19]: And then there's. I had like some. We had like some interesting conversation with folks that they were saying, well, we run a few experiments and our experience was that. I'll give an example. Let's say we have tickets, support tickets and we want to be able to extract and first of all label them somehow into categories and then extract some information from there that can drive our product decisions. So what they usually do is they find a company that provides labeling with LLMs or whatever as a service. They go there, they're like, okay, I have to take all my tickets, I have to upload the tickets here, I'll run this thing. Something comes out.

Kostas Pardalis [01:14:05]: Well, it's not exactly what I expect.

Yoni Michael [01:14:07]: Kind of a black box.

Kostas Pardalis [01:14:09]: Yeah, like iterate, iterate, iterate. Actually what you're saying, like we did like a lot of prompting gymnastics to make it work. And actually one of the very interesting feedback on that was like, like we need it. When you are doing classification, you still have the problem of you have to tell to the LLM what's the classification scheme of that? Or use an LLM to figure it out. But still you have to figure out what are the classes there. It's not an oracle. I will just come out and be like, yeah, that's what you should do and shut up. You don't have an opinion.

Kostas Pardalis [01:14:41]: Human so we did that took some time. We ended up getting an output which was a data set in CSV that we could download. I was like, okay, that's too much work to start. Because here's the thing, when you get your labels, that's when the real work starts, right? You still have to go and figure out insights out of these labels. So the guy was like, okay, I can't do that. I can't be in the process of downloading, uploading, getting CSVs, put them somehow on my snowflake and then have an analyst also to go and analyze these things. Okay, that's not going to work.

Demetrios [01:15:25]: Extra work.

Kostas Pardalis [01:15:26]: Exactly. So it's a lot of, I think it's probably more like a product mistake. But again, people need to understand they need to meet the users where they are and not try to force them to do things that do not fit their workflows. Because a product person is a product person. Again, he's getting judged by the business based on the product work that they are doing, not the labeling that they are doing.

Demetrios [01:16:01]: Yeah, it's potentially a huge distraction.

Kostas Pardalis [01:16:03]: Oh yeah, and it is, it is. And then you have the companies that they manage to get to the point where they have things that they work, but then they're like, okay, how do we put these things into production? Which on its own is like a big, big conversation like what that means and what risk it puts to the project itself. Because again, the people on the other side, they're waiting results, right? And then there are very few companies I think, okay, it's usually either like Fortune 20 type of companies or very Silicon Valley high tech companies, which by the way, many of them, they say they are AI. They are not really AI, right. But there are some of them that they are doing. They put things in production, seriously. But we are talking about, I don't know, maybe tens or low hundreds of companies out there that have successfully do that.

Yoni Michael [01:16:56]: And the other thing we see too is the common theme is everyone gets wide eyed when like the head of AI comes to them and be like, what can I build you with AI? Like just tell me, I have a whole team, we'll go and build this. Going to be great. I got budget, I got everything going, whatever you want. We'll top it off with a cherry for you. And they'll go and they'll come up with these ideas, they'll create their AI roadmap for the quarter, the team will go and execute on it. And then he turns around and says, yeah, we built it, we spent the Time on it, we gave it delivered to them with a cherry on top and a nice bow and they don't use it. Right. And so I think a lot of it also has to do with you really need to go in and identify the high value use cases that are business critical for you, right.

Yoni Michael [01:17:38]: If you get something that's like kind of supplementary and not really easy to integrate and you're like in the day to day workflows of like a product marketing manager, for example, they're probably not going to use it. Right. And so I think it's very important when you're thinking about considering and taking on new AI projects that the business outcomes and there are key success metrics that are there, right? Like okay, content moderation for example, is another big example besides the transcript thing, right. These companies that are building communities where the users are applying content and they're going through and spending tons of money on content moderation teams, which means humans are going through and reading every single message in order to understand whether the, the, the me of whether the message was safe or not to be published into the community. It's all about building very safety. So if there's any like racist undertone or sexism or things like that, immediately want to disqualify profanity, things like that too, right? And so when you have a use case like that and you're spending hundreds of thousands, sometimes millions of dollars on hiring this workforce that's just literally sitting there reading messages manually now there's a business outcome associated how can I reduce my cost? Right? Because that model doesn't scale. Now let's say I want to expand to different locales. I want to now offer this in Portuguese and I want to offer the same community in Spanish or French.

Yoni Michael [01:19:13]: I have to go and build out these content moderation teams that are French speaking and all of that, right? And so LLMs are great at that kind of use case too. And the value is very apparent there, right? Like now no longer do I have to go and spend millions of dollars on hiring these folks. I can dedicate that cost towards inference, build these really robust pipelines that are helping me do content moderation and even get better performance outcomes and metrics. Like my, my meantime to review a message or a conversation is no longer eight hours because I need to wait for the content moderator shift to start in France or whatever it is, right? And so I think that's one of the key lessons that we're seeing a lot too is as you're a company thinking about building on AI use cases. Make sure that there are key success metrics and business metrics that you're targeting and that you're able to track as you're putting AI into production for you.

Demetrios [01:20:11]: I heard it like on a X and Y axis and I think the guy's name was Sergio. That said it was when I was at the Gen AI conference in Zurich. And basically he said, put on an X and Y axis the impact and then confidence that you can actually implement it and whatever is the highest missed up in that top right quadrant, that's what you should start with.

Yoni Michael [01:20:39]: And the confidence you can implement comes down to the tooling and infrastructure that you have to do it. Right. And that's where I think we're seeing a lot of innovation going into now. It is helping build that confidence for teams to be able to say, I know I can put this into production.

Demetrios [01:20:53]: And this is a very common.

Yoni Michael [01:20:54]: Now, the only other. Yeah, now the only thing question is, is there business value? And that's potentially up to the stakeholder to be able to decide that. Right?

Kostas Pardalis [01:21:02]: Yeah, there's something else that, that I want to add to that because we see having budget is kind of like a blessing and a curse at the same time for these teams, especially in the market as it is right now. Because one of the patterns that we see is that, okay, usually the head of AI is not necessarily a technical person themselves, so they rely a lot on the teams to find the tooling that they need. And obviously there's a lot of offerings out there. Now we are still at the stage of the industry where there are so many verticalized, very specific tools that do one thing, and they're pretty much like a black box. They have money, so they go and get that stuff. And it's good because in a way it helps you kickstart your projects. But there's a huge trap in that you can't engineer systems with black boxes. You can't do that.

Kostas Pardalis [01:22:05]: And people need to understand that no matter what LLIMPs are, there's true engineering that needs to happen to make them robust and deliver value at the end. So my advice to the people is that you should try and invest more on the infrastructure and your knowledge of how to build the right things and what practices will drive you there. Instead of being like, oh, you know what, okay, I need to OCR something here. Let's go and use every black box OCR thing there that says that they are 5% better than Mistral or whatever out there, right. And do it for prototyping. 100%. But I guarantee you that if it's successful, you'll get to a point where you'll be like, okay, now what do we do with this thing? Because it's either too slow, it's not reliable, oh, now we are getting outputs that we can't really understand why we are getting the outputs that we are getting, or oh, now we are getting into a different use case that different documents that these models that they are using here are not probably as good as they are used to be. So now are we adding another tool that we have to manage and who's going to do that? So I think engineers.

Kostas Pardalis [01:23:18]: Engineers should keep thinking like engineers, invest in tools that they are, let's say, good with their infrastructure for their work and they will build the value on top of them. And one last thing on that, there is a reason that this revolution of AI goes through the engineering practices first. The reason for that is that the work that we are doing as engineers is to validate, right? So you can add. If I ask the LLM to build me a function, it's almost like you can almost automate figuring out if it's working or not. You just run it, you compile it. Now there are many super impactful problems to be solved with LLMs that you don't have. That if you are doing the things that Yoni was describing with the transcripts, how do you validate with 100% confidence that this thing is going to work? Data. And that's something again, I'll go back to the data practitioners and how important their knowledge and experience is to make this successful because they know that data drifts.

Kostas Pardalis [01:24:37]: That was always the case. There's nothing that you can just build it once, put it out there and it was going to be working forever. And that's going to become even more true with LLMs. So you have to build systems. You can't just throw black boxes in there and make things work. You have to be an engineer, you have to engineer this and you have to keep iterating on it as the data drifts, as the needs drift also like from the users. And with LLMs, this is going to happen in the much more accelerated pace. So again, focus on core skills and infrastructure.

Demetrios [01:25:15]: I want to connect the dots on two things that it feels like we're dancing around, which is that business value and the skill of being able to sniff out that business value and understand how to properly implement it. It is one thing, but then going Back to since 2020, I've heard almost everyone that has presented at an MLOPS community event. Talk about in some way, shape or form, in different words, saying how I can build the best model I can build with the highest accuracy score. I spent five weeks tuning it, so it went from 98.1 accuracy to 95 or whatever that, like, metric is inside of it. But then I gave it to the people that were going to be using this model and they didn't use it and it fell flat on its face and all of that time that I had spent on it was for nothing. And so the whole idea of being able to make sure that what you're building is the right thing and that you're spending ample amount of time getting it into production as quickly as possible to know if it is the right thing or where you need to tune it is so important. And that's like what you're saying, Costas, here, is get it in there, start working with it, engineer it in a way that you can, then go and debug it when you need to figure out if it's not working. Because we've absolutely missed the mark on the product or we missed the mark on one of the steps in this pipeline.

Yoni Michael [01:26:52]: Yeah, it's not about the models anymore. The models are going to keep getting better, keep beginning, more accurate, probably cheaper and a lot more performant, less hallucination, all that kind of stuff. It's all around the infrastructure you have around it now. Right. And your ability to have that really tight feedback loop, to be able to know and build confidence around the outputs of the pipelines that you have and be able to trace all the way back, to be able to iterate and make improvements incrementally on that. And. And that's where we lie now, in, like the AI innovation spaces. There's lots of great innovation happening towards that and I think in the next couple years it's going to be a lot more prominent where we're starting to see teams.

Yoni Michael [01:27:33]: Feel like you're talking about the X and Y axis. The confidence level is going to be going way up because of the infrastructure and tooling that's being produced now. I want to give a quick shout out, Demetrios, you are to me, yeah, you know, you're the ultimate community builder. And. And Kostas and I are always super impressed and admire, like, you just seem to be everywhere all at once and we're like, oh, my God, I don't know how he does it, so. And it's like amazing seeing the prize of the community and like how you've Been able to grow it and all the conferences that you're doing. So like, lots of respect, man. And it's been really fun just like over the last few years seeing all of it happen.

Demetrios [01:28:12]: I'm glad that we got to make this happen.

Yoni Michael [01:28:14]: Yeah. And thank you. Big thank you for making Typedef and our company happen. Like it was one of the things that's super hard when you're building a startup is like finding co founders. And we know we're both second time founders, so we put even more emphasis in it. So we don't take for granted the fact that you made us meet up at some random blue bottle in San Francisco and then five minutes into the conversation decided that you had better things to do.

Demetrios [01:28:39]: It rarely works, but this time it did. So I'm going to put that on my resume now.

Yoni Michael [01:28:44]: Yeah. So thanks for everything, man.

Kostas Pardalis [01:28:48]: Communities have always been very important in building technology and they will always be. So that's another kind of service that you provide, bringing the people together, especially with, with when you are solving problems that they are not even well defined. At the end of the day it's all about emerging patterns through people interacting who are passionate about what they are doing and trying to find solutions. So that's probably the most important thing.

Demetrios [01:29:21]: And I think it's beautiful too that you mentioned that people come from different backgrounds, like the data engineering background, like the modeling background, data science, like the SRE background and, and getting this space, and specifically this space, getting to see how each one of these folks is attacking the problem is really cool and it makes for fertile ground for innovation, oh 100%.

Kostas Pardalis [01:29:44]: And I think if we want to succeed, we need to somehow increase the cross pollination of these communities. And that's your job to do obviously. But there is tremendous value bringing these diverse, let's say engineering disciplines together and trying to. Because here's what is very interesting and kind of like why I'm excited with LLMs is because we talked a lot about my age, but LLMs, in a way they make me feel young because it reminds me how technology was when technology was young. People complain today that oh this thing is not reliable, blah blah, blah, blah blah. But they forget that in order to get our databases to be transactional and probably kids don't even know today that there is transactions that they ensure that when you write something and I write and Johnny writes, it's going to be the correct thing. Took decades of research and development to get to that point. Right.

Kostas Pardalis [01:30:51]: So we kind of like we are again in this early stage of a new, potentially very transformative technology. And it feels nice. I mean, it's not easy, but you get back to how it was like hacking with networks around and networks not being reliable and not like having fiber at home like each one of us and don't have even to think that there is a router somewhere. Right. And one of the bad things that SAS did was that it managed to hide from the vast majority of the engineers out there the complexity and the effort to make things reliable. But it was always about that there was no technology, that the day that it was introduced was reliable. It took a long time to make it reliable. The same thing will be also like with LLMs.

Kostas Pardalis [01:31:45]: Right? And that's where engineering comes in. But we are not like in 1995 anymore, we are in 2025. And there's so much experience with all these different disciplines and bringing these people together can really, really accelerate and make the things that took decades to happen before now happen in just a few years. So go out there, bring them together.

Yoni Michael [01:32:09]: It's top of mind in innovation, especially being here in the valley, thinking about AI and infrastructure and the new new world that we live in now. But building as a community is also very important. Having the community one, but also being able to contribute together and innovate together. Right. So part of our launch too for typedef is we're open sourcing one of the libraries that's really amazing at going through and helping people build in, let's say like jupyter notebooks and interfacing with LLMs very nicely. And so that's one of the things I think is a huge help in being able to move the pace of innovation forward is once you can have a project and have multiple projects and everyone's contributing, there's a lot of excitement. It helps build a lot of momentum in that space.

Demetrios [01:32:53]: Who's going to be the pets.com of the LLM bubble?

Kostas Pardalis [01:32:58]: Okay, I'm not going to say like a name.

Demetrios [01:33:02]: No, that's not fun then.

Kostas Pardalis [01:33:03]: But I do think that the prompt website companies, they are going to be something have a rude awakening. Yeah, yeah, I think so. I think we need them, we need the page.com of the world to happen for things like. But you know, that's like if you reflect back like to the dot com bubble, like people say if you think like what happened back then, it was like an extremely verticalized solution that was built for like pretty much like everything crossed. And then people realize that we need platforms. Right.

Demetrios [01:33:44]: And then Amazon, which is basically pets.com, but more.

Kostas Pardalis [01:33:48]: Yeah. And you have like Shopify and you.

Yoni Michael [01:33:51]: Have, you have Chewy though too. That's doing pretty well.

Demetrios [01:33:54]: Oh yeah.

Yoni Michael [01:33:54]: Wait, are they Amazon? No.

Demetrios [01:33:56]: No, I don't think so.

Yoni Michael [01:33:57]: Anyways, I digress.

Demetrios [01:33:59]: Yeah. My, my pets.com is whoever has that fucking billboard that says don't hire humans. That is on the. Whoever those guys are. There's so many things that you have to do to get a billboard and the fact that it got so many marketers and top level people to sign off on that billboard. I don't even know who it is, but I know that I don't like them.

Kostas Pardalis [01:34:25]: I don't know. I personally feel that anything that feels like too easy cannot be real. I might be wrong. I'd love to be wrong for myself at least. But problems that they are valuable, they tend to be hard. You need to put effort, you need to work hard to make them successful. So there's no easy path to. It's not like the LLM just because you're doing LLMs like you are going to be rich.

Kostas Pardalis [01:34:55]: Like it doesn't work like that.

Yoni Michael [01:34:56]: I think there's tons of, of eval companies that are out there now too. Right. And I think they're solving very hard problems. But we might start seeing some consolidation there too. As you get think about the main observability and big players out there like the data dogs of the world that are also very much thinking around how do they go and integrate into AI. So that's going to be an interesting thing to see. I think they're still cropping up like there's lots of different eval platforms there. But the ones that are solving the hardest problems, like kind of what Kostos was describing I think are the ones that are going to be able to really stand on their own and substantiate there.

Yoni Michael [01:35:33]: But it's a very important part of the AI lifecycle. The problem set that they're going after. I don't know that we need hundreds of them, but it'll be interesting to see I think what happens in that space generally to track that over time.

Kostas Pardalis [01:35:47]: Yeah. Although I think the good thing with the Valve companies is that. But to build an eval company it requires a baseline of technical competency that at the end of the day the teams that they build something, they will have some kind of valuable exits, let's say, or at least the value is not going to be destroyed. There are companies that they will destroy value. There are companies that will end up what was the company. It wasn't like.com era company was like much more recent. The one that was doing the one click checkout.

Yoni Michael [01:36:24]: Oh, fastly or fastly.

Kostas Pardalis [01:36:26]: One of those. The one that was. Yeah. That had like the record in like burn.

Demetrios [01:36:31]: It wasn't.

Yoni Michael [01:36:32]: No, Bolt was the one. The other one that was doing well, I think it was fast.com or fast.

Kostas Pardalis [01:36:38]: Yeah. The one that had like they broke the record of like how much money they burned in like a year or something like that.

Demetrios [01:36:46]: And the founder went on Twitter and was saying stuff about stripe and how it was the mafia.

Kostas Pardalis [01:36:52]: Yeah, of course. Yeah. There's always someone to blame if you want to. But still there's value that has been destroyed. Right. Like so there's always that like in the industry. I think it's part of like any fast pacing and like high rewards space. It doesn't mean that like everyone is like a scam or like anything like that.

Yoni Michael [01:37:17]: Will you have us on again, Demetrios.

Demetrios [01:37:19]: Or is this like there's a one time thing.

+ Read More

Watch More

From Arduinos to LLMs: Exploring the Spectrum of ML
Posted Jun 20, 2023 | Views 762
# LLMs
# TinyML
# Sleek.com
Don't Listen Unless You Are Going to Do ML in Production
Posted Mar 10, 2022 | Views 600
# ML Orchestration
# Model Serving
# Leverage
# Banana.dev
How to Systematically Test and Evaluate Your LLMs Apps
Posted Oct 18, 2024 | Views 15.1K
# LLMs
# Engineering best practices
# Comet ML