Sign in or Join the community to continue

Iceberg, MCP, and MLOps: Bridging the gaps for Enterprise // Mini Summit #11 // Snowflake

Posted May 08, 2025 | Views 135

# MLOps

# Iceberg

# Model Registry

# Snowflake

Share

speakers

Caleb Baechtold

Principal AI/ML Architect @ Snowflake

Caleb is a Principal Architect for AI/ML in the Applied Field Engineering office at Snowflake. He is the global field lead for the Snowpark and Snowflake ML ecosystem. His background is in theoretical mathematics and philosophy. Prior to Snowflake, he was a practicing data scientist and machine learning engineer primarily in the defense and intelligence contracting industry. Caleb is based out of Denver, CO.

+ Read More

Hamza Tahir

Co-founder @ ZenML

Hamza Tahir is a software developer turned ML engineer. An indie hacker by heart, he loves ideating, implementing, and launching data-driven products. His previous projects include PicHance, Scrilys, BudgetML, and you-tldr.

Based on his learnings from deploying ML in production for predictive maintenance use-cases in his previous startup, he co-created ZenML, an open-source MLOps framework to create reproducible ML pipelines.

+ Read More

Simba Khadder

Founder & CEO @ Featureform

Simba Khadder is the Founder & CEO of Featureform. After leaving Google, Simba founded his first company, TritonML. His startup grew quickly and Simba and his team built ML infrastructure that handled over 100M monthly active users. He instilled his learnings into Featureform’s virtual feature store. Featureform turns your existing infrastructure into a Feature Store. He’s also an avid surfer, a mixed martial artist, a published astrophysicist for his work on finding Planet 9, and he ran the SF marathon in basketball shoes.

+ Read More

Ben Epstein

- @ -

SUMMARY

Best Practices for Enterprise-Grade MLOps and Governance with Snowflake Catch this ML expert-led session to learn the best practices for designing and managing enterprise-grade ML development and production systems at scale. You’ll learn how Snowflake ML makes it easy to rapidly develop and prototype new ML projects while ensuring production ML systems deploy and operate in a secure, governed manner.

Laying the Foundation for Enterprise MLOps: Workflow Orchestration with ZenML Effective ML orchestration is the foundation of successful enterprise AI systems, connecting data processing, training, and deployment into reproducible workflows. This session explores how ZenML provides the critical pipeline infrastructure that enables teams to standardize their ML processes while maintaining flexibility.

Operationalizing Data for Agents and Models with Featureform, MCP, and Iceberg For years, feature platforms like Featureform have powered classical ML systems—serving features to models, productionizing transformations, and helping ML teams scale. But the rise of LLMs and agentic workflows is fundamentally expanding the surface area of the ML platform, introducing new patterns for how data is consumed and acted on. In this session, we’ll explore the next evolution of the feature platform: one that supports both real-time and batch pipelines, bridges traditional ML and agentic systems, and makes data accessible through interfaces like MCP.

+ Read More

TRANSCRIPT

Ben Epstein [00:00:00]: Thanks for joining everybody. Very exciting. We have probably the hottest topic in the industry right now. We're going to talk about mcp, we're going to talk about building production data systems, we're going to talk about Snowflake, we're going to talk about Iceberg, we're going to talk about every word that you've seen on LinkedIn but how to actually do them in production with real engineers. We have like some of the best companies with us today. Super stoked to have everybody on a huge thank you to Snowflake for sponsoring this particular event. This allows us to bring on people like Snowflake and also open source contributors like Featureform and ZenML.

Ben Epstein [00:00:43]: So we're really excited about this. Thanks all three of you for joining.

Caleb Baechtold [00:00:47]: Yeah, happy to be here.

Ben Epstein [00:00:48]: We're going to try to get everybody to talk and get as many questions from the audience as possible. I'll be monitoring those questions as we go, but we're going to jump into it quickly because we have three pretty good and pretty technical talks today. First we're going to talk about Snowflake and then we're going to talk about ZenML and then we're going to talk about Feature form and MCP and Iceberg. So yeah, Caleb, why don't you kick us off whenever you're ready and we won't interrupt but at the end we'll come in with questions from the audience as they come.

Caleb Baechtold [00:01:19]: Cool. So thanks for having me. So just by way of introduction, I'm Caleb Baechtold. I'm a principal architect for AI and machine learning at Snowflake. So my job at Snowflake is basically twofold. One, make sure that our enterprise customers understand how to be successful with AI and ML on top of our platform. And I always joke, the second part of my job is to make sure that, you know, work with product and engineering, that they're building the things that our customers need to be successful on our platform. And so today what I wanted to talk through a little bit about was the way that we're seeing customers adopt enterprise grade mlops practices on Snowflake as a platform and in particular some of the process and organizational challenges that we see most enterprises struggle with when it comes to maintaining production MLOps infrastructure and systems.

Caleb Baechtold [00:02:10]: My kind of take over my career has always been that mlops is only maybe half of the technology problem and a lot of it has to boils down to like organizational governance processes and things like that. So we're going to talk a little bit about some of the technical capabilities in Snowflake and why they support good ML Ops practices, but also some best practices for kind of scaling ML Ops across, you know, a large enterprise organization. So most customers and organizations that I work with when it comes to their ML platform and tooling are in kind of a state that's reflected, you know, in this top box here. Obviously at Snowflake we're working with Snowflake customers and prospects. And so Snowflake typically exists as kind of a central data platform in the organization. And the ML platform is usually something separate. Whether that's your preferred cloud service provider's ML solutions or another managed service, it doesn't totally matter. But this is sort of like the state that most of our customers, you know, initially started.

Caleb Baechtold [00:03:07]: And that's because Snowflake was born as a data platform first and then has kind of emerged in the AI and machine learning space. And what we typically see more than anything is that customers have built these kind of complex, stitched together production ML systems that require a couple of different things. One, pretty complex engineering processes to orchestrate across both ML development and production. ML deployments that span a whole different suite of various tools and data is moving across a whole bunch of different kind of disparate services. And one of the big issues that we see at the enterprise in particular is that this really exacerbates challenges around governance, compliance, auditability, all those kinds of things, which becomes super important. That enterprise scale so we try to build on the Snowflake side of things is a consistent experience in terms of tooling that ML teams would be accustomed to from any kind of best of breed ML platform. So part of that is embracing open source compatibility and providing compute abstractions to do ML at scale easily, but also a fully integrated and streamlined system so that you maintain a single governance and security and compliance lens over top of ML projects, the same way the enterprises do that today over top of data. And that includes a couple of different things, just even simple things like extending role based access controls in your data platform to encompass ML processes, whether that's feature engineering and feature store capabilities, or managing model objects and deployments, and who can modify and use and interact with these different kinds of capabilities, but also providing full lineage and auditability from source data all the way through to model deployments and model outputs, and giving you a consistent view across the entire ML lifecycle.

Caleb Baechtold [00:04:59]: So we built this platform to be fully integrated first and foremost. So from development and OP side of things, the full suite of capabilities interoperate and fully integrate with one another. So if you're building with a feature store on top of Snowflake and then training models and then deploying models and performing inference within this platform, you have full kind of visibility into how those different pieces, you know, stitch together and interoperate with one another. But importantly too, you're not having to know, orchestrate complex processes to make these things work together. There's tons of bespoke tooling out in the AML ecosystem and they do very, very specific things very, very well. But what we increasingly see is that as that kind of ecosystem within an organization grows, managing and orchestrating across this kind of multitude and proliferation of different services gets to be very, very complex. Scalability is hugely important for Snowflake all up and for our customers at large. We have the largest companies in the world are building on top of Snowflake as a data platform and they're looking for scalable solutions to machine learning as well.

Caleb Baechtold [00:06:06]: So that cuts across again the entire model life cycle from feature pre processing and data preparation to the model training itself with distributed processing to large scale inference, basically the entire lifecycle. There's a scalability question in terms of performance and cost, but also in terms of just managing hundreds of potential ML users, dozens of different lines of business that are, that are working with ML, all these sorts of things. Flexibility is also super important for us. So we've, you know, managed service providers, there's always been a debate around proprietary differentiation versus embracing open source flexibility. And our approach has been to support open source first and foremost to allow customers and users to build ML models the way they do when you're first learning how to do data science. Even for example, with all the typical open source frameworks, libraries, packages, et cetera that people are building for ML. We want to embrace all of that kind of community development and leadership that comes out of open source, while providing really the compute infrastructure and the governance layers to scale to the enterprise. Cost effectiveness then is obviously super important too.

Caleb Baechtold [00:07:19]: This includes everything from the infrastructure optimizations to, you know, to be able to run enterprise grade machine learning at scale cost effectively, but also provide really good cost visibility for platform owners and executive stakeholders to really see what is the ML footprint from a cost standpoint too. And trust this is the big thing. When we talk enterprise ML, ops, governance and security are really paramount and they get into those organizational challenges that I, that I mentioned before, where the same way that data and the data lifecycle gets governed in enterprise organizations should be extensible to support machine learning, model development, model inference, production pipelines that entire Model life cycle and the governance and the lineage and auditability and traceability throughout that lifecycle is super important to the enterprise customer as well. There's a whole bunch of features in the platform and capabilities that support end to end ML workflows. I'm not going to dive into these in a tremendous detail today, but that spans everything again, from sort of machine learning development to operational deployment and the corresponding compute infrastructure that's needed in the platform to power these things at scale. What I would highlight here from an enterprise MLOps perspective is that what I mentioned before with other platforms where your access control policies and governance layer gets extremely complex as that spans kind of a multitude of different services within the ML stack itself, but we decided to do on the Snowflake side is provide the ML specific abstractions within the existing governance frameworks that our customers build for their data assets and their data pipelines to encapsulate all of the ML kind of specific components that they're building with as well. So this includes everything from feature definitions and feature store concepts within the platform. Who can create and manage and modify different feature definitions, what data assets can they build, features on top of what other feature views that have been created in the platform can be consumed by different users or by themselves.

Caleb Baechtold [00:09:25]: This is all managed first and foremost through Snowflake's role based access control model, and that extends to model Registry and being able to create, conversion and deploy models, being able to consume those models, providing a really, really clear and granular lens into who can do what in the enterprise setting within the ML platform without being too constraining either. And I'll talk about that more in just a second. So there's specific tools around MLOps that are, you know, when we talk production grade pipelines that our customers are really adopting en masse. Those are of course things like Feature Storm for supporting scalability, discoverability, versioning of input features for models, the registry itself and the model serving capabilities. This is the single biggest way that customers get started with enterprise ML and Snowflake today is they're building models somewhere and they need to land inference results into their data ecosystem so that the business can fundamentally consume those and derive insights and make decisions. And Model Registry makes that very, very seamless by supporting any arbitrary ML framework that's out there and bringing it into a data ecosystem to support large scale compliant governed NL pipelines to produce inference results and ultimately drive some business value as well. Observe, observe. As well as observability, so doing automated, you know, metric capturing, input output, feature drifts, model prediction drift alerting over top of these, so that you have really good visibility into how models are performing in their actual production inference setting.

Caleb Baechtold [00:10:55]: But fundamentally, when it comes to the enterprise, there's like two kind of core objectives in place. One, get models to production in a way that delivers some sort of business value. ML projects sort of sit on shelves and die when they can't make it to a healthy production state that delivers on some sort of actual business objective. But also to ensure that ML practitioners have the flexibility they need to rapidly prototype, develop and then promote ML capabilities for the enterprise. One of the ways that we kind of uniquely guide customers to do this in Snowflake today is through very, very clear environment separation in the Snowflake platform itself. And this can be in two different models. So we have customers who operate with multiple Snowflake accounts themselves, where they have sort of sandbox dev accounts, maybe test accounts, and then their PROD accounts. But even kind of easier than that becomes within a single account doing environment level separation.

Caleb Baechtold [00:11:52]: Because all of these Snowflake ML objects, whether those are feature views, notebooks themselves for, you know, model experimentation, development, the registry inference results, the monitors over top of model objects, these are all permissioned database and schema level objects. And so it becomes very, very straightforward to have a sandbox sort of schema within your Snowflake environment, where data scientists have very, very flexible permission models and allows them to iterate and develop quickly and prototype and experiment, and they don't run into the constraints that they would have from a security standpoint of the PROD ecosystem itself, and then leveraging things like CI CD pipelines, we're then basically mapping and promoting capabilities built in the sandbox out to the production database and schemas within the Snowflake environment itself. Not only does this give you this like rapid prototyping and flexibility, which is important for just supporting kind of iteration and allowing ML teams to kind of push the boundary of what they're doing. It also makes sure that what happens in Prod is in a, you know, is within kind of the compliance and governance constraints that enterprises face, but also gives you really good cost visibility as well, which is hugely important for the enterprise. Most traditional ML platforms. It becomes very, very difficult to attribute cost associated with ML projects when you have kind of this enterprise level separation of dev and test and prod within the Snowflake ecosystem that extends to not only the objects that are being built, the models, the features, et cetera, but also to the compute infrastructure that's being used to build them. And so it gives you this really, really granular way to attribute resource monitors and budgets, which are super important for enterprise kind of data platforms and ecosystems and constrain maybe the dev sandbox environment in a specific way or attribute costs in a very specific way that is different from the prod ecosystem itself. It also gives you this way to potentially be cost effective through different SaaS pricing tiers between these.

Caleb Baechtold [00:13:57]: So when you go with a multi account strategy, for example, you can operate dev at standard or an enterprise edition or your prod environment, this may be a business critical and so you're paying different kind of price per credits across those ecosystems and lets you do dev in a cheaper way than what's happening in Prague, for example. So this is kind of, I think the most important thing from us is the capabilities are in the platform and we give you these governance and security frameworks to manage these assets at enterprise scale so that developers, data scientists, ML engineers can prototype iterate quickly. They're not hamstrung by access to data or different things like that. But at the same time you can govern and manage production deployments in a very seamless, straightforward way and in a fully integrated way that gives you full visibility, traceability, lineage across the of entire ML life cycle. So I'll just kind of throw out for folks who are interested in exploring and learning more about this. We have a handful of different quick starts, some YouTube demos, blogs, et cetera, that go into a lot more detail around some of the specific capabilities for production ML on the platform. So I encourage all of you to check these out and you know, we have the QR codes and the links here if you're interested.

Ben Epstein [00:15:09]: Amazing. That was awesome. I'm curious, Hamza and Simba, do you guys have any initial questions?

Hamza Tahir [00:15:16]: Yeah, I mean it looks super cool. I think one of the appeals of having something like Snowflake orchestrate also the ML ops piece of the whole, let's call it the entire data engine, is that your data is very close to your model development practitioners. Is there anything. I think what I'm curious about is are there some integrations that come out of the box already? Like have you already integrated with some MLOPS components? Like, I don't know, MLflow would be the standard one, I guess.

Caleb Baechtold [00:15:48]: Yeah, yeah, it's a good question. Yeah, so we do. It's extremely common for our customers again to be in a place where they have Snowflake plus other tooling solutions. Right. And MLflow is a very common one. Like I prior to Snowflake, every ML project that I ever built, we were using ML Flow like it's ubiquitous out there. And so for example, one of the things we built with model registry and the way to think about like again I mentioned this, but the way the most customers start adopting ML and Snowflake is by bringing models and inference pipelines to the data because right now they have these engineering processes of data needs to move to some other service and platform just to be able to basically do fancy math and churn through data where it makes a lot more sense instead to bring those models into to where the data exists and basically translate ML pipelines into typical data engineering tasks. So what we've done with my registry is added direct support for basically any mlflow PY Funk spec.

Caleb Baechtold [00:16:46]: So there's native support for like all the typical frameworks that most customers would build with. But we have plenty of customers that use MLflow as they're still experiment tracking engine, whether that's self hosted or through another SaaS provider and then basically use Snowflake as a deployment destination for those ML Flow models that's extremely common. We have kind of similar integrations throughout the stack just to meet customers where they are, where they start to get value out of Snowflake as a platform for sort of the production life cycle, but are still iterating or integrating or experimenting or developing in different tool sets. So that's been one of big things for us as I mentioned, just in terms of open source compatibility and interoperability that we try to not dictate the way that data science and ML development work happens, but give enterprise customers a seamless way to kind of bring ML into the enterprise ecosystem.

Hamza Tahir [00:17:38]: Yeah, that resonates a lot with. Yeah. What I'm going to talk about as well. So that's great.

Simba Khadder [00:17:43]: Yeah.

Ben Epstein [00:17:44]: Okay Caleb, thanks so much. I'm going to go ahead and share Hamza's screen and we'll jump into workflow orchestration.

Hamza Tahir [00:17:53]: Thank you so much for the invite. Always great to be back at the MLOps community. Yeah, I'm Hamza, I'm the co founder of ZenML. Today I'm going to be talking about laying the foundations for Enterprise MLOps and why I feel like workflows are at the center of it and I love the Snowflake presentation because it transitions really nicely. A lot of the things that Caleb said, basically we agree with at ZenML and we have a way of exposing those patterns also to our users, which I'm going to talk about today. But first just a bit about me for those of you who haven't met me yet. I come from a software engineering background. I've been doing machine learning engineering for close to a decade now, worked with hundreds of MLOps platform teams trying to build MLOps platforms for their internal customers, data scientists, data analysts, and based on those learnings before and even after we built ZenML, we open sourced the framework and we've been going strong four years now and have built up quite a community which are welcome to join if you're interested in what I say today.

Hamza Tahir [00:18:56]: So maybe I wanted to start a bit with Talking about why MLOps in the enterprise is still broken. So I feel like because we're talking about things like MCP and Genai, people seem to have forgotten that mlops is sort of the foundation of all of those things as well, or a lot of those things, at least the things we learned there, if not directly applicable, you can take and lift into your LLM Ops stack. I just wanted to highlight a few, let's say, latest figures. So one of the things which I find very interesting is that most mlops teams, as Caleb was saying, they end up stitching together lots of different components rather than just choosing one or two. So like 26% of teams said in the Tecton report from 2023, I think it got worse from there is that having a coherent top five deployment is still a blocker for 26. Like 26% of ML teams and also 99% of IT leaders want to have things like responsible AI controls because of things like the EO AI act, but only 35% feel ready. So while I just want to, this is more of a cautionary tale that while we think maybe, and you know, like rightly so, it's such an exciting time for gen and especially with startups and medium companies, SMBs, they were moving so fast, but the enterprise is still yet to catch up to all of those things and are still figuring out their fundamental MLOps stack. It is a multifaceted problem in the enterprise.

Hamza Tahir [00:20:23]: So a lot of the things I'm going to talk about today, I want to focus on the enterprise and not on any context because I think the enterprise context is very specific. So this is really targeted towards ML practitioners, MLOps people who are building for enterprise. And if you look at the enterprise, the scale dictates a few different problems. So we have on the left side ML and on the right side ops and you have basically data on one side and infrastructure on one side. So you might have, I don't know, your data lying around on some sort of Rack which is running on prem some OCI stuff, right? Some open cluster stuff, maybe like a kubernetes managed thing. And maybe you're not able to move that data into the cloud to process some of those things, but maybe the GPUs that are available to you are on the cloud. So those sorts of problems really arise at scale. And when you have that mishmash of different facets of problems across the MLOps workflow, then you run into trouble.

Hamza Tahir [00:21:25]: And just maybe a brief summary like some of the things Caleb also mentioned already, heterogeneous compute requirements. So as I said, you might have some sort of L4s, like somewhere H200 is like the other way is how do you get access to those things in a reasonable way to your data scientists? Also you have to understand it's a human problem. Not everybody knows everything. You have many different departments. It's spread out all over the world, sometimes globally for big companies. And there are silos, right? So and you end up wasting a lot of resources. There's also a legacy of tools that already exist. We're not talking about green field, we're talking really, really brown field.

Hamza Tahir [00:22:04]: So like probably 20, 30 years of legacy and you have different sort of data compliance concerns. And at that scale you're also afraid of being locked in, right? So you don't want to get locked into one particular stack or one particular provider, one particular infrastructure. And this is the sort of the setting where the M platform teams arrive eventually. So platform teams are tasked their mandate internally, whether it's a data platform team, ML platform team, whatever. Their mandate is to empower ML teams in this MLOps jungle, right? So across the entire pipeline from pre processing data, maybe it's running on Snowflake somewhere to deploying again maybe like now that Snowflake offers that you have these capabilities to deploy the models there as well. You have a lot of steps and you have a lot of tools and infrastructure that could potentially be used, a lot of data that could be moved, a lot of code that could be moved. And this is where I think the complexity really arises. I think for a lot of the folk listening, this is not news.

Hamza Tahir [00:23:06]: I just want to emphasize how difficult it is still in the enterprise. So platform teams arrive at this question eventually, most of them. So either they have to enforce the stack or they have to deal with stacks problem, right? So either they say you only use, I don't know, airflow and you use MLflow and that's it and nobody is allowed to do anything else, or they start Being benevolent and saying, oh, you can use everything you want. We have a distributed setting and both things are problems, right? But I'm going to talk about the first option first, which I think is maybe not the right for most contexts in the endpoints, I don't think you can enforce a stack. I don't think you can just say this is one tool, this one stack, because the context is different, right? It's very different running a real time streaming service on the edge on a car, versus having a classification model or a fraud detection model or recommendation system model. It's very difficult to have the same exact way to doing those things with different people. Some people come from PhD backgrounds, some people come from software engineering backgrounds. So you have mixed expertise.

Hamza Tahir [00:24:16]: Again, data is disparate, spread out all over the place and you want to not be locked into vendors. Maybe your business logic remains the same, but you don't want to be locked into vendors. So for me, if I was given this choice today, I wouldn't enforce stacks. I have seen people do that. But at a certain scale that does fail and people don't listen to people end up bypassing the teams a lot of the times. On the other hand, if you go towards TaxPRWL, I mean that has its own problems. But I think what we must acknowledge is that there's a certain separation of concern that I think everybody would still agree with. So for example, you don't want to, perhaps you don't want to moderate giving access to compute to people, but you might might want to govern who has access to what compute.

Hamza Tahir [00:25:08]: Same with data, right? You might want to govern access to what data people have access to in what context and where it can be deployed. But on the other hand, you don't want to have so many hurdles in your data access that nobody can actually do anything about it, even if it creates tremendous value for the enterprise. Same with tooling. So I think the way to think about it is there's a certain governance envelope surrounding your machine learning development in the enterprise. And this envelope, the softer you can make it like a blanket, the more people feel comforted by it, rather choked by it. So that's really where I think the empathy which would need to approach this problem. And that's what we're trying to do at ZML for the last four years and we have worked a lot with enterprises in that time. Best described, ZENML is a central framework that standardizes how workflows are deployed on your infrastructure.

Hamza Tahir [00:26:05]: Workflows are really, or pipelines are just really the center of zml. So in a lot of ways it's like a classical workflow orchestration tool, but it adds the stack component on top. And that makes us, I think, a bit different from what you might see in the market. It lies in the middle of machine learning and ops. I mean, oftentimes we do end up working with ML engineers or ML ops people, but sometimes data scientists pick us up and just roll it upwards. So whenever you have a problem where you have sort of a friction between infrastructure and internal users who are deploying workflows, I think XENML can create tremendous value for you. How is that? So in xenml we have again this concept which we encodify as stacks. A stack is basically the description and the access of your infrastructure could be an Argo cluster running on Kubernetes on Prem.

Hamza Tahir [00:27:03]: It could be SageMaker running on some certain AWS account and the data needs to be stored on a certain bucket. Or it could even be your experiment tracking tool like Rates and biases or MLflow, where your model registry models are stored. The platform engineers create and provision and govern these stacks in zml and the data scientists write workflows that consume these stacks. So here, for example, I mean, again, we are classically exposed as a workflow orchestration pipelining tool. So you can see the step decorator with the configuration of the infrastructure in the decorator itself, in this case mlflow. So you can instantiate the experiment and you can link into it in your code. You're returning a machine learning model. Maybe you want that persisted in a model registry.

Hamza Tahir [00:27:56]: Maybe that model registry is, I don't know, clearml or something else. And maybe you want this actual step to be orchestrated on a Kubernetes cluster where you have two GPUs lying around, but maybe the next step needs to run somewhere else. Right? So. So those sorts of questions, I think, get easier when you use an abstraction tool like XenoML. The goal is that XenoML handles all the rest. So building the Docker images, compiling the environments, deploying these things, creating the experiments in these underlying tools. And that's really where we see a lot of magic happening. Some aspects of what I want to highlight today is because I do think the workflow underlies this whole, underpins the whole thing.

Hamza Tahir [00:28:37]: And I want to, like when we thought about workflow orchestration, I know four or five years ago when we started developing ZenML, these were the things we wanted to enable. So we think of ourselves as a framework rather than a platform. I think there's a subtle difference. Maybe some agree with me, some don't. Framework for me is more like a Ruby on Rails or I know you could argue like a next js, something which is more opinionated in certain abstractions but not opinionated in the implementation of those abstractions. And for me, a platform is the concrete implementation of that framework. So for me that's the subtle difference and that's how we approach this game. So we have a open architecture.

Hamza Tahir [00:29:24]: So all of the components which you can bring into your stack component, let's say you have a. I don't know, you have a Tomorrow you have a HPC cluster which is processing volumetric models somewhere or a different context and you want to plug that into your orchestration pool. You can do that. And maybe you want to have Slurm orchestration for some reason, or maybe you want to store your data in a MINIO cluster rather than an S3 bucket. Those things you can plug in to XENML and also extend it by adding a few classes inside the program. That way you have strategic flexibility. All those things that I talked about, you can switch between them and you're not locked in and you have governance, right? And I think governance is a big part of enterprise. I think when we saw the Snowflake presentation, that was also.

Hamza Tahir [00:30:09]: That's why we keep seeing roles, access, permissions. These things might sound boring to a startup person who's just training zillions of parameters model, which is equally valuable. But you can't just do that in the enterprise when you have regulators and other things that you need to be held accountable for in front of your stakeholders. So yeah, just, you know, a small, a small example of this. You can register a component, you solve many different types of components. Orchestration is just one. This is an airflow orchestrator that I registered. You can specify where the dags run or you can register.

Hamza Tahir [00:30:45]: It can even have feature stores, model deployers, all those things. We have a lot of integrations off the back that we ship with, but as it's an open source standard, you can sort of plug into it as you will. So if you don't support a particular experiment tracker, you can bring your own and then you can combine these things sort of like LEGO bricks inside modular stacks. So here, for example, I just made an orchestra, like a particular stack which I called mystack, with a particular AWS orchestration piece and I just shared it with my colleagues. And this sharing piece is very critical, right, because you don't want this stack to be accessible to everyone. So Maybe you have a pool of 20h-200s cost you a couple million dollars. You don't want that exposed to all of your data scientists. Maybe, maybe you just want to expose it to a certain project for a certain period of time that expires afterwards.

Hamza Tahir [00:31:34]: So those things you can start governing through ZenML's interface. Today many people run ZENML in cloud and production, sometimes with single clouds, because even single clouds have multiple stacks, sometimes on prem, sometimes with a combination. Right. So you can combine those things as you wish. So if some teams are using different things, you can plug those in the same way. You still have the same dashboard, the same way of looking at it. And we've seen massive enterprise deployments now after four years. So we have experience dealing with multiple workspaces.

Hamza Tahir [00:32:06]: So here's an example where you might have, you might want to give certain accesses like snowflake schemas, maybe your airflow DAG has access to your snowflake. So you want to give that stacks access to Workspace 1. Workspace 2 is maybe in another region and the third one maybe is databricks or something, I don't know. Everybody has different clouds, multi clouds, multi tenant. And all of your pipelines and your models are sort of structured around that. So you have a clear link between infrastructure and yeah, your machine learning development. And then you can expand that of course across teams. So if you, you know, we, we have had examples of people who have hundreds of data scientists using it and then you have different teams that you stack together.

Hamza Tahir [00:32:46]: And this is also very interesting because you can really govern in a central way. An example of this is Adeo Lira Merlin, one of the biggest retailers in Europe, E commerce and unreal in world as well. And they have decreased their time to market from 2 months to 2 weeks with ZenML increased rapidly their models in production because they ended up standardizing the way they produce, I believe in their case Vertex and they're running on Kubernetes and they've standardized the way these things are done and they've found it very easy to deploy these different workloads depending on their data contexts to different stacks. And data scientists have become more autonomous in doing that. And the ML platform team is very happy because they have a central place where they can manage that. And that's it for me. So if you are building MLOps in the enterprise and you resonate with some of the challenges that we talked about today, please scan this QR code. We can do a personalized demo for you.

Hamza Tahir [00:33:43]: I'd love to Talk to you in any case, even if you don't want to buy the product or use the open source product. I just love to get to know you, learn more about these challenges in depth. I always love talking to more platform teams. And yeah, at this point, open to questions.

Ben Epstein [00:33:57]: I have like a side question that's maybe not related to the core of ZenML, but I'm curious. One thing you had said in the beginning was that like enforcing a specific paradigm of ML ops doesn't like scale ZenML. Maybe what you're saying is that it helps create a standardization or different sets of standardizations across enterprises in that enterprise space are like you showed, creating an airflow orchestrator with like, you know, what we call click ops. Like do teams typically use ZenML in the enterprise with Terraform or with like some set of YAML or scripts? Or are they creating it in like a click in a UI forward way?

Hamza Tahir [00:34:38]: I would. Mostly people go with Terraform. The UI click stuff is mostly just to get started quickly. We have a ZenML Terraform provider which exposes our entire API. So if you go to the Terraform registry, you can actually very easily just like plug in our provider and you can just give. For example, one of the things I haven't talked about at all, which I should have now that I think about it, is access management. Right. So we have these things called service connectors that broker access to underlying things like service accounts and GCP to your end users.

Hamza Tahir [00:35:12]: And you can plug all of those things in and it's basic hcl, you can use that. And I think this is the most common paradigm. But I have seen Eric was doing something like Eric Reddoch. He was. He built a Pulumi provider for ZenML that was quite. Yeah. So we have different ways of doing it. Yeah.

Ben Epstein [00:35:30]: Does anybody use Zen ML as almost like a. What's the right term? Like a. I don't know what the right term is, but a tool to migrate, maybe a migration tool from one engine to another. Like Zen ML connects to everything and a team is on Kubernetes but they're maybe moving off of Kubernetes or they're on Databricks and they're moving to Snowflake or like whatever. Has anybody used Zen ML just as that pipe in the middle, like leverage ZenML abstract out the database and then just rip out one and put in the other?

Hamza Tahir [00:35:58]: Yeah, yeah, yeah, it has happened. I mean I don't think the intent was to migrate. When they went for xenml, I mean they had a Future eye on the lock in thing. But we have seen massive migrations, to be honest, when people have moved between providers. So yes, people have used us like this. I wouldn't say that's the main purpose of the tool is just to migrate. Yeah, yeah. But I think it does help.

Simba Khadder [00:36:21]: Yeah.

Ben Epstein [00:36:21]: That it's very funny. Yeah. Definitely not the intention, but if you, you have connectors to everything, it'd be, it's funny if people would ever use it for that purpose. Simba and Caleb, do you guys have any questions for Hamza?

Simba Khadder [00:36:34]: What's an abstraction that you have that is like a hot take, if that makes sense. Like what's an opinion you believe that maybe isn't as obvious?

Hamza Tahir [00:36:45]: What a great question. It's sort of a question I would ask in an interview or something. I think the best abstraction, honestly still ends up being orchestration because at the end of the day, and it feels like very obvious. Right. You have so many orchestration workflow engines. But the combination of orchestration and artifact storage and the way we do it, I mean, I can get into details, you know the doc, like how, for example, the docker settings are compiling to the docker image, what parent images to use, how do you expose those, or how do you store or like materialize artifacts and version them? I think those two we have built in a way that resonates with a lot of data science profiles, at least. I'm not sure it would resonate with the data engineering side so much, but I think, I mean, if you simplify down to that, you have this model artifact, you want to just transport it into a registry, you want to track metadata around it and you want to orchestrate compute, which is heterogeneous. I think that problem we solve really well in our abstraction.

Hamza Tahir [00:37:47]: So you can check out those. I think we do it slightly differently than Airflow has thought about it.

Ben Epstein [00:37:54]: Hamza, can you talk a little bit about your security compliance features of ZenML? Somebody was asking in the community.

Hamza Tahir [00:38:02]: Well, I mean ZENML Pro, the paid version is SOC2, ISO 27001, et cetera. But I think the question maybe was geared towards all the governance stuff that I was talking about. So yes, you can just like set up roles. Like we have a full permission system and RBAC system that integrates into stuff like auth0 or all the auth protocols. So typically what happens when we do an enterprise deployment is you map your groups in your identity provider into ZML groups with teams and all those things. And then if you, for example, kick out somebody on Your Google group, they get kicked out of the team, so they lose access to certain stacks. So in that way you have a very centralized way of managing access, which is a big part of security. And then we have other things like sharing visualizations, tokens, and all those things that expire certain ways.

Hamza Tahir [00:38:56]: We have service connectors that I hinted at which give people secure access to underlying infrastructure like a Kubernetes cluster for a short period of time. So yeah, there's a bunch of security features like a dog pod.

Ben Epstein [00:39:10]: Sweet. Amazing. We are. If we want to let Simba talk and also give him time for questions too, I think we have to jump over to Simba's talk.

Hamza Tahir [00:39:21]: I just transition with the. My feature form. Sunglasses.

Simba Khadder [00:39:25]: Nice.

Ben Epstein [00:39:26]: Oh, perfect. Phenomenal.

Hamza Tahir [00:39:29]: Simba was very kind to give our team so over to feature form. We're looking forward.

Ben Epstein [00:39:33]: I was trying to figure out how like which of these three products is the. Is the tool that combines the other ones, but they all actually combine each other in very weird ways. And I can't figure out which one is at the bottom. They're all sort of at the bottom, which. Which is kind of hilarious. There's no good order for this talk because they're all very well fit together easy.

Hamza Tahir [00:39:55]: All right, thank you for having me on.

Ben Epstein [00:39:57]: Yeah, thanks. Okay, so the tool that combines the tools. That combines the tools. Let's hear about Feature form.

Simba Khadder [00:40:04]: Yeah, it's like that's. We love abstractions in the MLF's world. So.

Caleb Baechtold [00:40:11]: Yeah.

Simba Khadder [00:40:12]: Good morning, evening, afternoon. Today I'm gonna be talking about feature platforms, feature stores, but I'm gonna give you a little twist. I'm gonna talk about what we believe that featureform is gonna be the next generation of feature platform, which is really structured around mcp. Let's start with the basics. How we do what is a feature store, a feature platform, how we think about things. I'm gonna start here and I'll get into what I think the next generation is. So this is where I think most people are today, which is that data scientist 1. Most ML is just data engineering.

Simba Khadder [00:40:45]: There's a ton of it, very, very clearly. And even though we have production grade systems, we have Snowflake, we have other tools that I won't mention. A lot of people just use Pandas, Polars, Duck tv, et cetera, locally for a lot of things. But just kind of the process that as data scientists we learned. And it's how we do experimentation after that. We have to get these things in production. And so you start having to worry about real Time features, incremental updates, you might need to start worrying about low latency serving. With inference store, you might need to worry about on demand features which are feature transformations that happen on the client like in the last mile.

Simba Khadder [00:41:31]: So there's this like big gap, this chasm between production and experimentation of features and data pipelines and that feature form, our view is that you should def. You can define your features as code. So feature forms approach is you as a data scientist. You in Python will define your feature transformations in the resQL as data frames or as just pure Python. You can do that in a notebook and apply it and work locally. As a notebook, everything can transition to and from. We use arrow and arrow flight. So you can easily use things even in Snowflake and bring it locally very quickly and run data frame transformations on it.

Simba Khadder [00:42:18]: And also we kind of take a terraform like approach. So for production you can start defining your features as code, perhaps in a directory of Python files you can see a plan and feature form will actually deploy it on your infrastructure. So feature form provides the framework where you can define your features as code, but it still runs on Snowflake, on Dynamo, on all this stuff. So it's running on top of your existing data infrastructure, but it lets your data scientist experiment as they would locally with Pandas or something else. And then we will make that into production grade data pipelines wherever it is. Like in Snowflake for example, all of the tables that we create become iceberg back dynamic tables. And we also handle materialization of those updates so that every time those tables update we can actually copy them into Dynamo. We also support databricks, we support clickhouse, we support most data infrastructure providers.

Simba Khadder [00:43:19]: Iceberg is like a core piece of feature form. We actually about, about a year ago now we rewrote all of featureform to run on Iceberg. So every table we create becomes an iceberg table. Why this is really nice is in production, all of the, you know, generic data providers read and write iceberg. So they work quite well with it. And for other workflows like if you're working locally, if you're doing some other thing, et cetera, like the iceberg. Our views that the iceberg table is kind of the revolution that no one's talking about. Everyone talks about AI and LLMs, but I actually think that one of the most impactful things that's happening right now is that almost every company I'm talking to is in the midst or just finished or is planning to migrate all of our tables into iceberg and so with Feature form, as a data scientist, you don't have to worry about and understand all the complexity of Iceberg.

Simba Khadder [00:44:11]: You also don't have to learn how to take full advantage of Iceberg. Iceberg has a lot of functionality like Copy on Write. It has the ability to emerge on Redrever and has the ability to do snapshots. You can roll back data sets. There's a lot of really powerful functionality of Iceberg, which featureform kind of just gives for free. You just define your features as code. Feature form will turn it into a production grade data pipeline. It fits into a normal CICD workflow.

Simba Khadder [00:44:40]: Again, you can use a notebook in development, you can experiment, you can play around with things, you can use Duck db, you can use Pandas, you can use these things to just kind of quickly iterate on Deb. You can edit your Python definitions, your feature definitions in a repo and run a typical CICD workflow to get your feature pipelines into production. So that was quick by design. We've spoken a lot in the past about Feature Store. We have a lot of content written if you go to our website featureform.com we also do a lot of webinars around it. So if you haven't caught any of those, you can also find those on YouTube. Today though, I wanted to change it up a bit. Feature platforms are a key and I think at this point it's pretty well understood as a critical part of any ML op stack.

Simba Khadder [00:45:27]: Feature form is an open source solution for it. It works nicely with providers like Snowflake. But our view, and maybe this is our unique hot take at featureform, is that we're about to be in the midst of a massive transformation. Every feature platform that exists today, except for Feature form, is oriented towards classical models which have features that are kind of hard coded into traditional models like forest linear regressions, deep learning models, but we don't really see them used today in Agentic now on workflows and I think that's going to change. And here's our view why. When you're building the AI agent, you need it to be an expert on your users. The secret to doing this is actually structured data. Now, if you've built anything or you've read really anything about LLMs in the past year in production, you will have read about rag and unstructured data and how that's the key that definitely has a place in the production workflow.

Simba Khadder [00:46:31]: I think that that's more clear and has been really hashed out. I think it's very Rare to see people talk about and use structured data. And that might just seem like why. And I think to show you, I want to talk about what rad might look like if you're building a customer support bot. So if you're building a customer support bottom with rag, you typically first begin by indexing a mass, like some sort of massive text corpus. For most customer support bots, that tends to be like a help center because it needs to be generic. You want it, you don't want to do it per user. Right? You kind of do it across everything.

Simba Khadder [00:47:09]: So that when someone asks, hey, my doordash is taking forever, I've been waiting for an hour, it's going to go to the help center, it's going to find relevant snippets, and it's going to respond something like this, Hi, I'm really sorry your order is taking so long. Here's some typical reasons a dasher may be delayed. Now, I want you to imagine for a minute that you ordered some food on DoorDash and it's been, you know, you've been waiting for an hour and this is the response that the AI gives. You would be terribly unhappy. And you could imagine. This generalizes to a ton of use cases. This is, this one I like to use because it's very clear. Now, where would structured data fit into this? Now, when the user asks, the agent or the LLM should be able to go through structured data like the same way a human would.

Simba Khadder [00:48:01]: Like, you're going to click on the dash order, you're going to click on the restaurant, you're going to look through all the data, the restaurant status, estimate time, et cetera, and you're going to give a personalized answer to the intent. So structured data is actually the key to have truly personalized workflows with agents and LLMs. The funny thing is about this problem I just Talked about is DoorDash already can predict how long until the dash arrives. So clearly it knows why. Or some model in DoorDash knows why. Why don't we provide that same information to the agent? It's in the feature store. Well, they aren't able to, but with featureform, now with MCP and how we've implemented it specifically, it's not possible. Featureform's MCP interface enables agents to discover and use relevant and authorized structured data to solve for an intent.

Simba Khadder [00:49:00]: So, for example, if a user asks that question, the agent will use feature form, kind of like a directory where it can look across Redis diamond, wherever all your production feature views and find relevant attributes to solve the problem at hand. So it can look and say here, I have some features on the dash on the order, I have some features about the user, I have some features about the restaurant, whatever else, all that becomes available to it and they can decide what it needs to solve the problem at hand. We think that this is just fundamentally how things are changing. We think that things are moving to a place where we need people are going to give their intent in text and MCP is just a key function of that. We were actually really surprised though, given all the hype MCP has how impossible it was to build an actual production grade MCP implementation. So we open source something called MCP Engine. We actually make no money off of this. It's a way to build MCP servers.

Simba Khadder [00:50:04]: It's stateless as auth as package management and it's fully backwards compatible. So please check that out. We just actually launched support for AWS Lambdas today that actually enabled us on Feature form to build this interface. And the thing is, it's not just like you put an MCP server on Feature form and you're done. That doesn't really get you quite there. To get this to work, what we needed was a semantic catalog one. So the agent needs to be able to not just know, hey, I have, you know, these 55 attributes. It needs to know what each of these attributes are really like, what they really mean semantically, so they can decide what to use.

Simba Khadder [00:50:42]: Once it decides what it wants, it needs low latency serving. You need user propagation. You want to make sure that like if you use text to SQL, it can query everything it can accidentally query.

Caleb Baechtold [00:50:52]: Good.

Simba Khadder [00:50:52]: It shouldn't. With User propagation and Feature form, we make sure that the features it gets are only relevant to the intent at hand. We don't trust the agent to not look at data it shouldn't look at. It's impossible for it to do so. We have agent level governance, real time updating features, audit logs and rbac. We're the only place in the enterprise that collectively understands the meaning of all these attributes and features and provides them in an accessible way via MCP with production quality. So featureform brings MCP to the enterprise. That's all I have today.

Simba Khadder [00:51:26]: Feature form is also open source. We're starting to release a lot more about our MCP implementation. If you're interested in Feature form, either for your traditional models or your agents, feel free to reach out or join our community. Thank you.

Ben Epstein [00:51:40]: I thought we were going to show us, I thought you were going to show us that demo of, of the LLM knowing exactly what was wrong with the doordash order.

Simba Khadder [00:51:48]: It does, yeah. There's a slide with the, with this one. You can see it's accessing live order details. It's checking the dasher location. It's actually use it like it's. The agent itself is like I'm going to go into feature form now and find all the information I need to solve the intent versus in the rad situation.

Ben Epstein [00:52:06]: Yeah, you're gonna. I think next time we have you on, you're gonna give a live demo of, of that flow happening happily.

Simba Khadder [00:52:14]: Yeah, would love to.

Ben Epstein [00:52:15]: That's very cool. So I mean you gave a lot of large hot tapes which, which I always love from, from you and from the feature 14. I'm, I'm curious like how you're. I mean this is like a. Almost a different and maybe parallel vision to like Iceberg becoming the universal data store. Right. Like Iceberg becoming a data store and MCB becoming the thing that people leverage to let agents go out in the open are kind of parallel tracks and you're sort of playing in both of them. Is how are you thinking about that? How are you balancing these two seemingly gigantic level shifts in your opinion in the industry?

Simba Khadder [00:53:02]: Yeah, I think what is true is that data has never been more valued. The power of data scientists and AI engineers or whatever you call them bring to enterprise, it's massive. And so the questions become, well, how do we do this? Well, because unfortunately on top of all that, the amount of complexity we've added to this stuff, where now you have LLMs, you have traditional models doing recommendation systems and fraud, you have your BI companies, you know, the C level person's building an agent for X and then this person is saying, hey, governance doesn't let you use ChatGPT for anything. It kind of chaos. And so what we kind of like is the standards. When the standard emerges, we try to find a way to kind of our users. And so Iceberg I think is a more obvious one at this point. I think it's kind of very clear that everyone has kind of accepted Iceberg as like the proper way to do things, which very good for us because we actually before that kind of committed fully to Iceberg.

Simba Khadder [00:54:04]: And then MCP is probably more of a hot take today. But Iceberg was a hot take like a year and a half ago when we started doing it. And like now it's obvious and I think it's the implement. I think people are kind of wrapping their head around it. And there's a bit of too much hype for what it really is which is just like rest for agents and like it's not even as good as rest in my opinion. But it, it, everyone uses it so it's supported and that's all that matters. But yeah, I, I, I actually don't. Well, we're just empowering AI and ML teams to leverage their data in a way that is efficient, is iterative, has monitoring, is governed, that's what we do at our core.

Simba Khadder [00:54:40]: These are just all the, we just, I like to throw out buzzwords where people pay attention on.

Caleb Baechtold [00:54:46]: I think that's, I think that's super important though because like even in our, you know, on our side, you know, the customer demand around gen is, you know, it's the, the C level board saying we gotta do gen AI somewhere and people come, people think very like user centric around their initial gen initiatives without thinking critically about the data side of like how do we actually make the data usable by these things? It's a, it's like a very, I think it's a natural thing just because of the way that like ChatGPT captured imagination for people and the way, you know, normal people interacted with it. It is like that's where this goes. But the enterprises I think are coming around to the idea that there's, there's a lot of like backend data stuff that still needs to happen to really like get to the future that people are envisioning when they're you know, thinking about these user experiences and stuff like that.

Ben Epstein [00:55:39]: Kaylee, how are you thinking about like how are you and Snowflake thinking about Iceberg? I know that Snowflake supports Iceberg but like how are you thinking about your customers leveraging Iceberg with like internal Snowflake tables versus the external tables on Iceberg and how do those two things kind of play today?

Caleb Baechtold [00:55:54]: Yeah, I think like what we're seeing right now, the goal like product wise in particular is a Snowflake user should not see any sort of difference in the platform and performance in anything whether they're using native ftb, Snowflake or Iceberg tables like they should it there, there will I think be a case for native table formats in Snowflake for some time probably. But like to your point Simba, like Iceberg is becoming the standard and so from a product standpoint like the end user shouldn't see any difference in the way that they're using Snowflake with those tablesports, the way the other tools are also operating on top of them, I think like the bigger challenge or like the big topic becomes then like, where does your iceberg catalog sit and who does that management step and how do different tools, like what's the right way to embrace open catalogs that also operate across the tool set? So we've introduced polaris as an open source catalog project to try to start solving for that and let that be the, you know, the, the structure that snowflake uses as a snowflake manager of the catalog, but also that other tools and things like that could start to interoperate with too. So I like at the bare bones, I, the iceberg at the table format is like the clear direction. And from performance standpoint, like we want to make sure that everything's all things are equal. And now the kind of big question is like, okay, what's the catalog story that has to sit on top of that and where does that live and what's the standard for that?

Ben Epstein [00:57:23]: So yeah, the catalog has been interesting because I've been watching from the sidelines, like preparing to start leveraging iceberg and you can le, like I, I didn't even realize that no one was talking about it. I didn't even realize that you could just use a postgres table like, like as a catalog. Like that's an official format. I have been always curious, like why wasn't that the thing that we were always doing? Like why was everyone going to glue. I, I, I'm almost curious like, why from, from experts. Because I'm, I'm definitely not an expert in in iceberg at all. I'm curious why you think iceberg won over delta in such a deep way, Especially with delta having at least like in this moment, at this whatever, April 2025 having like right now, better python support. Like duckdb has predicate push down in delta.

Ben Epstein [00:58:13]: And I don't think it does yet. An iceberg. And delta's super easy to get started with because there's no catalog. Like, why did iceberg win that battle seemingly so vehemently?

Simba Khadder [00:58:25]: I can speak to it a bit. At least my view. I think there's two parts. One is a bit technical, which is there are a lot of things that we love about iceberg. One simple one, for example, is if I want to, let's say, partition a table in delta, I have to actually create a column and say this is a column I'm partitioning by in iceberg, I can say, hey, hash this or do this or use this to partition or use the day of the date to Partition. There's like all these little things that this iceberg is just so much nicer for in my experience. And there's a lot of like, like the merge on Reed support I think is a lot better than what else has a lot of like specific tactical things. But I also think that there's also just like a higher level thing which is Delta became so attached to databricks that they became inseparable and I think that I won't speak for Snowflake but I could imagine as a third party that like there's probably a bit of like hey this is you know, Switzerland.

Simba Khadder [00:59:25]: Like you know, this thing's really strong. We can all agree that this is something that's good and I think the enterprises kind of had the uptick as well.

Caleb Baechtold [00:59:31]: Yeah, I think that's the, the trend was open table formats and like I think what it boiled down to is to your point exactly that like Delta became less and less of a truly open table format because of the, the prerequisite and dependency and attachment to the databricks side and that like I'm a Snowflake employee saying that so I know that carries like a certain connotation a lot of times but. But I think that's like the reality that a lot of enterprises saw too is like they're just. The open table formats are exciting to avoid vendor lock in and if my open table format locks me really into a particular vendor, it's not really doing what I want an iceberg provide the alternative to that on top of all the technical stuff that Simba mentioned.

Ben Epstein [01:00:10]: So I, I personally as somebody starting a new instead of very excited about open table formats like I'm, I'm. I've been waiting, everyone's been waiting but I've been so stoked to just be able to see data storage in an object store with compute and transformations that can run anywhere on top of it. On the iceberg side I'm still waiting for predicate pushdown on the DuckDB querying ideally also DuckDB writing to these, these sources is. I don't know why this has not happened but I'm dying for it. And then also Python like Python with Delta RS is. Is just actually so easy and I, I wish iceberg. I'm sure it will catch up. There's no question in my mind.

Ben Epstein [01:00:54]: I'm just, I'm just, I'm just waiting for it like I'm waiting for it to be the similar three lines of code to get delta tables built in in S3 and GCS to happen the same way in in Iceberg and just connect to my supabase or whatever as my catalog. Like that world is I feel like so close.

Simba Khadder [01:01:13]: I agree. Yeah. I think one maybe the downside of Iceberg is they committed really hard to do JVM ecosystem. Python was such an afterthought where I think yeah it's very painful. I mean obviously now it's so used that it doesn't matter what they think. Like it's the kind of. It's the hands, you know been pushed and PI iceberg is getting a lot better. But yeah it's still.

Hamza Tahir [01:01:32]: It's.

Simba Khadder [01:01:33]: We use it all over the place and it's. It's not our favorite. It's getting better. Yeah.

Ben Epstein [01:01:39]: I mean Polar's has been pretty huge in that ecosystem For PI it's not really. I mean it's rust but it's python on the rust that that's been pretty huge. The ability to write from polar like even just opening the stream and streaming through a pollers data frame into GCS has been enormous. Yeah. Just a great conversation. We'll definitely have everybody back on for some deep iceberg and MCP conversations. Thank you both and also Hamza for joining and thanks again for Snowflake for sponsoring. It's a great event.

Ben Epstein [01:02:11]: Thanks guys.

Caleb Baechtold [01:02:12]: Thanks for having us.

+ Read More

Watch More

MLOps Community: LLMs Mini Summit

Posted Nov 22, 2023 | Views 1.2K

# LLM Fine-tuning

# Large Language Models

# Weights and Biases

# Virtual Meetup

Innovative Gen AI Applications: Beyond Text // MLOps Mini Summit #5

Posted Apr 17, 2024 | Views 1.1K

# Gen AI

# Molecule Discovery

# Call Center Applications

# QuantumBlack

# mckinsey.com/quantumblack

AI Innovations: The Power of Feature Platforms // MLOps Mini Summit #6

Posted May 08, 2024 | Views 411

# AI Innovations

# Tecton.ai

# Fennel.ai

# meetcleo.com