MLOps Community
+00:00 GMT
Sign in or Join the community to continue

Why You Need More Than Airflow

Posted Jul 21, 2022 | Views 793
# Airflow
# Orchestration
# ML Engineering
# Union
# UnionAI
Share
speakers
avatar
Ketan Umare
Co-Founder and CEO @ Union.ai

Ketan Umare is the CEO and co-founder at Union.ai. Previously he had multiple Senior roles at Lyft, Oracle, and Amazon ranging from Cloud, Distributed storage, Mapping (map-making), and machine-learning systems. He is passionate about building software that makes engineers' lives easier and provides simplified access to large-scale systems. Besides software, he is a proud father, and husband, and enjoys traveling and outdoor activities.

+ Read More
avatar
Demetrios Brinkmann
Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More
avatar
George Pearse
Lead Machine Learning Engineer @ Visia
SUMMARY

Airflow is a beloved tool by data engineers and Machine Learning Engineers alike. But when doing ML what are the shortcomings and why is an orchestration tool like that not always the best developer experience? In this episode, we break down what some key drivers are for using an ML-specific orchestration tool.

+ Read More
TRANSCRIPT

Ketan Umare [00:00:00]: Hey, my name is Ketan, and I been having coffee recently with oat milk. I love oatmeal plates.

Demetrios [00:00:09]: George, we got Ketan Umare on today, and I'm gonna confess to you something before we even start talking about that awesome chat that we just had with him. I've been going into the forest recently, and where I live, there's wild raspberries everywhere. So basically, if you see any raspberry around my mouth right now, that's because I've been on a diet of strictly raspberries. Maybe it's gonna be a new fad. It's the raspberry diet. Stay healthy. Everybody out there, and eat your raspberries. Dude, it's amazing.

Demetrios [00:00:43]: I'm. Every time I get one, it's like, oh, so good, and I love it. This is the first time that I've ever found something like this, and it's awesome. So, anyway, we talked with Ktan today. We talked all about, like, basically orchestrators in the ML space airflow flight. What it looks like. What were your thoughts on it? Do you have any takeaways?

George Pearse [00:01:09]: Very knowledgeable guy. He knows his stuff in the space. It sounds like they're providing a framework to make sure you get things right in the ML workflow area. Lot in the way of the same on the analytics. It's rare for the same sort of level of depth to be going into how you orchestrate machine learning at scale.

Demetrios [00:01:26]: Yeah, that's so true. I love how he talked about how things can get messy a lot quicker than you would expect. And I feel like anyone who's ever gotten to a significant amount of scale knows that, and they're just like saying, yeah, you're preaching to the choir, man. I totally understand what was. I can't remember his exact words, but it was like something around how, you know, you start with something simple, and it's very, very tiny deployment, and then quickly it can build, and now you start, you've got these dependencies, and next thing you know, what was it? I think it was 600 pipelines are running together in lyft, and you can't really tweak any of them because you don't know what is dependent on the other. And I. Things break all the time, and it's just a headache to try and sort out where it's broken or why it's broken, or if you change something, if you update something, it can break. And I feel like that is something that, as you're starting out, you think about, but it's not really in your reality.

Demetrios [00:02:36]: Right it's not something that you're like, oh, well, that's not really a problem that I'm facing right now, but I would like to keep that in my mind in the back burner. And then I, like he said, you can just wake up one day and boom, it is your reality, and you got to go and figure it out. So it's cool that he has been thinking about that. The other part that I loved, and I want to hear what you think about the signal idea that he had.

George Pearse [00:03:00]: Yeah, no, it sounds. I mean, trying to get reinforcement learning into production is a famously difficult sort of problem. It requires you can't do the same sort of offline validation to the same quality that you can with deep learning, because you need to be actually making a decision based on the inputs on what your models, how your model's acting. So there's a whole new unique set of technical problems to be solved there. And if flyte sort of distinguish themselves based on that sort of feature set, then I think they could probably go really far. Probably the next big thing for machine learning to innovate in.

Demetrios [00:03:34]: Hey, Todd, my man. The CEO and founder of the team that brought you flight union AI is back with us again. We've also got George here. How you all doing? What's going on, Ktan? It's been a long time. This is round two now. What's happening?

Ketan Umare [00:03:54]: Yeah, I'm looking forward to it. Last time was a lot of fun. Something. This is gonna be better. So, yeah, thank you for having me.

Demetrios [00:04:03]: This is what I tell all the guests to say when we first start. So it brings up the anticipation and excitement. So this is cool, because, George, I know you've got a little forte in the dark arts of pipelines, and so we wanted to bring you on here. Caitan basically spends 24 hours a day thinking about pipelining, and pipelining for data engineering and pipelining for machine learning. Since you've been on here, Ktan, though, a lot has changed with union. I mean, you guys weren't even. Let's. Let's give a little context.

Demetrios [00:04:41]: Last time you were on here, you were still working at Lyft. Like, flight was an open source project that you were debating. I remember we finished the call, and I was like, you think you'll ever start a company around this? And you're like, I don't know. Maybe. It's kind of a lot of work. Fast forward to two years in the future. You've got your own company. You are just killing it.

Demetrios [00:05:05]: And so let's talk about what is going on and what has changed since we last chatted?

Ketan Umare [00:05:11]: I started a company, but I guess I'm reading a book, zero to one by Peter Thiel. And interestingly, I was just having a chapter where they were like, most individuals should not be starting a company because the power laws are against them. And which does make sense, right? It's like the chance of success through starting a company versus chance of success through going, you know, and working at a much more successful endeavor that exists already. That's good. So you do much better off working at a company. So the question is, why did I start a company? I have been working for about 15 plus years, and sadly, and I worked across different fields, high frequency trading, banking, map making, logistics, ride sharing and cloud. And across all of these, I've kind of done the while solving the real problems. I have been always had to solve some infrastructure related issue all the time.

Ketan Umare [00:06:30]: And that has always been around this, like, not really clearly defined as pipelines. Like, we only think about pipelines in the data engineering sense, but they exist everywhere. How do you think you get a Ec two machine when you request for what it goes through a freaking big pipeline? What happens when you order an Amazon order? It goes through a massive pipeline. And because pipelines are a very natural way for humans to think about, it's just, and not only that, they're great organizational tools to actually really allow you to scale folks independently. And so when I saw that and when I saw what's happening with machine learning, there's a big gap in the way we build software. Just traditionally think about databases. Let's take an example, MongoDB. If I tell you I just started a new company that builds a new database, it's good, but give me two years, it'll be better.

Ketan Umare [00:07:34]: And you probably will agree if I put enough resources, enough number of people with good experts, I can build a really good database given time. But let's take a machine learning model. 2019 December, and maybe DoorDash and Lyft trying to target the customers. And 2020 March. Both those models don't work anymore. And the reason why. Is it, is it something fundamentally changed within those companies? No, but what impacted was something outside of the control of the company was environmental, political factors. Right? Like the world was going crazy with the pandemic.

Ketan Umare [00:08:19]: And how do you react to this? You have to sometimes change your assumptions, concept drift, and, you know, we say model drift and so on. Sometimes you have to change the data, sometimes you have to change. And these changes have to happen rapidly. And you have to try new things and you have to kind of deliver this product very quickly. And what I said, at least in my opinion, could be my point of view, is that software and machine learning products, or data driven machine learning products are extremely kind of diametrically opposite. And if they are diametrically opposite, the existing set of infrastructure that we have for solving software tooling is just not well suited for machine learning. It's actually wrong because you've not taken this fundamental assumption that the things are going to change constantly, constant flux. And the moment that assumption kicks in, you're like, oh, hold on.

Ketan Umare [00:09:16]: Whatever we did for the last ten years is not going to work as is. We have to really rethink. And I was like, that excited me. That was like, that is a challenge. And somehow we are, we are very early. When we are very early. Ten years ago, if I asked you, would you need all these DevOps? You would be saying no. When you started AWS and you had all these on Prem, and I was there lots of.

Ketan Umare [00:09:42]: On Prem clusters, will you go to cloud? You'll say no. And the reason is because that transition period is the hardest, because you don't know which side is right and so on. And my opinion, we are in that transition period. And I was like, let me take that transition period. Let me take my own step at how this world could be shaped up for solving such a problem.

Demetrios [00:10:06]: Wait, so you said software and data infrastructure are diametrically opposed, right.

Ketan Umare [00:10:13]: Specifically when machine learning comes in.

Demetrios [00:10:15]: When it comes to machine learning, dive into that a little bit more because that's an interesting take. I like it.

Ketan Umare [00:10:22]: Yeah, it's like diametrically opposite is like an extreme statement. And I love to make. I made an extreme statement here, but what I'm saying is that the way, let's say the software evolved in a machine learning product has a different life cycle as compared to a traditional software. Right. Software is stateless. Very usually when you think about things like databases, services, getting somebody, like making a payment, pretty stateless, we know what we want to achieve. There are databases and transactions and distributed systems. Hard stuff.

Ketan Umare [00:11:04]: I'm nothing. Don't take me wrong. I'm not saying we've solved it completely, but we kind of know how to do this stuff. We've built a lot of experts in this field. We have done a lot of good stuff. But on the other hand, most companies are. Let's take a look at who are the companies who have actually delivered fantastic machine learning products to their customers. Google, maybe Facebook quickly tails off some Amazon and Microsoft and maybe a little bit of gen two companies, but most of the companies have not been able to do this in a very effective, efficient and repeatable way.

Ketan Umare [00:11:50]: And oftentimes, even if it is deployed, it's fragile, it gets broken. And that's because the nature, as I talked about, is different. We are constantly in a flood. I know teams, this is what they did. They would deploy a piece of machine learning product and then they would immediately say, okay, you've done this. Can we move to your new project that doesn't work. Move to a new project. You have to continue working on it, probably because immediately a pandemic would happen.

Demetrios [00:12:23]: Yeah.

Ketan Umare [00:12:23]: And all your models are screwed.

Demetrios [00:12:26]: Sorry, you got to babysit this bad boy.

Ketan Umare [00:12:29]: Exactly. And, and this is. And the way we have done data science today or machine learning being taught, this is not part of the curriculum. This is not how we do. We kind of jump from problem to problem. The, the fantasy of like, going, you know, just throwing data and solving a model, which maybe fine for some people, some people are builders and then some people have to maintain this stuff. So, so that. And how do you bring this as a philosophy, as a, as an infrastructure piece is what I'm interested in.

Ketan Umare [00:13:02]: And that helps people do this. Right.

Demetrios [00:13:04]: Amazing. So I want to jump in to the idea around pipelines. And I love this, how you talked about how there's pipelines. Everything basically is a pipeline. It's very easy for us humans to understand pipelines and to really wrap our heads around it. There is a pipelining tool or an orchestration tool that we all know and love airflow. Right. And I think everyone's played around with it.

Demetrios [00:13:38]: It has a very data engineering spin to it. And then there's what you all are doing at flight and you're really trying to take it to the next level specifically for machine learning. What I would love to just chat about for the next however long is where does airflow fall short? Why is like, is using two, like, flight and airflow together? Is that too many tools? Would you want to use them both? Should it just be flight? Like, what are your thoughts and your vision really, on how that looks as we move forward and we go from this transition from the on prem to cloud, but just in the machine learning world?

Ketan Umare [00:14:29]: Yeah, great question. You can imagine I have been debating this question for a while with a lot of people. There are people within the industry that think airflow should not exist. Some other tool exists, should exist. There are some people who just love airflow. I am politically right camp, but not really politically, right? More fundamentally, knowing the ground reality. So if I am a company that has existed for a few years, three years, I have potentially already have airflow running in the company, right? Maybe some couple other tools, but like, I potentially have airflow running already. Now the question should I migrate all the things I've done from airflow to like a new shiny tool and what do I get out of it? That's the reality of it.

Ketan Umare [00:15:41]: And in some cases, specifically with ethereal, you don't get too much. You will get some things, of course, but it's a new way of thinking. Redo some of the stuff. You have to rewrite the code and so on. If you just want to do exactly the same stuff that you were doing with airflow and just migrate over to a new thing, you probably are getting very little specifically if airflow is working, but you are definitely wanting to do more compute intensive stuff. You're wanting to build machine learning products. Airflow lacks over there. And here's like, I'll give you a bunch of examples, like, we will not do a full debate between what are the differences and so on, because that's kind of boring.

Ketan Umare [00:16:28]: But like, really, let's try and jump to the essence of like what we are trying to achieve, right? As I said, if, if we take one, that, that statement that I said, like, if we, if you agree that machine learning products are a constant flux, constant change and are different from software products, and I'm going to put ETL in the software product category, even though they are really not, but let's put it for a minute in that case, then you need tools that allow you to do quick iterations. It has to happen quick. It has to be, it cannot be like, oh, let me, you know, coordinate with the world and like get everybody to use Tensorflow 2.0 and then to go and deploy, it just doesn't work. That's not going to work. I need to give that power to the user. I have to push down the power to the user. That means they should be able to change whatever they feel like, try something new, get an output and potentially knock that output. It's not going to work.

Ketan Umare [00:17:28]: So try again and try again and try again, right. This modality is not supported by airflow is just, it's just the wrong tool for that. It was never done. Secondly, I think in the same scenario, when you're trying things out, certain things are good. So I don't want to redo those things again and again. And you need the ability to reuse results from the previous. Also, we've probably talked a lot about feature engineering and feature stores and all that. I have a slightly different take on this.

Ketan Umare [00:18:06]: Feature stores are very important in production, but feature stores have a lot of problems in the early, early days of adoption cycle. That's because you don't know what features you want, but somebody else may have done some features and that's where the feature store discovery and all that comes into play. But even those may not be materialized. Why should I materialize all the features of the world like I'm not going to use them today? So let's not materialize them, let's delay the materialization. But even then, discovery is required and, and you may want to reuse parts of like the feature generation pipelines, and that is very, very hard to do with, therefore, because of its intrinsic assumptions on context and so on. Another piece is like invocations of how the invocations happen and how different people can work. One of one thing that I'll tell you that happened within the teams pricing at Lyft was supply chain elasticity. Learning basically when how this is, this is how we control search pricing or we as lift control search pricing.

Ketan Umare [00:19:20]: And you need a bunch of models, you need to know what the current supply chain elasticity, what are the number of drivers, how do they move, can I incentivize them? A bunch of different ways of controlling the market, really bringing balance to the market. And there was a team of seven or eight people. They had about five models, 600 pipelines, all working together to build like, you know, there's like five models that actually impacted production at that scale. It's just madness. Things are getting reused, things are getting going around. Some pipelines are deprecated, some pipelines are new, and all of them exist. They cannot delete the code because they don't know if they will want to revive it again, some parts of it. And so this complexity, how do you handle this in airflow just doesn't work.

Ketan Umare [00:20:10]: It is one team, 600 pipelines, 80 other teams. You can imagine the complexity and multiply that at Spotify scale. It's just like whatever, 25,000 pipelines, thousand reports, thousand repos, GitHub repos having pipeline code in them and constantly changing. And just that's the reality of companies that are dealing with machine learning products and data products at scale, right? And this, and the problem is this quickly happens. You have no idea when this is going to happen. You start off with like, oh no, I don't need all of this. I need just one machine. And I'm good, this is all good.

Ketan Umare [00:20:53]: And if you have the optimism in the company, we should all be optimists in those, in that way that the company will improve. And immediately you'd have two repos and that two will become four. And it follows the power law, just what I mentioned. It escalates into hundreds of repositories. And again, that's where it's not really right suited.

George Pearse [00:21:15]: So in terms of the core abstractions, you implement what really differs from prefects. Airflow. Dag. So you've got your individual functions, I think, are the tasks in your world. And then you have your whole pipelines, something like jobs. It's the same set of abstractions, really, the same core structure that you pin everything to, or am I completely mistaken?

Ketan Umare [00:21:36]: It is the abstraction. One thing that we changed in the abstraction logic is that the abstract, the smallest component, is a task. As you said, a task in our case is like we derive ideas from function programming. So nothing's coming from our own. Right. Is a monad what? A monad means it exists on its own. That means you can run a task on its own and in flight, when you go, you register things, you can just go hit a button and you can run the task on its own. So like, let's say.

Ketan Umare [00:22:08]: And secondly, the task itself is version. So you have all the versions of how they have changed. So you can go back in time and you can run older ones, you have tracking on what inputs trigger for older ones and so on. And the last part is that they can be arbitrarily complex. Now this is a complex, weird way of saying it, but I'll give you an example. The typical way, when you look about, think about in airflow that you're running a job. Let's say you're running a python script, and in airflow the python script will run on the worker, right? You can run it on the worker. Now you'll say, oh, I want to run a spark job.

Ketan Umare [00:22:51]: It won't run on the worker. I have to. When I get control in that task, I kick off something in some place, EMR or whatever it is. Databricks lost all, tracking all this like data flow or whatever, you have to like kind of pass your context to this arbitrary system and make it work. And now you are doing sage maker training, or like, you are just triggering things externally. And that's one way of thinking. We think about it differently. We think that this task modality can be arbitrarily complex.

Ketan Umare [00:23:24]: That means when you get triggered, you may be simple python script that runs locally or whatever in one container. You could be a distributed spark job that runs on multiple containers, but within the same modality, the expressiveness, the code writing, everything is the same. You write Python code by Spark code and it becomes a spark cluster, right? And then the same code goes to sagemaker or databricks. And so the modality, and this is a very small difference in terms of it looks like it's ux, but it's actually core entrenched into the system. For this, the system had to be designed with specific points in place. And I'll give you an example. We wanted to give the capability of users to run arbitrary like their spark jobs on either databricks or EMR or Kubernetes. Kubernetes being free databricks and EMR, you have to pay in our sense, assuming you have a Kubernetes cluster, it's free to run in there.

Ketan Umare [00:24:31]: So we wanted to give that capability and doing that just by this baton passing is very very hard, specifically if they want to write spark code directly within their consumption pattern. And we also wanted like if you write a pandas data frame, you should just get it as a sparkly. It's a really complex idea. I'm curious and sorry, I'm really curious.

George Pearse [00:24:57]: Have you thought about integrating more heavily with machine learning specific packages? So I was thinking the other day, something would be quite nice to be able to do is to set up a project upfront from some hyper parameter tuning, something like Optuna, and then you could set your success criteria for each step and then spin off a new run for the sort of next thing you might want to do. So if this works, try something else, etcetera. And typical data engineering sort of workflow orchestration tools don't do this, but you make your tools specific for machine learning workflows. I just wondered if that's on your roadmap potentially.

Ketan Umare [00:25:31]: Yeah, we are doing so already. Like for example, if you write Pytorch or scikit learner keras packages, you get auto checkpointing. Of course you have to import one thing, but checkpointing is very key. It's like a very edit. If you think about pipelines, pipelines are check pointing, but you cannot checkpoint within a task unless you know that the task is running a training task. And training task cannot really be easily split into multiple workflow steps. It just doesn't work correctly. It's not performant, it's not the right abstraction.

Ketan Umare [00:26:09]: So yeah, we support intra task checkpointing without the users having to know, and if you do retries within the retries, we'll automatically pass the checkpoint between the retries. Similarly, when you actually output a model, we understand the model and we understand the context where you output it. So for example, if you're on a GPU and you output the model, now you want to store the weights in a device invariant way so that when you actually load it on a cpu, it should just work without you having to think, oh, let me do on CPU, on GPU or whatever. Yeah. So these are kind of some of the features, including if you output a TF tensor, the right way is not pickling TF tensor, the right way is to actually convert it into its product representation and serializing and sending it to the next part or whatever if you want to store it. These are interest intrinsic assumptions that the system knows about. Right. And the way we have layered it is that it fully extends.

Ketan Umare [00:27:07]: All of these things can keep on extending for forever as things change and evolve within the machine learning ecosystem. So, yeah, so I was not going to go into specific features, but thank you for pushing me in that direction.

George Pearse [00:27:18]: No, nice.

Ketan Umare [00:27:18]: Yeah, yeah, there are, and there are people who are, we are actually working with ray team that's coming out too. So array on flight, essentially, because flight is an orchestrator, it orchestrates your, think about it as a meta layer on kubernetes. It allows you to run random scheduling on kubernetes and managing. One of the reasons why, because we are working with teams from Spotify, Shopify, et cetera, figuring out, okay, I want to use Ray, but who brings up the ray cluster? The user just wants to use ray. Can you manage flight brings up the ray cluster, and if you're doing multiple steps, it actually, initially we are doing single step is one ray cluster, but across multiple steps, we're going to reuse the same ray cluster within the lifespan of a workflow. And these are things because you know that Ray has a plasma store, it can reuse the data that it has stored. So all of these are intrinsic assumptions that you need for machine learning, and that's what get baked into flight.

George Pearse [00:28:25]: So who do you think are your kind of core target users? Is it kind of enterprise level companies where they might have a machine learning specific workflow orchestration tool and then a separate kind of tool design for more analytics focused functionalities? Or do you think it can work for across the spectrum of different sizes?

Ketan Umare [00:28:41]: It can work for both. That, absolutely. For example, Spotify uses it across both their stacks. At Lyft. Many, many were analytical workloads. And that being said, we still use airflow. Some of our legacy stuff was on airflow at Lyft. Many, many workloads run out of, including from the analytical side, some even from the ops side.

Ketan Umare [00:29:02]: For example, what we did is we built an airflow to flight operator. Flight has an API, always was API first. Five years ago, it's today API first. So we built an operator that just goes and triggers into flight, just like as an AWS service that's open source, now it's available. We delayed in open sourcing, and we think we want to do that with everybody. If prefect would be open to it, we would love to work with them. If Daxter would be open to it, we would love to work with them. There used to be a daxter on flight, also execution, but I don't know where it's gone.

George Pearse [00:29:44]: Sorry, can you just double down on that? So you're integrating with the other workflow orchestration tools?

Ketan Umare [00:29:52]: Yeah, if they want to hand off the more complex compute intensive workloads to flight, that absolutely right. We should work. So if your organization, like, is using simple, like some simple workloads and they are not compute intensive, they're just simply, like, not running queries externally and you're happy with that, so continue using that. And then you get two examples where you want to run distributed training or some kind of GPU based workloads and you want to use, like, all the versioning and all the privileges you can adopt flight just for that workload, and they should work seamlessly with your existing workload. And I don't think it's necessary that you have to redo all your work at the moment, at least. And if, then if you decide that, oh, I want to kind of go to one, that could be your choice.

Demetrios [00:30:49]: So that brings up a great point that I was getting at, because it seems like you mentioned earlier, things get messy quick, you don't even realize it, and all of a sudden it's like, oh, my God, we got 25,000 pipelines. We don't know how to debug these pipelines. We don't know who's dependent on what. And when you're building these and say you're starting at that foundational level, and then things start to get more and more, and all of a sudden you're needing these gpu's and you're needing a bit more compute intensive and you're talking about, well, airflow is great if you're just starting out and it's great for that base layer and then when you want to start getting, if you want to hand off some of this more compute, heavy stuff to flight, you can, what does the evolution in your eyes look like? Because you mentioned, well, you can go fully over to flight, you can just keep some of them in airflow. But then I feel like you still, if you trying to mix and match, like the tasks in flight and then the airflow. Airflow like pipelines, you're still going to run into that problem of things being a mess. And you can't really debug that easily, right.

Ketan Umare [00:32:02]: If you are already using airflow, you're used to handing off work to some other system that's not new. That is, that's how you do it in airflow. That's probably what you should do with airflow. If you do start doing computer intensive stuff within your airflow cluster, you, I don't have to tell you guys that you probably will not end up in the right. And so I don't think that increases complexity by handing over work to another system that handles that complexity better. And for the parts that does increase complexity, you could just, you know, you could use, for example, like completely for those parts. But, but if you have ETL workloads that are running on airflow. Yeah.

Ketan Umare [00:32:48]: At the end of my ETL pipeline, I want to run one training cycle. Boom. Just add one operator and it's like calling Sagemaker or calling any other system. That's how it should be, like in the simplistic way, from an airflow point of view. And so, for example, at Lyft, all the spark jobs used to run on flight. So we actually had this airflow, spark operator. For Spark, it would just hand it over to flight. And does it make a difference if it gives you an API that says, start a job, wait for a job like pol.

Ketan Umare [00:33:21]: And those two APIs is what you need with airframe. So it's pretty cool. And those are available today for pipelines, for tasks, for complex pipelines, all of that, like within flight. And so you can easily hand it over. And I don't think it should look any different than batch or whatever other things you want to use.

Demetrios [00:33:41]: Perfect. I thought George might have another one. He wanted to jump in, but I was getting excited.

George Pearse [00:33:48]: I've seen that you've done some nice work just very recently in terms of better visualization of the sort of thing you'd be looking at in a data science pipeline. So supportful markdown, rendering, various box plots and, and so on. But I keep being frustrated by all these projects that get like 90% of the way there. But as soon as you get to the point you can just render HTML, you're into a lovely space of put some little bit of, run a bit of a Jupyter notebook, export it to HTML, then whatever bit of python, whatever python package you want to make use of, you're free to go. And then you're in the perfect sort of Lego block scenario of I can take my favorites from each part of the different bits of data engineering and data science and put them together as I wish. I was just wondering, is that something you'd view to do on the line? Is it just harder technically than I think it is? Why is that not on the way yet?

Ketan Umare [00:34:35]: Which part? So we do have visualization. Sorry, I didn't catch that part. Which part is missing?

George Pearse [00:34:41]: So as far as I can tell, so far it's marked down and some specific breakdowns of data frame statistics and so on. But not to the full HTML rendering.

Ketan Umare [00:34:54]: It is full HTML rendering, actually, we've just added like our team only added a few in the beginning. It's open to the community to add a bunch. This is where like adding a lot of these, like small to HTML, it takes some time, right? Like, so there are. And the thing is, I have to convince a lot of people to open source the work that they have done on this stuff, right? It's like this part of this foundation is completely open, so people can do whatever they feel like and not open source server, right? It does not like, I know there are vortex AI integrations that are running in some companies that are, there are integrations like people have done pandas profiling, which we are HTO, which we also did, but like some other stuff, right? So it gives you a canvas, it gives you a HTML canvas. That's what it gives you. How you come up with that HTML is like there's a simple plugin that you write boom, and you should be able to generate it. We are slowly and surely adding as we can within the core team in union, but we are still a tiny team and we are doing like 15 other things. Like we are building CD for ML, we are doing external signal approval workflows.

Ketan Umare [00:36:12]: We think if you want to do, let's say some sort of amazing, like multi arm bandits or even a simple time split experiment, how should you do that? You should be able to deploy, wait for something, wait for a signal, and then go to the next deployment and go to the next deployment and so on. And we want to code all of this up within this one framework, we just want this to be the toolkit that data scientists know and learn. I love.

George Pearse [00:36:39]: So can we just double down on that? You're saying using flight for RL deployment, because that sounds like a whole, like, very specific technical set of problems and very interesting ones. So what do you have down?

Ketan Umare [00:36:52]: Not just RL. Like, let's, you know, keep RL for a minute aside, but even for doing simple models, like, what happens usually is you have something running in production and you want a new model to be a b tested it. And the way you do it is, let's say you want to deploy it to some set of machines and you want to say, okay, 10% of the traffic should go here, or you want to do that. Traffic shaping happens outside of flight. That doesn't happen in flight. Like, assuming you are using something like envoy or selden or like a bunch of different things that does that shaping, but you need things to control that. You need, like the brain that's saying that, oh, let's push that out. And you want one place where you go and see all of them happening.

Ketan Umare [00:37:33]: So we are building that in flight and the way we are building it, more general purpose. So, for example, we think about it like signals. So, for example, you run a pipeline. The pipeline enters into a state where it's like, okay, I'm waiting for output signal. Input signal from an external system. It could be completely automated. So let's say you're doing data drift detection or model drift, or. Or like some sort of drift detection based on the outputs, or like maybe even doing the output monitoring.

Ketan Umare [00:38:04]: Maybe a ground truth comes in really quickly. You want to check if your new model is performing correctly. You could send a signal saying a bar or rollback or move forward to 50%. And so our goal is one day this should be fully automated, just like we do with DevOps. Right, still far away. So we are building the base infrastructure to power such a thing. Similarly, that same concept could be used for labeling. Same concept could be used for, like, a bunch of different things, like semi supervised labeling.

Ketan Umare [00:38:34]: And so we're building that into the core layer.

Demetrios [00:38:36]: Yeah, I didn't really get that. The signals. Can you explain that again? Because that sounds really cool, but I'm not sure I fully wrap my head around it.

Ketan Umare [00:38:44]: So let's take that as a workflow, as we call workflows in flight. But let's say it's a pipeline, right? And in the pipeline, you want to split your deployment into three steps. So you want to say 25% if things go well, 50%, if things go well, 100%. Very simple scenario. How do you do that? How do you tell the system that things are going well? There's no way to externally interact with the pipeline that's in progress. So the way you do it is you'll deploy three pipelines. One pipeline that does 25%, then you manually trigger another pipeline that does the 50%. And who passes the data in between that context has to be passed and transferred.

Ketan Umare [00:39:33]: And this is complexity increasing. Right. And what we think about this is, yeah, do 25% and then say, I want to expect a signal. Now the signal could be a person coming and hitting a button saying, I approve it to go to 50%. Perfectly normal, goes to 50%. Another, again, it goes and waits for a signal. And now a person comes in and says, 5100 percent goes to 100%. Another option.

Ketan Umare [00:40:02]: And this is where I'm saying the future, future world, fully automated. You get data metrics out of this. You see, not always ground truth is available immediately, but let's assume in this case it is. You look at it, you're like, oh, my accuracy has improved. It's actually improving in production with this model. Go to 50% automatically. Oh, accuracy is degraded, go back to 25%, things like that. You could code these complex ideas in there and, and, and have experiments happening in production all the time, right through your same workbench that you started trying out with.

Ketan Umare [00:40:41]: That's our goal. So this starts like deviating right from the typical pipelining stuff that people have taken. Last charge.

Demetrios [00:40:52]: There he is.

George Pearse [00:40:53]: Always the crucial moments. I'm going to. I'm just going to change Wi Fi provider. It's actually worth it at this point.

Demetrios [00:40:59]: Yeah.

George Pearse [00:41:00]: So with that human in the loop sort of process. So this is something that really, really interests me. Normally there's a hacky workaround. You can kind of get it working by like dropping a file in s three at the right time because you have some sense of waiting for it, or you have to create your own separate Ui. Is that sort of natively supported in flight?

Ketan Umare [00:41:17]: It's not yet. It's coming in the next month. If you join our. I definitely recommend people to join the by weekly sync. This is all in the next foundation, OpenSync. Yeah, we'll be showing the UX marks for this. So even a UI will be supported for all of this natively within the pipeline stuff. But of course, everything and flight, as always, runs through an API.

Ketan Umare [00:41:40]: So you can build your own amazing uis and power it however you want to power this externally so, like, I think the way we have been talking to a few companies, I can't name all the names of these, but one company, for example, Spotify, I can name. So Spotify wants to use this for their royalties payouts, right? Like this. That's absolutely okay. Right. It doesn't need to be purely machine learning deployment. That was an example. But in this case, they want to see the price and then hit a button and boom, all the payouts happen and things like that. We would love to help companies power semi supervised learning.

Demetrios [00:42:21]: Yeah, that's something that I think about a lot too. And I know many people are doing, like, I've seen and interviewed people who are doing PhDs on this stuff, and it's just like, where does the human touch the loop? Like, what part of the loop do you put the human in? Right. And if you're going to automate it, how are you making sure that you're automating it in the way that is going to be the most beneficial and it fully understands. And so when you're putting something like this together with flight, have you either a seen people? Because I imagine everyone that's using a is giving different signals. We could say, or saying, this is the signal that we think is the best here. Have you seen some really cool examples of these signals being used? Or do you feel like there is a best practice already put into place besides the one that you mentioned before where it's just like, if we see the 25% looks good, then someone just clicks a button and it goes to 50% to 75%.

Ketan Umare [00:43:24]: That's where most people are at today, manual intervention and or timed intervention. That means wait for five days. Those are core intrinsic. But why build the platform in a way that, that doesn't allow the automation in the future? Right. It's like what comes first? Sometimes it's, does it, does the infrastructure come first or does the solution come first? Usually the solution doesn't. And we've seen this, right, automated in multiple ways in DevOps. So why should that not be done in this? And that's where we think we are. We are building the infrastructure, we are going to open it up.

Ketan Umare [00:44:12]: The API is open. You can try multiple and within union when that's where we started, union Mljdev, to kind of codify many of these practices into union ML as a higher level, on top of flight, where we are working with partners, we are working with all the folks who do monitoring and so on. And our goal is we could build a way to do this automated just one click, go for it constantly. Every ten minutes, something is getting trained and automatic. And we used to do that actually, every ten minutes. We used to deploy a new model to production, which is kind of like science fiction, I feel now for many people, we did it. We did it with the same practices. We put.

Ketan Umare [00:44:50]: We had a lot of checks. All those checks would pass and we'll get alert. They did not pass and the model would not go to production. So, yeah, we tests and all kinds of tests to figure out if things are okay or no. Again, this is 2016, so we did that in 2016.

Demetrios [00:45:11]: Wow. Yeah. So you also mentioned something really interesting that I would like to touch on, which is union versus flight. And what does that look like? I know people probably come into and are very familiar with flight, but now union has launched. What exactly is union? It's on top of flight. What are the benefits? Why would I stop using flight and use union? Is it just a managed service on top of it? What does it look like?

Ketan Umare [00:45:41]: Yeah, don't stop using flight if you're using it. We and I, you know, this is a hard reality and we. I treat open source as a sanct space, and this is. This is how it should be done. When I'm talking about open source, I should not be talking about union, and I will not. Right. So now I'm putting my union hat on as a CEO.

Demetrios [00:46:05]: Like fiduciary from first.

Ketan Umare [00:46:07]: Yeah, exactly.

Demetrios [00:46:08]: Like a tax advisor or whatever you gotta say. Okay, now I'm putting on my financial.

Ketan Umare [00:46:14]: These lines get blurred in many other places. These lines get blurred and it's kind of, like, confusing, and I don't want those lines to be blurred. Union exists in two modes. Right? Today, today, union, number one, is the top contributor to flight. We don't own the trademark to flight. We don't own the rights to flight. We don't own anything. Flight is a standalone entity.

Ketan Umare [00:46:40]: We assert dominance, really, through our contributions. We just lead the way by contributing the most amount of code to flight. Like, all these features are all getting built into flight. We are not building in any union. These are all for flight. And yes, anybody who else could take advantage? I know there are companies, there are at least three to four companies that are built vertically on top of light, and they are valued more than union today. So.

Demetrios [00:47:11]: Oh, no. No way.

Ketan Umare [00:47:16]: But that's like, you know, that's how outstanding products are built. I feel like they should be built in a way that they like. There are companies getting value from it, which is just amazing. It's great for them. It's great for the community. So that's, number one, role of union. Number two role of union is that we've, over the years, we've, like, over the year, rather, we've been. People have tried to launch.

Ketan Umare [00:47:41]: In fact, even today is not that easy to get started with. And that saddens me because of the complexities of people's deployment infrastructure and ingresses and like security and the right ways of doing security and like, things like that. And those are very, very crucial and key things that people sometimes overlook. And that's where union is. Second motive comes in. We want people to be very efficient and effective in utilizing flight, get there quickly. And the third final motive where union comes in is we actually, as I said, we are trying to codify the best practices to deliver a fantastic machine learning orchestration and an mlops solution on top. Machine learning orchestration is a superset of mlops.

Ketan Umare [00:48:32]: It's required to achieve mlops, but it's not mlops. That's where the confusion lies with many people. But we are building a bunch of mlops practices directly. Like if you use union ML, for example. And then what we are saying is, there's a big problem. We talk about feature serving skew. There's a big problem. The feature transforms and model prediction code in batch.

Ketan Umare [00:49:00]: So when you're training, it's usually batch, right. It's working on data set, and that code is different. And then once you go to production, the prediction code is completely different many times. And the small differences cause a lot of problem. And we wanted to eliminate that. We just want to get rid of it. And so how do you do that? There's no way but to create, like, a best practices experience. And we think good intentions are not enough.

Ketan Umare [00:49:26]: So we are writing all of it in code. And so that's what union is. That's the third practice. And over a period, we are building layers within union. It will take time, and we think we, that's how we'll expand the market from folks who are comfortable with flyte, which is engineers and data scientists and machine learning engineers who are okay with coding and know a lot about git and like Python and Java and scala and all of these things, whatever their coding technology. But we want to expand it to, you know, more allied engineers and I, data scientists who do not really care about git and so on. And for that, that's where union exists. That's the third mission.

Demetrios [00:50:13]: Oh, that's awesome. And that makes a lot of sense. I mean, the data scientists who probably are getting their feet wet with airflow, then want something that is going to be more up their alley, that goes into the machine learning. And if you have something that is easy to plug in, like you said, I think that's a lot of, a lot of this right now, and a lot of companies in this space are trying to crack that nut. How can we make this easy and lighter weight? Because when you start dealing with infrastructure, especially legacy infrastructure, and when you start dealing with data, and there's the security issues around data, there's the PII around data, there's just so many variables and so many dangers and security risks, or just risks in general, you really have to be take the necessary precautions and so it makes complete sense.

Ketan Umare [00:51:09]: Yeah. And that's how we are, like the stewards, I'll tell you, because the CVE is now open, we had a security vulnerability in flight about two and a half, three months ago. We patched it in 4 hours and we deployed it to all 35 of the top companies that are using flight within two days. This is not possible in a purely open source thing. It wouldn't be possible without having funding in union. Right. So that's where we think our role is. We are stewards of this open source.

Ketan Umare [00:51:44]: We, we have the, we have to help all our users make sure that they are secure and in turn their customers are secure. That's our duty.

George Pearse [00:51:56]: I've started to think about, like, lots of my work has just been evaluating different tools in exactly this sort of space. And I think of things in terms of the, like, time to approve a concept. At what point can I just show this to somebody and say, like, this is what it could be used for and how long from that point onwards does it take for the adoption to actually be worth that, like, learning curve? So say you're going from airflow to dags or airflow to flight, or something similar to that. How many is it, weeks? Is it months? At what point does that transition become worth it? Even if the second tool is definitely better, does it take a quarter to hit that point? And like, who knows what the ecosystem will look like when, when we get there? But it sounds like you're on top of both. Both those parts of the process.

Ketan Umare [00:52:37]: This is where union cloud comes in today. Number one is to speed up that. Like, if you are. Again, this might sound like a sales speech, and I want to say that if I'm wearing a flight hat, that's not what this is. But if you are trying out and you want to go to it quickly, just don't have to set up, go to union cloud, it won't click really. And the data still resides in your own VPC. So that way the data gravity is not lost. We take care of the hard lifting, getting things up and running and so on, which is infrastructure stuff.

Ketan Umare [00:53:17]: And we think, for example, one of our users is using in three different clouds their data planes controlled by the same one flight, one union cloud flight, plus other stuff. Right? And they don't have to think about another user using on prem and cloud like and migrating their stuff live. And this is happening. And so, yeah, so that's how we build the system is essentially to make it really, really easy for you to start and achieve your goal. Right. Your goal is also to move, let's say, to a cloud native system. And those are the places where we are right fit. If you are happy with like running things on one or two machines, which many companies are, we probably are not the right fit with us.

Ketan Umare [00:54:07]: You have to use kubernetes or you have to want to use kubernetes. And we've been working with large companies, like large companies are in the transition phase to kubernetes. So this is an early time for them and they are partnering with us as they are transitioning and we help them still, whether they use us or no, we help them. We try to tell them what could be the right practices and how could you get there. So yeah, we are essentially an infrastructure partner.

Demetrios [00:54:37]: I mean, speaking of kubernetes, and let's probably go back to your flight hat, not the union hat, but everything about union sound great. It didn't sound like a sales pitch to me. So that was two thumbs up. I think it's awesome. And the value props there. It's very true. And like what George said is so crucial for an engineer to bring on a SaaS product and get value out of that. How fast can you get value out of that? And how quickly are you able to then transition to doing more? Impacting work is so important for anyone to do a company.

Demetrios [00:55:16]: Right. Like whatever you're doing, that and the ability to write well and like explain the way that you're doing things and impact other engineers. Those are two life hacks that a really smart friend of mine told me and I had never thought about it that way. And it's just like boom. That gets you really valuable. You've just become like so valuable to the company right away.

Ketan Umare [00:55:44]: Yeah, yeah. People today in the open source come and they sometimes struggle with like setting up an ingress or whatever and we help them. Even in that case we help them. But you know, it's a sincere request to them. Maybe start off with the cloud. If you want to migrate, migrate to the open source, it's okay, but get an impact out. The more quickly you get an impact out, the more you're able to push that much more within the organization and your respect and your value becomes higher. And so it's not about running infrastructure, it's about creating impact for your users.

Demetrios [00:56:19]: That's it. So I got to ask, man, we did like the last two podcasts and George was on one a week ago and then we also had one with this guy Ryan, and it was all about, we talked about Kubeflow and Kubeflow pipelines. And then we also talked, last time George was on, we talked about ML Flow. And since that recording ML Flow came out with pipelines, how are you looking at these as opposed to what Flyte is doing? I mean, I know Kubeflow pipelines is very kubernetes native. It's also got its pains, it's got its downfalls. But are you feeling like you're not even in the same ballpark? Like what you're doing is very distinguished from cubeflow pipelines?

Ketan Umare [00:57:06]: Great question. I have to think, because it's great that Kubeflow pipeline is creating the awareness and explaining the need for a system like this. Sadly, probably I'm going to regret this, saying this, but they are doing it not in a way that's very user friendly and user focused. And that causes people to potentially go away from the system. We, we've taken like, so there are again, always, you can start with first principles and take a radically different approach. That's what we decided to do. We decided like what if you have a blank slate and if you have kubernetes as a thing, right? That's a reality. It's becoming more and more and more reality.

Ketan Umare [00:58:02]: Like it's getting penetrated hugely. Then how would you do it differently? That's what we did and that's how we have written the software instead of tacking on stuff onto an existing piece of software. Like for example, when you're running a sagemaker job, should you be running a pod to start a sagemaker job? I don't think so. That's like if you have run this at scale, too many parts is not great for kubernetes, like a very, very high, especially fast generator. And that part is not doing anything, it's expensive. What is that part now? Can you put that part on a spot machine? You cannot because you're tracking something else. So all of this is just not the right way of building it. We've built.

Ketan Umare [00:58:50]: So if you run a sagemaker job within flight, for example, it doesn't run a part, it runs directly from the engine. If you're on a snowflake query, which may run for days or hours or minutes or whatever, if you run a big query, if you run an Athena query, whatever, they all get run directly from the core engine, which means the extremely lightweight you won't pay a single penny for running a snowflake to EC two should not be paying. And we know theoretically, 0.2 CPU's is okay. In Kubernetes, it doesn't really work. Like, we've run 40 million containers to 100 million containers within the flight trust itself. And at that scale, we know what all things break. And those we have, like we've learned from them and codified those principles into the system. And so we know it scales, we know it reliably scales to a very large number and you won't have to redo, but it comes at some penalty and we constantly keep on improving that.

Ketan Umare [00:59:57]: So we've added concepts of backend plugins and what that also does is improve the user experience. So when you're running a sagebreaker job, and I would highly recommend look at Kubernetes Spark on flight. Spark on flight and Sagemaker on flight or MPI on flight. It doesn't look like you are like you're doing the airflow like thing where like calling something else. It looks like you're just writing your code, which runs locally. If you don't have MPI, it runs on one machine, runs locally, and now you say, oh, I want four workers. It will get the four workers and create that API connection between them so that they can all work together. And that's how it should be.

Ketan Umare [01:00:35]: It should be complexity as you need it. And for example, Spark too should be the same way. I think. I know so many people use Spark where they don't have the right data sets for it. They use Spark just for fun. We have 10gb of data sets and using Spark and I'm like, don't use it, it's too much complexity. But make your system so that you can scale to it. So write it on one node, run spark on the one node local mode, and if you want to scale, add one thing and that becomes hundreds of nodes.

Ketan Umare [01:01:06]: And it has worked fine. Yes.

George Pearse [01:01:11]: How have you been finding your current work going solo compared to the days back within bigger companies?

Ketan Umare [01:01:18]: Oh.

Demetrios [01:01:21]: Problem sets now, huh?

Ketan Umare [01:01:24]: Yeah, like constantly focused on one problem. Sometimes you get into your echo chamber and I try not to do that. I try to be realistic. I ask hard questions to the team, but I, there's a lot of fits, be honest with the current economic situation and so on. But it's, it's, it's fun. It's like it's what I would do if given a choice. Again, if I also, Demetrius knows I had a baby last year. Like it's eleven months and I had a baby before the pandemic.

Ketan Umare [01:02:02]: So three or three year old and. Eleven month old and a company. So a lot of interesting challenges over there. But I would still not make a different choice. I would this, I truly believe in this problem and I think whether it's me or not, somebody's going to solve this problem and why don't we take a crack at it?

George Pearse [01:02:26]: I actually think it's very noticeable. Even when a certain open source package doesn't come to dominate, or maybe it does. Often other packages learn from the API decision choices that they made in very, very constructive ways. You see that various orchestration engines have sort of consolidation, roughly the same design behind the scenes. It might be completely different, but they all have the same kind of style of abstraction that makes it very easy to move from one to the other. It feels like the whole ecosystem grows together and learns from the mistakes and successes of the other packages. So you could constantly, one way or another.

Ketan Umare [01:02:55]: Yeah, we learned from airflow, to be honest, and I've said this in the past too, I think airflow got one thing right, which is to connect to other pieces those operators idea. That idea is key is crucial. So it's not a workflow engine. So some people ask, is it a workplace? I'm like, it's not. That's not the goal of it. The goal of it is to let you orchestrate different things. Now, yes, you can do it through workflows, that's one thing. But if you use a typical workflow engine which is off the shelf like ZB and couple others, you have to write all of this on your own.

Ketan Umare [01:03:29]: And the user experience for that is not going to be the same. It's going to be like a launch something, then write another state that does something as and have a failure state. It's going to be magnet. We want to take all of that complexity away. So yeah, we learned that from airflow, to be honest. And yeah, but we also feel some of our things have inspired, like I've told you this, the past we have had chats with the Kubeflow pipeline before. They think they existed because flight existed well before that. I think our API in Python is pretty, is kind of.

Ketan Umare [01:04:02]: I don't want to say this, but like a lot of people are getting inspired by that. But we are the only probably within this space that allows multi lang and people have not seen what can be done with it. But I am excited. Hopefully within this year you'll see something really, really fun. Like you can write Java, Scala, python and just kind of like work together.

Demetrios [01:04:22]: Dude, so awesome to hear all of this. I'm excited for what you're doing. I always have been a fan of flight. I also love that we did the hackathon. If anyone has not seen the results from the hackathon that we did with flight, you can check them out in the video. We will put in the description because there were some really cool projects that came through and we ended up flying the winners out to the Toronto Machine Learning Mlops world Conference. That was fun. Got to meet a bunch of the participants.

Demetrios [01:04:57]: We got to do that. Ktan, I want to finish up with a little bit of a lightning round if that's cool with you.

Ketan Umare [01:05:03]: Yes, let's do that.

Demetrios [01:05:07]: Some people, it's a lightning round. That is not so lightning. So, you know, we like to joke. It's like I ask you these deep questions and expect a three word answer. All right, here we go.

Ketan Umare [01:05:18]: The last book you read, I'm reading zero to one.

Demetrios [01:05:23]: That's right. You're reading zero to one. You got to watch out with quoting Peter Thiel because he's gone off the rails. But it's a great book.

Ketan Umare [01:05:33]: It's a nice book. And I'm reading art of action.

Demetrios [01:05:35]: Oh, what's that one?

Ketan Umare [01:05:40]: Another controversial topic, potentially, but really interesting. It's like, you know this guy who learned how to use old school military, like 200 year old, 100 year old military techniques and apply them to companies to really, like, streamline operations and make the company really achieve its goals. Right. Make the. That's why the art of action, it's pretty nice. I like it so far.

Demetrios [01:06:09]: So if we ask Nils or anybody at flight, you're not a dictator, are you?

Ketan Umare [01:06:17]: They will. They will quite on the other side say, like, I am, I am an operator. I still write code. I love writing code. Sometimes I delay the code because of the other thing. But I am an operator. I think the best way to lead is need, you know, with thousand cuts in the front.

Demetrios [01:06:37]: Excellent. All right, next question I've got for you, what was the last bug that you smashed?

Ketan Umare [01:06:46]: Let me see. Oh yeah, last bug. I don't remember. I remember the security bug that like was two and a half months ago. Yeah, that needed the code. I was like, what? What a hole. And it was amazing. It was like I was able to reproduce it and was able to get a secret out of the system without really, without being a big hacker or anything.

Ketan Umare [01:07:12]: So I was like, oh my God, this is dangerous. And then we were able to patch it within 4 hours.

Demetrios [01:07:17]: So what was big learning from that?

Ketan Umare [01:07:19]: Oh my God. Security bugs are the scariest, that's one. And second, they existed everywhere. So be careful when you use software, open source software, that's one thing. Also, open source software needs tools that help. Even Linux needs people to constantly fix and really push that stuff through. And we are thankful to all the companies who contribute to meaningful open source and let companies be safe.

Demetrios [01:07:50]: And that is so true. So what's a piece of software that you are bullish on that might surprise people?

Ketan Umare [01:08:05]: Web assembly. Oh yeah, yeah. I think I am enthused by it. And probably my team will tell you. I think there are ways we can do magic in the next couple of years. I've been diving deep into it.

Demetrios [01:08:24]: Interesting.

Ketan Umare [01:08:24]: But really even within the ML and data world, it's going to change what we do.

Demetrios [01:08:31]: Interesting. Oh, I like that. Prophesizing. Next we'll have you on. Next time we have you on, we'll have you just talk about that for the whole hour.

Ketan Umare [01:08:39]: Yeah, when it fails, probably that I look back and say how wrong I was. But I thought you've seen the python in the browser.

Demetrios [01:08:51]: What is it, Pytorch Lightning or no py lightning?

Ketan Umare [01:08:53]: No, Anaconda.

Demetrios [01:08:55]: Yeah, that's pyscript and that's true. Yeah, yeah.

Ketan Umare [01:09:01]: That runs on the webassembly stuff, right? The core, so.

Demetrios [01:09:05]: Ah, okay.

Ketan Umare [01:09:07]: It's fun stuff. It's like, it's really, the properties are really amazing.

Demetrios [01:09:13]: Last one for you, man. How do you want to be remembered?

Ketan Umare [01:09:22]: I wanted to be remembered by my kids first that as a loving father, that's number one. My wife, and then in the world as an engineer who was not afraid to try something really, uh, you know, the first principle, but not just going with the stream, kind of finding my own path. That would be, that would be wonderful.

Demetrios [01:09:51]: So.

Ketan Umare [01:09:51]: And yet having the right path in the end would be amazing.

Demetrios [01:09:54]: Ktan the rebel I like it, man. I appreciate you, dude. I appreciate you coming on here. This is awesome. I love talking to you and I look forward to our web assembly chat that we're going to have and absolutely for the next one. Yeah, this has been great.

Ketan Umare [01:10:12]: That's going to be fun. So, yeah, hopefully before that, we can have some other chats. We can dive deeper into some other fun stuff we are doing, so.

Demetrios [01:10:21]: Oh, for sure, dude. So, everyone, if you've stayed with us this long and you have not checked out union yet, I'm going to put on Ktan's sales hat for a moment and tell you to go check it out. We'll leave the description in the link below. We appreciate everything that you're doing for the mlops community, Caitan. That is another thing that I will say. Thank you. And the union team, it's awesome to have you all as sponsors, and it's great to see the work you're doing. So that's it.

Demetrios [01:10:48]: George, you got any last words?

George Pearse [01:10:49]: Man, just thanks so much for the chance to ask some program questions. You've answered them all very well. I have more. I'm going to check it out more thoroughly. It might be. Might be worth the adoption cost.

Demetrios [01:10:59]: Yeah, we go.

Ketan Umare [01:11:00]: Thank you. All right.

Demetrios [01:11:02]: There we go. Sweet. See you all later.

+ Read More

Watch More

Model Performance Monitoring and Why You Need it Yesterday
Posted Jun 01, 2021 | Views 629
# Machine Learning
# ML Monitoring
# Fiddler AI
# Fiddler.ai