Sign in or Join the community to continue

Evolving Workflow Orchestration

Posted Feb 14, 2025 | Views 315

# Workflow

# Orchestration

# DAG

Share

speakers

Alex Miłowski

Entrepreneur and Computer Scientist @ Self

Dr. Milowski is a serial entrepreneur and computer scientist with experience in a variety of data and machine learning technologies. He holds a PhD in Informatics (Computer Science) from the University of Edinburgh, where he researched large-scale computation over scientific data. Over the years, he's spent many years working on various aspects of workflow orchestration in industry, standardization, and in research.

+ Read More

Demetrios Brinkmann

Chief Happiness Engineer @ MLOps Community

At the moment Demetrios is immersing himself in Machine Learning by interviewing experts from around the world in the weekly MLOps.community meetups. Demetrios is constantly learning and engaging in new activities to get uncomfortable and learn from his mistakes. He tries to bring creativity into every aspect of his life, whether that be analyzing the best paths forward, overcoming obstacles, or building lego houses with his daughter.

+ Read More

SUMMARY

There seems to be a shift from workflow languages to code - mostly annotation pythons - happening and getting us. It is a symptom of how complex workflow orchestration has gotten. Is it a dominant trend or will we cycle back to “DAG specifications”? At Stitchfix, we had our own DSL that “compiled” into airflow DAGs and at MicroByre, we used a external workflow langauge. Both had a batch task executor on K8s but at MicroByre, we had human and robot in the loop workflows.

+ Read More

TRANSCRIPT

Alex Milowski [00:00:00]: I'm Alex Bilowski. There's lots of ways to pronounce my name, which is a fun fact, but Kobe, Bolowski, Kubiovski too, if you want to be the Polish version. And I usually take my coffee any way I can get it. I love a flat white, but it's very hard to find in the us. It's like proper flat white, so I got it cooked on them in Edinburgh.

Demetrios [00:00:21]: Welcome Back to the MLOps community podcast. This is the definite guide on workflows, aka Dags, aka pipelines. We get in into the nitty gritty of why they're valuable, how you can use them. We're talking workflows within workflows, workflows that are dependent on other workflows. It's just a constant deep dive on what they are, why they're useful and how you can best take advantage of them. My man Alex did a whole survey of different tools that are out there. He covered 79 of them. And we'll leave his blog post to all of the insights that he gained from this survey of all the open source workflow tools on GitHub in the description so you can check it out.

Demetrios [00:01:19]: Let's get into the conversation with Alex. Workflow systems and what those mean, what it is, what got you interested in them. Where are we coming from with that? Because you've got lots of thoughts on it and there's this really cool like summary or survey that you showed me. So I want to dive into that. But let's just set the scene with workflow systems.

Alex Milowski [00:01:50]: Sure. The idea of a workflow, I had to look this up and it goes back to really the 1920s, which is global pilot. The term itself comes out of people looking at process engineering, manufacturing and other kind of business contexts. So that makes sense. Right. And it is how we use it today. But the real sort of genesis of the current generation I think comes out of more like rule based expert systems. The generation maybe some of the 90s, where people were trying to build instead of writing code that implement business processes, they have rules.

Alex Milowski [00:02:28]: And the rules would use a rules engine and the rules engine would tell the application or whatever what's the next thing to do. And so there's this sort of implicit idea of a chain of things that somebody has specified and what's the first step and if that's successful and maybe some criteria and then what's the next step? The average user might think of those as. Or sort of person might think of those as like a filter. But it's More computer sciencey talk that's it's a big graph and inside the graph is a bunch of steps. And the steps are basically tasks that do things and manipulate data and produce results or have side effects. And then when one successfully completes, some downstream task gets invoked and that is the workflow itself. And there have been processes with systems developed post rules engine's world that think of that more like a graph and they think it's more human intuitive to draw out that flowchart of what they want to do. And there's a whole sort of other side of this world that's not used in machine learning as much, but this whole business process modeling and there's actually standards around that.

Alex Milowski [00:03:33]: There's a thing called BPM and there's a business process modeling notation. And those are very cool things themselves that the people use to describe processes that they use inside their businesses for all kinds of purposes. And having an exchange format and notation for that. Great. But that's sort of the I think bifurcation here of those kinds of systems that were built for sort of business processes in general versus how we have used them in data DevOps and data engineering, ML operations and so forth. And then there's a kind of a small category of things that are workflow engines that are embedded in applications themselves which kind of are all over the place. And so my interest was to look at the sort of more broader context because revisiting things I've done in the past and all this new stuff and what is out there. And I spent some time on GitHub looking at, I found some people had some great lists and I found some best ways to search for these things.

Alex Milowski [00:04:34]: And I basically built a big spreadsheet of workflow systems and features that they have and are trying to sort out these different systems and what, where, when did they start and are they active and features do they have versus others.

Demetrios [00:04:49]: So presumably the majority of them are geared towards technical users.

Alex Milowski [00:04:55]: Yes, as I said, there's a bifurcation the business side of it that do this more of the business process modeling. They have nice interfaces. They are meant for your sort of average user inside a corporation or someplace to be able to describe a process. Some of them have a sort of a no code aspect to them. In the end of the day it's still a very technical thing to draw one of these big diagrams. Even if you have a beautiful tool to do it with. You have to know a lot and you have to think a lot about like how do we actually do this? And Then sometimes they're replacing or modeling things that are done by humans. And I have a whole story about that from the last company I was working at with scientists, Biden, how I was trying to.

Alex Milowski [00:05:38]: We were actually used this BPN notation just to help them like draft a picture. And I needed a standard way to draw a picture of what they were doing. And so that business process modeling notation in the scientific context was actually a useful tool. Right.

Demetrios [00:05:54]: It is funny that you talk about how we can represent it as graphs because I've heard this before in this last year when we had the AI Quality Conference, one of the speakers, who actually is one of the creators of Docker Solomon, he talked about how everything is a graph. And then my friend that was sitting next to me, David, he said, wow, I never thought about that. It's. Yeah, everything's a dag. You can really represent anything as a dag, kind of. And so it's almost like thinking about it like that.

Alex Milowski [00:06:27]: Yeah, graphs are super useful notation. They also can get messy really, really quick. Right. And this is where you need tooling and so the notations and so the. Depends on what you're doing with it. I think the. What I found interesting and we stumbled on this BPN and I had not been following that area for a while, but I was. We were struggling with how do we draw a picture of these process engineering flows because we want to automate this.

Alex Milowski [00:06:55]: We have machine learning, we have robots, we have humans. Right. And they're all part of this process and we have different interactions there. And when a person does something in the lab and they take a. They literally have a plate with wells in it and inoculants and stuff in it, and they're putting it inside a liquid handler robot or they're putting inside an incubator robot. We need to know what that task that the human did before the robot is told to do their task, before the machine learning is told with the post data to do its thing. And so that modeling of that whole process let us build tools and user interfaces to make the lab more efficient. And that's one of my sort of takeaways is from that experience was that there's what we do in mlops for like just the part, the technical part that we're concerned about.

Alex Milowski [00:07:47]: And then there's how it's used in the bigger organization and that's also a workflow. And so it's like workflows inside workflows. And you know how where you slice it is important to how the sort of result that you're trying to. Trying to get. So in this case, we're trying to make the processes that the lab used more efficient to make to get more throughput to the lab as well as to get all the interesting results from the computational side of it. And so writing that workflow down as a whole thing with all the technical bits in there too, so everybody understood all the parts was a. Was a challenge. Right.

Alex Milowski [00:08:27]: So having a notation was first like how do we communicate this? How do I get a. Literally a drawing it doesn't have to turn into code, but just a drawing that everybody says yes, that's what we do. Right? Yeah. And from all sides. Right. And even if their eyes glaze over for a part of it that's not their thing, we all have consensus and we're building around the same thing. So we stumbled across this mostly because there was a really cool web based tool out there from a company called Camunda.

Demetrios [00:08:53]: It's.

Alex Milowski [00:08:53]: But they have an open source thing. It runs in the browser and you can drag and drop things and build the whole workflow and then it spits out an artifact. Nice BPNN notation file format. So theoretically down the road you could do something with that. In our case it was simply just a diagram that we could use as a communication piece as part of our sort of internal technical adaptation. But it was a good starting point for that. Just drawing a picture of what are we implementing with our workflow systems.

Demetrios [00:09:25]: You talk about the way that you slice it and being able to look at the workflows almost like from infinitely zoomed out to infinitely zoomed in and all of the different layers of workflows involved in each time that you're zooming in and then where one workflow starts and one stops and who owns which workflow and then you're getting into technical workflows versus non technical workflows. And do you have any experiences on what is useful in that regard? Like how to slice and dice these?

Alex Milowski [00:10:06]: Yeah, I think the. One of the challenge, one of the things is to maybe not try to think of it as there is the one workflow. So I think the. On the outside there's the process of your business, whatever it might be. So if you're a very in silico technical kind of organization, like you have a digital product, right. And you're using machine learning. There's the aspects of the tech internal piece of how that model does inference or is trained or is evaluated. And that's a very technical workflow.

Alex Milowski [00:10:36]: That's something that a small team probably somewhere in your organization really understands well, and then it's a black box. And it's a black box in a larger workflow. That is how your organization uses that, how they make decisions around it. So train a new model. The output of that whole process might be not just the model, but how good is it at a particular task. Somebody has to make a decision about what they do with that decision could be automated. If it passes certain criteria, we put it into some kind of production track. It might not be.

Alex Milowski [00:11:10]: There might be a human who has to go in there and make a decision and start the ball rolling. That's part of a bigger process and that's sort of a business process workflow. And so I think there's these, there's opportunities here to have a layered model where you can be using and maybe using different technologies, different workflow systems. Right. But still they're interacting because one is, one is using the other. And in the case of my last company, the auto workflow of what is the scientists labs doing versus the technical bits of each tool that they're using. That was just a paper discussion. Right? It's a diagram, it's a part of a documentation.

Alex Milowski [00:11:49]: It's so that we all understand what we're doing. But maybe the ultimate goal is that it's executed by some system. But that's a long term goal versus the short term goal, which is. So now if you dig into the boxes in this thing, how do we accomplish that task? Maybe there's a big procedure, it's a wet lab thing, maybe it's a robot, maybe it's a whole machine learning workflow that runs a bunch of code and manipulates a bunch of data. But so there's, you can have, you know, you can draw the workflow out and you can execute it and you can have a system that actually implements it. And I think those are useful architectural sort of ways to decompose the problem. And then you can choose where you implement your and where you spend your time and money implementing a system. Right.

Alex Milowski [00:12:32]: And there are choices for each of those things. That's what's cool about this is you can go full on like I got workflow everything and I've got a system for everything. And there are tools for those things. It's the cool place where we are at this point in time, which was not true even like a decade ago.

Demetrios [00:12:47]: Yeah. It does feel like if you are able to understand the different workflows and what happens after your workflow ends, in a way you are setting yourself up for greater success. Because if we take that example of, okay, there's your little piece, which is the model, and then exposing the model to the greater organization, that's great. And you can be optimizing the model for things that you think are cool, but if you really know what happens after you expose it to the organization and how other teams are using it, or what they're looking for when it comes to that model, then you're optimizing for something that's greater than just what you think is useful and you're understanding how it's being used in the greater context. Have you seen that being a case that when you notice all the different ways and dependencies that are being built from your specific workflow, you're setting yourself up for more success?

Alex Milowski [00:13:56]: I think that it depends on the scope there. So I think that workflow systems, when they are properly handled, however, whatever kind of technology you choose, you have to support them. They have to be vibrant and they have to meet the user's needs. So like in Stitch Fix, you know, when I was there, there was 140 data scientists running stuff for the organization. Some of these things were the sort of backhouse types of things. Some of these things were more daily things that happened that populated their systems internally. And they were really critical. And some of them were research work and so forth.

Alex Milowski [00:14:31]: And so all over the place, the ability to describe the steps of things and interact with it was airflow underneath or wasn't. But they didn't actually interact with the outflow. They had a DSL that let them just term we should define the domain specific language. But they have a way to describe the workflow in a DSL and then they could give it over to the system and they would run it through by underneath with airflow. And then they had a way to execute the tasks on their big sort of batch system. All that complexity was hidden from those 140 users. So they didn't have to become experts in that technology. And it did lots of good things for them.

Alex Milowski [00:15:08]: Like the task executor auto learned stuff like, oh, your task needs more memory, so we're going to retry it with more memory. Because it ran out of memory and it learned how they write parameters for the user. So again, they didn't have to be experts on deployment. You can do lots of cool things with workflow systems that and you can. So that gives you that they're more productive, right? They are less frustrated until the system breaks and then they're frustrated with it. But happy users are Silent like this. And you. So I think that I've seen that kind of success.

Alex Milowski [00:15:37]: What I haven't seen CF and maybe don't be able to tell me I'm wrong. I haven't seen a jump. Silos. Right. So that's a data science organization inside a big company. There you. The tool is used internal to that part of the organization. But you don't see nesting of workflows as much where you see.

Alex Milowski [00:15:55]: And this is now part of a bigger workflow. I think those are more. Those processes are more bespoke in terms of if the data that's used by the output of the machine learning workflow, the. The model that was updated is used by some other system by some other connection that is specific to how we deploy that thing and the idea of a workflow using a workflow. But there are different parts of the organization using workflow systems. I haven't seen that sort of in real life. I'd like to see it. I think that's a vision that we describe our organizations as process.

Alex Milowski [00:16:31]: I like to think of as process engineering. Right. And we can take that down. In fact, there's a term I'm borrowing from like manufacturing where you know, everything has to follow a procedure and we describe what those procedures are and they fit together like puzzle pieces. And I think business and digital context can work the same way. But you have to do that process engineering. And it's hard so that people kind of skip that hard step and there's a payoff people will use these systems. And so I think we see that with Data Engineering and MLOps and Operations where there is a payoff.

Alex Milowski [00:17:07]: Right. Because there's all this complexity about how we do our task and we could hide that in the system and make the end user who is a data scientist or machine learning engineer or some other kind of data person. Data engineer. They're just more productive because most those details are left to a very small handful of people in a corner making sure that workflow system work for them.

Demetrios [00:17:32]: Why do you think it is hard you mentioned to do this process engineering? Is it just because it's time consuming and cumbersome and there's a lot of friction there to describe the systems and explain the processes. And it's not really like you get. You can't just do it and it gets auto documented for you really. Right.

Alex Milowski [00:17:56]: I think the. My last company experience was. Is interesting because my main users were internal. They were scientists and they are. This is not a world view that they're used to thinking about. In terms of, like, we're going to describe our. What we're doing in our lab as a graph of things, of tasks. And they're really writing, like, procedures, operating procedures, and it's a big document with hierarchies of things.

Alex Milowski [00:18:23]: And next you do whatever, and next you talk to this machine and set up these protocols. And so the idea of looking at it as a graph of tasks and how they interact, how you interact with that and annotating it with, here's the data that I need, here's the pre. Steps that have to happen before I can start this task. Here's the controls I have, here's the failures that could happen here in this situation. That level of detail, they're just not used to writing out. And you can understand that because that's not usually their. It's not useful to them in their day because they know how to handle those situations and they're familiar with equipment, they're familiar with the work that they're doing. It's what they've been trained to do.

Alex Milowski [00:19:02]: That's what they've been doing in their career. But it doesn't. When you try to scale. And this is where the problem comes in. When you try to scale it, you try to understand more about the metrics of it, the data artifacts, the other things you might be able to get on the system. Then you need that process engineering to understand that. If I'm going to automate something, you have to be able to draw me a picture of what you do and tell me all the facets of it. And so that's just a hard conversation to have.

Alex Milowski [00:19:30]: And so I had varied success when I did that with these. Some people really pushed back hard. They're like, I don't know, why would I have to do this? Here's the thing I'm doing, here's the procedure I have. It works fine for us. And others were like, sounds interesting. The diagram looks interesting, but I don't quite understand it. And then when they engage on it at different points in times, they all of a sudden they're like, oh, okay. I can see some value in this as maybe as a way to document what we're doing in a different way.

Alex Milowski [00:19:58]: And that's a great opener, right? If you see value in drawing the picture, and I can take the picture or one of these tools and do something with it, actually build a system, that's great. And so I still had that sort of cool, like, range of responses for people. So I think that is the challenge an organization has to Be able to attach some kind of a value to it. What are we getting out of them? What's the ROI of doing all of this process engineering. And so that's why people are like manufacturing things. Makes total sense that they do this kind of engineering because the efficiency, quality measuring stuff, that's all about manufacturing. But when things are much more, I wouldn't call them peak, but that's probably not the right term. But basically when they're highly trained people and they're building these things, they're running a lab, they're running the business side of your organization, they that sort of technical benefit and you're asking them to speak this weird language.

Alex Milowski [00:20:57]: Where's the benefit of that? And so you have to lead them with like, where's. And so I, I always like to build demos. Right. Or prototypes or find some exemplar that's going to give them like the why of why would I do this? Which is where we started with my last company and others were like, you've got to build something that's got a captivated value somebody can see. And then there's an institutional investment and going beyond that.

Demetrios [00:21:20]: Yeah, I noticed it with just this podcast. For example, I had a friend tell me really early on, you're spending too much time on that podcast. Why? What is your favorite part about it? What do you like doing? And then let's figure out how we can automate the parts or get someone else to do the parts that you don't like doing. Because there is a lot of intense things that you get from creating podcasts. And like you said, like, I was doing one a week and feeling overwhelmed because there's so much that goes into it. I wasn't, I was dropping the ball on a lot of stuff. So I recognize that sounds like a workflow system. Well, I did.

Demetrios [00:22:05]: So I was lucky enough because my friend told me, hey, sit down with my buddy. He's really good at this type of thing and recognizing where you plug in and where you don't need to plug in. And so I sat down with a guy and he just said, so what do you do first? And I say, I find the guest. And then what do you do once you find the guest? I ask if they want to come on. Okay. And then what do you do? And we just went through that. And what if the guest says no? Or what if the guest says this? And. And three weeks later I had a very in depth flowchart and it helped me so much because it was the blueprint.

Demetrios [00:22:41]: And so I'M wondering, have you seen any trends in the ML world taking a bit of a left turn of or what are the most interesting trends that you've seen in workflows for machine learning?

Alex Milowski [00:22:55]: I think that one of the interesting trends here is that the workflow systems in the last, I would say decade have really grown up and it is not a 10 years ago it was more of a niche thing like why would we do this thing? It's weird. I can just write all the code or I'll write a bash script. Now it's much more common practice that oh, you've gotten to this point from building whatever your prototype is to you need to have a workflow system. Here's your choices of systems out there that people are using. And there are older ones like Airflow and there are newer ones like Metaflow, everything in between. And so you. And then you pick your technologies and things. And so I think that is that sort of change to this is not a hard decision like you have to convince people and no, it's like a yes.

Alex Milowski [00:23:42]: People use workflow systems to train, to do inference, to do all kinds of tasks for them and you should have one, right? And what's your deployment infrastructure if you're using Kubernetes? There's all kinds of choices, other things, there's ones that do it more like a service orchestration and you pick, pick your sort of amongst a menu of things. And there are SaaS services and there are things that you deploy. And so I think that's the great thing is that we've moved from it's maybe a hard sell in a corner to this is just standard practice. And, and I think the. There are a handful of companies that are doing this as a business which is great. There's a lot of open source here. A lot of even the ones that have a SaaS system, the core technology is open source, which is what I did. That's why I went to GitHub.

Alex Milowski [00:24:28]: It was easy to find a long history of these things over the last actually couple decades of people building these systems. Some of them done active and some of them have gone stale and are no longer active projects and, and, but they do like they, they're in different categories. So there's the whole business process stuff which we've been talking about a lot about the high level and, and then there's these other categories and when I did my survey I made these things out. There's things for the business processing things for the generic aspect of it and then there's Things that were specifically for science. They have their own challenges there of PC computing and then there's this sort of challenge. They starts with data engineering and then there's the data science and ML and the side of it. And then there's a little bit around operations and so you can see the sort of newer generation of tools came out of that sort of data engineering side and then grew into data science and ML and then a little bit of offshoot for operations which is more like system management. How do I add a new node to my Kubernetes cluster, how do I install software across a bunch of machines, et cetera, et cetera.

Alex Milowski [00:25:32]: But the same kind of problem workflows that applied in the operations context. And of the the smallest 79 systems I looked at out there, little like 46% is business process automation. And the rest of it is there and the other, but the other big chunk is the sort of data science ML side of things, which is about 22%. And so that's a trend, right of like these systems are growing and they're active and it's not like we're not using the business process stuff. That is a whole healthy world, right? And they are growing as well. And but the focus on ML ops, which is obvious in lots of contexts, right? But these are multi step processes and this is where workflow system of a certain sort can shine. And so there's active. And I, I spent some time looking at, I actually went in and said when was this project created? And, and making sure is it still active? And those are two dimensions because some things are personal hobbies, some things are products that kind of came in and maybe somebody abandoned it, the company went out of business or something.

Alex Milowski [00:26:37]: There's lots of reasons why things stop getting developed because we're usurped by a new thing. And if you look at that, the business process stuff is like the oldest repos out there you can find of various products that goes back to almost like 2005, 2006 somewhere in there. And they're still actively new projects being created, new things up until sometime in the last year. So people are still innovating and building new things in the the business side of things. Same thing is. But you know, go look at data science. ML is more like 2015 and then from there and so actively till through last year. And it shifts back a little bit for data engineering.

Alex Milowski [00:27:15]: Science has had its heyday from like 2000s to the 2015. There's lots of reasons for that. They're still actively being used but they saw the ones that saw a very specific, this is my take, very specific HPC problem like IAM doing some massive model, they're still in use, right. Whereas the things that are more like machine learning, data science, data engineering, they have new choices. They don't have to use these other systems. And I think there's a bifurcation there of use cases. So it's interesting to see that there are different trends here, but they're also in these different sort of columns of use. Right.

Demetrios [00:27:57]: It's fascinating that you break down like the data engineering the. So besides the business side of the house. But if we're looking at the technical stuff, I can probably rattle off three or four data engineering E1s. When it comes to the most popular. You've got airflow that's proliferated everything and most people are using that. Whether or not they like using it is is another story because it's been around the longest and it's had the most adoption. And then you have the, like the Mages or the dag. I think it's Dagster and the prefix out there that are attacking it in different ways.

Demetrios [00:28:41]: And almost like the workflow, data engineering, workflow 2.0 type of thing I would say because they're a bit more new and they're taking a different approach to things. And then in the ML world you have the Zen MLs and the Metaflows like you said, and even Flight I think is another one that is is in there. And those are all fascinating because they're going after the ML specific type of use cases. And then the DevOps world, you've got like your Argos and maybe you could consider kubeflow in that world. That kind of plays in the ML world, DevOps world.

Alex Milowski [00:29:24]: So when I was surveying this, I had a whole column and this is a spreadsheet. I was like, yes, knowing and trying to put categories and things. But I had a whole column about machine learning AI. And one of the most challenging because I'm looking at their documentation and looking at the code and the repo is everybody is who's got an active project is adding ML AI to their documentation as we do this, right? And so I marked those people as neutral and they went until I found actual evidence of here is we have tasks for it, we have examples, here's a workflow that does whatever and that that shows that you can actually do it. Then I mark it as a yes. And I think that's. There's a nuance here which is that any of These machine, Sorry, any of these workflow systems that have some kind of a model of a task executor, a lot of airflow is included in this. They can do advanced machine learning workflows just fine.

Alex Milowski [00:30:26]: And because that task executor could be some complicated thing running on Kubernetes, it could be some other system that you're interacting with, it could be an inference endpoint that you're using at some inference provider like base 10. Those are all. They all have that capability. It's. But I think the, the challenge here is. And that's true for the people who have business or historically also say of an older product was more mature and now they're saying, hey, we can do machine learning AI. It's probably true there as well. It's this question of how easy is it for the practitioner to use, right.

Alex Milowski [00:31:03]: To actually do that. Do they. Even though they said it, am I. Do I have to do all the heavy lifting to make it happen? Or do you have infrastructure for me? Do you have examples? Do you have documentation? Have you thought through the nuances of what I need as a say, a model training pipeline and how do I get all access to assets and GPUs and things that I need? Or is that stuff I had to figure out, even though I'm using your workflow engine? And I think that's the differentiator. And some of the newer systems also are kind of code oriented, they're on that infrastructure as code track. And so if you're going to describe your workflow, use Python annotations and you don't have to deal with the DSL and all these other things, you just write code. And that's a trend right now. Everything is code and we just write it in Python, we use annotations and that's works for some people quite well.

Alex Milowski [00:31:55]: And there's nothing wrong with that, but it's not the only way. And so older systems like Airflow, there's also a differentiator there that the DAG is stored in a database. Right. And so there's not necessarily a representation of it other than to talk to the system. And there's a bunch of systems that work like this and. And then some things have DSLs and it's a YAML file, it's a JSON file, it's something else custom language. And then you write in that DSL and some things have just code. Right.

Alex Milowski [00:32:27]: And so I tried to make a differentiation in the analysis of these different systems and looking at how do you interact with These things and the trends for Data science and ML is more cooked, right? Less annotation, less DSLs. But maybe we should talk about DSLs at some point. But I think the, and the older trend has been that there's a serialization format. There's a thing you could author. There's an artifact that is the workflow itself, right? And it's, it's coded in JSON, xml, YAML, some custom language, and it's a piece of code itself that you can check in somewhere, but it's not Python. It's something else, describes all the metadata around it and treat it as such. And that has its use cases, right? Yeah, but the trend is, for me, ML is away from that. I think right now, I don't let that stick.

Demetrios [00:33:19]: You mentioned to me before we hit record, that it feels like everything is moving towards code and this infrastructure is code world. And the big question there was, is that good?

Alex Milowski [00:33:34]: Yeah, I have mixed feelings about that. I, I understand how it is when you're writing something. It's very compact and it does a very technical thing and it's all sitting there in Python. It's a bunch of Pytorch and other things like that, and being able to wrap those up in functions and then organize those functions into a workflow as a sequence of steps, it's very elegant and useful. But the challenge with that is then that code is the only way you can understand what the workflow looks like, right? So if you want to draw a picture of it from your code, you have to run the code somehow and get an artifact. What is that artifact? Right? You don't have a dsl, right? How do you draw, generate a diagram from that? A lot of tools, a lot of these things will do that for you. They will make a picture for you. And maybe you can save it as an svg, maybe you can't, maybe you can take a screenshot, whatever.

Alex Milowski [00:34:27]: But you have a picture, right? And you can give somebody a picture and they can say, this is what we're doing and you can have a discussion about it. And so I think the problem with the annotation side of things is then only people who can write code, you know, can understand that workflow. And so that's, I think, is a challenge. And then I think it's if you're like me and your ultimate goal is there's more workflows and there's nesting of workflows. There are people who don't write code, they're just talking about your machine learning workflow. Is a big. A part of a bigger system and that's a big workflow and we have pictures for all the rest of yours is a black box when you don't understand what it. How it works and what the different failure states are so forth.

Alex Milowski [00:35:05]: And so I think that it runs a follow of that. So I don't think that the infrastructure as code approach, the annotation approach is bad because it could produce artifacts that are the definition of the workflow. I just don't see a lot of evidence that that is where these tools are going right now. And maybe they just as they go on their adventure or building their systems and services that they'll. Something will come out like there's a wide variance of what these DSLs look like. There's a lot of history there. There have been some attempts to standardize it in various contexts. The science domain had a YAML based thing called Common Workflow Language.

Alex Milowski [00:35:46]: BPM is something borrowed from the Object Management Group, I believe as a standard for business process modeling. And they have a notation and that's another standard for the diagramming of them. But those. Have they taken root in various communities? Sure. Are they widespread? Probably not. And so I think the. It's not clear that there's any real winner there and it's not clear that we necessarily need a standard. But certainly within your organization if you had 10 different formats, you'd probably be unhappy.

Demetrios [00:36:15]: Yeah, you know, keep going crazy.

Alex Milowski [00:36:19]: Yeah, there's some work to be done there.

Demetrios [00:36:20]: But when it comes to these domain specific languages, is it something that you. In an organization you choose one and you go with it and you can abstract away like you were doing at Stitch Fix. You mentioned you had the end users using the domain specific languages and then underneath the hood you had almost that infrastructure as code layer. And it feels like that was working well for you all.

Alex Milowski [00:36:53]: Yeah, maybe, but maybe depending on your person. Right. It was working well to keep the system's longevity. Right. So that as we change how those tasks were interpreted by the system. Right. As technology changes, the workflow is just metadata about what we would like to see happen and how they are chained together. So I think there's a value in that.

Alex Milowski [00:37:16]: But I think that there is definitely some pushback that comes with that as well. Because it's another thing, another artifact. It's external to your code. There can be mismatches. So there's lots of challenges there too. And so I think it really depends on the particular user that you are interacting with. What's nice about most of These things. There's a high prevalence of people coding these things in YAML.

Alex Milowski [00:37:42]: It's the. You could like it or not. I don't find YAML a problem. But a lot of people don't like it. It's okay. But the structure is pretty much the same. There's a list of steps and the step has a bunch of metadata and they all have names and they point to each other in some mechanism. And so the.

Alex Milowski [00:38:00]: So having that kind of comment format lets you take a workflow from system A and a workflow from different system B. And if they're both in YAML, you can think about how you would represent those as there are artifacts that you can check into source control. You could generate diagrams from them. Maybe there's a tool and system for that. So I think there's. Having Diesel has benefits like that, where it's a. It's just something you can parse and you don't have to run code because that requires infrastructure, that requires, you know, environment set up. You can just parse it and understand what are the steps in this.

Alex Milowski [00:38:36]: That thing. And then people. That's when people jump off and be like, oh, we should have a standard. But standards are hard, right? Getting people to agree, they take a very long time. I'm not saying that won't happen. I'm just. It's not something I see happening right now and. But maybe there'll be a need for it in the future.

Alex Milowski [00:38:54]: Right. And then there's the whole custom side. Right. That's another thing. Like, people have created their own little languages, declarative languages, for describing these things. That is definitely on the down side. I don't think that people are doing that so much anymore.

Demetrios [00:39:07]: It's like that trend is fading.

Alex Milowski [00:39:11]: There's a reason why they do it. It's. You can make a nicer thing. Right. But then you have the problem of. To leap up, to learn your syntax and semantics, and maybe that's more trouble than it's worth. Yeah.

Demetrios [00:39:22]: Well, especially onboarding new folks. And everybody's got to go through that. And so you're now creating just a more cumbersome process to get someone up to speed.

Alex Milowski [00:39:32]: Yeah. And these things are. Everybody likes to use the term dag, which I don't always, because not everything's a dag. Right. Asynclip bar. Their workflows have loops, right. So they're graphs. Right.

Alex Milowski [00:39:44]: In general. And sometimes they're forests in the sense that there could be workflows that have two different independent pieces. There's lots of complex things out there. Those are the edge cases. Right. Even things that have loops are at the edge cases. So the DAG term is a simplification computer science wise to make it nice to execute and. But it's still even with a dag you can have meets and joins.

Alex Milowski [00:40:09]: Right. So you can have a little sort of loop in your thing and you gotta cut that somewhere. If you're writing it in YAML, JSON, whatever, you've gotta cut it and you've gotta, you know, and how you make those choices maybe is easier or less easy as the thing gets more complex. And that's where you need tools in the end of the day or if.

Demetrios [00:40:28]: You just have thousands of different DAGs or workflows in the organization. It's. I've heard so many stories of folks who are like yeah, we started as a startup and then we had success and the airflow dags just kept growing and nobody really went back and sorted those out. And so you have that sprawl I.

Alex Milowski [00:40:49]: Looked at when I said cisfix. We had the airflow we had underneath there. We had the system with the DSL so I could pull all of the workflows or thousands that we had and look through them. And what was I found interesting is that there were some eyeballs in there like that did all kinds of crazy stuff. But most of them are a straight chain of steps. Right. Well A, B, C chained together. Right.

Alex Milowski [00:41:14]: That's most of what people are doing. And it's not a surprise. So all this, it's a DAG, it has loops or it's not a DAG. Most people's like the 90% case is probably a straight through chain of things. That's a dead numbers, I guess. But is what I found overwhelmingly the majority, way higher than 50% of what? And this was an organization that had been doing this for a while was these straight through chains. And it makes sense. There's some kind of preparation stages of what you're doing and then there's the main event, you're training them all you're doing inference, you're, you know, upserting into a database and then there's some cleanup maybe at the end.

Alex Milowski [00:41:53]: Yeah. And that's like most people's workflows and that's I think how these ML data engineering workflows differ from the business process workflows. Right. Business proxy workflows aren't a straight through chain. Right. There's decisions being made about did we answer this customer's question? If that system fails, we Go over here to do something else. If there's a transaction, when we've done, the transaction goes through, the order gets made. If it doesn't go through, there's some other process.

Alex Milowski [00:42:29]: Right. And so they think those business process work, those. They have much more complexity in them. They have a lot of branching and conditionals, and they have a lot of side effects. Like, if we succeed here, we're going to notify another system, we're going to send a message to the customer or something else like that. So they have all these side effects that happen along the way that are just not a thing as much, although you can imagine them. They're just not in practice as much in mo ops. And so that's how these systems are different, and that's why they're different products for them.

Demetrios [00:43:04]: It makes sense. I'm just thinking what you get with your favorite DAG tool is a slack message, right? That's like.

Alex Milowski [00:43:15]: Which are useful, right?

Demetrios [00:43:16]: Yeah, exactly. But it's not like what you're talking about with this super complex logic or loops happening or then spitting out into another subset of a workflow. Then it is fascinating to think about what I would like. I can't believe it took us whatever, 40 minutes, 45 minutes to get into this part of the workflow engines. But I have to ask about agents and what your take on them are as almost like workflow engines and workflows in general. And also, I will preface this by saying we had Igor on here a few weeks ago, and he was talking about how agents should be seen, or even LLM calls should just be seen as another step in a dag. Oh, we're taking messy data and making it neat and tabular, and that's one of the steps of the dag. But with agents, I've seen so many different examples of folks who have tried to explain the agents or have the agents work as DAGs where they just make up the.

Demetrios [00:44:34]: They make up the graph on the fly. And you now are dealing with this workflow that was created by an agent.

Alex Milowski [00:44:42]: There are some companies out there in the mix that I looked at who are specifically focused on agentic systems. I put them in the category of sort of the business process because their tools look exactly like that and they're a little more generic in the sense that they are. I would think of them supporting more like your chatbot type of interface where you're talking to your favorite airline and things are happening. And when you say, yeah, I'd like to buy that ticket, or I have a this problem with my luggage, you know, there's a. It interacts with a bunch of stuff and comes back to the agent interface. Right. And that's a workflow that they have to manage and it's automated in some capacity. And so there's people who are building products for that kind of workflow.

Alex Milowski [00:45:29]: And I found them amongst the things that I would servic. I think the sort of interesting side that's not that is this kind of challenge of the LLM based sort of agent system where you have some inference that's happening there and then there's a consequence. It's made, it's put something out there that is either code ish or it's something that's going back to the user and that's part of this bigger workflow. And that's more like an embedded workflow. And that could be dynamic, right. As you say that it could be generated by the system itself. It could be generated by another piece of code or some other model. And there are these sort of part of the mix here are these things that are more like workflow engines and they are.

Alex Milowski [00:46:14]: So they're not systems necessarily as much as like it's a library to do this kind of workflow orchestration and you can write your thing or have it give it one of these dags of things to do and it will run it right to some completion. And that's more like an embedded workflow engine. And there's a bunch of them like that out there. And that is interesting because it goes back decades to what people were doing with rules engines for workflow systems because those were engines that they put in you put inside the product. And sometimes there were like desktop applications that were doing this stuff and they were running the rules and acting on your behalf or and there's a user in front of them. So now we've got like a chatbot interface or agent out there that somebody's interacted with on a website or through an app. And it's doing the same thing. Right.

Alex Milowski [00:47:02]: But different scale. Right. And it's not running on your desktop app, it's running out there in the cloud somewhere. But it's the same thing. It's an embedded engine running an embedded workflow and dynamic or not. And so I think that's what some really, that's. There's some cool possibilities there. I have not.

Alex Milowski [00:47:20]: And this is a good research topic for some of you or myself. I just, you know, where are you in that, you know, that adventure? Like what is, what have people done successfully? How Is it architecturally different from what people are doing now with these interfaces? Where's the sort of there, there? And what can't you do with the current systems? You could do with this sort of theoretical embedding, LLM based thing or agent. There's a lot of possibilities.

Demetrios [00:47:48]: What I hadn't thought about before that you just opened my mind to is how the agent is almost a gateway to choosing the right workflow. So we talk a lot about agents being able to choose tools or have access to tools. And most people think, okay, now it can scrape the web or it can have access to my database. But the thing that you just said is, yeah, one of the tools might be that it kicks off a workflow and so then you don't have to worry about the agent spawning a workflow every time. And then maybe it spawns the wrong workflow or the workflow isn't exactly like what you need to happen. So the agent just has to choose between what workflow it needs to use. And that is very much like going back 20 years, but now we have a little bit looser way of having the end user interact with the agents or the if that then this statement.

Alex Milowski [00:48:53]: Yeah, I mean that's kind of the interesting juncture that we're at where a lot of the, you know, large language models and we have agent systems and so we've taken them apart in workflow systems. So we know we have these kind of pieces that are much more advanced than they were back. And they're not all in one system, just doing pointed in one direction. And so we can take these puzzle pieces, put them together, they're like Legos and make different things out of them. And since the technology is more advanced, we can do some really amazing stuff with it. And I think that's a nice juncture where we're at. I was surprised going through the list of projects, how many there were that were still active in all of the categories that I had, and that's 79 is not a huge number. There's a lot of noise out there, but there's a lot of just people are actively developing these things and maintaining them.

Alex Milowski [00:49:48]: They're using them for things and they're in a variety of contexts. And so I think that maybe not what you hear out there right now in terms of the buzz in the industry that this is a very healthy, vibrant area of work that people are doing. They're using it for stuff obviously because they're. These projects are active and some of them are niche and in a corner and some of them are commercial products and that people are selling and everything in between. And so I think that's good for users out there because they can find the thing that matches it. The only challenge is that there's a little bit of tyranny of choice. Right?

Demetrios [00:50:24]: Yeah.

Alex Milowski [00:50:25]: If you're new to this and you're like, I need a workflow system for X, you've got some choices that you're apt to make.

Demetrios [00:50:31]: How do you categorize, like the RPA systems, Would that be the business ones that you're talking about?

Alex Milowski [00:50:39]: Or rpa?

Demetrios [00:50:40]: Rpa. What do they call it? Robotic Processing Automation, I think is what it stands for.

Alex Milowski [00:50:46]: I don't tell that. So I'm not trying to think of did I run into anything that was more on the automation manufacturing side. And I'm going to say that I didn't see a lot there in what I surveyed. So that might be a whole different thread of this. Certainly there are lots of people doing things like we did at my last company, where we're sending a protocol to like a liquid handling robot because it's part of our automation of what's happening in a lab. Right. And that kind of like in biotech and in general Apharma, the kind of automation you can use, these workflow systems that everybody's been talking about here, because it's a service call to something, the thing has an API and you, you push a protocol to it and tell it to go. And so that kind of level of automation, I think people are using these kinds of tools for, but the sort of more industrial things, there's.

Alex Milowski [00:51:38]: There's a whole other world there. I learned that in science. There's a whole other world of plant automation that uses a different technology. And it uses some really old technology, which is scary. Like it uses OPC was, which is Microsoft Olay from like 1995.

Demetrios [00:51:59]: What.

Alex Milowski [00:52:00]: And so, you know, it's. That's why you can hack these. Like a power plant, for example. Yeah, yeah.

Demetrios [00:52:08]: Hold it for ransom.

Alex Milowski [00:52:09]: Those are. And so there's some areas there that are completely like off the radar from this kind of group of workflow systems. And there might be really good tech there. I just, I think that's a whole. That's a whole another world. Yeah.

Demetrios [00:52:23]: So you've been seeing the trends of what's dying and what's growing and you did all this research on it. Do you feel like you have any bets or guesses on what the future of these systems holds?

Alex Milowski [00:52:41]: I. I think there are some good contenders in the SaaS realm for machine learning workflows in general, ML Ops and they have some nice tools. I think the challenge for machine learning AI context is that we're just getting started and so their customer base are all these sort of early adopters and they're startups and people like that. So if you take this thing and you walk into a large enterprise where that's regulated and that is, you know, has like air gap systems. Right. They can't use a SaaS system. Yeah, yeah. And there's big applications in these places making that leap from you can use our SaaS service and everything's cool and we're writing Python code and we're doing all this cool stuff with it to like these places where it's highly regulated.

Alex Milowski [00:53:31]: It's. And there's all these other enterprise challenges and you have to have like certain kinds of certification, the BA link operating there. That's where the revenue is for these companies to be potentially to go to. And so there's to sort of have to make that leap and then there are some that are doing that. So I think the evolution of some of these companies to be able to provide enterprise products that really meet the needs, these other needs of has to have certain security levels and certain compliance things and be able to work in these sort of non cloud based environments and so forth or private clouds and so forth, that's that kind of is a growth area. Right. And you have to be able to survive that because it's also expensive video plus lens. So that's.

Alex Milowski [00:54:16]: I'm watching to see who matures in that realm. That's why some of the projects that are open source and good technology and well supported work well in these places because their technology teams can take them and put them inside these environments and deploy them but then they have to manage them. So I think that's a trend. I was happy to see that. I think the business process automation thing, it seems like there's a health community of people using that. I think the agentic systems is an area of growth for them to some extent. MLTI stuff is an area of growth for them. I'd be curious to know if they get traction in there versus just saying we do ML AI are people turning to these other products that are a little bit older? We'll see.

Alex Milowski [00:55:00]: Right.

Demetrios [00:55:01]: It's almost like with the business process automation I see now they're incorporating the capabilities of LLMs into their products. And so now as one of their steps in your whole workflow you can add whatever an LLM Call is capable of doing, whether that is summarizing a bulk of text or it is going and you scrape a website and then from that scraped website you pick out the most important stuff and now you have that data to pass on for the next step. And so you've got, you've got new tools that you can work with. And I've seen it done really well with friends in marketing who are trying to create content and what they do because you've got the almost like the easy way of creating AI slop which is just saying to ChatGPT, create me a blog post about whatever GPU consumption in the US and then it'll spit out whatever it has inside of it. Or you can start with saying create me an outline. Find like three relevant blogs that talk about it and then you choose as a human different blogs that you like. And then it uses the information from those blogs, it creates an outline for it. And then you say now create the intro paragraph, now create three body paragraphs.

Demetrios [00:56:35]: And you really prompt Engineer it to be a much more in depth type of workflow. And the tools are now giving you those capabilities by default because you have the LLM calls.

Alex Milowski [00:56:50]: I think one interesting possible evolution here would be that you have engineering teams which are often very expensive, taking things like generative AI technology in general LLMs and they're doing various things too in their quality checks and stuff like that. Does this thing pass our tests? Is it the risk assessment test to make sure that it's not going to do something bad, spit out bad results? They have guardrails and all this stuff. And so you imagine that there's these more technical workflows that they can take a new model, run it through its paces and say this is good, this is not good. Or there's some score maybe how it's not a black and white type of thing. It's a score of how well does it do in these different dimensions. But as there's more to do and there's more uses of these technologies, you can imagine that there's a higher level user there that's not on the technical team who this is their product and there's a new model and they, or there's a fix or there's a better model where it has, it doesn't do the bad thing because they tested the new version of it and there's a workflow for accepting that and rolling it out to their organization or in their application. If that always has to be a technical engineering problem, that's expensive. And so I Can imagine that part of the way that people build use workflow systems is that kind of multi layered thing where there's technical workflows.

Alex Milowski [00:58:19]: They use very specific technology that's geared towards the task at hand. Training a model, evaluating it, producing these risk scores. And then there's the business level workflow. It's saying, how do I take that model and get it into my application, roll out my users, get into production. And there's a gatekeeper which is, there's. And this is where human in the loop, which we haven't really talked about, but there's a human in the loop step there where it says do I want to do this? Because there's a business decision to roll this thing out. And when you codify that as a workflow and the only way it gets out the door is that somebody goes and does that human in a loop step of saying yes, as a human decision maker. It's not just a technical team somewhere who, you know, if they do the right thing in their DevOps, you know, thing, it goes out the door.

Alex Milowski [00:59:10]: It's, you know, maybe they'll even know they no longer have that ability. It's only done through the workflow system. And then there's a human who makes that decision and it's traceable in your organization and it maybe is less prone to mistakes, accidentally rolling out the wrong model or a model that doesn't pass your tasks. And so that kind of like level of control and then bringing it out of the engineering organization and back into the hands of a product manager or a business user of some sort, I think that's going to make the thing cost less. You're going to get better results in terms of quality. That's where a workflow system of different and different layers of them can be super helpful. I would love to see that kind of like thing. These kind of things already exist, but they're run by technical teams.

Alex Milowski [00:59:54]: Right. They're using the same tool to do their DevOps, their operations. Right. And so the tools are all there. But I think that business process automation side of it is not as much because again those people are usually like, yes, engineering team, roll out the new version. It's a lack message to a person and then a person goes and does a task.

Demetrios [01:00:17]: Looks good to me.

Alex Milowski [01:00:18]: Yeah. And that's where we are. I think the challenge with LLMs is they're squishy. Right. And somebody needs to look at these risk scores and there's new benchmarks and cool things coming up. Fantastic. And then they make a Decision about is this risk accepted right to roll this new version of whatever model we're using from whatever provider. And we've done our tests, we have our evaluations and now I have to make a decision, my job as the product manager, Azure, as though, whenever, just to roll this thing out.

Alex Milowski [01:00:50]: And you don't want to skip that, you want to record it and you want to just so that we have understand how did this thing get out and then want to change your process or change that workflow to deal with whatever issues your organization might have in terms of their use of these AI technologies. Right? Yeah, it's another guard, but it's a.

Demetrios [01:01:11]: Business guard, 100% in the FinTech or just financial or even just any regulated space that is a necessity because they're going to get audited. And so you got to have that explainability of what exactly you were thinking when you put that out there. And so it makes sense that this would only come in a workflow so that you have that specific area where someone pushed the button and said, yep, we're good with that and the logic behind why they chose that. But I, I do like the idea of taking it out of engineering's hands and just getting a different set of eyes on it because of the. As we had, we had Allegra on here a few weeks ago and she was talking about how she's a big proponent of density of diversity in a room. And so by doing that, by taking things out of engineering's hand, then it's not only engineers that are looking at it. And so by way of that, you have higher density of diversity.

Alex Milowski [01:02:18]: Yeah, that sounds great. And morphosystems are a good tool for defining that process and recording it and querying metadata and collecting information along the way. So you have a trail of information of what you did and, and then you can act on that trail in terms of making your processes better or when just somebody wants to know what versions of this model are we using out in our systems? They might be different. You have that true. Right. And people are doing this again, some other way. Maybe they have systems for this, but maybe they're not using these tools that are available and they don't have to build a special system for it. They can just use a workload system that exists out there.

Alex Milowski [01:03:04]: And that's. So there's that sort of build by choice then too. Right.

Demetrios [01:03:09]: And you know what this also makes me think of is how a friend told me about how wild any enterprise is right now when it comes to what AI they're using and not just like your ML teams, but if you think about governance in the enterprise level or on the organizational level and you have maybe the marketing team is using one business processing software that has some LLM calls or capabilities, AI capabilities within the tool, then you've got the actual software engineers that are using these AI coding helpers and you've got the HR team that's using some SaaS software that has some kind of AI capabilities and you don't realize it, but you as an organization you are exposed if your idea is like yeah, we're keeping all of our data inside that is totally out the window because everybody's using a different piece of SaaS tooling that potentially is sending the data wherever it needs to go. And so from a governance perspective, bringing these workflows in and recognizing that if it's workflow, if you have it documented in workflows, you also have a bit tighter control on what is being used and how it's being used and how these things are happening, I would assume. But I can imagine that it also can slip through the cracks there too.

Alex Milowski [01:04:44]: Yeah, yeah, for sure. I mean that there's a lot of data governance challenges but we'll hear today that that are are made worse I think and I'm not sure there's a clear way out of that box right now. Yeah, even with work close systems.

Demetrios [01:05:03]: But yeah, it's not going to, it's not going to help that much. That's the truth, man. But just the data governance piece. I know I had a friend tell me that their company did an audit and they were expecting that folks were going to be using like 10 tools, the AI tools and after they did the whole audit and this is a com. This is a relatively small startup, like 200 folks mid size is what you would call it. What they found is that There were over 92 tools that were being used that had AI capabilities or were AI tools and they just were looking at each other thinking like wow, this is wild. And a lot. There was a lot of repeat tools.

Demetrios [01:05:53]: So you have a lot of the same workflows, but maybe it's different parts of the organizations, different branches. Maybe you're paying for. Even if you are okay doing the OpenAI Enterprise Edition, maybe one branch is paying for it, another branch is paying for individuals to be using it. And so all that governance is a mess.

Alex Milowski [01:06:17]: So that was a challenge in my last company because partially because I am not a molecular biologist. Right. And so I don't know what is standard Tooling and things and resources that they use online. And, and the way that we took that apart was this process engineering aspect was having those discussions and drawing those pictures of what is, what do you do on your day to do this task and how, what systems and like, where do you get that result from? Oh, we go online and we take that DNA and we stick it into this tool and we run a blast search. And then we do this other thing with this other site because they, they're good at this particular thing. I'm really interested. And you're like, okay. And there's.

Alex Milowski [01:07:01]: So there's all these like touch points to different systems and these were things that very unique isolate strains, stuff like that. So that data, that genome is our, was our sort of bread and butter. You have this sort of challenge of if we take a little snippet of it, nobody knows where that came from. And so we're okay with that going out. But there's a, there's a fine line, maybe a gray line there of like, how much is too much just knowing those interactions. It all comes from doing this sort of, I think as the process narrative. If you can draw a picture like that, flowchart, whatever, pick your favorite tool and then, okay, what am I talking to? It also helps you with the data artifacts problem, which is, so what data went in, what data comes out, do we care? Do we store that? Where does it go? Does it go into some knowledge graph? Does it need to be stored for to have a sort of full view of the experiment or whatever we're doing here for your compliance, the industry. Like, maybe you need to record that because it's a.

Alex Milowski [01:08:00]: You were required to record those interactions that like, is essential to your business. People do this, but they don't necessarily do it in a uniform way. And so that's where these, like, I, I was pushing on this using this thing called BP Admin. This was something, it just was a nice notation. Like, it's a visual notation that has. People have considered all these problems and so we don't have to make one up, we can just use it. There's tools for it. And.

Alex Milowski [01:08:27]: But that also means you can also say, okay, I'm going and talking to the SaaS service. What does that service do? And then you can, you know, decide whether you trust them. Yeah.

Demetrios [01:08:35]: What data are you giving it? What are you, what's your end goal? Yeah, how are you trying to use it? Because maybe we already have another service that we're paying for and you're paying extra for that. I know that we had Maria on here probably two years ago now, and she did this with her company Ahoytz, because they did not have any centralized ML platform. And so they just went to all the different teams and said, what are you using? How are you using it? And they recognized that they already had a lot of usage on databricks. So they were like, I think we should probably standardize the databricks so that it is cleaner and just recognizing, from talking to people and drawing it out. I really think there's so much value in that. But what you're saying is even more. It's like you're taking an etching sketch and you're making a beautiful picture of what is happening at each step and what the goal is for each of these steps. And being able to have that, you then are so far ahead of the curve because you understand each person's tasks, what they're doing, how they're doing it, but you also understand how that fits into different workflows, as you were saying, and not looking at it as one big workflow, but seeing all the different workflows and how they interact with each other.

Alex Milowski [01:10:07]: Because then you can ask all kinds of great questions. Is it worth automating this? And do we need to be worried about this data leakage from this service that you're using? Is there a better provider for that? If we were to use some kind of new machine learning tech and generative AI or an LLM or whatever, some model, where does it provide the most value in this process that we have this big workflow and you can ask all these great questions and you can see what you know and you can get samples of your data. Because now you've done that analysis, it might seem old school and hard to do, which is the challenge, right? I have this challenge and oh, this seems like a massive undertaking. Let's start simple. Let's start like a little piece of it or maybe a really big block diagram with like big chunks of what we do on a daily basis. And then there's. They're just black boxes. I'm like, you need to dig into that when the time is right.

Alex Milowski [01:10:58]: But I think those kind of like different ways of, of making the problem smaller and more useful. You can be very strategic in where you start, but you get these sort of. I think this is a, it's probably a bigger challenge for bigger organizations, but I also think that smaller companies, you need to think about this because if you're going to use machine learning tech and in general, whether it's just inference or if it's like I'm building models like that. It's not just those ML engineering teams in a corner that need the workflow systems and the process engineering, it's everything around it as well. And so you can, you can go bottom up, but that's a hard sell. You can go more top down in terms of like, where is this going to provide value in organization? And what do we in our industry, whatever it is, need to be worried about? And you have that again, like a picture is a amazing thing and it's a way to communicate outside of those technical groups that are used to graphs. Right. We.

Demetrios [01:12:00]: So you know what I find the most difficult in these is when you update processes and updating them on all the documentation and then making sure that, okay, this is the newest way that we're doing things, even though maybe it's an experimentation for a few weeks before you really realize if it is a better way of doing it. And most of the time I'll update a process and it's on the fly and then later maybe you don't codify that as well. And so it's updated in one place but not in another. And so that just. Is that the entropy of all of the stuff that's happening is really hard to wrangle. And you get. It's like a workflow debt, I guess you could call it.

Alex Milowski [01:12:50]: Right, that. And that's where if the workflow system is how you get things done and, and it can produce documentation and diagrams and things like that, and that's where you go to look up like, how are we actually doing this? That's one of its sort of core benefits is, is you. There's no out of sync because it's how you do things. And the challenge when it fits into a bigger process that isn't automated. But I think it's a useful starting point to have those discussions. And I think it makes whatever the technical task easier to do when even if it's a documentation piece artifact, that you're drawing this diagram, then you go back to the diagram and say, so explain your change that you want in terms of changing this diagram. And then you've started with the change to your description, your architecture, whatever the process, the workflow, and then you go off and build the thing and then there's a kind of a reconciliation sometimes of, you know, the reality that I found as I went to go build it. It's maybe not quite matching the what we thought at the beginning, but that's part of the sort of doing an agile process, you should be coming back to that thing and revisiting and having a little bit of structure there and diligence helps and.

Alex Milowski [01:14:06]: But you don't have to. It doesn't have to be overwhelming. But if everything was work, though, it would just all be up to date, right? That's an ideal version of this thing. It's not, not a reality, of course.

Demetrios [01:14:19]: It would be so much easier.

+ Read More

Watch More

MLOps vs ML Orchestration

Posted Oct 18, 2023 | Views 574

# ML Orchestration

# Flyte

# UnionAI

Accelerating ML Deployment with Orchestration Systems

Posted Aug 20, 2023 | Views 660

# ML Deployment

# Orchestration Systems

# Etsy

Building ML Blocks with Kubeflow Orchestration with Feature Store

Posted Jul 21, 2021 | Views 1.3K

# Open Source

# Coding Workshop

# Presentation

# Kubeflow

# Feature Store

# publicissapient.com