MLOps Community
+00:00 GMT
Sign in or Join the community to continue

The Role of Resource Management in MLOps

Posted May 07, 2022 | Views 1.1K
# Run:AI Atlas
# ML Inference
# Resource Management
Share
speakers
avatar
Ronen Dar
CTO @ Run:AI

Run:ai co-founder and CTO Ronen Dar was previously a research scientist at Bell Labs and has worked at Apple and in Intel in multiple R&D roles. As CTO, Ronen manages research and product roadmap for Run:ai, a startup he co-founded in 2018. Ronen is the co-author of many patents in the fields of storage, coding, and compression. Ronen received his B.S., M.S. and Ph.D. degrees from Tel Aviv University.

+ Read More
avatar
Gijsbert Janssen van Doorn
Director Technical Product Marketing @ Run:ai

Gijsbert Janssen van Doorn is Director of Technical Product Marketing at Run:ai. He is a passionate advocate for technology that will shape the future of how organizations run AI. Gijsbert comes from a technical engineering background, with six years in multiple roles at Zerto, a Cloud Data Management and Protection vendor.

+ Read More
SUMMARY

You know the old iceberg analogy, where the larger portion is hidden under the surface? Well, most of us in MLOps tend to focus on the visible, the models we need to deploy and run in production. But if we ignore resource management as our AI/ML initiatives grow, we’ll start to take on water, in the form of researchers fighting for resources, time-consuming manual workload rescheduling, and spiraling costs associated with ML inference.

In this talk, the experts at Run:AI show what role resource management has in MLOps, what to strive for, and how to get buy-in from IT.

+ Read More
TRANSCRIPT

Quotes

Ronen Dar

"We started when we realized that there is a gap in the AI space. A gap around the Stack, the technology stack, and the software stack in the four AI workloads."

"We saw that Compute, on one side, is really really important for Machine Learning, for deep learning. Compute is strategic as you have more powerful Compute, you'll be able to run more workloads, train more models, do better AI, solve bigger problems, write more difficult problems, and essentially process more data."

"The ability to not being constrained by static quotas but by static machines, once you have the ability to scale very efficiently and very efficient to replicate your machine, you become better as a data scientist."

"Deploying models is different than deploying applications, deploying microservices. The requirements are different. The needs are different."

"As an MLOps Engineer, I would like to deploy multiple models. Even as a team or organization. typically, it's about deploying a lot of models in production so we're hitting customers having this problem of needing to deploy."

"It was really really important for us to stay open and to stay open in the sense that we can integrate with any tool that runs jobs, workloads, notebooks on Kubernetes."

"I've been a researcher. You don't want to have infrastructure assets you want to run your workload. Do the data science stuff that you do, do your machine learning and you want to do it quickly and a lot of it. So we're really excited to bring and help the data science world, get those stuff and iterate faster, do stuff faster than better."

"I think that our system brings values even just for an individual researcher and then when you have a team of data scientists then sharing resources that another benefit that we bring."

"We're bringing that management of resources. We're putting a lot of emphasis on views because what we see technology-wise, there are a lot of missing pieces in terms of just the software stack that sits on top of the GPUs. There are a lot of missing tools today in that sense."

Gijsbert Janssen van Doorn

“What I noticed is whenever people talk about MLOps specifically, it always comes down to a lot of data verification. Data of course is very important but the resources or the infrastructure where all of that above actually is running on is hidden somewhere, it’s not top of mind. It needs to be there. That’s a given. The fact that it’s at the bottom of the iceberg means that they offer stability. It’s the foundation of where everything runs upon. So if you take out that foundation, the iceberg tips over. The MLOps iceberg will go down the drain.”

“Having proper resource management of that important infrastructure is so important to everything that sits on top of that.”

“Compute makes things faster. Compute shouldn’t hold you back. It shouldn’t be a bottleneck. It needs to be instant.”

“The better and the more resources you have, the more impact you can make. The more work you can do, the more of a difference you can make with the work you do.”

“Your machine is your machine and that’s why we’re trying to centralize infrastructure and get more infrastructure but again then the resource management comes around the corner and makes it so important to be on top of that infrastructure.”

“If you take it back and look at it from an infrastructure perspective, the way that the infrastructure is being used by Machine Learning, by deep learning models, all the way from the development phase to the production phase is so different than what regular idea people are used to. It requires a new skill set and a new practice.”

“MLOps was basically put out there to help productionalize and operationalize models. That’s why we’re so passionate about the resource management of it.”

+ Read More

Watch More

What is the role of Machine Learning Engineers in the time of GPT4 and BARD?
Posted Apr 27, 2023 | Views 1.4K
# GPT4
# BARD
# API
# Digits Financial, Inc.
# Rungalileo.io
# Snorkel.ai
# Wandb.ai
# Tecton.ai
# Petuum.com
# mckinsey.com/quantumblack
# Wallaroo.ai
# Union.ai
# Redis.com
# Alphasignal.ai
# Bigbraindaily.com
# Turningpost.com