MLOps Community
timezone
+00:00 GMT
SIGN IN
  • Home
  • Events
  • Content
  • People
  • Messages
  • Channels
  • Help
Sign In
Sign in or Join the community to continue

DataOps is a Software Engineering Challenge

Posted May 17
# Maersk
# Software Engineering Challenge
# DataOps
Share
SPEAKER
Micha Kunze
Micha Kunze
Micha Kunze
Lead Data Engineer @ Maersk

Micha has a background in physics, in fact, he has a Ph.D. in Biophysics. He's always been interested in crunching data, be that using HPC clusters or his laptop.

Micha loves building and improving systems that provide value through data.

+ Read More

Micha has a background in physics, in fact, he has a Ph.D. in Biophysics. He's always been interested in crunching data, be that using HPC clusters or his laptop.

Micha loves building and improving systems that provide value through data.

+ Read More
SUMMARY

Micha's team delivers millions of forecasts a day for the global operations of one of the largest ocean logistics companies in the world. They need reliable systems while also changing quickly.

In this talk, Micha shares how they achieved this following simple software engineering practices.

+ Read More
TRANSCRIPT

Qoutes

“I think there are many practices from software engineers we can benifit from by applying them to data work.”

“We need to be excellent in operating data products as well. We don’t just develop models. We build the data, build the models, and operate.”

“Our complexity, a lot of it is coming from the business domain. It turns out to be very hard to run a global network of ships and utilize that to a good degree.”

“The amount of technology and data we have to integrate are very very different which makes it a bit harder. It’s not a greenfield. We have to integrate from systems that are really old to systems that are really new.”

“Customer experience is important but also customer behavior changes. The quality of the data because of that is a big big factor.”

“With as fast as possible, I don’t mean real-time. When we build new things and extract new features, that should be as fast as possible. It shouldn’t take six months or so to get new data.”

“We have a high change velocity so we live in continuous deployment. In production, we have roughly 20-25 changes per day. That’s important to us because we have to change quite quickly and react to a lot of things.”

“The model you built today won’t be good next year.”

“You can’t waste time on the bad experience that you set up. You want to do the right thing at scale.”

“If you want to have an industrialized setup, two things a really important, speed and reliability.”

“There are safe ways to test in production. You don’t have to go through different environments and create waste or create process steps that can increase errors.” “We predict how much we have to do, when, and where.”

“Tests make you faster and that’s something I rarely see data teams doing.”

“Like a caveman, you can just make your changes and see that it’s what breaks. Do something about it or rollback or whatever you need to do.”

“Using generated data helps you need to have a better understanding of the data. the connection of the data to the business is what creates value.”

“Observability tools cost you all the time. In that case, if you have an engineering team, I would highly recommend just sticking to open-source. It’s not worth it yet in my opinion.”

“The observability tools haven’t blown me away that I would really need them. I think the key-value store plus the metrics are already enough to get 80% of the value.”

“Run full pipelines, not unit tests.”

“Typically, If I want to test something, there is an investment upfront to write the test.”

“If the pipeline changes over time, of course, I have to maintain those tests as well so it’s extra baggage. I want to make sure when I write these when I set up actual tests, I get a customer, not just schemas.”

“Test an interphase that is rather stable. Spend extra time on the test.”

“It’s better to at least run the full job, not the pipeline.”

“Find one problem, fix one problem, find a new one.”

+ Read More

Watch More

57:43
Posted Jul 14 | Views 131
# Open Source
# Interview
57:54
Posted Aug 18 | Views 439
# Data Modeling
# Data Warehouses
# Semantic Data Model
Posted Mar 07 | Views 322
# Presentation
# ML Engineering
See more