Qoutes
“I think there are many practices from software engineers we can benifit from by applying them to data work.”
“We need to be excellent in operating data products as well. We don’t just develop models. We build the data, build the models, and operate.”
“Our complexity, a lot of it is coming from the business domain. It turns out to be very hard to run a global network of ships and utilize that to a good degree.”
“The amount of technology and data we have to integrate are very very different which makes it a bit harder. It’s not a greenfield. We have to integrate from systems that are really old to systems that are really new.”
“Customer experience is important but also customer behavior changes. The quality of the data because of that is a big big factor.”
“With as fast as possible, I don’t mean real-time. When we build new things and extract new features, that should be as fast as possible. It shouldn’t take six months or so to get new data.”
“We have a high change velocity so we live in continuous deployment. In production, we have roughly 20-25 changes per day. That’s important to us because we have to change quite quickly and react to a lot of things.”
“The model you built today won’t be good next year.”
“You can’t waste time on the bad experience that you set up. You want to do the right thing at scale.”
“If you want to have an industrialized setup, two things a really important, speed and reliability.”
“There are safe ways to test in production. You don’t have to go through different environments and create waste or create process steps that can increase errors.”
“We predict how much we have to do, when, and where.”
“Tests make you faster and that’s something I rarely see data teams doing.”
“Like a caveman, you can just make your changes and see that it’s what breaks. Do something about it or rollback or whatever you need to do.”
“Using generated data helps you need to have a better understanding of the data. the connection of the data to the business is what creates value.”
“Observability tools cost you all the time. In that case, if you have an engineering team, I would highly recommend just sticking to open-source. It’s not worth it yet in my opinion.”
“The observability tools haven’t blown me away that I would really need them. I think the key-value store plus the metrics are already enough to get 80% of the value.”
“Run full pipelines, not unit tests.”
“Typically, If I want to test something, there is an investment upfront to write the test.”
“If the pipeline changes over time, of course, I have to maintain those tests as well so it’s extra baggage. I want to make sure when I write these when I set up actual tests, I get a customer, not just schemas.”
“Test an interphase that is rather stable. Spend extra time on the test.”
“It’s better to at least run the full job, not the pipeline.”
“Find one problem, fix one problem, find a new one.”