Model Maxims & Data Dogmas
This article was originally published on Leonard’s Substack.
In the same vein as Rob Pike’s Go Proverbs, engineers relish a good aphorism. I’ve tried to consolidate these frequent murmurs of the ML/AI community into somewhat tangible anchors:
Bad ingredients tend to always lead to a bad dish. Be mindful of what you consume.
Two variables moving together doesn’t automatically make one the puppet master.
Models that don’t adapt to the changing landscape will degrade.
Ensure that you understand the problem and metrics before jumping into optimization.
A simple and clear model or approach is always preferable over something clever, hard-to-understand and maintain. Beware of the Black Box.
If the results are too good to be true, it is highly likely that you made a mistake.
Every external dependency introduced adds a layer of complexity and potential points of failure to your system. Choose them wisely and clean up unused dependencies.
Meaningful incremental progress comes from spearheaded focus on the problem at hand. Make sure you know what to look for before you dive into the matrix.
In ML/AI it’s particularly easy to introduce changes that unexpectedly causes fires and performance degredation. Failures deeply entrenched in the model’s internals are subtle in their manifestations. Such unseen errors can compromise the system and even lead to significant real-world consequences. It’s of outmost importance to engage with purposeful testing across the whole system.
Data’s utility and role in computations should dictate its storage blueprint.
The inherent imprecision that accompanies machine learning and AI methodologies is something we have to embrace. It’s crucial to understand the limitations and to establish an acceptable margin of error tailored to each specific context or objective.
In the famous words of Einstein everything should be made as simple as possible, but not simpler. In software development, we constantly strive against the tide of complexity. This battle intensifies in the realm of machine learning/AI. To navigate these waters effectively, we must double down on our commitment to simplicity and clarity.
While the world of machine learning and AI might appear as a distinct frontier on the surface, it’s crucial to remember that the guiding principles of traditional software development such as, KISS (Keep It Simple, Stupid), DRY (Don’t Repeat Yourself), YAGNI (You Aren’t Gonna Need It), the tenet of Separation of Concerns, the emphasis on loose coupling and so on, still hold significant value here. After all, at its core ML/AI system parallels the challenges and intricacies of any other software system endeavour.
This list is by no means exhaustive. I invite you, to share your pearls of wisdom. Are there any guiding principles you believe deserve a spot on this list, or perhaps some that might not resonate as strongly? I’m eager to refine and expand upon these with your collective wisdom. After all, we’re all part of this journey, learning and iterating as we go. Please share your thoughts @substack or reach out to me directly.
These resources provide comprehensive insights to the broader principles of machine learning. I strongly recommend them for anyone looking to further their understanding:
Rules of Machine Learning: A robust guide offering a set of best practices for ML engineering. Explore it further here.
Reliable Machine Learning: This enlightening read steers you through the process of applying an SRE (Site Reliability Engineering) mindset to machine learning. Authored by Cathy Chen, Kranti Parisa, Niall Richard Murphy, D. Sculley, Todd Underwood, and other guest authors. You can find it here.
“You must unlearn what you have learned.” – Yoda