MLOps Community
+00:00 GMT
September 5, 2021

What I learned after an hour with Jeremy Howard

It isn’t often you get to chat with a living legend

September 5, 2021
Demetrios Brinkmann
Demetrios Brinkmann
Demetrios Brinkmann
Demetrios Brinkmann

It isn’t often you get to chat with a living legend. But that’s exactly what happened this past July when’s Jeremy Howard joined us for an hour-long interview. He’s been one of the most requested guests on the podcast, and we finally got him to say yes.

The whole process on how I was able to make contact with him is another story completely. Short version is big thanks to Sam Charington from TWIML for making an intro.

For anyone who isn’t familiar with Jeremy and his long list of achievements as a data scientist and entrepreneur, here’s a quick rundown:

  1. Management consultant at McKinsey and AT Kearney
  2. Founder of FastMail
  3. President and Chief Scientist at Kaggle
  4. CEO and Founder at medical diagnostics company Enlitic
  5. Co-founder of, a research institute dedicated to making deep learning more accessible

Jeremy also teaches data science at Singularity University. In 2014, the World Economic Forum named him one of their Young Global Leaders. He regularly advises some of the world’s biggest tech VCs on investing in data-driven startups. And that’s not nearly all.

This Summer, while I was in the middle of the Greek islands (humble brag) Vishnu and I sat down and pelted him with questions. How did it all begin? What was the inspiration for, and what’s all this about the power of laziness?

The power of laziness

‘I think it was Bill Gates who said ‘if I have a hard problem to solve, I try and give it to a lazy person.’

Larry Wolf said laziness is one of the three virtues of a great programmer, along with impatience and hubris.

My interest in improving ML workflows comes from my extreme laziness. Like all humans, I’m forgetful, impatient, highly visual, and like to experiment, but I can’t always remember everything I’ve done.

So I want to write as little code as possible, and I want to write it as few times as possible, and I want to spend as little time as possible debugging it or going back and figuring out why I did something six months or a year ago.

I found that my tools make a huge difference, and being cognizant of my human foibles helps a great deal when it comes to setting up and creating tools.

I wrote my first code 40 years ago when I was seven, and I’ve been coding pretty much every day for 30 years. In the beginning, I was mainly using editors and IDEs, and I was always struggling to find this kind of iterative experimental approach to writing where I could figure out the answer gradually.

It was notebooks that made the most significant difference in my life in terms of productivity. I think it was the first tool to really treat me as a human being, a creature that needs to experiment and needs to explore.

I’m a big believer in the scientific process, and the best scientists in history have always been brilliant at keeping journals, fantastic, thoughtful and careful journals. But as computer programmers, we have a culture of throwing away the stuff we experimented with or iterated. And I’m like, no that’s terrible. We’re figuring out all this cool stuff and then throwing it away.

It’s like writing Fermi’s Last Theorem in the margin and ignoring how you got there.

So all these things led me to want to build web flows on top of really visual iterative exploratory things. Notebooks are the best tool we have, so that’s where nbdev comes from, for example.

The idea for

‘I thought, let’s create something like Keras but much more hackable and researcher friendly.’ provides several layers of abstraction that machine learning practitioners can use to build, test, deploy, and maintain ML models.

How it started was driven by a couple of things. The first was when the PyTorch pre-release first came out. It was such an improvement on TensorFlow. And I just said to myself; I want to embrace this.

I literally rewrote an entire course from scratch to use PyTorch as soon as it came out. I was like, okay, this is just so much better — particularly for Part 2, where it’s all about exploring the cutting edge of research and reading implemented papers and stuff.

So I embraced PyTorch, but it still wasn’t enough on its own, so we needed to create something Keras-like to use with PyTorch because nothing existed at that time.

One of the things that bothered me about Keras was it only really operated well at one level of abstraction — you had to use what the Keras API gave you. And I didn’t like that. It was very often hard to use in our own research because it wasn’t really that hackable, and it was hard to show students how to dig deeper. Quite often, we just had to work manually because it just didn’t support what we wanted to do.

So I thought, okay, let’s create something like Keras but much more hackable and researcher friendly, allowing you to make changes to every part of the code but let’s do it in a way that’s based on really rigorous software engineering principles.

Because the other thing I found in a lot of machine learning libraries is that software engineers take as better & butter didn’t exist. So make sure things are really well de-coupled and really well layered, and that’s really where came from.

Obsession and abstraction

How do you apply abstraction in the context of software engineering?

Human civilization and human scientific development have been about building on top of layers of abstraction.

If you think about the world of maths and the world of physics, you have arithmetic operations, and then you have powers on top of that. Then you expand things to handle non-integers, and as you learn more maths, you build on top of what you know—ditto with physics and ditto with computer science. We should endeavour to do the same thing with our APIs.

Over the last few decades, I’ve done a lot of reading about API design and about APIs I admire, for sample API. When it came to building, it was a case of being almost ridiculously obsessive about re-writing it again and again and again. The data blocks API, for example, went through 25 rewrites.

And it drives anybody I work with crazy. 24 times out of 25, wherever we get to a point where I say ‘no that’s not good enough, and it’s kind of like, just keep trying until we get to a point where I can’t think of any way to make it better.

It’s important that you’re able to go into the lower levels of the API and just change a bit, rather than have to replace the entire foundation when you want to change things.

I often think about how AWS built their surface architecture. There are well-defined interface boundaries, and everybody on AWS has to talk to each other through these interface boundaries. As a result, you have to make sure the whole thing works together because everybody has to use that system.

So I make sure all of my mid-tier APIs are built using my low-tier APIs, and all of my high-tier APIs are created using my mid-tier APIs, and so on.

I’m my own customer. Having the layers built on top of each other forces me to provide a layered API which I’m confident developers are going to enjoy working with, because I’m enjoying using it.

Code is creative

‘What we call genius is usually just the outcome of a lot of hard work’

MLOps Community: If you tell a product manager or even an engineering manager that you have to do a re-write, the question often comes back, why didn’t you write it correctly the first time? That’s the kind of creative tension programmers have to live with.

The thing I love about is the fact that so many different types of programmers can use it. But you are a top-tier engineer. How do you balance your own ability to create elegant and sophisticated solutions with the end user’s need for ease of use?

Jeremy Howard: If you admire TS Eliot’s poetry, you learn that he didn’t just write his poems in one sitting. He rewrote them dozens or hundreds of times. This is the thing that people often don’t realize when they see a work of art they admire. You may think, ‘this is a work of genius, but it’s really the result of hundreds of hours of working really hard, or saying ‘this isn’t good enough’ 100 times.

I’ve been coding longer than most people, so I can often figure things out a bit more quickly than people who haven’t been coding as long.

But the difference between a smooth, straightforward interface and a clunky interface is still going to slow me down. The kind of stuff that makes it easy for a beginner to use will make it easy for me to use as well.

I can’t keep the whole thing in my head. There are still parts of the library that I have to spend a lot of time reminding myself about before I start working on them, and then other parts where I’m like, okay, it’s not good enough yet because it’s still too complicated.

So if it takes me a long time to kind of ‘page in’ the concepts and things I need to know, that same stuff will cause other people to find it difficult to get started. Error messages and stack tracing, and debuggability are important for everybody, regardless of expertise level. It’s just less overhead for our brains to have to think about.

Perfection as a barrier

When you’re writing and re-writing an API and trying to get it just right, do you ever look back and discover that you’ve actually made something less efficient?

Yes, all the time. You know there’s always a compromise in abstracting something because it could make it less transparent as to what’s going into it, and that can make it harder to debug. So if you can try to minimize the number of conceptual types of abstraction, that’s the critical thing.

You want to know that in every API, this is what it looks like, and it’s a case of stepping away and just thinking carefully about the design.

Looking at the Data Block API, for instance, and fast.AI version one, it drove my users and me crazy that we had shedloads of classes for image regression, image classification, and image bounding boxes for every combination of input and output. Somebody would then come across a combination we hadn’t supported, and we then had to build this new thing from scratch.

So the Data Block API came from me saying, ‘What are we actually trying to achieve with this?!’ We had thousands of lines of code in V1 data processing. But I looked at it again and realized there are just four steps we always do.

We pulled apart what had become an annoying multiplicative thing and made it possible to say, what’s the input type and the output type and all the independent and dependent variables. That will then remove most of our classes.

Then you have to ask where do I get my data? So you start with some data source like filenames or a network stream. Then you need something that converts that source into independent variables and dependent variables; something that puts those into validation and training. And then something that batches it up to be sent to a model.

Once you split out each of those things, you realize, ‘that’s our API.’

Start with the end point

‘You can’t always iterate and refactor your way to a solution

Sometimes you need to step back and say okay ‘what are we really doing here? Hopefully and API falls out of that process.

Something that’s become increasingly popular in recent years is what’s called documentation-driven programming, in which you write the ‘read me first’ text and pretend the thing is already complete. If you set out your intentions first, you’ll have these moments when you realize ‘sh*t there’s a whole piece missing here. I’ve told the user they have to do this, but that means I’ll have to do this first’.

Another way is to write the index.HTML documentation for your API first. I tend to do that in a notebook. Now, nothing in that notebook works because it’s documenting an API that doesn’t exist. But it’s an excellent way just to go through and fill it in gradually, piece by piece, and see my documentation just naturally appear.

It also forces you to write only what you need. Sometimes junior engineers will tend to think about every possible kind of edge case or every possible extension. They might end up with this super-abstract thing but only ever use one tiny piece of it. I you only write what you actually need at the time, you’ll save a lot of effort.

The universe of tools is expanding

‘We won’t be finished until we get rid of the need to code for the vast majority of people’.

Something we talk about a lot in the community is how the range of tools we have to use is expanding, and in some ways, becoming more powerful. To go from TensorFlow V1 to V2, for example, it took fewer lines of code to create models. Similar to moving from what Pytiorch does to what fast.AI does. Tools like autoML are being used more widely in corporate environments.

At fast.AI, our organizational goal is to make deep learning more accessible.

At the moment, our courses have a pre-requisite of at least a year of coding experience, and our key software product is itself a coding library, and the vast majority of people in the world can’t code. So we are clearly failing at our mission.

We won’t be finished until we get rid of the need to code for the vast majority of people.

When I started using the Internet, it was impossible to use if you couldn’t write your own kind of TCP/IP config files and write a lot of scripts. Nowadays, my 81-year-old mom uses the Internet on her phone every day. So that’s kind of where I want to get to with coding tools.

Anything which helps us get there is great. But AutoML has mainly been bull****. I dislike Google’s approach to it and particularly disliked Mike Dean’s public comments about it, saying effectively that ‘with 1000 times more compute power we can have a thousand times fewer data scientists.’

But all Google’s AutoML tends to do is execute things like neural architecture search. I mean, how many of us need to create our architectures, you know? It’s just marketing BS.

The stuff they’re doing at Hugging Face with AutoML is entirely different — and actually super cool. The difference is that you have somebody there who has extensively studied real-world AutoML stuff and win international competitions with it.

So I’m excited about where that’s going. It’s not just about hyperparameter sweeps or neural architecture search or something. It’s genuinely about helping an end-user with not too much expertise generate lines of code.

Something that bugs me about AutoML, though, is when people throw lots of compute at something that doesn’t need it—so using, for example, AutoML to set your learning rate through some kind of hyperparameter sweep, instead of just using the learning rate finder and spending like 3 seconds on the problem. Or, you know, checking every possible batch size and learning rate combination rather than just using the simple heuristics we know about. Is about minimizing the amount of compute and data you need, so we spend a lot of time coming up with heuristics and rules of thumb and configurations that work just about all the time. Ideally, we’ll reach a point where people don’t need an AutoML.

Generally, I think we should be trying to get rid of code entirely. One interesting solution is CoPilot, which doesn’t get rid of code, but it does write a lot of code for you. That’s another interesting approach. The code it writes is crappy, but for a lot of people who can’t code much at all, it’s probably better than what they could write themselves.

I guess that’s probably a good thing, but I’m on the fence about it. Things like CoPilot and AutoML are both double-edged swords. I haven’t decided yet whether they’re helping us move towards the goal of making machine learning more accessible.

Is it time to think about the environmental impact of ML?

Big iron will always get big attention

Even back when I was getting started, large data sets and large compute was what people got excited about. It isn’t things like fundamental algorithmic breakthroughs or new ideas that get attention; it’s the first person to take that idea and throw shedloads of compute and data at it.

That gets attention because it’s where a layperson can see the obvious benefits. Throwing lots of compute and data at things also gets attention because engineers love working with that stuff. I actually don’t know why — I find it very annoying to deal with this kind of big-iron compute stuff. There’s a lot of things like sys-admin stuff to do, DevOps, etc. But engineers love working on big machines. It’s just the nature of the beast.

I just want to solve the problem

‘Generally, I think we should be trying to get rid of code entirely.’

I’m always on the lookout for other people doing good work, and I always try to highlight it. I don’t make any money out of It’s there because I want to solve the problem, or rather, I want the problem solved. It doesn’t have to be me who does it!

I wish there were a model where we could all focus on addressing problems rather than worrying about winning.

Thank you Jeremy for takin the time to chat with us and passing on your wisdom!

Dive in
What I Learned Building Platforms at Stitch Fix
By Demetrios Brinkmann • Sep 30th, 2022 Views 29
What I Learned Building Platforms at Stitch Fix
By Demetrios Brinkmann • Sep 30th, 2022 Views 29
What Does James Lamb – A Machine Learning Engineer Do?
By Demetrios Brinkmann • Sep 15th, 2022 Views 34
MLOps Review 2022
By Demetrios Brinkmann • Dec 27th, 2022 Views 37
Jupyter Notebooks In Production?
By Demetrios Brinkmann • Jun 4th, 2021 Views 28