What is deep learning good for

Attention Conservation Notice This is just a barely edited copy of an old twitter thread that collects a lot of my thoughts on deciding if, when, and how to use deep learning (which is why it reads so choppy). I think it’s advice that holds up, but given the rapid growth in the number of people who have applied deep learning to practical problems, I suspect a lot of this has graduated to “common knowledge” since I wrote it.

Why the hell even use Deep Learning?

(That was the original question that prompted the thread.)

Setting out terms: Taking “AI” to loosely mean “black box models with lots of parameters that are very hard to interpret and are used predictively.” Most people immediately go to deep learning, but I’m going to include things like GBDT as well; similar issues and concerns.

To be able to use AI “for real,” as in use it to (eventually) save you money and not just make a nice paper for an academic conference, you usually need a problem with as many of the following attributes as possible:

The cost of errors needs to be extremely low. AI will inevitably make mistakes. You want those mistakes to be of the “I had to push the button twice” variety, not the “my car just rammed a school bus at 80mph” kind.
The decision needs to be possible but expensive for a human to double-check; “Is this blurry image a cat or a dog?” and not “is this specific network packet bad or good?” If you can’t check, you have no idea if your AI is working. If it’s cheap, just don’t use AI at all.
There needs to be a lot of the same kind of decision. The best zucchini-vs-volleyball detector in the world doesn’t matter if that decision is only relevant once a week.
There needs to be an actual benefit from making the right decision: if you put the right product in front of your customer and that increases sales, that’s good. If you can tell when a customer is grumpy but don’t have anything to do with that information, that’s bad.
The cumulative benefit from making the right decision has to outweigh the combined benefit of “I’m just using a dumb rule and it does ok-ish” plus “…but I didn’t commit to the lifetime care and feeding of a deep learning model.” Did you try picking the most common class?
You have to not care all that much about how it got to the answer, only aggregate performance statistics. AI models are famously difficult to interpret, and attempts to create post-hoc explanations generally haven’t worked out that well. (This last one is the thing that I’ve struggled with a lot in the security space: a lot of the time if the AI says “this is bad,” a human analyst has to do a completely parallel investigation into the thing, all while faced with the possibility that it’s a false positive.)
The base rates need to be as close to even as possible – the more skewed they are, the harder it is to evaluate performance in the wild. In malware detection (e.g.) where the base rate is close to zero, it’s almost impossible to evaluate false negativesm since you’d have to manually analyze and verify thousands if not millions of samples to find the one that your AI missed. You have to resort to complicated statistical tricks to ballpark that number.
You need to have a steady stream of relevant data to train and evaluate your AI against, and you desperately want that stream to be labelled or your problem to be susceptible to self-supervision. And finally
You need to have a match between where the model has to run and the size of the model: if I need GPT-3 to do malware detection in real time on mobile phone, I’m in for a bad time. There are more deployment points than just the cloud.

So, recap: AI models are expensive to build and maintain; they’re also only ever likely to be good at extremely narrow things, and even then they’ll make mistakes. If, having taken that into account, you still want to apply “AI” to your problem, then good luck and godspeed.

(Of course, the real answer to the original question is “raising VC money to do linear regression, and optionally spooking the ‘AI existential risk’ crowd”, but anyway.)

Written on September 24, 2024