Troubleshooting Deep Learning models

Attention Conservation Notice An old and barely-edited Twitter thread (so please forgive choppy writing and any formatting glitches) that I rescued from the memory hole. Has my own tips and advice for troubleshooting deep learning models when they don’t want to train. As we’ve moved away from hand-written training loops and people making up new and wierd architectures specific to their particular problem, and just throw LLMs/transformers at them, these are probably less relevant.

Read More

What is deep learning good for

Attention Conservation Notice This is just a barely edited copy of an old twitter thread that collects a lot of my thoughts on deciding if, when, and how to use deep learning (which is why it reads so choppy). I think it’s advice that holds up, but given the rapid growth in the number of people who have applied deep learning to practical problems, I suspect a lot of this has graduated to “common knowledge” since I wrote it.

Read More

Paper readthrough: Rethinking the Value of Network Pruning

Attention Conservation Notice Notes on a paper that asks (of a particular kind of model pruning) “should start where we plan to end up, and just train the pruned architecture from scratch?” The answer turns out to be that – once you adjust for total number of FLOPs – just starting with the smaller model with random weights generally works fine. Plus other interesting observations about the structure-vs-initialization question.

Read More

Facial recognition roundup

Attention Conservation Notice : Links and short commentary from recent news on facial recognition, mostly so I can track it and my reactions to it later; if you’re on twitter you’ve probably seen them.

Read More

Paper readthrough: Towards Neural Decompilation

Attention Conservation Notice My notes on a paper that builds on previous efforts to apply neural machine translation (NMT) to decompiling binaries back into human-readable source code.
They focus on compiler-specific translation pairs, which allows them to use the compiler they target to a) act as an oracle, b) generate more source-translation pairs as needed, and c) generate error-specific training samples when their decompiler makes a mistake.

Read More