Troubleshooting Deep Learning models
Attention Conservation Notice An old and barely-edited Twitter thread (so please forgive choppy writing and any formatting glitches) that I rescued from the memory hole. Has my own tips and advice for troubleshooting deep learning models when they don’t want to train. As we’ve moved away from hand-written training loops and people making up new and wierd architectures specific to their particular problem, and just throw LLMs/transformers at them, these are probably less relevant.