This is really just a self-checklist because I think I have bad-habits when it comes to training neural networks.
Check everything else
A NN is a black box where a lot of different things could be the problem with why it isn’t learning. Because of this, you should be absolutely sure that everything else is working correctly. Plot all your inputs, plot them all before they go into the network. Make sure that colors that were [0, 255] are [0, 1]. Plot images after. Ensure batches look how they’re meant to.
More and moreso, it feels like the most important part of training a neural network is ensuring the rest of your pipeline is as airtight as possible.
Overfit one data piece
Sometimes the problem is that the network doesn’t have the ability to learn what you want it to. To make sure it’s even slightly working, make sure it can overfit on a single piece of data. If so, scale it up to half the data, and then the whole thing.
Double check all types
Check your shapes
Check training
- Did you set it to
.eval()
before testing? - Did you
.zero_grad()
before.backward()
? - Did you pass softmax’d values into something that expects raw logits?
- Did you set
bias=False
for layers when using BatchNorm? - Did you credit Karpathy for the above list? (Yes, thanks!!)