An argument in favor of strong scaling for deep neural networks with small datasets
The common advice of scaling the batch size as you scale the number of workers for training your Deep Learning model doesn’t hold when your dataset is not large. Worse than that, it might well be the case your model will diverge when using such updates. Read on for a potentially better, guaranteed to converge alternative.