Paper Reading Notes #03: meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

Paper Reading Notes #03: meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

The paper "meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting" from ICML 2017 by researchers at Peking University. This paper presents a technique to speed up machine learning model training by gradient sparsification.

The key idea is sparsifying backpropagation gradients by retaining only the top-k elements in the gradient vector with respect to the layer output, called minimal effort backpropagation (meProp). This updates just 1-4% of the weights in small LSTM and MLP models.

An example with top-2 sparsification. Only the shaded gradient elements will be used to compute gradient wrt weights and wrt to input activation, cutting computation time needed by half

The researchers tested meProp on part-of-speech tagging using LSTM, transition-based dependency parsing, and MNIST recognition with MLP models. Experiments were conducted on both CPU and GPU platforms with architectures containing up to 5 hidden layers.

Although upto 70x speedup on backpropagation time has been observed with meProp, it should be noted the authors only evaluated on tiny LSTM model with 1 hidden layers and MLP model with upto 5 hidden layers. Speedup of meProp on larger and modern models has not been justified. In addition, with large models like ResNet-50, the overhead involved for computing the top-k elements in gradient cannot be omitted and might worsen the performance.

Generated with ChatGPT and edited by human.

Subscribe to TheXYZLab Blog

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe