The paper "meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting" from ICML 2017 by researchers at Peking University. This paper presents a technique to speed up machine learning model training by gradient sparsification.

The key idea is sparsifying backpropagation gradients by retaining only the top-k elements in the gradient vector with respect to the layer output, called minimal effort backpropagation (meProp). This updates just 1-4% of the weights in small LSTM and MLP models.

An example with top-2 sparsification. Only the shaded gradient elements will be used to compute gradient wrt weights and wrt to input activation, cutting computation time needed by half

The researchers tested meProp on part-of-speech tagging using LSTM, transition-based dependency parsing, and MNIST recognition with MLP models. Experiments were conducted on both CPU and GPU platforms with architectures containing up to 5 hidden layers.

Although upto 70x speedup on backpropagation time has been observed with meProp, it should be noted the authors only evaluated on tiny LSTM model with 1 hidden layers and MLP model with upto 5 hidden layers. Speedup of meProp on larger and modern models has not been justified. In addition, with large models like ResNet-50, the overhead involved for computing the top-k elements in gradient cannot be omitted and might worsen the performance.

Generated with ChatGPT and edited by human.