Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures
In this writing, I am going to share about the paper Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures. It is about pruning neural networks to reduce computational and memory usage.
The authors discovered that there are a lot of zero activations in a neural network. Regardless of the inputs given to the networks, there is a significant part of the neural networks that generate zero activations. The authors formulated a pruning criteria called Average Percentage of Zeros (APoZ) based on this observation. These zero activations are the redundancy which can be removed in the neural network.
Average Percentage of Zeros (APoZ)
APoZ for a neuron or channel c of layer i of a neural network measures the percentage of zero activations of a neuron after ReLU mapping.
- M: dimension of output feature map, N: Total number of validation examples
- f(.) = 1 if true and f(.) = 0 if false
- O_(i)_c : a neuron or channel c of layer i of a neural network
If the output/output feature map for a channel/neuron of a layer is zero, then f is zero and vice versa. If each neuron outputs zero or each channel of a filter results in zero output feature map, APoZ should be 0%. If each neuron outputs non-zero value or each channel of a filter results in non-zero output feature map, APoZ should be 100%.
The following figure shows the APoZ distribution for a convolutional neural network.
Higher APoZ means higher redundancy. These redundancy can be removed without harming the overall accuracy.
The figure above illustrates the pruning procedure for Network Trimming:
- A neural network is trained using conventional training process
- APoZ of each neuron is computed by running the trained network on large number of validation examples. The larger the validation set, the estimated APoZ will be more accurate. Then, prune neurons with high APoZ. How to prune a neuron ? Remove the connection to and from the neuron as shown in following figure. It will result in shrinking in size of the weight or parameter matrix, which saves memory and computation.
- The pruning of these connections will result in some performance or accuracy drop. The weights of remaining connections are kept (initialized).
- The pruned network is retrained
It is discovered that trimming a few layers with high APoZ and gradually trims their neighbors can result in a final pruned network with comparable performance as the original network.
How to decide the threshold for APoZ for pruning ? The authors found that pruning APoZ larger than 1 standard deviation from the mean APoZ of a layer is good.
Experimental Results
The following table listed the results of iteratively pruning a LeNet (CNN).
Initial accuracy: The accuracy after pruning (without retraining)
Final accuracy: The accuracy of pruned network after retraining
As more neurons are pruned (down the table), the compression rate increases. This is because redundancy is reduced as iterative pruning proceeds. The initial accuracy drops as pruning proceeds, but the pruned network at each stage of iterative pruning can be retrained to a final accuracy comparable to the accuracy of original model.
Similar behavior can be obtained using VGG-16 and different dataset such as ImageNet dataset.
Conclusion:
Network Trimming can achieve:
- 2~3 × less parameters
- Reduction in computational cost
- While remaining comparable accuracy / performance
References:
- Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures