Neural Network Pruning: A Gentle Introduction

3 min readNov 26, 2021

This post will be continuously updated from time to time.

Pruning is a technique of removing unimportant parameters (weights) of a deep neural network. It aims to achieve several goals:

Reduction in storage (smaller file size)
Speed-up during inference (testing)

There are two main types of pruning techniques namely: Structured Pruning and Unstructured Pruning.

Structured Pruning

Structured pruning removes a structure (building block) of the target neural network such as:

Neuron for a Fully Connected Layer
Channel of filter for a Convolution Layer
Self-attention head for Transformer

Figure 1 shows a fully connected network before and after pruning one neuron.

Figure 1: Before and after pruning a neuron of a neural network [1]

Removal of a particular structure of a network results in shrinking in the size of the parameter (weight) matrices.

Weight matrix size = Number of output connections × Number of incoming connections

For example in Figure 1, the connection to and from the pruned neuron is removed, the resulting weight shrinks from a 3×2 matrix to a 2×2 matrix.

Figure 2: Before and after pruning a channel of a filter of convolutional layer [2]

Structured pruning can also be applied to Convolutional Neural Networks (CNN). In Figure 2, some channels of filter of CNN were pruned.

Pruning Criteria:

Okay, we identified which parts of neural networks can be pruned using structured pruning. What is the criteria to decide whether a neuron or channel of CNN is important or unimportant?

Here are some pruning criteria for structured pruning:

Saliency of parameters: Approximation of 2nd derivative
Average Percentage of Zeros
Batch normalization scaling factors
Magnitude for each channel of filters
Geometric mean

Benefits of structured pruning:

Speed-up during inference (testing)

Pruning Strategies:

One-shot pruning
Iterative pruning

One-shot pruning: Prunes a fully-trained neural network.

Iterative pruning: Gradually prunes a neural network.

Figure 3: Steps for iterative pruning [1]

Figure 3 shows the steps for iterative pruning.

Trains a neural network up to certain level of performance
Prune the p% of the neurons / channels of the network
Fine-tune or retrain the pruned network for a few epochs
Repeats prune and retrain/fine-tune

Unstructured Pruning

Unstructured pruning is also called magnitude pruning. Unstructured pruning converts some of the parameters or weights with smaller magnitude into zeros.

Dense: lots of non-zero values
Sparse: lots of zeros

Unstructured pruning converts an original dense network into a sparse network. The size of the parameter matrix (or weight matrix) of the sparse network is the same as the size of parameter matrix of the original network. Sparse network has more zeros in their parameter matrix.

Regularization:

For unstructured pruning, adds a L1 or L2 regularization term to the loss function which penalizes non-zero parameters. During training, the regularization will push the non-zero parameters to zeros.

During pruning, set a threshold. Prune parameters with magnitude smaller than threshold, and keep parameters with magnitude larger than threshold. The pruned parameters are set to zero and frozen.

Unstructured pruning can be pruned via one-shot pruning and iterative pruning.

After pruning, there will be lesser non-zero parameters as shown in Figure 4. Sparse matrix with a lot of zeros demand a smaller storage space than the original dense matrix.

Figure 4: Weight distribution before and after parameter pruning [3]

Benefits of unstructured pruning:

Reduction in storage (smaller file size)

References:

Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures. https://arxiv.org/abs/1607.03250
Learning Efficient Convolutional Networks through Network Slimming. https://arxiv.org/abs/1708.06519
Learning both Weights and Connections for Efficient Neural Networks. https://arxiv.org/abs/1506.02626

Neural Network Pruning: A Gentle Introduction

Structured Pruning

Unstructured Pruning

Written by SoonChang

Responses (1)