This post will be continuously updated from time to time.

**Pruning **is a technique of removing unimportant parameters (weights) of a deep neural network. It aims to achieve several goals:

- Reduction in storage (smaller file size)
- Speed-up during inference (testing)

There are two main types of pruning techniques namely: Structured Pruning and Unstructured Pruning.

## Structured Pruning

Structured pruning removes a structure (building block) of the target neural network such as:

- Neuron for a Fully Connected Layer
- Channel of filter for a Convolution Layer
- Self-attention head for Transformer

Figure 1 shows a fully connected network before and after pruning one neuron.

Removal of a particular structure of a network results in shrinking in the size of the parameter (weight) matrices.

*Weight matrix size = Number of output connections × Number of incoming connections*

For example in Figure 1, the connection to and from the pruned neuron is removed, the resulting weight shrinks from a 3×2 matrix to a 2×2 matrix.

Structured pruning can also be applied to Convolutional Neural Networks (CNN). In Figure 2, some channels of filter of CNN were pruned.

Pruning Criteria:

Okay, we identified which parts of neural networks can be pruned using structured pruning. What is the criteria to decide whether a neuron or channel of CNN is important or unimportant?

Here are some pruning criteria for structured pruning:

- Saliency of parameters: Approximation of 2nd derivative
- Average Percentage of Zeros
- Batch normalization scaling factors
- Magnitude for each channel of filters
- Geometric mean

Benefits of structured pruning:

- Speed-up during inference (testing)

Pruning Strategies:

- One-shot pruning
- Iterative pruning

One-shot pruning: Prunes a fully-trained neural network.

Iterative pruning: Gradually prunes a neural network.

Figure 3 shows the steps for iterative pruning.

- Trains a neural network up to certain level of performance
- Prune the
*p*% of the neurons / channels of the network - Fine-tune or retrain the pruned network for a few epochs
- Repeats prune and retrain/fine-tune

## Unstructured Pruning

Unstructured pruning is also called ** magnitude pruning**. Unstructured pruning converts some of the parameters or weights with

**into zeros.**

*smaller magnitude*Dense: lots of non-zero values

Sparse: lots of zeros

Unstructured pruning converts an original dense network into a sparse network. The size of the parameter matrix (or weight matrix) of the sparse network is the same as the size of parameter matrix of the original network. Sparse network has more zeros in their parameter matrix.

Regularization:

For unstructured pruning, adds a** L1 **or

**term to the loss function which penalizes non-zero parameters. During training, the regularization will push the non-zero parameters to zeros.**

*L2 regularization*During pruning, set a threshold. Prune parameters with magnitude smaller than threshold, and keep parameters with magnitude larger than threshold. The pruned parameters are set to zero and frozen.

Unstructured pruning can be pruned via ** one-shot pruning** and

**.**

*iterative pruning*After pruning, there will be lesser non-zero parameters as shown in Figure 4. Sparse matrix with a lot of zeros demand a smaller storage space than the original dense matrix.

Benefits of unstructured pruning:

- Reduction in storage (smaller file size)

References:

- Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures. https://arxiv.org/abs/1607.03250
- Learning Efficient Convolutional Networks through Network Slimming. https://arxiv.org/abs/1708.06519
- Learning both Weights and Connections for Efficient Neural Networks. https://arxiv.org/abs/1506.02626