Neural Network Pruning: A Gentle Introduction

SoonChang
3 min readNov 26, 2021

--

This post will be continuously updated from time to time.

Pruning is a technique of removing unimportant parameters (weights) of a deep neural network. It aims to achieve several goals:

  • Reduction in storage (smaller file size)
  • Speed-up during inference (testing)

There are two main types of pruning techniques namely: Structured Pruning and Unstructured Pruning.

Structured Pruning

Structured pruning removes a structure (building block) of the target neural network such as:

  • Neuron for a Fully Connected Layer
  • Channel of filter for a Convolution Layer
  • Self-attention head for Transformer

Figure 1 shows a fully connected network before and after pruning one neuron.

Figure 1: Before and after pruning a neuron of a neural network [1]

Removal of a particular structure of a network results in shrinking in the size of the parameter (weight) matrices.

Weight matrix size = Number of output connections × Number of incoming connections

For example in Figure 1, the connection to and from the pruned neuron is removed, the resulting weight shrinks from a 3×2 matrix to a 2×2 matrix.

Figure 2: Before and after pruning a channel of a filter of convolutional layer [2]

Structured pruning can also be applied to Convolutional Neural Networks (CNN). In Figure 2, some channels of filter of CNN were pruned.

Pruning Criteria:

Okay, we identified which parts of neural networks can be pruned using structured pruning. What is the criteria to decide whether a neuron or channel of CNN is important or unimportant?

Here are some pruning criteria for structured pruning:

Benefits of structured pruning:

  • Speed-up during inference (testing)

Pruning Strategies:

  1. One-shot pruning
  2. Iterative pruning

One-shot pruning: Prunes a fully-trained neural network.

Iterative pruning: Gradually prunes a neural network.

Figure 3: Steps for iterative pruning [1]

Figure 3 shows the steps for iterative pruning.

  1. Trains a neural network up to certain level of performance
  2. Prune the p% of the neurons / channels of the network
  3. Fine-tune or retrain the pruned network for a few epochs
  4. Repeats prune and retrain/fine-tune

Unstructured Pruning

Unstructured pruning is also called magnitude pruning. Unstructured pruning converts some of the parameters or weights with smaller magnitude into zeros.

Dense: lots of non-zero values
Sparse: lots of zeros

Unstructured pruning converts an original dense network into a sparse network. The size of the parameter matrix (or weight matrix) of the sparse network is the same as the size of parameter matrix of the original network. Sparse network has more zeros in their parameter matrix.

Regularization:

For unstructured pruning, adds a L1 or L2 regularization term to the loss function which penalizes non-zero parameters. During training, the regularization will push the non-zero parameters to zeros.

During pruning, set a threshold. Prune parameters with magnitude smaller than threshold, and keep parameters with magnitude larger than threshold. The pruned parameters are set to zero and frozen.

Unstructured pruning can be pruned via one-shot pruning and iterative pruning.

After pruning, there will be lesser non-zero parameters as shown in Figure 4. Sparse matrix with a lot of zeros demand a smaller storage space than the original dense matrix.

Figure 4: Weight distribution before and after parameter pruning [3]

Benefits of unstructured pruning:

  • Reduction in storage (smaller file size)

References:

  1. Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures. https://arxiv.org/abs/1607.03250
  2. Learning Efficient Convolutional Networks through Network Slimming. https://arxiv.org/abs/1708.06519
  3. Learning both Weights and Connections for Efficient Neural Networks. https://arxiv.org/abs/1506.02626

--

--

SoonChang

PhD in AI. Interested in Computer Vision, Deep Learning and Network Pruning