This paper addresses the problem
of image compression using sparse representations. We propose a
variant of auto-encoder called Stochastic Winner-Take-All Auto-Encoder
(SWTA AE). "Winner-Take-All" means that image patches compete
with one another when computing their sparse representation and "Stochastic"
indicates that a stochastic hyperparameter rules this competition during training.
Unlike auto-encoders, SWTA AE performs variable rate image compression for images
of any size after a single training, which is fundamental for compression.
For comparison, we also propose a variant of Orthogonal Matching Pursuit
(OMP) called Winner-Take-All Orthogonal Matching Pursuit (WTA OMP).
In terms of rate-distortion trade-off, SWTA AE outperforms
auto-encoders but it is worse than WTA OMP. Besides,
SWTA AE can compete with JPEG in terms of rate-distortion.
Stochastic Winner-Take-All Auto-Encoder (SWTA AE) is a type of auto-encoder. An auto-encoder is a neural network that takes an input and provides
a reconstruction of this input. SWTA AE is a fully-convolutional auto-encoder so that it can compress images of various sizes. The
central bottleneck of SWTA AE, denoted
Z in the figure below, will be further processed to deliver the bitstream.
Z contains 64 sparse
feature maps, displayed in orange in the figure below, and one non-sparse feature map, displayed in red. The 64 sparse feature maps are computed via a Winner-Take-All (WTA) competition and a WTA parameter
gives control over the coding cost of
Z. During training, the WTA parameter is stochastically driven so that, after training, SWTA AE is able to compress at any rate.
SWTA AE architecture |
Winner-Take-All Orthogonal Matching Pursuit (WTA OMP)
The WTA competition is adapted to Orthogonal Matching Pursuit (OMP), giving rise to
Winner-Take-All Orthogonal Matching Pursuit (WTA OMP).
Training
The training set of luminance images is built from
the ILSVRC2012 ImageNet dataset. SWTA AE learns its parameters. A dictionary for OMP and a dictionary for WTA OMP are
learned using respectively K-SVD and our proposed dictionary learning algorithm for WTA OMP.
Visualization of the bottleneck feature maps of SWTA AE after training
Non-sparse feature map in the bottleneck
of SWTA AE when SWTA AE is fed with LENA luminance 320x320 |
SWTA AE learns to store in the non-sparse feature map of
Z
a subsampled version of its input image. Apparently, the non-zero coefficients in the sparse feature maps of
Z are
agglomerated around the spatial locations where the most complex textures are found in the input image.
Visualization of the learned dictionaries
Dictionary of 1024 atoms learned using K-SVD |
Dictionary of 1024 atoms learned using the dictionary learning algorithm for WTA OMP |
The two dictionaries look different. The dictionary learned using the dictionary learning
algorithm for WTA OMP contains more diverse atoms around high frequencies. This justifies the utility of this dictionary algorithm.
Image compression experiment
Comparison of rate-distortion curves
We compare the rate-distortion curves of OMP,
WTA OMP, SWTA AE, the non-sparse Auto-Encoder (AE), JPEG and JPEG-2000 on four standard
luminance images. These luminance images have different sizes to see whether SWTA AE can
handle the change of input size. Note that, before drawing a new point at a different rate in
the rate-distortion curve of AE, AE must be re-trained with a different number of feature maps
in its bottleneck.
LENA luminance 320x320 |
BARBARA luminance 480x384 |
MANDRILL luminance 400x400 |
PEPPERS luminance 512x512 |
Visualization of image reconstruction
LENA luminance 320x320 (original) |
JPEG-2000 (PNSR = 41.46 dB, rate = 0.93 bbp) |
SWTA AE (PNSR = 38.90 dB, rate = 0.91 bbp) |
JPEG (PNSR = 39.06 dB, rate = 0.87 bbp) |
BARBARA luminance 384x480 (original) |
JPEG-2000 (PSNR = 34.09 dB, rate = 0.94 bbp) |
SWTA AE (PSNR = 31.57 dB, rate = 0.91 bbp) |
JPEG (PSNR = 31.60 dB, rate = 0.92 bbp) |
For BARBARA, SWTA AE reconstructs well the scarf, the wicker chair and the tablecloth but
struggles to reconstruct BARBARA's left harm and the shadow of the table.