Open Access

Sparse signal subspace decomposition based on adaptive over-complete dictionary

EURASIP Journal on Image and Video Processing20172017:50

Received: 13 September 2016

Accepted: 17 July 2017

Published: 28 July 2017


This paper proposes a subspace decomposition method based on an over-complete dictionary in sparse representation, called “sparse signal subspace decomposition” (or 3SD) method. This method makes use of a novel criterion based on the occurrence frequency of atoms of the dictionary over the data set. This criterion, well adapted to subspace decomposition over a dependent basis set, adequately reflects the intrinsic characteristic of regularity of the signal. The 3SD method combines variance, sparsity, and component frequency criteria into a unified framework. It takes benefits from using an over-complete dictionary which preserves details and from subspace decomposition which rejects strong noise. The 3SD method is very simple with a linear retrieval operation. It does not require any prior knowledge on distributions or parameters. When applied to image denoising, it demonstrates high performances both at preserving fine details and suppressing strong noise.


Subspace decomposition Sparse representation Frequency of components PCA K-SVD Image denoising

1 Introduction

Signal subspace methods (SSMs) are efficient techniques to reduce the dimensionality of data and to filter out noise [1]. The fundamental idea under SSM is to project the data on a basis made of two subspaces, one mostly containing the signal and the other the noise. The two subspaces are separated by a thresholding criterion associated with some measures of information.

The two most popular methods of signal subspace decomposition are wavelet shrinkage [2] and principal component analysis (PCA) [3]. Both techniques have proved to be quite efficient. However, wavelet decomposition depending on signal statistics is not equally adapted to different data and requires some knowledge on prior distributions or parameters of signals to efficiently choose the thresholds for shrinkage. A significant advantage of the PCA is its adaptability to data. The separation criterion is based on energy which may be seen as a limitation in some cases as illustrated in the next section.

In recent years, sparse coding has attracted significant interest in the field of signal denoising [4]. A sparse representation is a decomposition of a signal on a very small set of components of an over-complete basis (called dictionary) which is adapted to the processed data. A difficult aspect for signal subspace decomposition based on such a sparse representation is to define the most appropriate criterion to identify the principal components (called atoms) from the learned dictionary to build the principal subspace. The non-orthogonal property of the dictionary does not allow to use the energy criterion for this purpose, as done with PCA.

To solve this problem, we introduce a new criterion to measure the importance of atoms and propose a SSM under the criterion of the occurrence frequency of atoms. We thus make benefit both from the richness of over-complete dictionaries which preserves details of information and from signal subspace decomposition which rejects strong noise.

The remainder of this paper is organized as follows: Section 2 presents two related works to signal decomposition. Section 3 introduces the proposed sparse signal subspace decomposition based on the adaptive over-complete dictionary. Some experimental results and analysis are presented in Section 4. Finally, we draw the conclusion in Section 5.

2 Review of PCA and sparse coding methods

We start with a brief description of two well-established approaches to signal decomposition that are relevant and related to the approach proposed in the next section.

2.1 PCA-based subspace decomposition

The basic tool of SSM is principal component analysis (PCA). PCA makes use of an orthonormal basis to capture on a small set of vectors (the signal subspace) as much energy as possible from the observed data. The other basis vectors are expected to contain noise only, and the signal projection on these vectors is rejected.

Consider a data set \(\left \{\mathbf {x}_{m} \in \mathbb {R}^{N \times 1}\right \}_{m=1}^{M}\) grouped in a matrix form X of size N×M: \(\mathbf {X} = \{\mathbf {x}_{m}\}_{m=1}^{M}\). The PCA is based on singular value decomposition (SVD) with singular values σ i in descending order obtained from:
$$ \mathbf{X} = \mathbf{U}\mathbf{A} = \mathbf{U}\mathbf{\Sigma} \mathbf{V}^{T} $$

where U and V are unitary matrices of sizes N×N and M×M, respectively (U T U=I N ,V T V=I M ), and \(\mathbf {\Sigma }= \left [\begin {array}{cc} \text {diag}\left [\sigma _{1},\cdots,\sigma _{r}\right ],\mathbf {0}\\ \mathbf {0} \end {array}\right ]\) of size N×M with σ 1σ 2σ r >0, \( \{\sigma _{i}\}_{i=1}^{r} \) are positive real known as the singular values of X with rank r (rN).

Equation (1) can be rewritten in a vector form as:
$$ {} \begin{aligned} &\left[\mathbf{x}_{1} \quad \mathbf{x}_{2} \cdots \mathbf{x}_{m} \cdots \mathbf{x}_{M}\right]\\ &\quad= \left[\mathbf{u}_{1} \quad \mathbf{u}_{2} \cdots \mathbf{u}_{n} \cdots \mathbf{u}_{N} \right]. \left[\boldsymbol{\alpha}_{1} \quad \boldsymbol{\alpha}_{2} \cdots \boldsymbol{\alpha}_{m} \cdots \boldsymbol{\alpha}_{M} \right] \end{aligned} $$

where \(\mathbf {U}=\left \{\mathbf {u}_{n} \in \mathbb {R}^{N \times 1}\right \}_{n=1}^{N}\) and \(\mathbf {A}=\left \{{\alpha }_{m} \in \mathbb {R}^{N \times 1}\right \}_{m=1}^{M}\). Equation (2) means that the data set \(\{\mathbf {x}_{m}\}_{m=1}^{M}\) is expressed on the orthonormal basis \(\{\mathbf {u}_{n}\}_{n=1}^{N}\) as \(\{\alpha _{m}\}_{m=1}^{M}\).

In the SVD decomposition given in Eq. (1), the standard deviation σ i is used as the measurement for identifying the meaningful basis vector u i . PCA takes the first P(P<r) components \(\{\mathbf {u}_{n} \}_{n=1}^{P}\) to span the signal subspace, and the remainders \(\{\mathbf {u}_{n}\}_{n=P+1}^{r}\) are considered in a noise subspace orthogonal to the signal subspace. Therefore, projection on the signal subspace will hopefully filter out noise and reveal hidden structures. The reconstructed signal \(\hat {\mathbf {S}}_{\text {PCA}}\) of size N×M is obtained by projecting in the signal subspace as:
$$ {}\begin{aligned} \hat{\mathbf{S}}_{\text{PCA}}=\left[\mathbf{u}_{1} \cdots \mathbf{u}_{P} \quad \mathbf{0}_{P+1} \cdots \mathbf{0}_{N} \right]. \left[\boldsymbol{\alpha}_{1} \quad \boldsymbol{\alpha}_{2} \cdots \boldsymbol{\alpha}_{m} \cdots \boldsymbol{\alpha}_{M}\right] \end{aligned} $$

The underlying assumption is that information in the data set is almost completely contained in a small linear subspace of the overall space of possible data vectors, whereas additive noise is typically distributed through the larger space isotropically. PCA, using the standard deviation as a criterion, implies that the components of the signal of interest in the data set have a maximum variance and the other components are mainly due to the noise. However, in many practical cases, some components with low variances might actually be important because they carry information relative to the signal details. On the contrary, when dealing with noise with non-Gaussian statistics, it may happen that some noise components may actually have higher variances. At last, note that it is often difficult to provide a physical meaning to the orthonormal basis \(\{\mathbf {u}_{i} \}_{i=1}^{r}\) of the SVD decomposition (Eq. (2)) although they have a very clear definition in the mathematical sense as orthogonal, independent, and normal. It is therefore difficult to impose known constraints on the signal features when they exist after the principal component decomposition.

2.2 Sparse decomposition

Recent years have shown a growing interest in research on the sparse decomposition of M observations \(\left \{\mathbf {x}_{m} \in \mathbb {R}^{N}\right \}_{m=1}^{M}\) based on a dictionary \(\mathbf {D}=\{\mathbf {d}_{k}\}_{k=1}^{K} \in \mathbb {R}^{N \times K}\). When K>N, the dictionary is said to be over-complete. \(\mathbf {d}_{k} \in \mathbb {R}^{N}\) is a basis vector, also called an atom since it is not necessarily independent. By learning from data set \(\{\mathbf {x}_{m} \}_{m=1}^{M}\), the sparse decomposition is the solution of Eq. (4) [4]:
$$ {}\begin{aligned} \{\mathbf{D},\boldsymbol{\alpha}_{m}\}&=\operatorname*{arg min}\limits_{\mathbf{D},{\boldsymbol{\alpha}_{m}}} \parallel {\boldsymbol{\alpha}_{m}} \parallel_{0}\\ &\quad\ + \parallel \mathbf{D}\boldsymbol{\alpha}_{m} - \mathbf{x}_{m}\parallel^{2}_{2} \leq \varepsilon, \quad 1 \leq m \leq M \end{aligned} $$
where \(\boldsymbol {\alpha }_{m}=\left [\alpha _{m}(1) \; \alpha _{m}(2) \; \dots \; \alpha _{m}({K}) \right ]^{T} \in \mathbb {R}^{K \times 1}\) is the sparse code of the observation x m . The allowed error tolerance ε can be chosen according to the standard deviation of the noise. An estimate of the underlying signal \(\{\mathbf {s}_{m}\}_{m=1}^{M}\) embedded in the observed data set \(\{\mathbf {x}_{m} \}_{m=1}^{M}\) would be:
$$ {}\begin{aligned} &\left[\hat{\mathbf{s}}_{1} \quad \hat{\mathbf{s}}_{2} \cdots \hat{\mathbf{s}}_{m} \cdots \hat{\mathbf{s}_{M}} \right]\\ &\quad= \left[ \mathbf{d}_{1} \quad \mathbf{d}_{2} \cdots \mathbf{d}_{k} \cdots \mathbf{d}_{K} \right]. \left[\boldsymbol{\alpha}_{1} \quad \boldsymbol{\alpha}_{2}\cdots \boldsymbol{\alpha}_{m} \cdots \boldsymbol{\alpha}_{M} \right] \\ &\qquad\text{or equivalently} \quad \hat{\mathbf{S}} = \mathbf{D} \mathbf{A} \end{aligned} $$

where the matrix A of size K×M is composed of M sparse column vectors α m .

The first term on the right side of Eq. (4) is a sparsity-inducing regularization that constrains the solution with the fewest number of nonzero coefficients in each of the sparse code vectors α m (1≤mM). The underlying assumption is that a meaningful signal could be represented by combining few atoms. This learned dictionary adapted to sparse signal descriptions has proved to be more effective in signal reconstruction and classification tasks than the PCA method, which is demonstrated in the next section. The second term in Eq. (4) is the residual of the reconstruction, based on the mean-square reconstruction error estimate in the same way as in the PCA method.

On the other hand, we note that the dictionary D, a basis in sparse decomposition, is produced by learning noisy data set \(\{\mathbf {x}_{m}\}_{m=1}^{M}\), so the basis vectors \(\{\mathbf {d}_{k}\}_{k=1}^{K}\) should be decomposed into a principal subspace and a residual subspace. However, it is impossible to exploit an energy-constrained subspace since \(\{\mathbf {d}_{k}\}_{k=1}^{K}\) are not necessarily orthogonal or independent.

3 The proposed sparse subspace decomposition

In this section, we introduce a novel criterion to the subspace decomposition over a learned dictionary and a corresponding index of significance of the atoms. Then we propose a signal sparse subspace decomposition (3SD) method under this new criterion.

3.1 Weight vectors of learned atoms

At first, we intend to find out the weight of the atoms. In the sparse representation given in (5), coefficient matrix A is composed by M sparse column vectors α m , each α m representing the weight of the observation x m , a local parameter for the m-th observation. Let us consider the row vectors \(\{\boldsymbol {\upbeta }_{k}\}_{k=1}^{K}\) of coefficient matrix A :
$$ \begin{aligned} \mathbf{A}& = \left[\boldsymbol{\alpha}_{1} \; \boldsymbol{\alpha}_{2} \; \cdots \; \boldsymbol{\alpha}_{M} \right]\\& = \left[\begin{array}{ll} \alpha_{1}(1) \quad \alpha_{2}(1) \quad \cdots \quad \alpha_{M}(1) \\ \alpha_{1}(2) \quad \alpha_{2}(2) \quad \cdots \quad \alpha_{M}(2)\\\vdots \quad \quad \quad \vdots \quad \quad \ddots \quad \quad \vdots \\ \alpha_{1}(K) \quad \alpha_{2}(K) \quad \cdots \quad \alpha_{M}(K) \end{array}\right] = \left[\begin{array}{ll} \boldsymbol{\upbeta}_{1} \\ \boldsymbol{\upbeta}_{2} \\ \vdots \\ \boldsymbol{\upbeta}_{K} \end{array}\right] \\&\text{where} \quad \boldsymbol{\upbeta}_{k}=\left[ \alpha_{1}(k) \; \alpha_{2}(k) \; \dots \; \alpha_{M}(k) \right] \in \mathbb{R}^{1 \times M} \end{aligned} $$
Note that the row vector β k is not necessarily sparse. Then Eq. (5) can be rewritten as:
$$ \begin{aligned} \hat{\mathbf{S}}& = \mathbf{D}\mathbf{A} \\ & = \left[\mathbf{d}_{1}\cdots\mathbf{d}_{k}\cdots\mathbf{d}_{K}\right]. \left[\boldsymbol{\upbeta}_{1}^{T}\cdots \boldsymbol{\upbeta}_{k}^{T} \cdots \boldsymbol{\upbeta}_{K}^{T} \right]^{T} \end{aligned} $$
Equation (7) means that the row vector β k is the weight of the atom d k , which is a global parameter over the data set X. Denoting β k 0 the 0 zero pseudo-norm of β k . β k 0 is the number of occurrences of atom d k over the data set \(\{\mathbf {x}_{m} \}_{m=1}^{M}\). We call it the frequency of the atom d k denoted by f k :
$$ \begin{aligned} f_{k}\triangleq \text{Frequency}(\mathbf{d}_{k} | \mathbf{X}) = \| \boldsymbol{\upbeta}_{k} \|_{0} \end{aligned} $$

In the sparse decomposition, basis vectors \(\{\mathbf {d}_{k} \}_{k=1}^{K}\) are prototypes of signal segments. That allows us to take them as a signal patterns. Thereupon, some important features of this signal pattern could be considered as a criterion to identify significant atoms. It is demonstrated [5] that f k is a good description of the signal texture. Intuitively, a signal pattern must occur in meaningful signals with higher frequency even with a lower energy. On the contrary, a noise pattern would hardly be reproduced in observed data even with a higher energy.

It is reasonable to take this frequency f k as a relevance criterion to decompose the over-complete dictionary into a principal signal subspace and a remained noise subspace. Here, we use the word “subspace,” but in fact, these two subspaces are not necessarily independent.

3.2 Subspace decomposition based on over-complete dictionary

Taking vectors \(\{\boldsymbol {\upbeta }_{k}\}_{k=1}^{K}\), we calculate their 0-norms \(\{\|\boldsymbol {\upbeta }_{k} \|_{0}\}_{k=1}^{K}\) and rank them in descending order as follows. The index k of vectors \(\{\boldsymbol {\upbeta }_{k}\}_{k=1}^{K}\) are belonging to the set C={1,2,,k,,K}. A one-to-one index mapping function π is defined as:
$$ {}\begin{aligned} & \quad \quad \quad \quad \pi(\mathbf{C} \rightarrow \mathbf{C}): k=\pi(\tilde{k}), \quad \quad k,\tilde{k} \in \mathbf{C} \\& s.t. \ \ \| \boldsymbol{\upbeta}_{\pi(1)} \|_{0} \!\geq\! \| \boldsymbol{\upbeta}_{\pi(2)} \|_{0} \geq \cdots \geq \!\| \boldsymbol{\upbeta}_{\pi(\tilde{k})} \|_{0} \geq \cdots \geq \!\| \boldsymbol{\upbeta}_{\pi(\!K\!)} \|_{0} \end{aligned} $$
By the permutation π of the row index k of matrix \(\mathbf {A} = \left [\boldsymbol {\upbeta }_{1}^{T} \cdots \boldsymbol {\upbeta }_{k}^{T} \cdots \boldsymbol {\upbeta }_{K}^{T} \right ]^{T}\), the reordered coefficient matrix \(\tilde {\mathbf {A}}\) becomes
$$ \begin{aligned} \tilde{\mathbf{A}}&= \left[\boldsymbol{\upbeta}_{\pi(1)}^{T} \quad \boldsymbol{\upbeta}_{\pi(2)}^{T} \cdots \boldsymbol{\upbeta}_{\pi(k)}^{T} \cdots \boldsymbol{\upbeta}_{\pi(K)}^{T} \right]^{T} \end{aligned} $$
With corresponding reordered dictionary \(\tilde {\mathbf {D}} = \{ \mathbf {d}_{\pi (k)} \}_{k=1}^{K}\), Eq. (7) can be written as:
$$ {}\begin{aligned} \hat{\mathbf{S}} &= \tilde{\mathbf{D}} \tilde{\mathbf{A}} \\&= \left[ \mathbf{d}_{\pi(1)} \cdots \mathbf{d}_{\pi(k)} \cdots \mathbf{d}_{\pi(K)} \right]. \left[\boldsymbol{\upbeta}_{\pi(1)}^{T} \cdots \boldsymbol{\upbeta}_{\pi(k)}^{T} \cdots \boldsymbol{\upbeta}_{\pi(K)}^{T} \right]^{T} \end{aligned} $$
Then, the span of the first P atoms can be taken as a principal subspace \(\mathbf {D}_{P}^{(\mathbf {S})}\), and the remaining atoms span a noise subspace \(\mathbf {D}_{K-P}^{(N)}\) as:
$$ \begin{aligned} \mathbf{D}_{P}^{(\mathbf{S})} = \text{span}\{\mathbf{d}_{\pi{(1)}},\mathbf{d}_{\pi{(2)}},\cdots,\mathbf{d}_{\pi{(P)}} \} \qquad \\ \mathbf{D}_{K-P}^{(N)} = \text{span}\{\mathbf{d}_{\pi{(P+1)}},\mathbf{d}_{\pi{(P+2)}},\cdots,\mathbf{d}_{\pi{(K)}} \} \end{aligned} $$
An estimate \(\hat {\mathbf {S}}_{P}\) of the underlying signal S embedded in the observed data set X can be obtained on the principal subspace \(\mathbf {D}_{P}^{(\mathbf {S})}\) simply by linear combination:
$$ {}\begin{aligned} \hat{\mathbf{S}}_{P} &= \mathbf{D}_{P}^{(\mathbf{S})}. \mathbf{A}_{P}^{(\mathbf{S})} \\ &= \left[\mathbf{d}_{\pi(1)} \cdots \mathbf{d}_{\pi(k)} \cdots \mathbf{d}_{\pi(P)} \right]. \left[\boldsymbol{\upbeta}_{\pi{(1)}}^{T} \cdots \boldsymbol{\upbeta}_{\pi{(k)}}^{T} \cdots \boldsymbol{\upbeta}_{\pi{(P)}}^{T} \right]^{T} \end{aligned} $$

3.3 Threshold of atom’s frequency

Determining the number P of atoms spanning the signal subspace \(\mathbf {D}_{P}^{(\mathbf {S})}\) is always a hard topic especially for wide-band signals. Here, P is the threshold of atom’s frequency f k to distinguish a signal subspace and a noise subspace. One of the advantages of 3SD is that this threshold P can be easily chosen without any prior parameter.

For a noiseless signal even with some weak details, such as the image example in Fig. 1a, the atoms’ frequencies \(f_{\pi (k)}^{\text {image}}\)s shown in Fig. 1d (in black line) are almost always high except the zero value. For a signal with strong noise, such as the example in Fig. 1b, the atoms’ frequencies \(f_{\pi (k)}^{\text {noise}}\)s shown in Fig. 1d (in red line) are almost always equal to 1 without zero and very few with a value 2 or 3. It is easy to set a threshold P of f k (dotted line in the Fig. 1d) to separate the signal’s atoms from the noise’s atoms. By contrast, using the values of atom’s energies β k 2s for the two images shown in Fig. 1c, it is rather a puzzle to identify principal bases.
Fig. 1

Sparse signal subspaces with criterion of atom’s frequency. a Image with details. b White noise. c 2-norm of β k . d 0-norm of β k

For a noisy signal, such as an image example in Fig. 2a, its adaptive over-complete dictionary (Fig. 2b) consists of atoms of principal signal patterns, strong noise patterns, and noisy signal patterns. Principal signal atoms should have higher frequencies, strong noise atoms lower frequencies and noisy signal atoms moderate frequencies. Intuitively, the red line (Fig. 2c) should be a suitable threshold P of the frequencies f k s. In practical implementation, the value of P could be simply decided relying on the histogram of f k . As shown in Fig. 2d, one can set the value of f k associated with the maximum point of its histogram to P as follows:
$$ \begin{aligned} P =\operatorname*{arg \ max \ Hist}\limits_{k}{(\|\boldsymbol{\upbeta}_{k} \|_{0})} \end{aligned} $$
Fig. 2

The threshold P of the frequencies f k s. a Noisy image. b Over-complete dictionary D. c Frequency of d k . d Histogram of d k ’s frequency

In fact, the performances in signal analyses by 3SD method are not sensitive to the threshold P, owed to the dependence of the atoms. To illustrate this point, we take three images, Barbara, Lena, and Boat. Their histograms of f k are shown in Fig. 3a with the maximum points in dotted lines, 121, 97, and 92, respectively. Figure 3b reports the peak signal-to-noise ratio (PSNR) of the retrieved images \(\hat {\mathbf {S}}_{P}\) on the signal principal subspace \(\mathbf {D}_{P}^{(\mathbf {S})}\) with respect to P. We can see that the PSNRs of the results remain the same in a large range around the maximum points (in dotted lines). Consequently, taking the value of f k associated to the maximum point of its histogram as the threshold P is a reasonable solution.
Fig. 3

The insensitivity of the threshold P. a Histograms of f k . b PSNR of \(\hat {\mathbf {S}}_{P}\) with respect to P

4 Results and discussion

4.1 Signal decomposition methods

Taking a part of the noisy Barbara image (Fig. 4a), we show an example of the sparse signal subspace decomposition (3SD) and the corresponding retrieved image (Fig. 4b). For comparison, the traditional sparse decomposition and the PCA-based subspace decomposition are shown in Fig. 4c, d.
Fig. 4

Signal decompositions. a Image sample. b Sparse Subspace decomposition. c Sparse decomposition. d Subspace decomposition

We use the PSNR to assess the noise removal performance:
$$ {}\begin{aligned} \text{PSNR}=20 \cdot \log_{10} \left[\text{MAX}\{\mathbf{S}(i,j)\}\right] -10 \cdot \log_{10} \left[ \text{MSE} \right] \\ \text{MSE}=\frac{1}{IJ}{{\sum\nolimits}_{i=0}^{I-1}}{{\sum\nolimits}_{j=0}^{J-1}}\left[\mathbf{S}(i,j)-\hat{\mathbf{S}}(i,j) \right]^{2} \end{aligned} $$
and the structural similarity index metric (SSIM) between the denoised image and the pure one to evaluate the preserving detail performance:
$$ \begin{aligned} \text{SSIM}(\mathbf{S},\hat{\mathbf{S}})=\frac{(2u_{\mathbf{S}}u_{\hat{\mathbf{S}}} + c_{1})(2\sigma_{\mathbf{S}\hat{\mathbf{S}}}+c_{2}) }{\left(u^{2}_{\mathbf{S}}+ u^{2}_{\hat{\mathbf{S}}} +c_{1}\right)\left(\sigma^{2}_{\mathbf{S}} + \sigma^{2}_{\hat{\mathbf{S}}}+ c_{2}\right)} \end{aligned} $$

where u x is the average of x, \(\sigma _{x}^{2}\) is the variance of x, σ xy is the covariance of x and y, and c 1 and c 2 are small variables to stabilize the division with a weak denominator.

Let us look at the proposed sparse signal subspace decomposition on the top of Fig. 4b. The 128 atoms d k s of the learned over-complete dictionary D are shown in descending order of their energies measured by β k 2. The 32 principal signal atoms are chosen from the dictionary D under the frequency criterion. They are shown in descending order of their frequencies measured by β k 0 composing a signal subspace \(\mathbf {D}_{32}^{(\mathbf {S})}\). We can see that some of the principal atoms are not among the first 32 atoms with the largest energy in the over-complete dictionary D. The retrieved images are shown at the bottom of Fig. 4b. The image S on D is apparently denoised. The image \(\hat {\mathbf {S}}\) on the signal subspace \(\mathbf {D}_{32}^{(\mathbf {S})}\) improves obviously by preserving fine details with a high SSIM=0.86 and at suppressing strong noise with a high PSNR=36.41. On the other hand, the residual image on noise subspace \(\mathbf {D}_{96}^{(N)}\) contains some very noisy information. This is because the atoms of the over-complete dictionary are not independent.

For the same example, the classical sparse decomposition is shown in Fig. 4c, using the K-SVD algorithm [6] in which the allowed error tolerance ε (in Eq. (4)) is set to a larger value to filter out noise. The retrieved image S has a high PSNR=29.62, but it has obviously lost the weak information with SSIM=0.82. This is because signal distortion and residual noise cannot be minimized simultaneously at dictionary learning by Eq. (4).

In another comparison, the PCA-based subspace decomposition is shown in Fig. 4d. The 64 components are orthonormal and the 32 principal components are of the largest variance. The retrieved image by projecting on the signal subspace is rather noisy with PSNR=29.62. This is because it cannot suppress strong noise and preserve weak details of information only using the variance criterion.

4.2 Application to image denoising

The application of 3SD to image denoising is presented here. A major difficulty of denoising is to separate the underlying signal from the noise. The proposed 3SD method could win this challenge. In the 3SD method, the important components are selected from the over-complete dictionary relying on their occurrence number over the noisy image set. Evidently, the occurrence numbers would be large for the signal, even for weak details, such as edges or textures. On the other hand, the occurrence numbers would be low for different kinds of white Gaussian or non-Gaussian noises, even strong at intensity.

The 3SD algorithm for image denoising is presented as follows:

I n p u t:

Noisy image X

O u t p u t:

Denoised image \(\hat {\mathbf {S}}\)


Sparse representation {D,A}: using K-SVD algorithm [6] by (4)


Identify principal atoms from D based on A :


■ Compute the frequencies of atoms \({\kern 23pt}\{\|\boldsymbol {\upbeta }_{k} \|_{0}\}_{k=1}^{K}\) according to (6) and (8)


■ Get the permutation π sorting the index k of \(\{\|\boldsymbol {\upbeta }_{k} \|_{0}\}_{k=1}^{K}\) by (9)


■ Compute the threshold P by (14)


Obtain the signal principal atoms \(\{\mathbf {d}_{\pi (k)}\}_{k=1}^{P}\) by (12)


Reconstruct image \(\hat {\mathbf {S}_{P}}\) by (13)

In this application, we intend to preserve faint signal details under a situation of strong noise.

In the experiments, dictionaries used Ds of size 64×256 (K=256 atoms), designed to handle image patches x m of size N=64=8×8 pixels.

4.3 Image denoising

A noisy Lena image X=S+V with an additive zero-mean white Gaussian noise V is used. The standard deviation of noise is σ=35. A comparison is made between the 3SD method and the K-SVD method [6] which is one of the best denoising methods reported in the recent literatures.

From the results shown in Fig. 5, the 3SD method outperforms the K-SVD method by about 1 dB in PSNR and by about 1% in SSIM (depending on how much details in the images and how faint the details). In terms of subjective visual quality, we can see that the corner of the mouth and the nasolabial fold with weak intensities are much better recovered by the 3SD method.
Fig. 5

Image denoising comparing the proposed 3SD method with the K-SVD method

4.4 SAR image despeckling

In the second experiment, a simulated SAR image with speckle noise is used. Speckle is often modeled as multiplicative noise as x(i,j)=s(i,j)v(i,j) where x, s, and v correspond to the contaminated intensity, the original intensity, and the noise level, respectively.

Figure 6 shows the despeckling results of a simulated one-look SAR scenario with a fragment of the Barbara image. A comparison is made with 3SD method and a probabilistic patch-based (PPB) filter based on nonlocal means approach [7] which can cope with non-Gaussian noise. We can see that PPB can well remove speckle noise. However, it also removes the low-intensity details. The 3SD method shows advantages at preserving fine details and at suppressing strong noise.
Fig. 6

SAR image despeckling comparing the proposed 3SD method with the PPB method

4.5 Comparison with BM3D method

With a spatial complicated image scene, we make a comparison of the 3SD-based denoising method with the BM3D algorithm [8], one of the best methods especially for image denoising reported in many recent literatures.

The effectiveness of any signal analysis method depends on the different conditions in different applications. For the image denoising application, the signals involved should be homogeneous. Therefore, a procedure of grouping is generally adopted to select homogeneous pixels. In the BM3D method, a block-matching grouping is taken before filtering. We adopt the same grouping technique and then filter each homogeneous group by the proposed 3SD method.

Firstly, we take a 256×256 Barbara image (Fig. 7a) with a strong additive zero-mean white Gaussian noise where σ=70 (Fig. 7b). The denoising result by the BM3D algorithm is shown in Fig. 7c. It displays a quite high performance. The denoising result by the 3SD-based method is shown in Fig. 7d. It demonstrates a higher PSNR and a higher SSIM and a better subjective visual quality over the BM3D algorithm.
Fig. 7

Denoising for spatial complicated image scene comparing BM3D method with 3SD-based method. a Original: Barbara (a fragment) 256256. b Noisy: σ=70; PSNR=11.22; SSIM=0.142. c BM3D denoising: PSNR=24.08; SSIM=0.7026. d 3SD denoising: PSNR=24.21; SSIM=0.765

Secondly, we take a simulated one-look SAR image with this 256×256 Barbara image (Fig. 8a) where PSNR=−0.1042. Figure 8b shows the despeckling result by the PPB method [7], in which grouping is realized based on nonlocal similarity and filtering is implemented by averaging each homogeneous group. The despeckled image is too smooth due to average filtering. Figure 8c shows the despeckling result by the BM3D-based method [9], in which grouping is realized based on similar 2-D fragments and filtering is implemented by Wiener shrinkage coefficients from the energy of the 3-D transform coefficients. The despeckled image is much better in PSNR and in SSIM, but it seems a little noisy still due to the used energy criterion which is not effective enough to separate noise elements from the principal elements. Figure 8d shows the despeckling result by the proposed 3SD-based method, in which grouping is realized based on nonlocal similarity [7] and filtering is implemented by the proposed sparse subspace decomposition. The despeckled image demonstrates some advantages of the 3SD method at preserving fine details and at suppressing speckle noise, attributed to the principal subspace decomposition.
Fig. 8

Despeckling for spatial complicated SAR image scene. a Noisy: 1-look; PSNR=−0.1042; SSIM=0.21. b PPB despeckling: PSNR=7.8943; SSIM=0.63. c SAR-BM3D despeckling: PSNR=9.9312; SSIM=0.73. d 3SD-based despeckling: PSNR=10.3316; SSIM=0.74

5 Conclusions

We proposed a method of sparse signal subspace decomposition (3SD). The central idea of the proposed 3SD is to identify principal atoms from an adaptive over-complete dictionary relying on the occurrence frequency of atoms over the data set (Eq. (8)). The atom frequency is measured by zero pseudo-norms of weight vectors of atoms (Eqs. (6) and (8)). The principal subspace is spanned by the maximum frequency atoms (Eq. (12)).

The 3SD method combines the variance criterion, the sparsity criterion, and the component’s frequency criterion into a uniform framework. As a result, it can identify more effectively the principal atoms with the three important signal features. On the contrary, PCA uses only variance criterion and sparse coding method uses the variance and the sparsity criterions. In those ways, it is more difficult to distinguish weak information from strong noise.

Another interesting asset of the 3SD method is that it takes benefits from using an over-complete dictionary which reserves details of information and from subspace decomposition which rejects strong noise. On the contrary, some undercomplete dictionary methods [10] and some sparse shrinkage methods [11, 12] might lose weak information when suppressing noise.

Moreover, the 3SD method is very simple with a linear retrieval operation (Eq. (13)). It does not require any prior knowledge on distribution or parameter to determine a threshold (Eq. (14)). On the contrary, some sparse shrinkage methods, such as [11], necessitate non-linear processing with some prior distributions of signals.

The proposed 3SD could be interpreted as a PCA in sparse decomposition, so it admits straightforward extension to applications of feature extraction, inverse problems, or machine learning.



The idea of the sparse signal subspace decomposition here arises through a lot of deep discussions with Professor Henri Maitre at Telecom ParisTech in France; he also gave suggestion on the structure of the manuscript.


This work was supported by the National Natural Science Foundation of China (Grant No. 60872131).

Authors’ contributions

HS proposed the idea, designed the algorithms and experiments, and drafted the manuscript. CWS gave suggestion on the design of the algorithm, realized the algorithms, and carried out the experiments. DLR gave suggestions on the mathematical expressions of the manuscript and experiment analysis as well as explanation and helped draft the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

School of Electronic Information, Wuhan University, Luojia Hill
Signal and Image Processing Department, Telecom ParisTech
CEDRIC Laboratory, CNAM


  1. K Hermus, P Wambacq, HV Hamme, A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP J. Adv. Signal Process. 888–896 (2007). doi:10.1155/2007/45821.
  2. DL Donoho, IM Johnstone, G Kerkyacharian, D Picard, Wavelet shrinkage: asymptopia?J. R. Stat. Soc. Ser. B. 57:, 301–369 (1995).
  3. DW Tufts, R Kumaresan, I Kirsteins, Data adaptive signal estimation by singular value decomposition of a data matrix. IEEE Proc. 70(6), 684–685 (1982). doi:10.1109/PROC.1982.12367.View ArticleGoogle Scholar
  4. M Elad, M Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 15(12), 3736–3745 (2006). doi:10.1109/TIP.2006.881969.MathSciNetView ArticleGoogle Scholar
  5. G Tartavel, Y Gousseau, G Peyré, Variational texture synthesis with sparsity and spectrum constraints. J. Math. Imaging Vis. 52(1), 124–144 (2015). doi:10.1007/s10851-014-0547-7.MathSciNetView ArticleMATHGoogle Scholar
  6. M Aharon, M Elad, A Bruckstein, K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006). doi:10.1109/TSP.2006.881199.View ArticleGoogle Scholar
  7. CA Deledalle, L Denis, F Tupin, Iterative weighted maximum likelihood denoising with probabilistic patch-based weights. IEEE Trans. Image Process. 18(12), 2661–2672 (2009). doi:10.1109/TIP.2009.2029593.MathSciNetView ArticleGoogle Scholar
  8. K Dabov, A Foi, V Katkovnik, K Egiazarian, Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 16:, 2080–2095 (2007). doi:10.1109/TIP.2007.901238.MathSciNetView ArticleGoogle Scholar
  9. S Parrilli, M Poderico, CV Angelino, L Verdoliva, A nonlocal SAR image denoising algorithm based on LLMMSE wavelet shrinkage. IEEE Trans. Geosci. Remote Sens. 50:, 606–616 (2012). doi:10.1109/TGRS.2011.2161586.View ArticleGoogle Scholar
  10. F Porikli, R Sundaresan, K Suwa, SAR depeckling by sparse reconstruction on affinity nets. EUSAR2012. 18:, 796–799 (2012).
  11. A Hyvarinen, P Hoyer, E Oja, Sparse code shrinkage for image denoising. IEEE World Congr. Comput. Intell. 2:, 59–864 (1998). doi:10.1109/IJCNN.1998.685880.Google Scholar
  12. R Malutan, R Terebes, C Germain, Speckle noise removal in ultrasound images using sparse code shrinkage. 5th IEEE Conf. E-Health Bioeng. Conf. 2:, 1–4 (2015). doi:10.1109/EHB.2015.7391394.Google Scholar


© The Author(s) 2017