# Sparse signal subspace decomposition based on adaptive over-complete dictionary

- Hong Sun
^{1, 2}Email authorView ORCID ID profile, - Cheng-wei Sang
^{1}and - Didier Le Ruyet
^{3}

**2017**:50

https://doi.org/10.1186/s13640-017-0200-7

© The Author(s) 2017

**Received: **13 September 2016

**Accepted: **17 July 2017

**Published: **28 July 2017

## Abstract

This paper proposes a subspace decomposition method based on an over-complete dictionary in sparse representation, called “sparse signal subspace decomposition” (or 3SD) method. This method makes use of a novel criterion based on the occurrence frequency of atoms of the dictionary over the data set. This criterion, well adapted to subspace decomposition over a dependent basis set, adequately reflects the intrinsic characteristic of regularity of the signal. The 3SD method combines variance, sparsity, and component frequency criteria into a unified framework. It takes benefits from using an over-complete dictionary which preserves details and from subspace decomposition which rejects strong noise. The 3SD method is very simple with a linear retrieval operation. It does not require any prior knowledge on distributions or parameters. When applied to image denoising, it demonstrates high performances both at preserving fine details and suppressing strong noise.

## Keywords

## 1 Introduction

Signal subspace methods (SSMs) are efficient techniques to reduce the dimensionality of data and to filter out noise [1]. The fundamental idea under SSM is to project the data on a basis made of two subspaces, one mostly containing the signal and the other the noise. The two subspaces are separated by a thresholding criterion associated with some measures of information.

The two most popular methods of signal subspace decomposition are wavelet shrinkage [2] and principal component analysis (PCA) [3]. Both techniques have proved to be quite efficient. However, wavelet decomposition depending on signal statistics is not equally adapted to different data and requires some knowledge on prior distributions or parameters of signals to efficiently choose the thresholds for shrinkage. A significant advantage of the PCA is its adaptability to data. The separation criterion is based on energy which may be seen as a limitation in some cases as illustrated in the next section.

In recent years, sparse coding has attracted significant interest in the field of signal denoising [4]. A sparse representation is a decomposition of a signal on a very small set of components of an over-complete basis (called dictionary) which is adapted to the processed data. A difficult aspect for signal subspace decomposition based on such a sparse representation is to define the most appropriate criterion to identify the principal components (called atoms) from the learned dictionary to build the principal subspace. The non-orthogonal property of the dictionary does not allow to use the energy criterion for this purpose, as done with PCA.

To solve this problem, we introduce a new criterion to measure the importance of atoms and propose a SSM under the criterion of the occurrence frequency of atoms. We thus make benefit both from the richness of over-complete dictionaries which preserves details of information and from signal subspace decomposition which rejects strong noise.

The remainder of this paper is organized as follows: Section 2 presents two related works to signal decomposition. Section 3 introduces the proposed sparse signal subspace decomposition based on the adaptive over-complete dictionary. Some experimental results and analysis are presented in Section 4. Finally, we draw the conclusion in Section 5.

## 2 Review of PCA and sparse coding methods

We start with a brief description of two well-established approaches to signal decomposition that are relevant and related to the approach proposed in the next section.

### 2.1 PCA-based subspace decomposition

The basic tool of SSM is principal component analysis (PCA). PCA makes use of an orthonormal basis to capture on a small set of vectors (the signal subspace) as much energy as possible from the observed data. The other basis vectors are expected to contain noise only, and the signal projection on these vectors is rejected.

**X**of size

*N*×

*M*: \(\mathbf {X} = \{\mathbf {x}_{m}\}_{m=1}^{M}\). The PCA is based on singular value decomposition (SVD) with singular values

*σ*

_{ i }in descending order obtained from:

where **U** and **V** are unitary matrices of sizes *N*×*N* and *M*×*M*, respectively (**U**
^{
T
}
**U**=**I**
_{
N
},**V**
^{
T
}
**V**=**I**
_{
M
}), and \(\mathbf {\Sigma }= \left [\begin {array}{cc} \text {diag}\left [\sigma _{1},\cdots,\sigma _{r}\right ],\mathbf {0}\\ \mathbf {0} \end {array}\right ]\) of size *N*×*M* with *σ*
_{1}≥*σ*
_{2}≥⋯≥*σ*
_{
r
}>0, \( \{\sigma _{i}\}_{i=1}^{r} \) are positive real known as the singular values of **X** with rank *r* (*r*≤*N*).

where \(\mathbf {U}=\left \{\mathbf {u}_{n} \in \mathbb {R}^{N \times 1}\right \}_{n=1}^{N}\) and \(\mathbf {A}=\left \{{\alpha }_{m} \in \mathbb {R}^{N \times 1}\right \}_{m=1}^{M}\). Equation (2) means that the data set \(\{\mathbf {x}_{m}\}_{m=1}^{M}\) is expressed on the orthonormal basis \(\{\mathbf {u}_{n}\}_{n=1}^{N}\) as \(\{\alpha _{m}\}_{m=1}^{M}\).

*σ*

_{ i }is used as the measurement for identifying the meaningful basis vector

**u**

_{ i }. PCA takes the first

*P*(

*P*<

*r*) components \(\{\mathbf {u}_{n} \}_{n=1}^{P}\) to span the signal subspace, and the remainders \(\{\mathbf {u}_{n}\}_{n=P+1}^{r}\) are considered in a noise subspace orthogonal to the signal subspace. Therefore, projection on the signal subspace will hopefully filter out noise and reveal hidden structures. The reconstructed signal \(\hat {\mathbf {S}}_{\text {PCA}}\) of size

*N*×

*M*is obtained by projecting in the signal subspace as:

The underlying assumption is that information in the data set is almost completely contained in a small linear subspace of the overall space of possible data vectors, whereas additive noise is typically distributed through the larger space isotropically. PCA, using the standard deviation as a criterion, implies that the components of the signal of interest in the data set have a maximum variance and the other components are mainly due to the noise. However, in many practical cases, some components with low variances might actually be important because they carry information relative to the signal details. On the contrary, when dealing with noise with non-Gaussian statistics, it may happen that some noise components may actually have higher variances. At last, note that it is often difficult to provide a physical meaning to the orthonormal basis \(\{\mathbf {u}_{i} \}_{i=1}^{r}\) of the SVD decomposition (Eq. (2)) although they have a very clear definition in the mathematical sense as orthogonal, independent, and normal. It is therefore difficult to impose known constraints on the signal features when they exist after the principal component decomposition.

### 2.2 Sparse decomposition

*M*observations \(\left \{\mathbf {x}_{m} \in \mathbb {R}^{N}\right \}_{m=1}^{M}\) based on a dictionary \(\mathbf {D}=\{\mathbf {d}_{k}\}_{k=1}^{K} \in \mathbb {R}^{N \times K}\). When

*K*>

*N*, the dictionary is said to be over-complete. \(\mathbf {d}_{k} \in \mathbb {R}^{N}\) is a basis vector, also called an atom since it is not necessarily independent. By learning from data set \(\{\mathbf {x}_{m} \}_{m=1}^{M}\), the sparse decomposition is the solution of Eq. (4) [4]:

**x**

_{ m }. The allowed error tolerance

*ε*can be chosen according to the standard deviation of the noise. An estimate of the underlying signal \(\{\mathbf {s}_{m}\}_{m=1}^{M}\) embedded in the observed data set \(\{\mathbf {x}_{m} \}_{m=1}^{M}\) would be:

where the matrix **A** of size *K*×*M* is composed of *M* sparse column vectors α
_{
m
}.

The first term on the right side of Eq. (4) is a sparsity-inducing regularization that constrains the solution with the fewest number of nonzero coefficients in each of the sparse code vectors α
_{
m
}(1≤*m*≤*M*). The underlying assumption is that a meaningful signal could be represented by combining few atoms. This learned dictionary adapted to sparse signal descriptions has proved to be more effective in signal reconstruction and classification tasks than the PCA method, which is demonstrated in the next section. The second term in Eq. (4) is the residual of the reconstruction, based on the mean-square reconstruction error estimate in the same way as in the PCA method.

On the other hand, we note that the dictionary **D**, a basis in sparse decomposition, is produced by learning noisy data set \(\{\mathbf {x}_{m}\}_{m=1}^{M}\), so the basis vectors \(\{\mathbf {d}_{k}\}_{k=1}^{K}\) should be decomposed into a principal subspace and a residual subspace. However, it is impossible to exploit an energy-constrained subspace since \(\{\mathbf {d}_{k}\}_{k=1}^{K}\) are not necessarily orthogonal or independent.

## 3 The proposed sparse subspace decomposition

In this section, we introduce a novel criterion to the subspace decomposition over a learned dictionary and a corresponding index of significance of the atoms. Then we propose a signal sparse subspace decomposition (3SD) method under this new criterion.

### 3.1 Weight vectors of learned atoms

**A**is composed by

*M*sparse column vectors

*α*

_{ m }, each

*α*

_{ m }representing the weight of the observation

**x**

_{ m }, a local parameter for the

*m*-th observation. Let us consider the row vectors \(\{\boldsymbol {\upbeta }_{k}\}_{k=1}^{K}\) of coefficient matrix

**A**:

_{ k }is not necessarily sparse. Then Eq. (5) can be rewritten as:

_{ k }is the weight of the atom

**d**

_{ k }, which is a global parameter over the data set

**X**. Denoting ∥β

_{ k }∥

_{0}the

*ℓ*

^{0}zero pseudo-norm of β

_{ k }. ∥β

_{ k }∥

_{0}is the number of occurrences of atom

**d**

_{ k }over the data set \(\{\mathbf {x}_{m} \}_{m=1}^{M}\). We call it the frequency of the atom

**d**

_{ k }denoted by

*f*

_{ k }:

In the sparse decomposition, basis vectors \(\{\mathbf {d}_{k} \}_{k=1}^{K}\) are prototypes of signal segments. That allows us to take them as a signal patterns. Thereupon, some important features of this signal pattern could be considered as a criterion to identify significant atoms. It is demonstrated [5] that *f*
_{
k
} is a good description of the signal texture. Intuitively, a signal pattern must occur in meaningful signals with higher frequency even with a lower energy. On the contrary, a noise pattern would hardly be reproduced in observed data even with a higher energy.

It is reasonable to take this frequency *f*
_{
k
} as a relevance criterion to decompose the over-complete dictionary into a principal signal subspace and a remained noise subspace. Here, we use the word “subspace,” but in fact, these two subspaces are not necessarily independent.

### 3.2 Subspace decomposition based on over-complete dictionary

*ℓ*

^{0}-norms \(\{\|\boldsymbol {\upbeta }_{k} \|_{0}\}_{k=1}^{K}\) and rank them in descending order as follows. The index

*k*of vectors \(\{\boldsymbol {\upbeta }_{k}\}_{k=1}^{K}\) are belonging to the set

**C**={1,2,⋯,

*k*,⋯,

*K*}. A one-to-one index mapping function

*π*is defined as:

*π*of the row index

*k*of matrix \(\mathbf {A} = \left [\boldsymbol {\upbeta }_{1}^{T} \cdots \boldsymbol {\upbeta }_{k}^{T} \cdots \boldsymbol {\upbeta }_{K}^{T} \right ]^{T}\), the reordered coefficient matrix \(\tilde {\mathbf {A}}\) becomes

*P*atoms can be taken as a principal subspace \(\mathbf {D}_{P}^{(\mathbf {S})}\), and the remaining atoms span a noise subspace \(\mathbf {D}_{K-P}^{(N)}\) as:

**S**embedded in the observed data set

**X**can be obtained on the principal subspace \(\mathbf {D}_{P}^{(\mathbf {S})}\) simply by linear combination:

### 3.3 Threshold of atom’s frequency

Determining the number *P* of atoms spanning the signal subspace \(\mathbf {D}_{P}^{(\mathbf {S})}\) is always a hard topic especially for wide-band signals. Here, *P* is the threshold of atom’s frequency *f*
_{
k
} to distinguish a signal subspace and a noise subspace. One of the advantages of 3SD is that this threshold *P* can be easily chosen without any prior parameter.

*P*of

*f*

_{ k }(dotted line in the Fig. 1d) to separate the signal’s atoms from the noise’s atoms. By contrast, using the values of atom’s energies ∥β

_{ k }∥

_{2}s for the two images shown in Fig. 1c, it is rather a puzzle to identify principal bases.

*P*of the frequencies

*f*

_{ k }s. In practical implementation, the value of

*P*could be simply decided relying on the histogram of

*f*

_{ k }. As shown in Fig. 2d, one can set the value of

*f*

_{ k }associated with the maximum point of its histogram to

*P*as follows:

*P*, owed to the dependence of the atoms. To illustrate this point, we take three images, Barbara, Lena, and Boat. Their histograms of

*f*

_{ k }are shown in Fig. 3a with the maximum points in dotted lines, 121, 97, and 92, respectively. Figure 3b reports the peak signal-to-noise ratio (PSNR) of the retrieved images \(\hat {\mathbf {S}}_{P}\) on the signal principal subspace \(\mathbf {D}_{P}^{(\mathbf {S})}\) with respect to

*P*. We can see that the PSNRs of the results remain the same in a large range around the maximum points (in dotted lines). Consequently, taking the value of

*f*

_{ k }associated to the maximum point of its histogram as the threshold

*P*is a reasonable solution.

## 4 Results and discussion

### 4.1 Signal decomposition methods

where *u*
_{
x
} is the average of *x*, \(\sigma _{x}^{2}\) is the variance of *x*, *σ*
_{
xy
} is the covariance of *x* and *y*, and *c*
_{1} and *c*
_{2} are small variables to stabilize the division with a weak denominator.

Let us look at the proposed sparse signal subspace decomposition on the top of Fig. 4b. The 128 atoms **d**
_{
k
}s of the learned over-complete dictionary **D** are shown in descending order of their energies measured by ∥β
_{
k
}∥_{2}. The 32 principal signal atoms are chosen from the dictionary **D** under the frequency criterion. They are shown in descending order of their frequencies measured by ∥β
_{
k
}∥_{0} composing a signal subspace \(\mathbf {D}_{32}^{(\mathbf {S})}\). We can see that some of the principal atoms are not among the first 32 atoms with the largest energy in the over-complete dictionary **D**. The retrieved images are shown at the bottom of Fig. 4b. The image **S** on **D** is apparently denoised. The image \(\hat {\mathbf {S}}\) on the signal subspace \(\mathbf {D}_{32}^{(\mathbf {S})}\) improves obviously by preserving fine details with a high SSIM=0.86 and at suppressing strong noise with a high PSNR=36.41. On the other hand, the residual image on noise subspace \(\mathbf {D}_{96}^{(N)}\) contains some very noisy information. This is because the atoms of the over-complete dictionary are not independent.

For the same example, the classical sparse decomposition is shown in Fig. 4c, using the K-SVD algorithm [6] in which the allowed error tolerance *ε* (in Eq. (4)) is set to a larger value to filter out noise. The retrieved image **S** has a high PSNR=29.62, but it has obviously lost the weak information with SSIM=0.82. This is because signal distortion and residual noise cannot be minimized simultaneously at dictionary learning by Eq. (4).

In another comparison, the PCA-based subspace decomposition is shown in Fig. 4d. The 64 components are orthonormal and the 32 principal components are of the largest variance. The retrieved image by projecting on the signal subspace is rather noisy with PSNR=29.62. This is because it cannot suppress strong noise and preserve weak details of information only using the variance criterion.

### 4.2 Application to image denoising

The application of 3SD to image denoising is presented here. A major difficulty of denoising is to separate the underlying signal from the noise. The proposed 3SD method could win this challenge. In the 3SD method, the important components are selected from the over-complete dictionary relying on their occurrence number over the noisy image set. Evidently, the occurrence numbers would be large for the signal, even for weak details, such as edges or textures. On the other hand, the occurrence numbers would be low for different kinds of white Gaussian or non-Gaussian noises, even strong at intensity.

| Noisy image |

| Denoised image \(\hat {\mathbf {S}}\) |

- | Sparse representation { |

- | Identify principal atoms from |

■ Compute the frequencies of atoms \({\kern 23pt}\{\|\boldsymbol {\upbeta }_{k} \|_{0}\}_{k=1}^{K}\) according to (6) and (8) | |

■ Get the permutation | |

■ Compute the threshold | |

- | Obtain the signal principal atoms \(\{\mathbf {d}_{\pi (k)}\}_{k=1}^{P}\) by (12) |

- | Reconstruct image \(\hat {\mathbf {S}_{P}}\) by (13) |

In this application, we intend to preserve faint signal details under a situation of strong noise.

In the experiments, dictionaries used *D*s of size 64×256 (*K*=256 atoms), designed to handle image patches **x**
_{
m
} of size *N*=64=8×8 pixels.

### 4.3 Image denoising

A noisy Lena image **X**=**S**+**V** with an additive zero-mean white Gaussian noise **V** is used. The standard deviation of noise is *σ*=35. A comparison is made between the 3SD method and the K-SVD method [6] which is one of the best denoising methods reported in the recent literatures.

### 4.4 SAR image despeckling

In the second experiment, a simulated SAR image with speckle noise is used. Speckle is often modeled as multiplicative noise as *x*(*i*,*j*)=*s*(*i*,*j*)*v*(*i*,*j*) where *x*, *s*, and *v* correspond to the contaminated intensity, the original intensity, and the noise level, respectively.

### 4.5 Comparison with BM3D method

With a spatial complicated image scene, we make a comparison of the 3SD-based denoising method with the BM3D algorithm [8], one of the best methods especially for image denoising reported in many recent literatures.

The effectiveness of any signal analysis method depends on the different conditions in different applications. For the image denoising application, the signals involved should be homogeneous. Therefore, a procedure of grouping is generally adopted to select homogeneous pixels. In the BM3D method, a block-matching grouping is taken before filtering. We adopt the same grouping technique and then filter each homogeneous group by the proposed 3SD method.

*σ*=70 (Fig. 7b). The denoising result by the BM3D algorithm is shown in Fig. 7c. It displays a quite high performance. The denoising result by the 3SD-based method is shown in Fig. 7d. It demonstrates a higher PSNR and a higher SSIM and a better subjective visual quality over the BM3D algorithm.

## 5 Conclusions

We proposed a method of sparse signal subspace decomposition (3SD). The central idea of the proposed 3SD is to identify principal atoms from an adaptive over-complete dictionary relying on the occurrence frequency of atoms over the data set (Eq. (8)). The atom frequency is measured by zero pseudo-norms of weight vectors of atoms (Eqs. (6) and (8)). The principal subspace is spanned by the maximum frequency atoms (Eq. (12)).

The 3SD method combines the variance criterion, the sparsity criterion, and the component’s frequency criterion into a uniform framework. As a result, it can identify more effectively the principal atoms with the three important signal features. On the contrary, PCA uses only variance criterion and sparse coding method uses the variance and the sparsity criterions. In those ways, it is more difficult to distinguish weak information from strong noise.

Another interesting asset of the 3SD method is that it takes benefits from using an over-complete dictionary which reserves details of information and from subspace decomposition which rejects strong noise. On the contrary, some undercomplete dictionary methods [10] and some sparse shrinkage methods [11, 12] might lose weak information when suppressing noise.

Moreover, the 3SD method is very simple with a linear retrieval operation (Eq. (13)). It does not require any prior knowledge on distribution or parameter to determine a threshold (Eq. (14)). On the contrary, some sparse shrinkage methods, such as [11], necessitate non-linear processing with some prior distributions of signals.

The proposed 3SD could be interpreted as a PCA in sparse decomposition, so it admits straightforward extension to applications of feature extraction, inverse problems, or machine learning.

## Declarations

### Acknowledgements

The idea of the sparse signal subspace decomposition here arises through a lot of deep discussions with Professor Henri Maitre at Telecom ParisTech in France; he also gave suggestion on the structure of the manuscript.

### Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 60872131).

### Authors’ contributions

HS proposed the idea, designed the algorithms and experiments, and drafted the manuscript. CWS gave suggestion on the design of the algorithm, realized the algorithms, and carried out the experiments. DLR gave suggestions on the mathematical expressions of the manuscript and experiment analysis as well as explanation and helped draft the manuscript. All authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- K Hermus, P Wambacq, HV Hamme, A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP J. Adv. Signal Process. 888–896 (2007). doi:10.1155/2007/45821.
- DL Donoho, IM Johnstone, G Kerkyacharian, D Picard, Wavelet shrinkage: asymptopia?J. R. Stat. Soc. Ser. B.
**57:**, 301–369 (1995). http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.162.1643. - DW Tufts, R Kumaresan, I Kirsteins, Data adaptive signal estimation by singular value decomposition of a data matrix. IEEE Proc.
**70**(6), 684–685 (1982). doi:10.1109/PROC.1982.12367.View ArticleGoogle Scholar - M Elad, M Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process.
**15**(12), 3736–3745 (2006). doi:10.1109/TIP.2006.881969.MathSciNetView ArticleGoogle Scholar - G Tartavel, Y Gousseau, G Peyré, Variational texture synthesis with sparsity and spectrum constraints. J. Math. Imaging Vis.
**52**(1), 124–144 (2015). doi:10.1007/s10851-014-0547-7.MathSciNetView ArticleMATHGoogle Scholar - M Aharon, M Elad, A Bruckstein, K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process.
**54**(11), 4311–4322 (2006). doi:10.1109/TSP.2006.881199.View ArticleGoogle Scholar - CA Deledalle, L Denis, F Tupin, Iterative weighted maximum likelihood denoising with probabilistic patch-based weights. IEEE Trans. Image Process.
**18**(12), 2661–2672 (2009). doi:10.1109/TIP.2009.2029593.MathSciNetView ArticleGoogle Scholar - K Dabov, A Foi, V Katkovnik, K Egiazarian, Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process.
**16:**, 2080–2095 (2007). doi:10.1109/TIP.2007.901238.MathSciNetView ArticleGoogle Scholar - S Parrilli, M Poderico, CV Angelino, L Verdoliva, A nonlocal SAR image denoising algorithm based on LLMMSE wavelet shrinkage. IEEE Trans. Geosci. Remote Sens.
**50:**, 606–616 (2012). doi:10.1109/TGRS.2011.2161586.View ArticleGoogle Scholar - F Porikli, R Sundaresan, K Suwa, SAR depeckling by sparse reconstruction on affinity nets. EUSAR2012.
**18:**, 796–799 (2012). https://www.researchgate.net/publication/232905473_SAR_Despeckling_by_Sparse_Reconstruction_on_Affinity_Nets_SRAN. - A Hyvarinen, P Hoyer, E Oja, Sparse code shrinkage for image denoising. IEEE World Congr. Comput. Intell.
**2:**, 59–864 (1998). doi:10.1109/IJCNN.1998.685880.Google Scholar - R Malutan, R Terebes, C Germain, Speckle noise removal in ultrasound images using sparse code shrinkage. 5th IEEE Conf. E-Health Bioeng. Conf.
**2:**, 1–4 (2015). doi:10.1109/EHB.2015.7391394.Google Scholar