Open Access

New inexact explicit thresholding/shrinkage formulas for inverse problems with overlapping group sparsity

EURASIP Journal on Image and Video Processing20162016:18

https://doi.org/10.1186/s13640-016-0118-5

Received: 13 July 2015

Accepted: 12 April 2016

Published: 29 April 2016

Abstract

The least-square regression problems or inverse problems have been widely studied in many fields such as compressive sensing, signal processing, and image processing. To solve this kind of ill-posed problems, a regularization term (i.e., regularizer) should be introduced, under the assumption that the solutions have some specific properties, such as sparsity and group sparsity. Widely used regularizers include the 1 norm, total variation (TV) semi-norm, and so on. Recently, a new regularization term with overlapping group sparsity has been considered. Majorization minimization iteration method or variable duplication methods are often applied to solve them. However, there have been no direct methods for solving the relevant problems due to the difficulty of overlapping. In this paper, we proposed new inexact explicit shrinkage formulas for one class of these relevant problems, whose regularization terms have translation invariant overlapping groups. Moreover, we apply our results to TV deblurring and denoising problems with overlapping group sparsity. We use alternating direction method of multipliers to iteratively solve them. Numerical results verify the validity and effectiveness of our new inexact explicit shrinkage formulas.

Keywords

Overlapping group sparsityTotal variationADMMImage deblurringRegularizationExplicit shrinkage formula

1 Introduction

The least-square regression problems or inverse problems have been widely studied in many fields such as compressive sensing, signal processing, image processing, statistics, and machine learning. Regularization terms with sparse representations (for instance, the 1 norm regularizer) have been developed into an important tool in these applications, and a list of methods have been proposed [14]. These methods are based on the assumption that signals or images have a sparse representation, that is, only containing a few nonzero entries. The corresponding task is to solve the following problem
$$ \min_{\mathbf{z}} \quad\|\mathbf{z}\|_{1} + \frac{\beta}{2}\|\mathbf{z} -\mathbf{x}\|_{2}^{2}, $$
(1)

where \(\mathbf {x}\in \mathbb {R}^{n}\) is a given vector, \(\mathbf {z}\in \mathbb {R}^{n}\), \(\|\mathbf {z}\|_{p}=\left (\sum _{i=1}^{n} |z_{i}|^{p}\right)^{\frac {1}{p}}\) with (p=1,2) represents the p norm of vector z. The first term of (1) is called the regularization term, the second term is called the fidelity term, and β>0 is the regularization parameter.

To further improve the solutions, recent studies suggested to go beyond sparsity and took into account additional information about the underlying structure of the solutions [1, 2, 5]. In particular, a wide class of solutions which have specific “group sparsity” structure are considered. In this case, a group sparse vector can be divided into groups of components satisfying (a) only a few of groups contain nonzero values and (b) these groups are not needed to be sparse. If we insert such group vectors into a matrix as row vectors of the matrix, this matrix will only have few nonzero rows and these rows may be not sparse. This property is called “group sparsity” or “joint sparsity,” and many literature have considered these new sparse problems [16]. These problems can be formulated as the following expression.
$$ \min_{\mathbf{z}} \quad \sum_{i=1}^{r} \|\mathbf{z}[i]\|_{2}+ \frac{\beta}{2}\|\mathbf{z} -\mathbf{x}\|_{2}^{2}, $$
(2)

where |z i | is the absolute value of z i , and z[i] is the ith group of z with z[i]∩z[j]= and \(\bigcup _{i=1}^{r} \mathbf {z}[i]= \mathbf {z}\). “Group sparsity” solutions from (2) have better representation and have been widely studied both for convex and nonconvex cases [1, 3, 69].

More recently, overlapping group sparsity (OGS) have been considered [1, 3, 1019]. These methods are based on the assumption that signals or images have a special sparse representation with OGS. The task is to solve the following problem
$$ \min_{\mathbf{z}} \quad \|\mathbf{z}\|_{2,1}+ \frac{\beta}{2}\|\mathbf{z} -\mathbf{x}\|_{2}^{2}, $$
(3)

where \(\|\mathbf {z}\|_{2,1} = \sum _{i=1}^{n} \|(z_{i})_{g}\|_{2}\) is the generalized 2,1 norm. Here, each (z i ) g is a group vector containing s (called group size) elements that surrounding the ith entry of z. For example, (z i ) g =(z i−1,z i ,z i+1) with s=3. In this case, (z i ) g , (z i+1) g , and (z i+2) g contain the (i+1)th entry of z, z i+1, which means “overlapping” different from the form of group sparsity in (2). In particular, if s=1, the generalized 2,1 norm degenerates into the original 1 norm, and the relevant regularization problem (3) degenerates to (1).

To be more general, researchers considered the weighted generalized 2,1 norm \(\|z\|_{w,2,1} = \sum _{i=1}^{n} \|w_{g}\circ (z_{i})_{g}\|_{2}\) (we only consider that each group has the same weight, which means translation invariant) instead of the former generalized 2,1 norm. The task can be extended to
$$ \min_{\mathbf{z}} \quad \|\mathbf{z}\|_{w,2,1}+ \frac{\beta}{2}\|\mathbf{z} -\mathbf{x}\|_{2}^{2}, $$
(4)

where w g is a nonnegative real vector with the same size as (z i ) g and “ ” is the point-wise product or Hadamard product. For instance, w g (z i ) g =((w g )1 z i ,(w g )2 z i+1,(w g )3 z i+2) with s=3 as the former example. In particular, the weighted generalized 2,1 norm degenerates into the generalized 2,1 norm if each entry of w g equals to 1.

The problems (3) and (4) have been considered in [1, 1419]. They solve the relevant problems by using variable duplication methods (variable splitting, latent/auxilliary variables, etc.). For example, Deng et al. in [1] introduced a diagonal matrix G in their method. However, matrix G is not easy to find and would break the structure of the coefficient matrix, which makes the difficulty of solving solutions under high-dimensional vector cases. Moreover, it is difficult to extend this method to the matrix case.

Considering the matrix case of the problem (4), we can get
$$ \min_{A} \ \ \ \|A\|_{W,2,1}+ \frac{\beta}{2}\|A -X\|_{F}^{2}, $$
(5)
where \(X,A\in \mathbb {R}^{m\times n}\), \(\|A\|_{W,2,1} = \sum _{i=1}^{m}\sum _{j=1}^{n} \|W_{g}\circ (A_{i,j})_{g}\|_{F}\). Here, each (A i,j ) g is a group matrix containing K 1×K 2 (called group size) elements that surround the (i,j)th entry of A. It is defined by
$$ {\begin{array}{l} W_{g}\circ(A_{i,j})_{g} =\\ \left[ \begin{array}{cccc} (W_{g})_{1,1}A_{i-l_{1},j-l_{2}} &(W_{g})_{1,2}A_{i-l_{1},j-l_{2}+1} &\cdots &(W_{g})_{1,K_{2}}A_{i-l_{1},j+r_{2}}\\ (W_{g})_{2,1}A_{i-l_{1}+1,j-l_{2}} &(W_{g})_{2,2}A_{i-l_{1}+1,j-l_{2}+1} &\cdots &(W_{g})_{2,K_{2}}A_{i-l_{1}+1,j+r_{2}}\\ \vdots &\vdots &\ddots &\vdots\\ (W_{g})_{K_{1},1}A_{i+r_{1},j-l_{2}}&(W_{g})_{K_{1},2}A_{i+r_{1},j-l_{1}+1}&\cdots &(W_{g})_{K_{1},K_{2}}A_{i+r_{1},j+r_{2}}\\ \end{array}\right]\\ \in \mathbb{R}^{K_{1}\times K_{2}}, \end{array}} $$
where \(l_{1}=\lfloor \frac {K_{1}-1}{2}\rfloor \), \(l_{2}=\lfloor \frac {K_{2}-1}{2}\rfloor \), \(r_{1}=\lfloor \frac {K_{1}}{2}\rfloor \), \(r_{2}=\lfloor \frac {K_{2}}{2}\rfloor \) (with l 1+r 1+1=K 1 and l 2+r 2+1=K 2) and x denotes the largest integer less than or equal to x. In particular, if K 2=1, this problem degenerates to the former vector case (4). If K 1=K 2=1, this problem degenerates to the original 1 regularization problem (1) for the matrix case.

Much more recently, Chen et al. [10] considered the problem (5) when (W g ) i,j ≡1 for i=1,,K 1, j=1,,K 2. They used an iterative algorithm to solve this problem based on the principle of majorization minimization (MM). Their experiments showed that their method on solving the problem (3) is more efficient than other variable duplication methods (variable splitting, latent/auxilliary variables, etc.) [1, 1419]. Although their method is efficient for computing the solution, it may cost more time since the solution is obtained after many iterations rather than a direct step computation.

To our knowledge, there have been no direct methods for solving the relevant problems (3), (4), and (5). In this paper, we propose new inexact explicit shrinkage formulas for solving them directly, which are direct methods, and faster or more convenient than other methods, for instance, the most recently MM iteration method [10]. This new direct method can save more time than the MM method in [10] while getting very similar solutions. Numerical results are given to show the effectiveness of our new explicit shrinkage formulas. Moreover, the new method can be used in application for solving the subproblem in many other OGS problems, such as the wavelet regularizer with OGS in compressive sensing and the total variation (TV) regularizer with OGS in image restoration [11, 12]. For example, we will apply our results to image restoration using TV with OGS in this work similarly as [11, 12]. Moveover, we expend the works, only considering ATV in [11, 12] to both ATV and ITV (more detail can be referred to Section 4). Numerical results will show that our method can save more time with getting very similar results.

The outline of the paper is as follows. In Section 2, we deduce our explicit shrinkage formulas in detail for the OGS problems (3), (4), and (5). In Section 3, we propose some extensions for these shrinkage formulas. In Section 4, we apply our results to image deblurring and denoising problems with OGS TV. Numerical results are given in Section 5. Finally, we conclude this paper in Section 6.

2 OGS shrinkage

2.1 Classic shrinkage

In this subsection, we will review the original shrinkage formulas and their principles, since our new explicit OGS shrinkage formulas are based on them. Firstly, we give the following definition.

Definition 1.

Define shrinkage mappings Sh1 and Sh2 from \(\mathbb {R}^{N}\times R^{+}\) to \(\mathbb {R}^{N}\) by
$$ {\text{Sh}_{1}(\mathbf{x},\beta)}_{i} = \text{sgn}(x_{i}) \max\left\{ |x_{i}| - \frac{1}{\beta},0\right\}, $$
(6)
$$ {\text{Sh}_{2}(\mathbf{x},\beta)} = \frac{\mathbf{x}}{\|\mathbf{x}\|_{2}} \max\left\{ \|\mathbf{x}\|_{2} - \frac{1}{\beta},0\right\}, $$
(7)

where the expression is taken to be zero when the second factor is zero, and “sgn” represents the signum function indicating the sign of a number. More precisely, sgn(x) =0 if x=0, sgn(x) =−1 if x<0, and sgn(x) =1 if x>0.

The shrinkage (6) is known as soft thresholding and occurs in many algorithms related to sparsity since it is the proximal mapping for the 1 norm. The shrinkage (7) is known as high-dimensional shrinkage formula.

Now, we consider solving the following problems
$$ \min_{\mathbf{z}} \quad \|\mathbf{z}\|_{p} + \frac{\beta}{2}\|\mathbf{z} -\mathbf{x}\|_{2}^{2}, \ \ p=1,2. $$
(8)
The minimizer of (8) with p=1 is the following equation.
$$ \arg\min\limits_{\mathbf{z}} \quad \|\mathbf{z}\|_{1} + \frac{\beta}{2}\|\mathbf{z} -\mathbf{x}\|_{2}^{2} = {\text{Sh}_{1}(\mathbf{x},\beta)}. $$
(9)
Due to the additivity and separability of both the 1 norm and the square of the 2 norm, Eq. (9) can be deduced easily by the following formula:
$$ \min_{\mathbf{z}} \quad \|\mathbf{z}\|_{1} + \frac{\beta}{2}\|\mathbf{z} -\mathbf{x}\|_{2}^{2} = \sum_{i=1}^{n} \min_{z_{i}}|z_{i}| + \frac{\beta}{2}|z_{i}-x_{i}|^{2}. $$
(10)
The minimizer of (8) with p=2 is the following equation.
$$ \arg\min\limits_{\mathbf{z}} \quad \|\mathbf{z}\|_{2} + \frac{\beta}{2}\|\mathbf{z} -\mathbf{x}\|_{2}^{2} = {\text{Sh}_{2}(\mathbf{x},\beta)}. $$
(11)
This formula is deduced by the Euler equation of (8) with p=2. In particular, without considering x=0 or z=0, we obtain the following Euler equation.
$$ \beta\left(\mathbf{z} -\mathbf{x}\right)+ \frac{\mathbf{z}}{\|\mathbf{z}\|_{2}} \ni \mathbf{0}, $$
(12)
$$ \left(1 + \frac{1}{\beta}\frac{1}{\|\mathbf{z}\|_{2}} \right)\mathbf{z}-\mathbf{x} \ni \mathbf{0}. $$
(13)

We can easily get that the necessary condition is that the vector z is parallel to the vector x. That is, \(\frac {\mathbf {z}}{\|\mathbf {z}\|_{2}}=\frac {\mathbf {x}}{\|\mathbf {x}\|_{2}}\). Substituting into (13) and considering x=0 or z=0, we obtain the formula (7). More details can be referred to [20, 21].

Our new explicit OGS shrinkage formulas are based on these observations, especially the properties of additivity, separability, and parallelity.

Remarks 1.

The problem (2) is easy to be solved by a simple shrinkage formula, which is not used in this work. More details can be referred to [1, 22, 23].

2.2 The OGS shrinkage formulas

Now, we focus on the problem (3) firstly. The difficulty of this problem is “overlapping.” Therefore, we must take some special techniques to avoid “overlapping.” That is the point of our new explicit OGS shrinkage formulas.

It is obvious that the first term of the problem (3) is additive and separable. So if we find some relative rules such that the second term of the problem (3) has the same properties with the same variable as the first term, one approximate solution of (3) can be easily found similar as classic shrinkage.

Assuming periodic boundary condition (PBC) (boundary condition (BC) is necessary in OGS functionals) is used here, we observe that each entry z i of the vector z would appear exactly s times in the first term. Therefore, to hold on the uniformity of vectors z and x, we need to multiply the second term by s. To maintain the invariability of the problem (3), after some manipulations, we have
$$ {}\begin{aligned} f_{m}(\mathbf{z})&=\min\limits_{\mathbf{z}} \|\mathbf{z}\|_{2,1}+ \frac{\beta}{2}\|\mathbf{z} -\mathbf{x}\|_{2}^{2}\\ &=\min\limits_{\mathbf{z}} {\sum\nolimits}_{i=1}^{n} \|(z_{i})_{g}\|_{2} + \frac{\beta}{2s}s\|\mathbf{z} -\mathbf{x}\|_{2}^{2}\\ &=\min\limits_{\mathbf{z}} {\sum\nolimits}_{i=1}^{n} \|(z_{i})_{g}\|_{2} + \frac{\beta}{2s}{\sum\nolimits}_{i=1}^{n}\|(z_{i})_{g} -(x_{i})_{g}\|_{2}^{2}\\ &=\min\limits_{\mathbf{z}} {\sum\nolimits}_{i=1}^{n} \left(\|(z_{i})_{g}\|_{2} + \frac{\beta}{2s}\|(z_{i})_{g} -(x_{i})_{g}\|_{2}^{2}\right),\\ \end{aligned} $$
(14)

where (x i ) g is similarly defined as (z i ) g in Section 1.

For example, we set s=3 and define (z i ) g =(z i ,z i+1,z i+2). The generalized 2,1 norm z2,1 can be treated as the generalized 1 norm of generalized points, whose entry (z i ) g is also a vector, and the absolute value of each entry is treated as the 2 norm of (z i ) g . See Fig. 1 a intuitively, where the top line is the vector z, the rectangles with dashed line are original (z i ) g , and the rectangles with solid line are the generalized points. In the case of PBC, we know that each line of Fig. 1 a is translated equal. We operate the vector x similarly as the vector z. Putting these generalized points (rectangles with solid line in the figure) as the columns of a matrix, we can regard \(s\|\mathbf {z} -\mathbf {x}\|_{2}^{2}\) as the matrix Frobenius norm \(\|\left [(z_{i})_{g}\right ]-\left [(x_{i})_{g}\right ]\|_{F}^{2}\), where [(z i ) g ] is a matrix as in Fig. 1 a with every line being its row ([(x i ) g ] is similar). This is why the second equality in (14) holds.
Fig. 1

Vector case. The top line is the vector z, the rectangles with dashed line are original group vector (z i ) g , and the rectangles with solid line are the generalized points, which changes the overlapping to non-overlapping. a Vector. b Weighted vector

Therefore, generally, for each i of the last line of (14), from the Eqs. (7) and (11), we can obtain
$$ {}\begin{aligned} &\arg\min\limits_{(z_{i})_{g}} \|(z_{i})_{g}\|_{2} + \frac{\beta}{2s}\|(z_{i})_{g} -(x_{i})_{g}\|_{2}^{2} = {\text{Sh}_{2}\left((x_{i})_{g},\frac{\beta}{s}\right)},\\ &\qquad \quad{(z_{i})_{g}} = \max \left\{ \|{(x_{i})_{g}}\|_{2} - \frac{s}{\beta}, 0\right\} \frac{{(x_{i})_{g}}}{\|{(x_{i})_{g}}\|_{2}},\\ &\qquad \quad{(z_{i})_{g}} = \left(\left[{(z_{i})_{g}}\right]_{1},\left[{(z_{i})_{g}}\right]_{2},\ldots,\left[{(z_{i})_{g}}\right]_{s}\right). \end{aligned} $$
(15)

Similarly as Fig. 1 a, for each i, the ith entry x i (or z i ) of the vector x (or z) may appear s times, so we need to compute each z i for s times in s different groups.

However, the results from (15) are not able to be satisfied simultaneously, because the results z i in s different groups are different from (15). Moreover, for each i of the last line in (14), the result (15) is given by that the vector (z i ) g is parallel to the vector (x i ) g due to Section 2.1.

Notice this point and ignore that (z i ) g =0 or (x i ) g =0, particularly for s=4 and (z i ) g =(z i−1,z i ,z i+1,z i+2), the vector z can be split as follows,
$$ \begin{array}{rllrrrrrrrrl} \mathbf{z}=&&(z_{1},&z_{2},&z_{3},&z_{4},&z_{5},&\cdots,&z_{n-2},&z_{n-1},&z_{n}&)\\ =&+\frac{1}{4}&(z_{1},&z_{2},&z_{3},&0,&0,&\cdots,&0,&0,&z_{n}&)\\ &+\frac{1}{4}&(z_{1},&z_{2},&z_{3},&z_{4},&0,&\cdots,&0,&0,&0&)\\ &+\frac{1}{4}&(0,&z_{2},&z_{3},&z_{4},&z_{5},&\cdots,&0,&0,&0&)\\ &+\ &&\cdots&&&&&&&&\\ &+\frac{1}{4}&(z_{1},&0,&0,&0,&0,&\cdots,&z_{n-2},&z_{n-1},&z_{n}&)\\ &+\frac{1}{4}&(z_{1},&z_{2},&0,&0,&0,&\cdots,&0,&z_{n-1},&z_{n}&).\\ \end{array} $$
(16)

Let (z i )g′=(0,,0,z i−1,z i ,z i+1,z i+2,0,,0) be the expansion of (z i ) g , with (z 1)g′=(z 1,z 2,z 3,0,,0,z n ),(z n−1)g′=(z 1,0,0,,0,z n−2,z n−1,z n ),(z n )g′=(z 1,z 2,0,,0,z n−1,z n ). Let (x i )g′=(0,,0,x i−1,x i ,x i+1,x i+2,0,,0) be the expansion of (x i ) g similarly as (z i )g′. Then, we have \(\mathbf {z}=\frac {1}{4}\sum _{i=1}^{n}(z_{i})'_{g}\), and \(\mathbf {x}=\frac {1}{4}\sum _{i=1}^{n}(x_{i})'_{g}\). Moreover, we can easily obtain that (z i )g2=(z i ) g 2 and (x i )g2=(x i ) g 2 for every i.

On one hand, the Euler equation of f m (z) (with s=4) is given by
$$ \beta\left(\mathbf{z}- \mathbf{x}\right) + \frac{(z_{1})'_{g}}{\|(z_{1})'_{g}\|_{2}} + \cdots + \frac{(z'_{n})_{g}}{\|(z'_{n})_{g}\|_{2}} \ni \mathbf{0}, $$
(17)
$$ \begin{aligned} {}\frac{\beta}{4}\sum_{i=1}^{n}\left((z_{i})'_{g} - (x_{i})'_{g} \right) &+ \frac{(z_{1})'_{g}}{\|(z_{1})'_{g}\|_{2}} +\cdots + \frac{(z_{i})'_{g}}{\|(z_{i})'_{g}\|_{2}} + \cdots \\ &+ \frac{(z_{n})'_{g}}{\|(z_{n})'_{g}\|_{2}}\ni \mathbf{0}. \end{aligned} $$
(18)
From the deduction of the two-dimensional shrinkage formula (7) in Section 2.1, we know that the necessary condition of minimizing the ith term of the last line in (14) is that (z i ) g is parallel to (x i ) g . That is, (z i )g′ is parallel to (x i )g′ for every i. For example,
$$ {\begin{aligned} (z_{2})'_{g} = (z_{1},z_{2},z_{3},z_{4},0,\cdots,0,0) // (x_{2})'_{g} = (x_{1},x_{2},x_{3},x_{4},0,\cdots,0,0). \end{aligned}} $$
(19)
Then, we obtain \(\frac {(z_{i})'_{g}}{\|(z_{i})'_{g}\|_{2}} =\frac {(x_{i})'_{g}}{\|(x_{i})'_{g}\|_{2}}\). Therefore, (18) can be changed to
$$ {\begin{aligned} \frac{\beta}{4}\sum_{i=1}^{n}\left((z_{i})'_{g} - (x_{i})'_{g} \right) + \frac{(x_{1})'_{g}}{\|(x_{1})'_{g}\|_{2}} +\cdots + \frac{(x_{i})'_{g}}{\|(x_{i})'_{g}\|_{2}}+ \cdots + \frac{(x_{n})'_{g}}{\|(x_{n})'_{g}\|_{2}}\ni \mathbf{0}, \end{aligned}} $$
(20)
$$ {\begin{aligned}\beta\left(\mathbf{z}- \mathbf{x}\right)+ \frac{(x_{1})'_{g}}{\|(x_{1})'_{g}\|_{2}} +\cdots + \frac{(x_{i})'_{g}}{\|(x_{i})'_{g}\|_{2}}+ \cdots + \frac{(x_{n})'_{g}}{\|(x_{n})'_{g}\|_{2}}\ni \mathbf{0}, \end{aligned}} $$
(21)
$$ {\begin{aligned}\mathbf{z} \ni \mathbf{x}-\frac{1}{\beta} \left(\frac{(x_{1})'_{g}}{\|(x_{1})'_{g}\|_{2}} +\cdots + \frac{(x_{i})'_{g}}{\|(x_{i})'_{g}\|_{2}}+ \cdots + \frac{(x_{n})'_{g}}{\|(x_{n})'_{g}\|_{2}}\right), \end{aligned}} $$
(22)
for each component, we obtain
$$ \begin{aligned} z_{i} \ni x_{i} &-\frac{1}{\beta} \left(\frac{x_{i}}{\|(x_{i-2})'_{g}\|_{2}} +\frac{x_{i}}{\|(x_{i-1})'_{g}\|_{2}} + \frac{x_{i}}{\|(x_{i})'_{g}\|_{2}}\right.\\ &\left.+ \frac{x_{i}}{\|(x_{i+1})'_{g}\|_{2}}\right). \end{aligned} $$
(23)

Therefore, when (z i ) g 0 and (x i ) g 0, we find a minimizer of (3) on the direction that all the vectors (z i ) g are parallel to the vectors (x i ) g . In addition, when \( \frac {\beta }{4}\left ((z_{i})'_{g} - (x_{i})'_{g} \right) + \frac {(z_{i})'_{g}}{\|(z_{i})'_{g}\|_{2}} = 0\), (18) holds, then \(\left (\frac {\beta }{4}+ \frac {1}{\|(z_{i})'_{g}\|_{2}}\right)(z_{i})'_{g} =(x_{i})'_{g}\); therefore, \(\frac {(z_{i})'_{g}}{\|(z_{i})'_{g}\|_{2}} =\frac {(x_{i})'_{g}}{\|(x_{i})'_{g}\|_{2}}\) holds. Moreover, because of the strict convexity of f m (z), we know that the minimizer is unique.

On the other hand, when (z i ) g =0 or (x i ) g =0, our method may not obtain good results. When (x i ) g =0, we know that the minimizer of the subproblem \(\min \limits _{(z_{i})_{g}}\|(z_{i})_{g}\|_{2} + \frac {\beta }{2s}\|(z_{i})_{g} -(x_{i})_{g}\|_{2}^{2}\) is exactly that (z i ) g =0. When (x i ) g 0 and the minimizer of the subproblem \(\min \limits _{(z_{i})_{g}}\|(z_{i})_{g}\|_{2} + \frac {\beta }{2s}\|(z_{i})_{g} -(x_{i})_{g}\|_{2}^{2}\) is that (z i ) g =0 (since the parameter β/s is two small), our method is not able to obtain good results. For example, that (z i ) g =0 while (z i+1) g 0 makes the element z i in z take different values in different subproblems. However, we can obtain an approximate minimizer in this case, which is that the element z i is a simple summation of corresponding subproblems containing z i . We will show that in the experiments of Section 5, the approximate minimizer is also good. Moreover, when we take this problem as a subproblem of the image processing problem, we can set the parameter β to be large enough to avoid this drawback.

In addition, from (23), we can know that the element z i of the minimizer can be treated as in s subproblems independently and then we combine them. In conclusion, after some manipulations, we can get the following two general formulas. (I)
$$ \arg\min\limits_{\mathbf{z}} \quad \|\mathbf{z}\|_{2,1} + \frac{\beta}{2}\|\mathbf{z} -\mathbf{x}\|_{2}^{2} = {\text{Sh}_{\text{OGS}}(\mathbf{x},\beta)}, $$
(24)
with
$$ {\text{Sh}_{\text{OGS}}(\mathbf{x},\beta)}_{i} =z_{i} =\max \left\{ 1 - \frac{1}{\beta}{F(x_{i})}, 0\right\} {x_{i}}. $$
(25)
Here, for instance, when group size s=4, (z i ) g =(z i−1,z i ,z i+1,z i+2) in z2,1 and (x i ) g is defined similarly as (z i ) g , we have \(F(x_{i}) = \left (\frac {1}{\|(x_{i-2})_{g}\|_{2}} +\frac {1}{\|(x_{i-1})_{g}\|_{2}}\right.\left.+\frac {1}{\|(x_{i})_{g}\|_{2}}+\frac {1}{\|(x_{i+1})_{g}\|_{2}}\right)\). The (x j ) g 2 is contained in F(x i ) if and only if (x j ) g 2 has the component x i , and we follow the convention (1/0)=1 in F(x i ) because (x i ) g 2=0 implies x i =0 and the value of F(x i ) is insignificant in (25). (II)
$$ \arg\min\limits_{\mathbf{z}} \ \ \ \|\mathbf{z}\|_{2,1} + \frac{\beta}{2}\|\mathbf{z} -\mathbf{x}\|_{2}^{2} = {\text{Sh}_{\text{OGS}}(\mathbf{x},\beta)}, $$
(26)
with
$$ {\text{Sh}_{\text{OGS}}(\mathbf{x},\beta)}_{i} =z_{i} ={G(x_{i})}\cdot{x_{i}}. $$
(27)

Here, symbols are the same as (I), and \(G(x_{i}) = \max \left (\frac {1}{s} - \frac {1}{\beta \|(x_{i-2})_{g}\|_{2}},0\right) +\max \left (\frac {1}{s}- \frac {1}{\beta \|(x_{i-1})_{g}\|_{2}},0\right)+\max \left (\frac {1}{s} - \frac {1}{\beta \|(x_{i})_{g}\|_{2}},0\right)+\max \left (\frac {1}{s} - \frac {1}{\beta \|(x_{i+1})_{g}\|_{2}},0\right)\).

We call them Formula (I) and Formula (II) respectively throughout this paper. When β is sufficiently large or sufficiently small, the former two formulas are the same. For the other values of β, from the experiments, we find that Formula (II) is better approximate to the MM iteration method than Formula (I), so we list the algorithm for Formula (II) as follows for finding the minimizer of (3). If we would use Formula (I), we can only change the final three steps, which does not change the computation cost.

We can see that Algorithm 1 only needs 2 times of convolution computations with time complexity ns, which is just the same time complexity as one-step iteration in the MM method of [10]. Therefore, our method is much more efficient than the MM method or other variable duplication methods. In Section 5, we will give the numerical experiments for comparison between our method and the MM method. Moreover, if s=1, our Algorithm 1 degenerates to the classic soft thresholding as our formula (27) degenerates to (6). Moreover, when β is sufficiently large or sufficiently small, the results of our new explicit algorithm are almost the same as the MM method (after 20 or more iteration steps) in [10].

Remarks 2.

Our new explicit algorithm can be treated as an average estimation algorithm for solving all the overlapping group subproblem independently. In Section 5, our numerical experiments show that the results of our two formulas are almost the same as the MM method in [10] when β is sufficiently large or sufficiently small and is approximate to the MM method for other β. Moreover, this point illuminates that the OGS regularizers coincide to maintain the property of the classical sparse regularizers and smoothen the local regions. For instance, TV with OGS can preserve edges and simultaneously overcome staircases in our experiments and [11, 12].

For the problems (4) (See Fig. 1 b intuitively) and (5), we can get similar algorithms. We omit the evolution details here and give them as in Appendix. Here, we directly give the algorithm for solving the problem (5) as follows under Formula (II).

We can also see that Algorithm 2 only needs 2 times of convolution computations with time complexity ns, which is just the same time complexity as one-step iteration in the MM method of [10]. Therefore, our method is much more efficient than the MM method.

3 Several extensions

3.1 Other boundary conditions

In Section 2, we gave the explicit shrinkage formulas for one class of OGS problems (3), (4), and (5). In order to achieve a simply deduction, we assume that PBC is used. One may confuse that whether PBC is always good for regularization problems such as signal processing or image processing, since natural signals or images are often asymmetric. However, in these problems, assuming a kind of boundary condition is necessary for simplifying the problem and making the computation possible [24]. There are kinds of BCs, such as zero BC (ZBC) and reflective BC. PBC is often used in optimization because it can be computed fast as above or other methods, for example, computation of matrix of block circulant with circulant blocks (BCCB) by fast Fourier transforms [20, 21, 24].

In this section, we consider other BC such as ZBC and reflective BC. For simplification, we only consider the vector case, while the results can be easily expanded to the matrix case as that in Section 2. When ZBC is used, we can expand the original vectors (or signal) z (\(=(z_{i})_{i=1}^{n}\)) and x by two s-length vectors on the both hands of the original vectors, which are \(\tilde {\mathbf {z}}\) (\(=\left [\tilde {z}_{-s},\cdots,\tilde {z}_{-1},(\tilde {z}_{i})_{i=1}^{n},\tilde {z}_{n+1},\cdots,\tilde {z}_{n+s}\right ]=\left [\mathbf {0}_{s},(z_{i})_{i=1}^{n},\mathbf {0}_{s}\right ]\)) and \(\tilde {\mathbf {x}}\), respectively. Then, the results and algorithms in Section 2 are similar as the case of PBC on \(\tilde {\mathbf {z}}\) and \(\tilde {\mathbf {x}}\). See Fig. 2 a intuitively.
Fig. 2

The case of expanding vectors with other BCs to use the formulas under PBC. a ZBC. b Reflective BC

Moreover, our numerical results will show that the results from a different BC are almost the same in practice (see Section 5.1); and according to the definition of weighted generalized 2,1 norm, ZBC seems better than PBC. Therefore, we will choose the ZBC to solve the problems (3), (4), and (5) in the following sections.

When reflective BC is assumed, the results are also the same. We only need to extend the original vector x to \(\hat {\mathbf {x}}\) with reflective BC. See Fig. 2 b intuitively.

3.2 Nonpositive weights and different weights in groups

In this section, we show that the weight vector w g and matrix W g in the former sections can contain arbitrary s entries with arbitrary real numbers. On one hand, the zero value can be the arbitrary entries of the weight vector or matrix. For example, the original 1 regularization problem (1) can be seen as a special form of weighted generalized 2,1 norm regularization problems (4) with s=1 or s=3 and w g =(1,0,0) for the example of Fig. 1 b.

On the other hand, the norm is with the property of positive homogeneity, (−1)·w g any =w g any , where “ · any ” can be arbitrary norm such as ·1 and ·2. Therefore, for the regularization problems (4) (or (5)), the weight vector w g (or matrix W g ) is the same as |w g | (or |W g |), where |·| the point-wise absolute value. Therefore, in general, our results are true, whatever the number of entries included in the weight vector w g and matrix W g and whatever the real number value of these entries.

However, if the weight w g is dependent on the group index i, for example, w i,g (z i ) g =((w i,g )1 z i ,(w i,g )2 z i+1,(w i,g )3 z i+2), we cannot solve the relevant problems easily since our method fails. As we mentioned above, we only focus on that the weight w g is independent on the group index i, that is, w i,g =w g for all i, which means that it is with translation invariant overlapping groups.

4 Applications in TV regularization problems with OGS

The TV regularizer was firstly introduced by Rudin et al. [25] (ROF), which is widely used in many fields, i.e., denoising and deblurring problems [20, 21, 2631]. Several fast algorithms such as Chambelle [26] and fast TV deconvolution (FTVd) [20, 21] have been proposed. Its corresponding minimization task is
$$ \min_{f} \|f \|_{\text{TV}} + \frac{\mu}{2} \|Hf-g\|_{2}^{2}, \ \ $$
(28)
where \( \|f \|_{\text {TV}}\!\!:=\!\!\!\! \sum \limits _{1\leq i,j \leq n} \!\!\|(\nabla f)_{i,j}\|_{2} \,=\,\! \sum \limits _{1\leq i,j \leq n}\!\! \sqrt {{|(\!\nabla _{x} f)}_{i,j}|^{\!2}\!\! +\!\! |(\nabla _{y} f)_{i,j}|^{\!2} }\) (called isotropic TV, ITV), or \( \|f \|_{TV}:= \sum \limits _{1\leq i,j \leq n}\!\|(\nabla f)_{i,j}\|_{1}= \sum \limits _{1\leq i,j \leq n} |(\nabla _{x} f)_{i,j}| + |(\nabla _{y} f)_{i,j}|\) (called anisotropic TV, ATV), H denotes the blur matrix, and g denotes the given observed image with blur and noise. Operator \(\nabla : \mathbb {R}^{n^{2}}\rightarrow \mathbb {R}^{2\times {n^{2}}}\) denotes the discrete gradient operator (under PBC) which is defined by (f) i,j =(( x f) i,j ,( y f) i,j ), with
$${}(\nabla_{x} f)_{i,j} \,=\,\left\{\begin{array}{lll} f_{i+1,j}\,-\,f_{i,j}\!\!&\text{if}&i\!<\!n,\\ f_{1,j}\,-\,f_{n,j}&\text{if}&i\,=\,n,\end{array}\right. (\nabla_{y} f)_{i,j} \,=\,\left\{\begin{array}{lll} f_{i,j+1}\,-\,f_{i,j}\!\!&\text{if}&j\!<\!n,\\ f_{i,1}\,-\,f_{i,n}&\text{if}&j\,=\,n,\end{array}\right. $$
for i,j=1,2,,n, where f i,j refers to the ((j−1)n+i)th entry of the vector f (it is the (i,j)th pixel location of the n×n image, and this notation remains valid throughout the paper unless otherwise specified). Notice that H is a matrix of BCCB structure when PBC are applied [24].

Recently, Selesnick et al. [13] proposed an OGS TV regularizer to one-dimensional signal denoising. They applied the MM method to solve their model. Their numerical experiments showed that their method can overcome staircase effects effectively and get better results. However, their method has the disadvantages of the low speed of computation and the difficulty to be directly extended to the two-dimensional image case. More recently, Liu et al. [11] proposed an OGS TV regularizer for two-dimensional image denoising and deblurring under Gaussian noise, and Liu et al. [12] proposed an OGS TV regularizer for image deblurring under impulse noise. Both of them used a variable substitution method and the ADMM framelet with an inner MM iteration for solving the subproblems similar as (3). Therefore, their methods may spend more time than our methods. Moreover, when the MM method is used in the inner iterations in [11, 12], they can only solve the ATV case but not the ITV case with OGS while our methods can solve both ATV and ITV cases.

Firstly, we define the ATV case with OGS under Gaussian noise and impulse noise respectively which is similar as [11, 12]. For the Gaussian noise case,
$$ \min_{f} \|(\nabla_{x} f)\|_{W,2,1} +\|(\nabla_{y} f)\|_{W,2,1} + \frac{\mu}{2} \|Hf-g\|_{2}^{2}. $$
(29)
For the impulse case,
$$ \min_{f} \|(\nabla_{x} f)\|_{W,2,1} +\|(\nabla_{y} f)\|_{W,2,1} + \mu \|Hf-g\|_{1}. $$
(30)

We call the former model as TV OGS L 2 model and the latter model as TV OGS L 1 model, respectively.

Then, we defined the ITV case with OGS. For Gaussian noise and impulse noise, we only change the former two terms of (29) and (30) respectively by A W,2,1. Here, A is a high-dimensional matrix (or tensor) with each entry A i,j =(( x f) i,j ;( y f) i,j ).

Remarks 3.

In particular in this work, A can be treated as (( x f);( y f)) for simplicity in vector and matrix computation, although the computation of A W,2,1 is should be treated as pointwise with A i,j =(( x f) i,j ;( y f) i,j ) since the computation on A is almost all pointwise.

Moreover, we consider constrained model as listing in [11, 12, 32]. For any true digital image, its pixel value can attain only a finite number of values. Hence, it is natural to require all pixel values of the restored image to lie in a certain interval [a,b], see [32] for more details. In general, with the easy computation and the certified results in [32], we only consider all the images located on the standard range [0,1]. Therefore,we define a projection operator \(\mathcal {P}_{\Omega }\) on the set \(\Omega =\left \{f\in \mathbb {R}^{n\times n}|0\leq f\leq 1\right \}\),
$$ \mathcal{P}_{\Omega}(f)_{i,j}= \left\{\begin{array}{ll} 0,&\,\,f_{i,j}<0,\\ f_{i,j},&\,\,f_{i,j}\in[0,1],\\ 1,&\,\,f_{i,j}>1. \end{array}\right. $$
(31)

4.1 Constrained TV OGS L 2 model

For the constrained model (the ATV case) (called CATVOGSL2), we have
$$ {{}\begin{aligned} \min_{u\in\Omega,v_{x},v_{y},f} &\left\{\|(v_{x})\|_{W,2,1} +\|(v_{y})\|_{W,2,1} + \frac{\mu}{2} \|Hf-g\|_{2}^{2}:\right.\\&\left.\quad v_{x}=\nabla_{x} f, v_{y}=\nabla_{y} f,u=f{\vphantom{\frac{\mu}{2}}}\right\}. \end{aligned}} $$
(32)
The augmented Lagrangian function [33] of (32) is
$$ {\begin{aligned} \mathcal{L}\left(v_{x},v_{y},u,f;\lambda_{1},\lambda_{2},\lambda_{3}\right)=&\,\|v_{x}\|_{W,2,1} - {\lambda_{1}^{T}} (v_{x}-\nabla_{x} f) +\frac{\beta_{1}}{2}\|v_{x}-\nabla_{x} f\|_{2}^{2}\\ &+\|v_{y}\|_{W,2,1} - {\lambda_{2}^{T}} (v_{y}-\nabla_{y} f) +\frac{\beta_{1}}{2}\|v_{y}-\nabla_{y} f\|_{2}^{2}\\ &-{\lambda_{3}^{T}} (u -f) + \frac{\beta_{2}}{2}\|u-f\|_{2}^{2} + \frac{\mu}{2} \|Hf-g\|_{2}^{2}, \end{aligned}} $$
(33)

where β 1,β 2>0 are penalty parameters and \(\lambda _{1}, \lambda _{2}, \lambda _{3} \in \mathbb {R}^{n^{2}}\) are the Lagrange multipliers.

The solving algorithm is according to the scheme of ADMM in Gabay [34], and we refer to some applications in image processing which can be solved by ADMM, e.g., [3543]. For a given \(\left ({v_{x}^{k}},{v_{y}^{k}},u^{k},f^{k}; {\lambda _{1}^{k}},{\lambda _{2}^{k}},{\lambda _{3}^{k}}\right)\), the next iteration \((v_{x}^{k+1},v_{y}^{k+1},u^{k+1}, f^{k+1}; \lambda _{1}^{k+1},\lambda _{2}^{k+1},\lambda _{3}^{k+1})\) is generated as follows: 1. Fix f=f k , \(\lambda _{1}={\lambda _{1}^{k}}, \lambda _{2}={\lambda _{2}^{k}}, \lambda _{3}={\lambda _{3}^{k}}\) and minimize (33) with respect to v x , v y , and u. With respect to v x and v y ,
$$ {}\begin{array}{rl} v_{x}^{~k+1}&\!\,=\,\arg\min \|v_{x}\|_{W,2,1} \,-\, {{\lambda_{1}^{k}}}^{T} \!(v_{x}\,-\,\nabla_{x} f^{k}) \,+\,\frac{\beta_{1}}{2}\!\|v_{x}\,-\,\nabla_{x} f^{k}\!\|_{2}^{2}\\ &\!\,=\,\arg\min \|v_{x}\|_{W,2,1} \,+\,\frac{\beta_{1}}{2}\|v_{x}-\nabla_{x} f^{k} - \frac{{\lambda_{1}^{k}}}{\beta_{1}}\|_{2}^{2},\\ \end{array} $$
(34)
$$\begin{array}{r@{~}l} {}v_{y}^{~k+1}&\!\,=\,\arg\min \|v_{y}\|_{W,2,1} \,-\, {{\lambda_{2}^{k}}}^{T} \!(v_{y}\,-\,\nabla_{y} f^{k}) \,+\,\frac{\beta_{1}}{2}\|v_{y}\,-\,\nabla_{y} f^{k}\|_{2}^{2}\\ &\!\,=\,\arg\min \|v_{y}\|_{W,2,1} +\frac{\beta_{1}}{2}\|v_{y}-\nabla_{y} f^{k} - \frac{{\lambda_{2}^{k}}}{\beta_{1}}\|_{2}^{2}.\\ \end{array} $$
(35)

It is obvious that problems (34) and (35) match the framework of the problem (5); thus, the minimizers of (34) and (35) can be obtained by using the formulas in Section 2.2.

With respect to u,
$$\begin{array}{r@{~}l} u^{k+1}&=\arg\min -{{\lambda_{3}^{k}}}^{T} (u -f^{k}) + \frac{\beta_{2}}{2}\|u-f^{k}\|_{2}^{2}\\ &=\arg\min \frac{\beta_{2}}{2}\|u-f^{k} - \frac{{\lambda_{3}^{k}}}{\beta_{2}}\|_{2}^{2}.\\ \end{array} $$
The minimizer is given explicitly by
$$ u^{k+1}=\mathcal{P}_{\Omega}\left[f^{k} + \frac{{\lambda_{3}^{k}}}{\beta_{2}}\right]. $$
(36)
2. Compute f k+1 by solving the normal equation
$$ \begin{array}{l} (\beta_{1} (\nabla_{x}^{*} \nabla_{x}+\nabla_{y}^{*} \nabla_{y})+\mu H^{*} H + \beta_{2} I)f^{k+1} \\ =\nabla_{x}^{*}(\beta_{1} v_{x}^{k+1}-{\lambda^{k}_{1}})+{\nabla_{y}}^{*}(\beta_{1} v_{y}^{k+1}-{\lambda_{2}^{k}})+\mu H^{*} g \\ \quad+ \beta_{2} (u^{k+1} - \frac{{\lambda_{3}^{k}}}{\beta_{2}}),\\ \end{array} $$
(37)
where “ ” denotes the conjugate transpose, see [44] for more details. Since all the parameters are positive, the coefficient matrix in (37) are always invertible and symmetric positive-definite. In addition, H, x , y and their conjugate transpose have BCCB structure under PBC. We know that the computations with BCCB matrix can be very efficient by using fast Fourier transforms. 3. Update the multipliers via
$$ \left\{ \begin{array}{*{20}{l@{~}}} \lambda_{1}^{k+1}&=&{\lambda_{1}^{k}} - \gamma\beta_{1}(v_{x}^{k+1}-\nabla_{x} f^{k+1}),\\ \lambda_{2}^{k+1}&=&{\lambda_{2}^{k}} - \gamma\beta_{1}(v_{x}^{k+1}-\nabla_{x} f^{k+1}),\\ \lambda_{3}^{k+1}&=&{\lambda_{3}^{k}} - \gamma\beta_{2}(u^{k+1} -f^{k+1}).\\ \end{array} \right. $$
(38)

Based on the discussions above, we present the ADMM algorithm for solving the convex CATVOGSL2 model (32), which is shown as Algorithm 3.

Algorithm 3 is an application of ADMM for the case with two blocks of variables (v x ,v y ,u) and f. Thus, if the step (1) of Algorithm 3 can be solved exactly, its convergence is guaranteed by the theory of ADMM [34, 35, 37, 43]. Although step (1) of Algorithm 3 cannot be solved exactly, we can find a convergent series to ensure the convergence as [35]. Besides, our numerical experiments verify the convergence of Algorithm 3.

Remarks 4.

For the image f, we use PBC for the fast computation. However, for v x , v y , we use ZBC. Because v x and v y denote the gradient of the image, ZBC seems better for the definition of the generalized norm 2,1 on v x and v y as mentioned in Section 3.1. Therefore, the two kinds of BCs for the image and its gradient are different and independent. These remain valid throughout the paper unless otherwise specified.

For the ITV case, the constrained model (CITVOGSL2) is
$$ {}\min_{u\in\Omega,A,f} \left\{\|A\|_{W,2,1} \,+\, \frac{\mu}{2} \|Hf\,-\,g\|_{2}^{2}:\ A\,=\, (\nabla_{x} f;\nabla_{y} f),u=f\right\}. $$
(39)

The detail will be presented in the next section for the constrained TV OGS L 1 model. We call this relevant algorithm CITVOGSL2.

4.2 Constrained TV OGS L 1 model

For the constrained model (the ITV case) (called CITVOGSL1), we have
$$ {}\min_{u\in\Omega,A,f}\! \left\{\!\|A\|_{W,2,1} \!\,+\,\! {\mu} \|z\|_{1}\!:z\,=\,Hf\,-\,g, A\,=\,(\nabla_{x} f;\nabla_{y} f),u\,=\,f \right\}. $$
(40)
The augmented Lagrangian function of (40) is
$$ {\begin{aligned} \mathcal{L}\left(v_{x},v_{y},z,w,f;\lambda_{1},\lambda_{2},\lambda_{3},\lambda_{4}\right)=\,&\|A\|_{W,2,1} - \left({\lambda_{1}^{T}},{\lambda_{2}^{T}}\right)\! \left(A-\left[ \begin{array}{c} \nabla_{x} f\\ \nabla_{y} f\\ \end{array}\right]\right)\\ & +\frac{\beta_{1}}{2}\left\|A-\left[ \begin{array}{c} \nabla_{x} f\\ \nabla_{y} f\\ \end{array}\right]\right\|_{2}^{2}\\ &+\mu \|z\|_{1} - {\lambda_{3}^{T}} \left(z-(Hf - g)\right) \\ &+\frac{\beta_{2}}{2}\|z-(Hf-g)\|_{2}^{2}\\ &-{\lambda_{4}^{T}} (w -f) + \frac{\beta_{3}}{2}\|w-f\|_{2}^{2}, \end{aligned}} $$
(41)

where β 1,β 2,β 3>0 are penalty parameters and \(\lambda _{1}, \lambda _{2}, \lambda _{3}, \lambda _{4} \in \mathbb {R}^{n^{2}}\) are the Lagrange multipliers.

For a given (A k ,z k ,u k ,f k ; \({\lambda _{1}^{k}},{\lambda _{2}^{k}},{\lambda _{3}^{k}},{\lambda _{4}^{k}})\), the next iteration (A k+1,z k+1,u k+1, f k+1; \(\lambda _{1}^{k+1},\lambda _{2}^{k+1},\lambda _{3}^{k+1},\lambda _{4}^{k+1})\) is generated as follows:

1. Fix f=f k , \(\lambda _{1}={\lambda _{1}^{k}}, \lambda _{2}={\lambda _{2}^{k}}, \lambda _{3}={\lambda _{3}^{k}}, \lambda _{4}={\lambda _{4}^{k}}\) and minimize (41) with respect to A, z, and u. With respect to A,
$$ {\begin{array}{r@{~}l} A^{k+1}&=\arg\min \|A\|_{W,2,1} - ({\lambda_{1}^{T}},{\lambda_{2}^{T}}) (A-\left[\begin{matrix}\nabla_{x} f^{k}\\ \nabla_{y} f^{k}\\ \end{matrix}\right]) +\frac{\beta_{1}}{2} \left\|A-\left[\begin{matrix}\nabla_{x} f^{k}\\ \nabla_{y} f^{k}\\ \end{matrix}\right]\right\|_{2}^{2}\!,\\ \left[\begin{matrix}A_{1}^{k+1}\\ A_{2}^{k+1}\\ \end{matrix}\right]&=\arg\min \|A\|_{W,2,1} +\frac{\beta_{1}}{2}\left\|\left[\begin{matrix}A_{1}\\ A_{2}\\ \end{matrix}\right]-\left[\begin{matrix}\nabla_{x} f^{k}\\ \nabla_{y} f^{k}\\ \end{matrix}\right]- \left[\begin{matrix}{{\lambda_{1}^{k}}}/{\beta_{1}}\\ {{\lambda_{2}^{k}}}/{\beta_{1}}\\ \end{matrix}\right]\right\|_{2}^{2}.\\ \end{array}} $$
(42)

It is obvious that problem (42) match the framework of the problem (5), thus the solution of (42) can be obtained by using the formulas in Section 2.2.

With respect to z,
$${\begin{array}{r@{~}l} z^{k+1}&=\arg\min\mu \|z\|_{1} - {{\lambda_{3}^{k}}}^{T} \left(z-(Hf^{k} - g)\right) + \frac{\beta_{2}}{2}\|z-(Hf^{k}-g)\|_{2}^{2}\\ &=\arg\min\mu \|z\|_{1} + \frac{\beta_{2}}{2}\|z-(Hf^{k}-g) - \frac{{\lambda_{3}^{k}}}{\beta_{2}}\|_{2}^{2}.\\ \end{array}} $$
The minimization with respect to z can be given by (6) and (9) explicitly, that is,
$$ {}z^{k+1}\,=\,\text{sgn}\!\left\{Hf^{k} \!- g \,+\,\frac{{\lambda_{3}^{k}}}{\beta_{2}}\right\}\circ \max\!\left\{\!|Hf^{k} \!-g\! +\!\frac{{\lambda_{3}^{k}}}{\beta_{2}}|-\frac{\mu}{\beta_{2}},0\right\}\!. $$
(43)
With respect to u,
$$\begin{array}{r@{~}l} u^{k+1}&=\arg\min -{{\lambda_{4}^{k}}}^{T} \left(u -f^{k}\right) + \frac{\beta_{3}}{2}\|u-f^{k}\|_{2}^{2}\\ &=\arg\min \frac{\beta_{3}}{2}\|u-f^{k} - \frac{{\lambda_{4}^{k}}}{\beta_{3}}\|_{2}^{2}.\\ \end{array} $$
The minimizer is given explicitly by
$$ u^{k+1}=\mathcal{P}_{\Omega}\left[f^{k} + \frac{{\lambda_{4}^{k}}}{\beta_{3}}\right]. $$
(44)
2. Compute f k+1 by solving the following normal equation similarly as the last section.
$$ \begin{array}{l} (\beta_{1} (\nabla_{x}^{*} \nabla_{x}+\nabla_{y}^{*} \nabla_{y})+\beta_{2} H^{*} H + \beta_{3} I)f^{k+1} \\ =\nabla_{x}^{*}(\beta_{1} A_{1}^{k+1}-{\lambda^{k}_{1}})+{\nabla_{y}}^{*}(\beta_{1} A_{2}^{k+1}-{\lambda_{2}^{k}})+\\ H^{*} (\beta_{2} z^{k+1} -{\lambda_{3}^{k}}) +\beta_{2} H^{*} g + \beta_{3} (u^{k+1} - \frac{{\lambda_{4}^{k}}}{\beta_{3}}).\\ \end{array} $$
(45)
3. Update the multipliers via
$$ \left\{ \begin{array}{*{20}{l@{~}}} \left[\begin{matrix}\lambda_{1}^{k+1}\\ \lambda_{2}^{k+1}\\ \end{matrix}\right] &=&\left[\begin{matrix}{\lambda_{1}^{k}}\\ {\lambda_{2}^{k}}\\ \end{matrix}\right]- \gamma\beta_{1} \left(\left[\begin{matrix}A_{1}^{k+1}\\ A_{2}^{k+1}\\ \end{matrix}\right]- \left[\begin{matrix}\nabla_{x} f^{k+1}\\ \nabla_{y} f^{k+1}\\ \end{matrix}\right]\right),\\ \lambda_{3}^{k+1}&=&{\lambda_{3}^{k}} - \gamma\beta_{2}(z^{k+1}-(Hf^{k+1}-g)),\\ \lambda_{4}^{k+1}&=&{\lambda_{4}^{k}} - \gamma\beta_{3}(u^{k+1} -f^{k+1}).\\ \end{array} \right. $$
(46)

Based on the discussions above, we present the ADMM algorithm for solving the convex CITVOGSL1 model (40), which is shown as Algorithm 4.

Algorithm 4 is an application of ADMM for the case with two blocks of variables (A,z,u) and f. Thus, if step (1) of Algorithm 4 can be solved exactly, its convergence is guaranteed by the theory of ADMM [34, 35, 37, 43]. Although step (1) of Algorithm 4 cannot be solved exactly, we can find a convergent series to ensure the convergence as [35]. Besides, our numerical experiments verify the convergence of Algorithm 4.

For the ATV case, the constrained model (CATVOGSL1) is
$$ \min_{u\in\Omega,v_{x},v_{y},f} \left\{\begin{array}{l}\|(v_{x})\|_{W,2,1} +\|(v_{y})\|_{W,2,1} + {\mu} \|z\|_{1}:\\ z=Hf-g, v_{x}=\nabla_{x} f, v_{y}=\nabla_{y} f,u=f\end{array}\right\}. $$
(47)

The detail has been presented in the last section for the constrained TV OGS L 2 model. We call this relevant algorithm CATVOGSL1.

5 Numerical results

In this section, we present several numerical results to illustrate the performance of the proposed method. All experiments are carried out on a desktop computer using MATLAB 2010a. Our computer is equipped with an Intel Core i3-2130 CPU (3.4 GHz), 3.4 GB of RAM, and a 32-bit Windows 7 operation system.

5.1 Comparison with the MM method for one-dimensional signal denoising

As an illustrative example, we apply our direct formulas in one-dimensional signal denoising. We only compare the results of our explicit shrinkage formulas with the most recent MM iteration method proposed in [10] as a simple example. The one-dimensional group sparse signal z is shown in the top left of Fig. 3. The noisy signal x in the top right of Fig. 3 is obtained by adding independent white Gaussian noise with standard deviation σ=0.5 same as [10]. The denoising model is as follows
$$ \arg\min_{\mathbf{z}} \quad \text{Fun}(\mathbf{z}) = \|\mathbf{z}\|_{2,1}+ \frac{\beta}{2}\|\mathbf{z} -\mathbf{x}\|_{2}^{2}. $$
(48)
Fig. 3

Comparison between our method and the MM method for one-dimensional signal denoising. Second row to bottom row, from top to bottom, results of the MM method, our Formula (I), our Formula (II) respectively, absolute error, and function value history. From left to right, β=3,10,15 respectively

We test several different parameters β (any integer from 1 to 50) for the comparison. The results of our formulas are almost the same as the MM method for almost these βs. Due to the limit space, we only illustrate parts of the results (β=3,10,15) in Fig. 3. We take both our Formulas (I) and (II) for the comparison with the MM method. From the figure, we can visually see that our results by our two kinds of formulas are almost the same as the MM method for different β. Moreover, for the practical problems, β = 10, can be better than others from the figure. This shows that our formulas are feasible, useful, and effective, because our method can only need the same computation cost as one-step iteration in the MM method while the MM method needs 25 iterations as [10]. That means our method is 25 times faster than the MM method. From the bottom line of Fig. 3, we can see that the MM method may take 5 iterations to be convergent, which is the reason why Liu et al. in [11] and Liu et al. in [12] only choose 5 inner iterations. However, in this case, our method is still 5 times faster than the MM method.

Remarks 5.

Although this model is not the best model for signal denoising, it shows the superiority of our method. Moreover, the authors in [10] also take similar experiments.

5.2 Comparison with the MM method on problem (5)

In this section, we give another example by solving the matrix case problem (5) for comparison with the MM method. Without loss of generality, we let A=rand(100,100) and A(45:55,45:55)=0 in the example. In particular, we set weighted matrix \(W_{g}\in \mathbb {R}^{3\times 3}((W_{g})_{i,j}\equiv 1\)) and list (5) again
$$ f_{A}(X) = \min_{X} \, \text{Fun}(X)=\ \|X\|_{2,1}+ \frac{\beta}{2}\|X -A\|_{F}^{2}. $$
(49)
We use different parameters β and use the number of MM iteration by 20 steps. For comparison, we expand our result by explicit shrinkage formulas to length 20 (the same as the MM iteration steps). The values of the object function Fun(X) against iterations are illustrated in the top line in Fig. 4. The cross-sectional elements of the minimizer A are shown in the bottom line in Fig. 4. We choose both ZBC and PBC for our method and the MM method. It can be seen that there is only a little difference between the two kinds of BCs.
Fig. 4

Comparison between our method and the MM method on two kinds of BCs, ZPC, and PBC. Top row, the function value history. Bottom row, the cross line of the final minimizer—matrix X. From left to right, β=1,5,10,20,30

It is obvious that our method is much more efficient than the MM method since our time complexity is just the same as one-step iteration in the MM method due to the explicit shrinkage formula. From the top line in Fig. 4, we observe that the MM method is also fast because it only needs less than 20 steps (sometimes 5) to converge. The related error of the function value of two methods is much less than 1 % for different parameters β. From the bottom line in Fig. 4, we can see that when β≤1 which is sufficiently small, and when β≥30 which is sufficiently large, our result is almost the same as the final results by the MM iteration method. This shows that the results by our method are almost the same as the MM method. When 1<β<30, the minimizer computed by our method is approximate to the MM method both on the error of the function value and on the minimizer X.

In Table 1, we show the numerical comparison of our method and the MM method on three parts, the related error of function value (\(\text {ReE}_{f_{A}}\)), related error of minimizer X (ReE X ), and the mean absolute error of X (MAE X ). The three terms are defined by \(\text {ReE}_{f_{A}}=\frac {|{f_{A}}_{\text {MM}}-{f_{A}}_{\text {ours}}|}{{f_{A}}_{\text {ours}}}\), \(\text {ReE}_{X}=\frac {\|X_{\text {MM}}-X_{\text {ours}}\|_{F}}{\|X_{\text {ours}}\|_{F}}\), and MAE X =mean absolute error of (X MMX ours), where X MM and X ours are the final solution X of the MM method and our method, respectively. f A MM and f A ours are the final function value of MM and our method, respectively.
Table 1

Numerical comparison of our method and the MM method for two kinds of BCs, ZBC, and PBC

BC

β=

1

5

7

10

15

20

30

50

ZBC

\(\text {ReE}_{f_{A}}\)

5.9e-14

0.0055

0.0028

6.5e-4

1.4e-4

5.4e-5

1.4e-5

2.6e-6

 

ReE X

0.0207

0.0017

1.9e-4

4.7e-5

7.5e-6

7.8e-7

 

MAE X

3.4e-15

0.0094

0.0130

0.0067

0.0029

0.0016

6.9e-4

2.4e-4

PBC

\(\text {ReE}_{f_{A}}\)

6.5e-14

0.0053

0.0025

4.1e-4

5.5e-5

1.1e-5

6.3e-7

1.3e-6

 

ReE X

0.0193

0.0014

1.3e-4

2.9e-5

4.3e-6

4.5e-7

 

MAE X

3.8e-15

0.0097

0.0130

0.0063

0.0026

0.0014

5.9e-4

2.1e-4

ReE of f A (X) denotes the related error of final function value fun (X). ReE of X denotes related error of minimizer X. MAE of X denotes the mean absolute error of X

From the table and the figure, we can see that our formula can almost get the same results as the MM method when β is sufficiently large and approximate results similar as the MM method for other β. This is another proof for the feasibility of our formula.

Similarly as weighted matrix \(W_{g}\in \mathbb {R}^{3\times 3}((W_{g})_{i,j}\equiv 1\)), we also tested other weighted matrix for more than 1000 times. For example, when A=rand(100,100) and A(A>=0.5)=0 (or A(A<=0.5)=1) (every element is in [0,1]), we find that, generally, \(\beta \leq \frac {\|w_{g}\|_{2}}{\sqrt {s}}\) is sufficiently small and \(\beta \geq 30\cdot \frac {\|W_{g}\|_{2}}{\sqrt {s}}\) is sufficiently large. These results illustrate that the former when \(W_{g}\in \mathbb {R}^{3\times 3}((W_{g})_{i,j}\equiv 1\)) β≥30 which is sufficiently large again. However, in practice, if β is too large, then A=X, and the minimization problem (49) is meaningless. In our experiments after more than 1000 tests, we find that, when every element of A is in [0,1], \(30\cdot \frac {\|W_{g}\|_{2}}{\sqrt {s}}\leq \beta \leq 300\cdot \frac {\|W_{g}\|_{2}}{\sqrt {s}}\) is sufficiently large but not too large to make the minimization problem meaningless. In this case, that means we may tune the parameter to be better in practice. On the other hand, when some elements of A are not in [0,1], we can first project or stretch A into the region [0,1] and then choose the better parameter β. Moreover, although the parameter is not chosen to be in this region, the result can also be treated as an approximate solution with small error. That means our formulas are feasible, useful, and effective.

5.3 Comparison with TV methods and TV with OGS with inner iteration MM methods for image deblurring and denoising

In this section, we compare all our algorithms with other methods. All the test images are shown in Fig. 5, one 1024×1024 image as (a) Man and three 512×512 images as (b) Car, (c) Parlor, and (d) Housepole.
Fig. 5

Original images. The versions of the images in this paper are special formats which are converted by Photoshop from the sources downloaded from http://decsai.ugr.es/cvg/dbimagenes/ . a Man. b Car. c Parlor. d Housepole

The quality of the restoration results is measured quantitatively by using the peak signal-to-noise ratio (PSNR) in decibel (dB) and the relative error (ReE):
$$\textrm{PSNR} = 10\log_{10} \frac{n^{2} \text{Max}_{I}^{2}}{\|f-\bar{f}\|_{2}^{2}},\quad \textrm{ReE} = \frac{\|f -\bar{f}\|_{2}}{\|\bar{f}\|_{2}}, $$
where \(\bar {f}\) and f denote the original and restored images, respectively, and Max I represents the maximum possible pixel value of the image. In our experiments, Max I =1.
The stopping criterion used in our work is set to be as other methods
$$ \frac{|\mathcal{F}^{k+1}-\mathcal{F}^{k}|}{|\mathcal{F}^{k}|}< 10^{-5}, $$
(50)

where \(\mathcal {F}^{k}\) is the objective function value of the respective model in the kth iteration.

We compare our methods with some other methods, such as Chan et al.’s TV method proposed in [32], Liu et al.’s method proposed in [11], and Liu et al.’s method proposed in [12]. Both the latter two methods [11, 12] are with inner iteration MM methods for OGS TV problems, where the number of the inner iterations is set 5 by them. In particular, for a fair comparison, we set weighted matrix \(W_{g}\in \mathbb {R}^{3\times 3}((W_{g})_{i,j}\equiv 1)\) in all the experiments of our methods as in [11, 12].

5.3.1 Experiments for the constrained TV OGS L 2 model

In this section, we compare our methods (CATVOGSL2 and CITVOGSL2) with some other methods, such as Chan et al.’s method proposed in [32] (Algorithm 1 in [32] for the constrained TV-L2 model) and Liu et al.’s method proposed in [11].

We set the penalty parameters β 1=35, β 2=20, for the ATV case, β 1=100, β 2=20 for the ITV case and relax parameter γ=1.618. The blur kernels are generated by MATLAB built-in function (i)fspecial(’average’,9) for 9×9 average blur. We generate all blurring effects using the MATLAB built-in function imfilter(I,psf, ’circular’,’conv’) under PBC with “I” as the original image and “psf” as the blur kernel. We first generated the blurred images operating on images (a)–(c) by the former Gaussian blurs and further corrupted by zero mean Gaussian noise with BSNR =40. The BSNR is given by
$$ \textrm{BSNR} = 10\log_{10}\frac{\textmd{variance of} g}{\textmd{variance of }\eta}. $$
where g and η are the observed image and the noise, respectively.
The numerical results of the three methods are shown in Table 2. We have tuned the parameters for all the methods as in Table 2. From Table 2, we can see that PSNR and ReE of our methods (both ATV and ITV cases) are almost same as [11], which applied MM inner iterations to solve the subproblems (34) and (35) (only for the ATV case). However, each outer iteration of our methods is nearly twice faster than [11] from the experiments. The time of each outer iteration of our methods is almost the same as the traditional TV method in [32]. In Fig. 6, we display the restored “Parlor” images from different algorithms. We can see that OGS TV regularizers can get clearer edges on the “desk” of the image than TV regularizer.
Fig. 6

Top row: blurred and noisy image (left), restoration images of Chan [32] (middle), Liu [11] (right). Bottom row: original image (left), restoration images of CATVOGSL2 (middle), CITVOGSL2 (right)

Table 2

Numerical comparison of Chan [32], Liu [11], CATVOGSL2, and CITVOGSL2 for images (a)–(c) in Fig. 5

Images

(a) Man

(b) Car

(c) Parlor

Method

μ

Itr

PSNR

Time

ReE

Itr

PSNR

Time

ReE

Itr

PSNR

Time

ReE

Chan [32]

0.5

15

30.34

6.02

0.0730

11

31.13

2.00

0.0426

12

31.70

2.01

0.0511

Liu [11]

1

12

30.60

9.06

0.0708

12

31.68

3.93

0.0386

13

32.46

4.21

0.0464

CAOL2

1

7

30.62

3.37

0.0711

9

31.68

1.76

0.0399

7

32.40

1.34

0.0472

CIOL2

1

8

30.59

3.70

0.0716

11

31.60

2.04

0.0403

8

32.03

1.61

0.0492

PSNR dB, Time s, Itr iterations, μ regularization parameter (×105), CAOL2 CATVOGSL2, CIOL2 CITVOGSL2

Now, we compute the complexity of each step of our methods and Liu et al.’s method [11]. Firstly, we know that the complexity of all the methods is 512×512×18×4 (4 times n log2n) except the OGS subproblems. Then, for the OGS subproblems, the MM method in [11] with five-step inner iteration, the complexity is 512×512×90×2 (2 subproblems). The complexity of our methods is 512×512×18×2 in CATVOGSL2 and 512×512×18 in CITVOGSL2. Therefore, the total complexity of [11] is 5122×252, the total complexity of CATVOGSL2 is 5122×108, and the total complexity of CITVOGSL2 is 5122×90. That means each step of our methods is more than double faster than the inner iteration method in [11]. In the next section, the common computation parts of our methods and the inner iteration method are much more, and then our methods are only nearly double faster.

Remarks 6.

We do not list the results of image (d) because they are almost the same as images (b) and (c). Moreover, when β 1<30 which is not sufficiently large, the numerical results are also good while we did not list them. This shows that the approximate part of our shrinkage formula is also good, and that when the inner step in [11, 12] is chosen to be 5, the numerical experiments are convergent although they did not find a convergence control sequence. In addition, in our experiments, we find that the results of the ATV case are better than the ITV case, which is a little different from the classic TV methods. Moreover, we find that if the parameters of [11] are chosen to be the same as ours, the solutions of our method and their method are always the same on PSNR and visual presentation while our method may save more time (it remains valid for [12]).

5.3.2 Experiments for the constrained TV OGS L 1 model

In this section, we compare our methods (CATVOGSL1 and CITVOGSL1) with some other methods, such as Chan et al.’s method proposed in [32] (Algorithm 2 in [32] for the constrained TV-L1 model) and Liu et al.’s method proposed in [12].

Similarly as the last section, we set the penalty parameters β 1=80, β 2=2000, β 3=1, for the ATV case, β 1=80, β 2=2000, β 3=1, for the ITV case and relax parameter γ=1.618. The blur kernel is generated by MATLAB built-in function fspecial(’gaussian’,7,5) for 7×7 Gaussian blur with standard deviation 5. We first generated the blurred images operating on images (b)–(d) by the former Gaussian blur and further corrupted them by salt-and-pepper noise from 30 to 40 %. We generate all noise effects by MATLAB built-in function imnoise(Bl,’salt & pepper’,level) with “Bl” the blurred image and fix the same random matrix for different methods.

The numerical results by the three methods are shown in Table 3. We have tuned the parameters manually to give the best PSNR improvement for Chan [32] as in Table 3 for different images. For Liu [12], we choose the given parameters μ default as 100, 80 for 30 and 40 %, respectively. For our method CATVOGSL1, we set μ as 180, 140 for 30 and 40 %, respectively. For our method CITVOGSL1, we set μ as 140, 100 for 30 and 40 %, respectively. In the experiments, we find that the parameters of our methods are robust and have wide range to choose. Therefore, we set the same μ for different images.
Table 3

Numerical comparison of Chan [32], Liu [12], CATVOGSL1, and CITVOGSL1 for images (a)–(d) in Fig. 5

Is

N

Chan [32]

Liu [12]

 

L

μ/Itr

PSNR

Time

ReE

Itr

PSNR

Time

ReE

(a)

30

25/130

29.59

61.57

0.0796

35

30.92

30.23

0.0683

 

40

18/ 99

28.85

46.41

0.0866

37

30.11

31.54

0.0749

(b)

30

25/128

29.71

22.90

0.0501

35

31.81

12.93

0.0393

 

40

20/ 98

28.59

17.80

0.0570

40

30.70

14.99

0.0447

(c)

30

26/138

30.21

25.65

0.0607

35

32.15

12.84

0.0485

 

40

22/105

29.10

19.27

0.0689

37

31.04

13.67

0.0551

(d)

30

26/127

30.41

23.17

0.0975

36

32.47

13.15

0.0769

 

40

20/ 95

29.44

17.43

0.1091

39

31.43

14.27

0.0867

  

CATVOGSL1

CITVOGSL1

(a)

30

32

31.17

17.97

0.0663

41

31.34

21.56

0.0651

 

40

32

30.10

18.03

0.0751

37

30.06

19.95

0.0754

(b)

30

26

31.77

5.97

0.0395

29

31.75

6.26

0.0396

 

40

26

30.47

6.12

0.0459

29

30.22

6.38

0.0472

(c)

30

25

32.30

5.82

0.0477

30

32.08

6.75

0.0489

 

40

26

31.01

5.76

0.0553

27

30.50

6.19

0.0587

(d)

30

25

32.54

5.68

0.0764

29

32.51

6.75

0.0766

 

40

28

31.35

6.40

0.0875

32

31.06

7.33

0.0905

PSNR dB, Time s, Is images, NL noise level (%)

From Table 3, we can also see that PSNR and ReE of our methods (both ATV and ITV cases) are almost the same as Liu [12], which applied MM inner iterations to solve the subproblems (34) and (35) (only for the ATV case). However, each outer iteration of our methods is nearly twice faster than Liu [12] from the experiments. The time of each outer iteration of our methods is almost the same as the traditional TV method in Chan [32]. Moreover, we can also see that sometimes ATV is better than ITV and sometimes on the contrary for OGS TV. Finally, in Fig. 7, we display the degraded image, the original image, and the restored images for 30 % level of noise on image (d) by the four methods. From the figure, we can see that both our methods and Liu [12] can get better edges (handrail and window) than Chan [32].
Fig. 7

Top row: blurred and noisy image (left), restoration images of Chan [32] (middle), Liu [12] (right). Bottom row: original image (left), restoration images of CATVOGSL1 (middle), CITVOGSL1 (right)

6 Conclusions

In this paper, we propose the explicit shrinkage formulas for one class of OGS regularization problems with translation invariant overlapping groups. These formulas can be extended to several other regularization OGS problems as a subproblem in many fields. In this work, we apply our results in OGS TV regularization problems—deblurring and denoising problems. Furthermore, we also extend the image deblurring problems with OGS ATV in [11, 12] to both ATV and ITV cases. Since the formulas are very simple, these results can be easily extended to many other applications such as multichannel deconvolution and compress sensing, which we will consider in the future. In addition, in the experiments, we only choose all the entries of the weight matrix W g equal to 1. We will test for other weights in the future on more experiments in order to choose the better or best weights for some other applications.

7 Appendix

For problem (4), similar to (14), we can get
$$ {\begin{aligned} f_{w}(\mathbf{z})&=\min\limits_{\mathbf{z}} \|\mathbf{z}\|_{w,2,1}+ \frac{\beta}{2}\|\mathbf{z} -\mathbf{x}\|_{2}^{2}\\ &=\min\limits_{\mathbf{z}} {\sum\nolimits}_{i=1}^{n} \|w_{g}\circ(z_{i})_{g}\|_{2} + \frac{\beta}{2{\sum\nolimits}_{k=1}^{s}(w_{g})_{k}^{2}}{\sum\nolimits}_{k=1}^{s}(w_{g})_{k}^{2}\|\mathbf{z} -\mathbf{x}\|_{2}^{2}\\ &=\min\limits_{\mathbf{z}} {\sum\nolimits}_{i=1}^{n} \|w_{g}\circ(z_{i})_{g}\|_{2} + \frac{\beta}{2\|w_{g}\|_{2}^{2}}{\sum\nolimits}_{i=1}^{n}\|w_{g}\circ\left((z_{i})_{g} -(x_{i})_{g}\right)\|_{2}^{2}\\ &=\min\limits_{\mathbf{z}} {\sum\nolimits}_{i=1}^{n} \|w_{g}\circ(z_{i})_{g}\|_{2} + \frac{\beta}{2\|w_{g}\|_{2}^{2}}{\sum\nolimits}_{i=1}^{n}\|w_{g}\circ(z_{i})_{g} -w_{g}\circ(x_{i})_{g}\|_{2}^{2}\\ &=\min\limits_{\mathbf{z}} {\sum\nolimits}_{i=1}^{n} \left(\|w_{g}\circ(z_{i})_{g}\|_{2} + \frac{\beta}{2\|w_{g}\|_{2}^{2}}\|w_{g}\circ(z_{i})_{g} -w_{g}\circ(x_{i})_{g}\|_{2}^{2}\right).\\ \end{aligned}} $$
(51)
All the symbols are the same as before. Similarly as before, we know that the necessary condition of minimizing the ith term of the last line in (51) is that (w g (z i ) g ) is parallel to (w g (x i ) g ). That is, \(\frac {w_{g}\circ (z_{i})_{g}}{\|w_{g}\circ (z_{i})_{g}\|_{2}} =\frac {w_{g}\circ (x_{i})_{g}}{\|w_{g}\circ (x_{i})_{g}\|_{2}}\) for every i. On the other hand, Let W= diag(w g ) be a diagonal matrix with diagonal being the vector w g , then w g (x i ) g =W(x i ) g =W T (x i ) g . If the vector x (the same size as (x i ) g ) is parallel to the vector z , we have x =α z . Then, W x =W α z =α W z . We obtain that the vector W x is also parallel to the vector W z . Therefore, \(\frac {W(z_{i})_{g}}{\|W(z_{i})_{g}\|_{2}} =\frac {W(x_{i})_{g}}{\|W(x_{i})_{g}\|_{2}}\), and each of them is a unit vector. Then, \(\frac {W^{T} W(z_{i})_{g}}{\|W(z_{i})_{g}\|_{2}} =\frac {W^{T} W(x_{i})_{g}}{\|W(x_{i})_{g}\|_{2}}\). That is,
$$\frac{w_{g}\circ w_{g}\circ(z_{i})_{g}}{\|w_{g}\circ(z_{i})_{g}\|_{2}} =\frac{w_{g}\circ w_{g}\circ(x_{i})_{g}}{\|w_{g}\circ(x_{i})_{g}\|_{2}}. $$
In particular, we first consider that s=4, (z i ) g =(z i−1,z i ,z i+1,z i+2) and w g =(w 1,w 2,w 3,w 4). We mark (w g (z i ) g ) as the expansion of (w g (z i ) g ) similarly as (z i )g′, and we can get \(\mathbf {x}=\frac {1}{\sum _{i=1}^{4} w_{i}}\sum _{i=1}^{n} (w_{g}\circ (z_{i})_{g})'\). Then, the Euler equation of f w (z) is given by
$$ {}\beta\left(\mathbf{z}- \mathbf{x}\right) + \frac{(w_{g}\circ w_{g}\circ(z_{1})_{g})'}{\|(w_{g}\circ(z_{1})_{g})'\|_{2}} + \cdots + \frac{(w_{g}\circ w_{g}\circ(z_{n})_{g})'}{\|(w_{g}\circ(z_{n})_{g})'\|_{2}} \ni \mathbf{0}, $$
(52)
$$ {}\begin{aligned}\frac{\beta}{\sum_{i=1}^{4} {w_{i}^{2}}}\sum_{i=1}^{n}\left(w_{g}\circ\left(z_{i}\right)_{g} - w_{g}\circ\left(x_{i}\right)_{g} \right) + \frac{\left(w_{g}\circ w_{g}\circ\left(z_{1}\right)_{g}\right)'}{\|\left(w_{g}\circ\left(z_{1}\right)_{g}\right)'\|_{2}} \\ + \cdots + \frac{\left(w_{g}\circ w_{g}\circ\left(z_{n}\right)_{g}\right)'}{\|\left(w_{g}\circ\left(z_{n}\right)_{g}\right)'\|_{2}}\ni \mathbf{0}, \end{aligned} $$
(53)
$$ {}\beta\left(\mathbf{z}- \mathbf{x}\right)+ \frac{(w_{g}\circ w_{g}\circ(x_{1})_{g})'}{\|(w_{g}\circ(x_{1})_{g})'\|_{2}} + \cdots + \frac{(w_{g}\circ w_{g}\circ(x_{n})_{g})'}{\|(w_{g}\circ(x_{n})_{g})'\|_{2}}\!\ni\! \mathbf{0}, $$
(54)
$$ {}\mathbf{z} \ni\! \mathbf{x}-\frac{1}{\beta} \left(\frac{(w_{g}\circ w_{g}\circ(x_{1})_{g})'}{\|(w_{g}\circ(x_{1})_{g})'\|_{2}} + \cdots + \frac{(w_{g}\circ w_{g}\circ(x_{n})_{g})'}{\|(w_{g}\circ(x_{n})_{g})'\|_{2}}\right). $$
(55)
For each component, we obtained
$$ \begin{aligned} z_{i} \ni x_{i} -\frac{1}{\beta} \left(\frac{{w_{4}^{2}} x_{i}}{\|w_{g}\circ(x_{i-2})_{g}\|_{2}} +\frac{{w_{3}^{2}} x_{i}}{\|w_{g}\circ(x_{i-1})_{g}\|_{2}}\right. \\ \left. + \frac{{w_{2}^{2}} x_{i}}{\|w_{g}\circ(x_{i})_{g}\|_{2}}+ \frac{{w_{1}^{2}} x_{i}}{\|w_{g}\circ(x_{i+1})_{g}\|_{2}}\right). \end{aligned} $$
(56)
Another expression is as follows.
$$ {}\begin{array}{r} z_{i} \ni\left(\frac{{w_{4}^{2}} x_{i}}{\|w_{g}\|_{2}^{2}} -\frac{1}{\beta}\frac{{w_{4}^{2}} x_{i}}{\|w_{g}\circ(x_{i-2})_{g}\|_{2}}\right) +\left(\frac{{w_{3}^{2}} x_{i}}{\|w_{g}\|_{2}^{2}} -\frac{1}{\beta}\frac{{w_{3}^{2}} x_{i}}{\|w_{g}\circ(x_{i-1})_{g}\|_{2}}\right) \\+ \cdots+ \left(\frac{{w_{1}^{2}} x_{i}}{\|w_{g}\|_{2}^{2}} -\frac{1}{\beta}\frac{{w_{1}^{2}} x_{i}}{\|w_{g}\circ(x_{i+1})_{g}\|_{2}}\right). \end{array} $$
(57)

Similarly as Algorithm 1, we obtain the following algorithm for finding the minimizer of (4).

We can also see that Algorithm 2 only needs 2 times of convolution computations with time complexity ns, which is just the same time complexity as one-step iteration in the MM method in [10]. Therefore, our method is much more efficient than the MM method.

Here, due to the properties of inequalities, without loss of generality, let x[0,1] n , then we obtain that if \(\beta \leq \frac {\|w_{g}\|_{2}}{\sqrt {s}}\leq \frac {\|w_{g}\|_{2}}{\|x_{g}\|_{2}}=\frac {\|w_{g}\|_{2}^{2}}{\|w_{g}\|\|x_{g}\|_{2}}\leq \frac {\|w_{g}\|_{2}^{2}}{\|w_{g}\cdot x_{g}\|_{2}}\), β is sufficiently small. However, we do not directly know why β is sufficiently large. In Section 5, after more than 1000 tests, we find that \(\beta \geq 30\cdot \frac {\|W_{g}\|_{2}}{\sqrt {s}}\) is sufficiently large generally.

For problem (5), similar to (51), we can obtain
$$ {\begin{array}{r@{~}*{20}{l@{~}}} f_{W} (A)=&\min\limits_{A}\|A\|_{W,2,1}+ \frac{\beta}{2}\|A -X\|_{F}^{2}\\ =&\min\limits_{A}\sum_{i=1}^{m}\sum_{j=1}^{n} \|W_{g}\circ(A_{i,j})_{g}\|_{F} +\\ &\frac{\beta}{2\sum_{k_{1}=1}^{K_{1}}\sum_{k_{2}=1}^{K_{2}}(W_{g})_{k_{1},k_{2}}^{2}}\sum_{k_{1}=1}^{K_{1}}\sum_{k_{2}=1}^{K_{2}}(W_{g})_{k_{1},k_{2}}^{2}\|A -X\|_{F}^{2}\\ =&\min\limits_{A}\sum_{i=1}^{m}\sum_{j=1}^{n} \|W_{g}\circ(A_{i,j})_{g}\|_{F}+ \\ &\frac{\beta}{2\|W_{g}\|_{F}^{2}}\sum_{i=1}^{m}\sum_{j=1}^{n}\|W_{g}\circ\left((A_{i,j})_{g} -(X_{i,j})_{g}\right)\|_{F}^{2}\\ =&\min\limits_{A}\sum_{i=1}^{m}\sum_{j=1}^{n} \|W_{g}\circ(A_{i,j})_{g}\|_{F}+ \\ &\frac{\beta}{2\|W_{g}\|_{F}^{2}}\sum_{i=1}^{m}\sum_{j=1}^{n}\|W_{g}\circ(A_{i,j})_{g} -W_{g}\circ(X_{i,j})_{g}\|_{F}^{2}\\ =&\min\limits_{A} \sum_{i=1}^{m}\sum_{j=1}^{n} \left(\|W_{g}\circ(A_{i,j})_{g}\|_{F} + \frac{\beta}{2\|W_{g}\|_{F}^{2}}\|W_{g}\circ(A_{i,j})_{g} -W_{g}\circ(X_{i,j})_{g}\|_{F}^{2}\right).\\ \end{array}} $$
(58)
For example, we set K 1=2,K 2=2 and define (A i,j ) g =(A i,j ,A i,j+1;A i+1,j ,A i+1,j+1) and W g =(W 1,1,W 1,2;W 2,1,W 2,2). Similar as the vector case, each (A i,j ) g is a matrix with \(\|(A_{i,j})_{g}\|_{F}=\sqrt {\sum _{k_{1}=1}^{K_{1}}\sum _{k_{2}=1}^{K_{2}}((A_{i,j})_{g})_{k_{1},k_{2}}^{2}}\). Notice that the Frobenius norm of a matrix is equal to the 2 norm of a vector reshaped by the matrix. Then, the Euler equation of f w (z) is given by
$$ \begin{aligned}\beta\left(A- F\right) &+ \frac{(W_{g}\circ W_{g}\circ(A_{1,1})_{g})'}{\|(W_{g}\circ(A_{1,1})_{g})'\|_{2}} + \cdots \\&+ \frac{(W_{g}\circ W_{g}\circ(A_{n,n})_{g})'}{\|(W_{g}\circ(A_{n,n})_{g})'\|_{2}} \ni \mathbf{0}, \end{aligned} $$
(59)
where ((A i,j ) g ) is defined similarly as ((z i ) g ), which is an expansion of (A i,j ) g . These symbols remain consistent as default through this paper.
$$ \begin{aligned} \beta\left(A- X\right) &+ \frac{(W_{g}\circ W_{g}\circ(X_{1,1})_{g})'}{\|(W_{g}\circ(X_{1,1})_{g})'\|_{2}} + \cdots \\&+ \frac{(W_{g}\circ W_{g}\circ(X_{n,n})_{g})'}{\|(W_{g}\circ(X_{n,n})_{g})'\|_{2}}\ni \mathbf{0}, \end{aligned} $$
(60)
$$ \begin{aligned} A \ni X&-\frac{1}{\beta} \left(\frac{(W_{g}\circ W_{g}\circ(X_{1,1})_{g})'}{\|(W_{g}\circ(X_{1,1})_{g})'\|_{2}} + \cdots\right. \\&+ \left.\frac{(W_{g}\circ W_{g}\circ(X_{n,n})_{g})'}{\|(W_{g}\circ(X_{n,n})_{g})'\|_{2}}\right). \end{aligned} $$
(61)
For each component, we obtained
$$\begin{array}{r} A_{i,j} \ni X_{i,j} -\frac{1}{\beta} \left(\frac{W_{2,2}^{2} X_{i,j}}{\|W_{g}\circ(X_{i-1,j-1})_{g}\|_{2}} +\frac{W_{2,1}^{2} X_{i,j}}{\|W_{g}\circ(X_{i-1,j})_{g}\|_{2}} +\right.\\ \left. \frac{W_{1,2}^{2} X_{i,j}}{\|W_{g}\circ(X_{i,j-1})_{g}\|_{2}}+ \frac{W_{1,1}^{2} X_{i,j}}{\|W_{g}\circ(X_{i,j})_{g}\|_{2}}\right).\end{array} $$
(62)

Therefore, we can obtain a similar algorithm on the former formula (62) for finding the minimizer of (5).

Declarations

Acknowledgements

The work of Gang Liu, Ting-Zhu Huang, and Jun Liu was supported by the 973 Program (2013CB329404), NSFC (61370147), and Fundamental Research Funds for the Central Universities (ZYGX2013Z005). The work of Xiao-Guang Lv is supported by NSFC (61401172).

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

(1)
College of Applied Mathematics, Chengdu University of Information Technology
(2)
Research Center for Image and Vision Computing/School of Mathematical Sciences, University of Electronic Science and Technology of China
(3)
School of Science, Huaihai Institute of Technology

References

  1. W Deng, W Yin, Y Zhang, in Proc. SPIE. Group sparse optimization by alternating direction method (SPIESan Diego, US, 2013), pp. 88580R–15.Google Scholar
  2. J Huang, T Zhang, The benefit of group sparsity. Ann. Stat.38(4), 1978–2004 (2010). doi:http://dx.doi.org/10.1214/09-AOS778.MathSciNetView ArticleMATHGoogle Scholar
  3. R Chartrand, B Wohlberg, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). A nonconvex ADMM algorithm for group sparsity with sparse groups (IEEEVancouver, BC, 2013), pp. 6009–6013.Google Scholar
  4. M Stojnic, F Parvaresh, B Hassibi, On the reconstruction of block-sparse signals with an optimal number of measurements. IEEE Trans. Signal Process. 57(8), 3075–3085 (2009).MathSciNetView ArticleGoogle Scholar
  5. L Jacob, G Obozinski, J-P Vert, in Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09. Group lasso with overlap and graph lasso (ACMNew York, 2009), pp. 433–440.Google Scholar
  6. A Majumdar, RK Ward, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Classification via group sparsity promoting regularization (IEEETaipei, 2009), pp. 861–864.Google Scholar
  7. E Elhamifar, R Vidal, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Robust Classification using structured sparse representation (IEEEProvidence, 2011), pp. 1873–1879.Google Scholar
  8. J Gao, Q Shi, TS Caetano, Dimensionality reduction via compressive sensing. Pattern Recognit. Lett. 33(9), 1163–1170 (2012).View ArticleGoogle Scholar
  9. YN Liu, F Wu, YT Zhuang, Group sparse representation for image categorization and semantic video retrieval. Sci. China Inform. Sci. 54(10), 2051–2063 (2011).MathSciNetView ArticleGoogle Scholar
  10. P-Y Chen, IW Selesnick, Translation-invariant shrinkage/thresholding of group sparse signals. Signal Process. 94(0), 476–489 (2014).View ArticleGoogle Scholar
  11. J Liu, T-Z Huang, IW Selesnick, X-G Lv, P-Y Chen, Image restoration using total variation with overlapping group sparsity. Inform. Sci. 295(20), 232–246 (2015).MathSciNetView ArticleGoogle Scholar
  12. G Liu, T-Z Huang, J Liu, X-G Lv, Total variation with overlapping group sparsity for image deblurring under impulse noise. Plos One. 10(4), 0122562 (2015).Google Scholar
  13. IW Selesnick, P-Y Chen, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Total variation denoising with overlapping group sparsity (IEEEVancouver, BC, 2013), pp. 5696–5700.Google Scholar
  14. M Figueiredo, J Bioucas-Dias, in Signal Processing with Adaptive Sparse Structured Representations - SPARS11. An alternating direction algorithm for (overlapping) group regularization (Edinburgh, Scotland, UK, 2011).Google Scholar
  15. R Jenatton, J-Y Audibert, F Bach, Structured variable selection with sparsity-inducing norms. J. Mach. Learn. Res.12:, 2777–2824 (2011).MathSciNetMATHGoogle Scholar
  16. M Kowalski, Sparse regression using mixed norms. Appl. Comput. Harmonic Anal. 27(3), 303–324 (2009).MathSciNetView ArticleMATHGoogle Scholar
  17. G Peyré, JM Fadili, in EUSIPCO. Group sparsity with overlapping partition functions (Barcelona, Spain, 2011), pp. 303–307.Google Scholar
  18. F Bach, R Jenatton, J Mairal, G Obozinski, Structured sparsity through convex optimization. Stat. Sci. 27(4), 450–468 (2012).MathSciNetView ArticleMATHGoogle Scholar
  19. I Bayram, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Mixed norms with overlapping groups as signal priors (IEEEPrague, 2011), pp. 4036–4039.Google Scholar
  20. Y Wang, J Yang, W Yin, Y Zhang, A new alternating minimization algorithm for total variation image reconstruction. SIAM J. Imaging Sci. 1(3), 248–272 (2008).MathSciNetView ArticleMATHGoogle Scholar
  21. J Yang, W Yin, Y Zhang, Y Wang, A fast algorithm for edge-preserving variational multichannel image restoration. SIAM J. Imaging Sci.2(2), 569–592 (2009). doi:http://dx.doi.org/10.1137/080730421.MathSciNetView ArticleMATHGoogle Scholar
  22. B Wohlberg, R Chartrand, J Theiler, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Local principal component pursuit for nonlinear datasets (IEEEKyoto, 2012), pp. 3925–3928.Google Scholar
  23. P Sprechmann, I Ramirez, G Sapiro, YC Eldar, C-HiLasso: a collaborative hierarchical sparse modeling framework. IEEE Trans. Signal Process. 59(9), 4183–4198 (2011).MathSciNetView ArticleGoogle Scholar
  24. PC Hansen, JG Nagy, DP O’Leary, Deblurring images: matrices, spectra, and filtering, 1st edn. (Society for Industrial and Applied Mathematics, Philadelphia, United States, 2006).View ArticleMATHGoogle Scholar
  25. LI Rudin, S Osher, E Fatemi, Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena. 60(1–4), 259–268 (1992).MathSciNetView ArticleMATHGoogle Scholar
  26. A Chambolle, An algorithm for total variation minimization and applications. J. Math. Imaging Vis.20(1–2), 89–97 (2004).MathSciNetGoogle Scholar
  27. G Liu, T-Z Huang, J Liu, High-order TVL1-based images restoration and spatially adapted regularization parameter selection. Comput. Math. Appl. 67(10), 2015–2026 (2014).MathSciNetView ArticleGoogle Scholar
  28. X-L Zhao, W Wang, T-Y Zeng, T-Z Huang, MK Ng, Total variation structured total least squares method for image restoration. SIAM J. Sci. Comput. 35(6), 1304–1320 (2013).MathSciNetView ArticleMATHGoogle Scholar
  29. X-L Zhao, F Wang, MK Ng, A new convex optimization model for multiplicative noise and blur removal. SIAM J. Imaging Sci.7(1), 456–475 (2014).MathSciNetView ArticleMATHGoogle Scholar
  30. C Chen, MK Ng, X-L Zhao, Alternating direction method of multipliers for nonlinear image restoration problems. IEEE Trans. Image Process. 24(1), 33–43 (2015).MathSciNetView ArticleGoogle Scholar
  31. L-J Deng, H Guo, T-Z Huang, A fast image recovery algorithm based on splitting deblurring and denoising. J. Comput. Appl. Math. 287:, 88–97 (2015).MathSciNetView ArticleMATHGoogle Scholar
  32. R Chan, M Tao, X Yuan, Constrained total variation deblurring models and fast algorithms based on alternating direction method of multipliers. SIAM J. Imaging Sci. 6(1), 680–697 (2013).MathSciNetView ArticleMATHGoogle Scholar
  33. J Nocedal, SJ Wright, Numerical optimization. 1431–8598 (Springer, New York, 2006).Google Scholar
  34. D Gabay, B Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl.2(1), 17–40 (1976).View ArticleMATHGoogle Scholar
  35. J Eckstein, D Bertsekas, On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Programm. 55(1–3), 293–318 (1992). doi:http://dx.doi.org/10.1007/BF01581204.MathSciNetView ArticleMATHGoogle Scholar
  36. R Glowinski, Numerical methods for nonlinear variational problems. 1434–8322 (Springer, Berlin Heidelberg, 1984).View ArticleGoogle Scholar
  37. B He, H Yang, Some convergence properties of a method of multipliers for linearly constrained monotone variational inequalities. Oper. Res. Lett.23(3–5), 151–161 (1998).MathSciNetView ArticleMATHGoogle Scholar
  38. E Esser, Applications of Lagrangian-based alternating direction methods and connections to split Bregman. Los Angeles: UCLA CAM report, 9–31 (2009). ftp://arachne.math.ucla.edu/pub/camreport/cam09-31.pdf.
  39. M Ng, P Weiss, X Yuan, Solving constrained total-variation image restoration and reconstruction problems via alternating direction methods. SIAM J. Sci. Comput. 32(5), 2710–2736 (2010).MathSciNetView ArticleMATHGoogle Scholar
  40. X Zhang, M Burger, X Bresson, S Osher, Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM J. Imaging Sci. 3(3), 253–276 (2010).MathSciNetView ArticleMATHGoogle Scholar
  41. J Yang, Y Zhang, W Yin, A fast alternating direction method for TVL1-L2 signal reconstruction from partial fourier data. IEEE J. Selected Topics Signal Process. 4(2), 288–297 (2010).View ArticleGoogle Scholar
  42. X Zhang, M Burger, S Osher, A unified primal-dual algorithm framework based on Bregman iteration. J. Sci. Comput. 46(1), 20–46 (2011).MathSciNetView ArticleMATHGoogle Scholar
  43. S Boyd, N Parikh, E Chu, B Peleato, J Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations Trends Mach. Learn. 3(1), 1–122 (2011).View ArticleMATHGoogle Scholar
  44. C Wu, X Tai, Augmented Lagrangian method, dual methods, and split Bregman iteration for ROF, vectorial TV, and high order models. SIAM J. Imaging Sci. 3(3), 300–339 (2010).MathSciNetView ArticleMATHGoogle Scholar

Copyright

© Liu et al. 2016