Nowadays, the denoising method based on wavelet transform has become an important branch of image denoising and restoration. Among the many image denoising methods based on wavelet transform, the wavelet threshold denoising method proposed by Donoho [6] and others has been widely used because of its simple principle, easy implementation, and remarkable denoising effect. But its inherent shortcomings, such as the discontinuity of hard threshold function, the constant deviation caused by soft threshold function [7,8,9,10], and the lack of scale adaptability, also limit its further development. Later, many scholars and researchers have studied it deeply from the point of view of constructing new threshold function and searching for optimal threshold and proposed many adaptive image denoising methods. Hong and Yang [11] have compared the signaltonoise ratio (SNR) and root mean square error (RMSE) according to the shortcomings of soft threshold function and hard threshold function in the traditional wavelet threshold denoising algorithm and constructed a new threshold function to denoise the signal. To a large extent, the shortcomings of poor continuity and inherent deviation of these two functions were overcome in practical processing, which made the noise processing more effective and superior. Juncheng and Qiang [12] applied the translation invariant to the secondgeneration wavelet transform based on the existing threshold denoising method of the secondgeneration wavelet transform and realized the more rapid and effective denoising of seismic signal and have achieved good denoising effect in the trial calculation of analog data and actual data. Hong [13] improved the classical wavelet threshold denoising algorithm and proposed a new threshold denoising method of mine video monitoring image in the wavelet transform domain. The image was decomposed by threelevel wavelet transform, the lowfrequency decomposition coefficients were filtered by Wiener filter, and the highfrequency decomposition coefficients were denoised by improved wavelet I value denoising algorithm. Experiments showed that it can effectively reduce noise and get an effective and complete image. Xiaofei and Xiaohui [14] combined the advantages of typical wavelet threshold function, integrated some improved methods, and proposed an improved threshold function for its shortcomings. This function is not only continuous at the threshold; the coefficient is a symptotic original coefficient of the wavelet estimation, but also has differentiability, and it is easy to implement adaptive learning of the gradient algorithm. Fengbo, Changgeng, and Hongqiu [15] proposed an improved threshold algorithm based on lifting wavelet transform. First of all, the advantages of the wavelet transform, multiresolution, and diversity of wavelet bases were exploited. On the other hand, a new threshold function was proposed to improve the signaltonoise ratio and reduce the RMSE. Denoising with the improved threshold function is superior to the commonly used threshold function in both visual effects and mean square error and peak signaltonoise ratio performance analysis. Xiaoyan and Abdukirimturki [16] combined the advantages of typical wavelet threshold function with some improved methods and proposed an improved new threshold function. The experimental results show that the denoised image is better than the traditional soft and hard thresholding and the existing thresholding in terms of visual effect, mean square deviation, and peak signaltonoise ratio. These methods not only open up a broad prospect for the full advantage of the wavelet threshold denoising method but also provide a basis for further exploration of adaptive denoising methods. However, in general, these studies are almost all based on the orthogonal wavelet transform, using the Mallat algorithm. Mallat algorithm [17] can achieve satisfactory speed, but due to the lack of translation invariance, the reconstruction accuracy after image denoising is not high, and the Gibbs phenomenon is easy to appear in the denoised image and cannot meet the requirements of human vision. The fast algorithm based on the binary wavelet transform [18]—à trous algorithm [19]—not only has the invariance of translation but also makes the representation of the image in the domain of binary wavelet transform very redundant, and the disturbance of partial coefficients will not lead to the serious distortion of the reconstructed image. In addition, for the Gibbs visual distortion in a denoised image caused by the wavelet threshold denoising method using orthogonal wavelet transform, many researchers have made improvements such as literature [20,21,22,23]. The results show that translation invariance is an important property of effectively suppressing Gibbs phenomenon and improving denoising effect. These conclusions provide a basis for the research of adaptive threshold image denoising based on binary wavelet transform.
Infrared image noise analysis
The meaning and classification of image noise
Image noise is generally defined from two aspects of human perception and mathematics. From the mathematical point of view, the image information can be regarded as a spatial function f, and image noise is the factor that degrades the information expressed by this function, that is, under the influence of noise, the image is degraded to f_{n}. Image noise can be divided into different categories according to different methods. From the mathematical point of view, the image can be divided into additive noise and multiplicative noise according to the way in which the image is degraded. These two relations can be expressed by the following formula:
where n represents noise, formula (1) represents additive noise, and (2) represents multiplicative noise.
If it is divided according to the physical factors of noise generation, it can be classified into electronic noise, photoelectron noise, photosensitive particle noise, speckle noise, and so on.
Measurement of image denoising effect
Because of the randomness of noise generation itself, the noise contained in an image can only be described by a statistical method, that is to say, the noise generation is regarded as a random process. The probability density function of this random process is used to describe the overall behavior of noise. But in many cases, the probability density function of noise distribution is very difficult to obtain, or it is difficult to express by some mathematical function. In this case, we usually study some digital characteristics of noise distribution, such as the mean m, variance σ, correlation coefficient ρ, and so on. This paper describes the following indicators for measuring the quality of image denoising:

1.
Normalized mean square error (NMSE)
$$ \mathrm{NMSE}=\frac{\sum \limits_m\sum \limits_n{\left[f\left(m,n\right){f}^{\hbox{'}}\Big(m,n\Big)\right]}^2}{\sum \limits_m\sum \limits_nf{\left(m,n\right)}^2} $$
(3)
where f(m, n) is the original image, and f^{'}(m, n) represents the restored image after processing.

2.
Standard deviation s
$$ s=\sqrt{\frac{\sum \limits_{i=1}^M\sum \limits_{j=1}^N{\left(f\left(i,j\right)\overline{f}\right)}^2}{MN}} $$
(4)
where f(i, j) is an image, \( \overline{f}=\frac{\sum \limits_{i=1}^M\sum \limits_{j=1}^Nf\left(i,j\right)}{MN} \), and M and N are the total number of rows and columns of the image matrix.

3.
The ratio between signal and mean squared error (F/MSE)
$$ \mathrm{F}/\mathrm{MSE}=10\lg \frac{\sum \limits_{m=1}^M\sum \limits_{n=1}^N{\left[f\left(m,n\right)\right]}^2}{\sum \limits_{m=1}^M\sum \limits_{n=1}^N{\left[{f}^{\hbox{'}}\left(m,n\right)f\Big(m,n\Big)\right]}^2}\left(\mathrm{dB}\right) $$
(5)

4.
Image relative signal to noise ratio
$$ \mathrm{SNR}=\frac{\overline{f}}{s} $$
(6)

5.
Smoothness: In most cases, the image after denoising should be at least as smooth as the original image.

6.
The variance estimation of image and original image after similarity denoising should be the least variance in the worst case (minmax estimator).
Noise characteristic analysis of infrared image
Speckle noise [24] exists extensively in the imaging process of infrared images. The formation of speckle noise is mainly due to the interaction of infrared waves in the imaging process. Besides, it is also closely related to the roughness of the imaging tissue surface. From the visual point of view, the noise in the image presents a speckle distribution and according to the density of speckles can be divided into three types of noise.
From the nature of noise analysis, strictly speaking, speckle noise contains both the components of multiplicative noise and the components of additive noise, which can be expressed by Eq. (7):
$$ {f}_n=f\cdot {n}_m+{n}_a $$
(7)
However, in most cases, the influence of additive noise on the image is much smaller than that of the multiplicative noise, so the additive noise can be ignored. Finally, the speckle noise is considered to be a multiplicative noise.
Binary wavelet theory
In the Mallat algorithm, due to the downsampling operation, the singular points of the image edges are easily lost, resulting in Gibbs phenomenon at the edges of the reconstructed images after denoising, resulting in image edge distortion or even blur. In order to overcome this drawback, wavelet transform with translation invariance needs to be considered. The binary wavelet transform only discretizes the scale factor in the continuous wavelet transform and keeps the translation factor continuously changing, thus effectively maintaining the translation invariance. This feature of the binary wavelet transform, coupled with the fast decomposition and reconstruction algorithm, à trous algorithm, has certain advantages in the fields of image denoising and image feature extraction.
It should be noted that the à trous algorithm mentioned in this paper is a modified version of the à trous algorithm form described in the Mallat book, and the algorithm is different from the proof in the Mallat book.
Assuming that the sample value a_{0, k} of the input discrete signal is the local average of f in the domain of t = k, it can be written as:
$$ {a}_{0,k}=<f(t),\varphi \left(tk\right)> $$
(8)
where φ and \( \tilde{\varphi} \) are the two scale functions generated by the infinite cascade calculation of filter (h, g) and its dual filter \( \left(\tilde{h},\tilde{g}\right) \), which can be expressed as:
$$ {\displaystyle \begin{array}{l}\varphi (t)=\sqrt{2}\sum \limits_k{h}_k\varphi \left(2tk\right)\\ {}\tilde{\varphi}(t)=\sqrt{2}\sum \limits_k{\tilde{h}}_k\tilde{\varphi}\left(2tk\right)\end{array}} $$
(9)
For any j ≥ 0, remember:
$$ {a}_{j,k}=<f(t),{\varphi}_{2^j}\left(tk\right)> $$
(10)
where \( {\varphi}_{2^j}(t)=\frac{1}{\sqrt{2^j}}\varphi \left(\frac{t}{2^j}\right) \).
The binary wavelet coefficients are calculated as follows:
$$ {d}_{j,k}= Wf\left({2}^j,k\right)=<f(t),{\psi}_{2^j}\left(tk\right)> $$
(11)
where \( {\psi}_{2^j}(t)=\frac{1}{\sqrt{2^j}}\psi \left(\frac{t}{2^j}\right) \) and ψ and \( \tilde{\psi} \) are the wavelet functions generated by the infinite cascade calculation of the filter (h, g) and its dual filter \( \left(\tilde{h},\tilde{g}\right) \), which can be expressed as:
$$ {\displaystyle \begin{array}{l}\psi (t)=\sqrt{2}\sum \limits_k{g}_k\varphi \left(2tk\right)\\ {}\tilde{\psi}(t)=\sqrt{2}\sum \limits_k{\tilde{g}}_k\tilde{\varphi}\left(2tk\right)\end{array}} $$
(12)
For the à trous algorithm, the binary wavelet decomposition algorithm is expressed as:
$$ {\displaystyle \begin{array}{l}{a}_{j+1,k}=\sum \limits_n{h}_n{a}_{j,k+{2}^jn},j=0,1,\cdots, \\ {}{d}_{j+1,k}=\sum \limits_n{g}_n{a}_{j,k+{2}^jn},j=0,1,\cdots .\end{array}} $$
(13)
The binary wavelet reconstruction algorithm is expressed as:
$$ {a}_{j,k}=\frac{1}{2}\left(\sum \limits_n{\tilde{h}}_n{a}_{j+1,k{2}^jn}+\sum \limits_n{\tilde{g}}_n{d}_{j+1,k{2}^jn}\right),j=0,1,\cdots . $$
(14)
The proofs of formulas (13) and (14) are given below. For formula (13), the following can be obtained from the definition of a_{j, k} and the relationship between two scales.
$$ {\displaystyle \begin{array}{l}{a}_{j+1,k}=<f(t),{\varphi}_{2^{j+1}}\left(tk\right)>={\int}_Rf(t)\frac{1}{\sqrt{2^{j+1}}}{\varphi}^{\ast}\left(\frac{tk}{2^{j+1}}\right) dt\\ {}\kern1.00em =\sum \limits_n{h}_n{a}_{j,k+{2}^jn},\kern0.5em j=0,1,\cdots .\end{array}} $$
(15)
In the same way, there are:
$$ {\displaystyle \begin{array}{l}{d}_{j+1,k}=<f(t),{\psi}_{2^{j+1}}\left(tk\right)>={\int}_Rf(t)\frac{1}{\sqrt{2^{j+1}}}{\psi}^{\ast}\left(\frac{tk}{2^{j+1}}\right) dt\\ {}\kern1.00em =\sum \limits_n{g}_n{a}_{j,k+{2}^jn},\begin{array}{cc}& j=0,1,\cdots .\end{array}\end{array}} $$
(16)
For formula (15), from the dualscale relationship \( \widehat{\tilde{\varphi}}(2w)=\frac{1}{\sqrt{2}}\widehat{\tilde{h}}(w)\widehat{\tilde{\varphi}}(w) \), \( \widehat{\tilde{\psi}}(2w)=\frac{1}{\sqrt{2}}\widehat{\tilde{g}}(w)\widehat{\tilde{\varphi}}(w) \) and the binary condition \( {\widehat{\psi}}^{\ast }(w)\widehat{\tilde{\psi}}(w)+{\widehat{\varphi}}^{\ast }(w)\widehat{\tilde{\varphi}}(w)={\widehat{\varphi}}^{\ast}\left(\raisebox{1ex}{$w$}\!\left/ \!\raisebox{1ex}{$2$}\right.\right)\widehat{\tilde{\varphi}}\left(\raisebox{1ex}{$w$}\!\left/ \!\raisebox{1ex}{$2$}\right.\right) \), we get:
$$ \sqrt{2}\widehat{\varphi}\left(\raisebox{1ex}{$w$}\!\left/ \!\raisebox{1ex}{$2$}\right.\right)={\widehat{\tilde{h}}}^{\ast}\left(\raisebox{1ex}{$w$}\!\left/ \!\raisebox{1ex}{$2$}\right.\right)\widehat{\varphi}(w)+{\widehat{\tilde{g}}}^{\ast}\left(\raisebox{1ex}{$w$}\!\left/ \!\raisebox{1ex}{$2$}\right.\right)\widehat{\psi}(w) $$
(17)
Time domain is expressed as:
$$ 2\sqrt{2}\varphi (2t)=\sum \limits_k{\tilde{h}}_{2k}\varphi \left(t+k\right)+\sum \limits_k{\tilde{g}}_{2k}\psi \left(t+k\right) $$
(18)
At this time, for formula (14), there is:
$$ {\displaystyle \begin{array}{l}{a}_{j,k}=<f(t),{\varphi}_{2^j}\left(tk\right)>={\int}_Rf(t)\frac{1}{\sqrt{2^j}}{\varphi}^{\ast}\left(\frac{tk}{2^j}\right) dt\\ {}\kern1.00em =\frac{1}{2}\left(\sum \limits_k{\tilde{h}}_k{a}_{j+1,k{2}^jn}+\sum \limits_k{\tilde{g}}_k{d}_{j+1,k{2}^jn}\right),j=0,1,\cdots .\end{array}} $$
(19)
The above theorem describes the à trous algorithm of the onedimensional signal. The structure of this algorithm is shown in Fig. 1. For the twodimensional image signal, the onedimensional à trous algorithm is applied along the rows and columns of the image to realize image decomposition and reconstruction.
Although the à trous algorithm has the same algorithm structure as the Mallat algorithm, there is a substantial difference: the à trous algorithm does not perform the downsampling operation, which not only preserves the translation invariance well but also the data length under each scale is the same as the original data length. Compared with the data of the Mallat algorithm, there is a great redundancy, which is convenient for spectrum analysis of details and profiles at each scale.