Image analysis using modified exponent-Fourier moments

He, Bing; Cui, Jiangtao; Xiao, Bin; Peng, Yanguo

doi:10.1186/s13640-019-0470-3

Research
Open access
Published: 18 July 2019

Image analysis using modified exponent-Fourier moments

Bing He^1,2,
Jiangtao Cui¹,
Bin Xiao³ &
…
Yanguo Peng¹

EURASIP Journal on Image and Video Processing volume 2019, Article number: 72 (2019) Cite this article

4307 Accesses
9 Citations
Metrics details

Abstract

Classic exponent-Fourier moments (EFMs) have been popularly used for image reconstruction and invariant classification. However, EFMs lack natively the translation and scaling-invariant; in addition, they exhibit two types of drawbacks, namely numerical instability and reconstruction error, which in turn influence their reconstruction capability and image classification accuracy. This study considers the challenge of defining modified EFMs (MEFMs), which are based on modified exponent polynomials. In our methods, the basis function of traditional EFMs is appropriately modified, and these modified basis functions are used to replace the original ones. The basis function of the proposed moments is composed of piecewise modified exponent polynomials modulated by a variable parameter exponential envelope. Various types of optimal-order moments can be established by slightly adjusting the bandwidth of the modified basis functions. Finally, we extend the rotation-invariant feature of previous works and propose a new method of scaling and rotation-invariant image recognition using the proposed moments in a log-polar coordinate domain. The translation invariance can then be achieved by an image projection operation, which is substituted for the traditional approach based on the calculation of image geometric moments. The experimental results demonstrate that the MEFMs perform better than traditional EFMs and other classic orthogonal moments including the latest image moments in terms of the image reconstruction capability and the invariant recognition accuracy of smoothing filters, in both noise-free and noisy conditions.

1 Introduction

Moments and moment invariants are global descriptors for image feature extraction that have become a hot topic in the field of image analysis. In recent years, various moments have been widely used in image reconstruction [1, 2], image detection [3, 4], target classification [5], digital watermarking [6, 7], image compression [8], and other applications [9, 10]. The study of moments mainly focuses on three directions. The first one is establishing image moments in different coordinate spaces, such as the Cartesian coordinate space [11, 12], polar coordinate space [13, 14], and Radon transformation space [15, 16], among others. The performance of moments reconstructed in the Cartesian coordinate space is better than those in the polar coordinate space and Radon transformation space. The computation complexity is lower; however, rotation-invariant features are difficult to achieve. The image moments are natively rotation-invariant in the polar coordinate and Radon transformation space, and their geometric invariance can easily be achieved. Therefore, the existing image moments are more greatly established in the polar coordinates. Figure 1 shows the various types of image moments in different coordinate systems. The second direction is studying the description ability of the image moments under different basis functions to search for the best basis functions to construct the image moments with better image reconstruction effect and numerical stability. Generally speaking, traditional image moments do not have the inherent properties of geometric invariance; thus, they need to be restructured and designed to satisfy the geometric invariance in pattern recognition. In summary, the construction of rotation-invariant is becoming a hot topic in the study of image moments, which is the third direction for research of image moments.

As mentioned earlier, the essence of image moments is the set of image transformations based on basis functions. The advantages and disadvantages of its basis functions will directly affect the performance of the constructed image moments. In light of whether the basis set satisfies orthogonal conditions, the image moments can be divided into orthogonal and non-orthogonal moments (similarly known as orthogonal and non-orthogonal transformations, respectively, for example, discrete cosine transform [17], Fourier transform [18], Haar-wavelet transform [19], and Walsh transform [20] belong to orthogonal transformations). Non-orthogonal moments like geometric moments [21], complex moments [22], and rotation moments [23] have made certain achievements in the field of moment applications. The basis functions of non-orthogonal moments are relatively simple with an image reconstruction that is difficult to realize. In addition, the non-orthogonal moments generally have information redundancy that is sensitive to noise. The orthogonal moments can overcome the disadvantages of the abovementioned non-orthogonal moments, thereby becoming a main focus area in the field of image moments in the recent years.

Orthogonal moments can be defined in different coordinate spaces. The basis functions of orthogonal moments defined in polar coordinates are composed of radial polynomials and Fourier complex exponential factors with angular variables (regarded as amplitude and phase coefficients as well); thus, they are called radial orthogonal moments. The radial orthogonal moments in Fig. 1 mainly include Zernike moments (ZMs) [13], pseudo-Zernike moments (PZMs) [14], orthogonal Fourier–Mellin moments (OFMMs) [6], Jacobi–Fourier moments (JFMs) [24], Tchebichef–Fourier moments (TFMs) [25], radial harmonic-Fourier moments (RHFMs) [26], Bessel–Fourier moments (BFMs) [18, 27], exponent-Fourier moments (EFMs) [7], and radial shifted Legendre moments (RSLMs) [28]. These radial orthogonal moments normally have the basic ability of image reconstruction. Moreover, their significant characteristic is that the radial polynomials satisfy orthogonal condition in the unit circle and natively possess a rotation-invariant feature. Thus, radial orthogonal moments have become the preferred descriptor for geometric invariant image recognition, especially for rotation-invariant recognition. Basis functions are regular polynomials defined in the Cartesian coordinates, which can be further divided into continuous orthogonal moments and discrete orthogonal moments, such as Legendre moments (LMs) [29] and Gaussian–Hermite moments (GHMs) [2] that belong to continuous orthogonal moments and Tchebichef moments (TMs) [30], Krawtchouk moments (KMs) [31], Hahn moments (HMs) [32], and Racha moments (RMs) [33] that belong to discrete orthogonal moments. Discrete orthogonal moments do not involve any numerical approximation operations; hence, their basis functions can accurately satisfy an orthogonal condition. Consequently, the image reconstruction performance is better than that of traditional continuous orthogonal moments. In addition, we can construct different moments in other spaces like Radon transform invariant moments and histogram invariant moments in the Radon transform space and histogram space, respectively.

Shortcomings still exist in the abovementioned traditional orthogonal moments. On the one hand, the order of the existing orthogonal moments can only be taken as an integer value, which makes the development of orthogonal moments encounter bottlenecks caused by this constraint. To solve this problem, Xiao et al. [34] and Yang et al. [35] proposed fractional orthogonal moments. The integer-order can be extended to a real-order (also known as fractional order) using their proposed models. Further experimental results showed that the fractional order orthogonal moments were better than the traditional orthogonal moments based on the integer order in image reconstruction, noise robustness, and image recognition. Chen et al. [36, 37] recently extended the ZMs and PZMs to a quaternion and a fractional framework for color image feature extraction. The application of image moments has also been further improved. On the other hand, for image sets with larger distinctions, the classification effect is preferable using the lower-order moments constructed using the basis functions of traditional orthogonal moments. However, for the classification effect of the image sets with smaller discrimination, numerical instability will occur when higher-order moments are adopted. The reason is that the basis functions of traditional orthogonal moments are fixed either in lower- or higher-order moments, which can result in poor classification results in pattern recognition. Wang et al. [38] proposed a circularly semi-orthogonal moment that can maintain a good numerical stability in higher-order moments and can obtain a better visual effect in image reconstruction. This method only performs a simple and fixed modulation on the orthogonal basis functions, and the basis functions of different-order moments are still fixed; hence, the method lacks generality.

Classic orthogonal moments (e.g., EFMs) have the defects of numerical instability and poor accuracy of image recognition in some image classifications, especially in texture image recognition. A modified exponent-Fourier moment (MEFM) is proposed herein based on the concept proposed in [34, 38]. We mainly make attempts in view of three aspects. First, we take on the challenge of studying the performance of semi-orthogonal basis functions at the intersections between the orthogonal and non-orthogonal moments for image reconstruction and pattern recognition. A general semi-orthogonal moment model suitable for different orders can also be established. Second, a new method of the theoretical analysis model of the image moments in the frequency domain is proposed, namely time–frequency correspondence analysis. Finally, a simple and useful algorithm for rotation, scaling, and translation (RST) of invariant image recognition using the proposed moments is introduced herein.

The remainder of this paper is organized as follows: Section 2 provides some preliminaries about the classic exponent-Fourier moments for the 2D images; Section 3 introduces the MEFMs in the polar coordinates and discusses some properties of the MEFMs; Section 4 describes the experiments on the computational complexities of the image moments, image reconstruction, optimal parameter selection, and RST invariant image recognition under both noisy and noise-free, smoothing filter conditions; and Section 5 presents the conclusions.

2 Preliminaries

This section briefly reviews the definition of the classic orthogonal exponent-Fourier moments (EFMs) [39] for an image along with some EFM properties.

2.1 Exponent-Fourier moments

The EFMs of order n with repetition m for a 2D image function f(r, θ) in the polar coordinates is defined as

$$ {E}_{nm}=\frac{1}{2\pi }{\int}_0^{2\pi }{\int}_0^1f\left(r,\theta \right){R}_n^{\ast }(r){e}^{-\tilde{j} m\theta} rdrd\theta $$

(1)

where f(r, θ) denotes the 2D image function in the polar coordinates; $ \tilde{j}=\sqrt{-1},n=0,1,2,\Lambda, m=0\pm 1,\pm 2,\Lambda $ represent the moment orders; and $ {R}_n^{\ast }(r) $ is the conjugate function of R_n(r) defined as

$$ {R}_n^{\ast }(r)=\frac{1}{\sqrt{r}}{e}^{-\tilde{j}2 n\pi r} $$

(2)

Based on the principle of the orthogonal theory, a 2D image function can be reconstructed by the infinite series of the orthogonal function $ {E}_{nm}{R}_n^{\ast }(r) $ over the unit circle.

$$ \overline{f}\left(r,\theta \right)\approx \sum \limits_{n=1}^{n_{\mathrm{max}}}\sum \limits_{m=1}^{m_{\mathrm{max}}}{E}_{nm}{R}_n(r){e}^{\tilde{j} m\theta} $$

(3)

2.2 Properties of EFMs and other radial orthogonal moments

For the existing radial orthogonal moments, the number of zeros of the orthogonal polynomials plays a significant role in describing the high-spatial-frequency components of an image. The real and imaginary parts of the radial polynomial of EFMs have 2n and 2n+1 zeros in the interval 0 ≤ r ≤ 1, respectively [39]. Meanwhile, the Bessel polynomials and the orthogonal Fourier–Mellion polynomials have n+2 and n zeros in the interval 0 ≤ r ≤ 1, respectively [6, 40]. Zernike polynomials only have (n − m)/2 zeros in the interval 0 ≤ r ≤ 1. Therefore, the degree n of EFMs required to represent an image is much lower than that in BFMs, OFMMs, and ZMs, thereby causing the EFMs to have a stronger capability in describing an image compared to the other orthogonal moments (e.g., BFMs, OFMMs, and ZMs) in the polar coordinates. Additionally, classic EFMs and other radial orthogonal moments have the property of rotation-invariance similar to geometric invariant recognition. The abovementioned properties show that the exponent-Fourier moments are potentially useful as feature descriptors for image analysis.

3 Methods

3.1 Analysis of the numerical instability involved in classic EFMs

Hu et al. [39] first proposed classic EFMs based on a radial function R_n(r) shown in Eq. (2), which satisfied the orthogonal condition over interval 0 ≤ r ≤ 1. However, the radial function R_n(r) is numerically unstable for classic EFMs, which could cause poor image reconstruction and imprecise image classification in practical applications. The abovementioned reasons are mainly attributed to the following two aspects: First, when r is equal to 0, the real component of the radial function R_n(r) of the EFMs will tend to infinity, and the imaginary part is not a number (i.e., not a number (NaN) value), which are illegal in an actual operation. Second, as shown in Fig. 2, the real component of the radial function R_n(r) of the EFMs will be very large when r tends to 0. This will result in the numerical instability during computation in image moments and will make the computed moments’ value inaccurate. Let r = Δr. When r is equal to 0, where Δr is the minimum value close to 0 (e.g., Δr = 0.005), the first question can be avoided. However, choosing a suitable value of Δr for the computation in lower- or higher-order moments will be difficult. Furthermore, the second question always exists in the computation of the EFMs all the same.

3.2 Definition of MEFMs

We improve the EFMs and define their modified version, MEFMs, as follows to avoid the numerical instability of the EFMs:

$$ {M}_{nm}=\frac{1}{2}{\int}_0^{2\pi }{\int}_0^1f\left(r,\theta \right){T}_n\left(\alpha, r\right){e}^{-\tilde{j} m\theta} rdrd\theta $$

(4)

where f(r, θ) is an image function in the polar coordinates; n = 0, 1, 2, Λ, m = 0, ± 1, ± 2, Λ are the moments’ order; and T_n(α, r) denotes the radial basis functions of the image moments defined as follows:

$$ {T}_n\left(\alpha, r\right)=\left\{\begin{array}{c}{16}^{-\frac{\alpha_1}{4}r}{e}^{-\tilde{j}2 n\pi r},n\in {N}_{\mathrm{low}}\\ {}{16}^{-\frac{\alpha_2}{4}r}{e}^{-\tilde{j}2 n\pi r},n\in {N}_{\mathrm{high}}\end{array}\right. $$

(5)

where T_n(α₁, α₂, r) ⊆ T_n(α, r), (α₁, α₂) ∈ R, n = 0, 1, 2…, N_low, and N_high represents the number of lower- and higher-order moments for the image moments, respectively. The radial basis functions T_n(α, β; r) can be comprehended as a set of orthogonal exponent functions $ {R}_n^{\ast }(r) $ in Eq. (2) multiplied by the compound envelope factor $ \sqrt{r}\left({16}^{-\frac{\alpha }{4}r}\right) $. The basis function $ {R}_n^{\ast }(r){e}^{-\tilde{j} m\theta} $ is orthogonal over the interior of the unit circle.

$$ {\int}_0^{2\pi }{\int}_0^1\left[{R}_n^{\ast }(r){e}^{-\tilde{j} p\theta} rdrd\theta =2{\pi \delta}_{nm}{\delta}_{pq}\right] $$

(6)

where 2π is the normalization coefficient and δ_nm or δ_pq is the Kronecker delta function. Thus, the MEFMs can also be called semi-orthogonal EFMs.

3.3 Calculation of MEFMs

In the image analysis process, all testing images are digital images; thus, Eq. (4) must be replaced by a discrete form. Consider a digital image f(x_i, y_j) of the M × N pixels, 0 ≤ i ≤ M, 0 ≤ j ≤ N. We normalize the M × N pixels onto the unit circle [−1, 1] × [−1, 1]. Eq. (4) can be rewritten as

$$ {M}_{nm}=\frac{1}{2\pi}\sum \limits_{i=0}^{M-1}\sum \limits_{j=0}^{N-1}f\left({x}_i,{y}_j\right)\times {\int}_{x_i-\frac{\Delta x}{2}}^{x_i+\frac{\Delta x}{2}}{\int}_{y_j+\frac{\Delta y}{2}}^{y_j+\frac{\Delta y}{2}}{T}_n\left(\alpha, \sqrt{x^2+{y}^2}\right){e}^{-\tilde{j}m\left({\tan}^{-1}\left(y/x\right)\right)} dxdy $$

(7)

where $ {x}_i=\frac{2_i+1-M}{M},{y}_j=\frac{2_j+1-N}{N}\ \mathrm{and}\ \Delta \mathrm{x}=\Delta \mathrm{y}=\frac{2}{\sqrt{M^2+{N}^2}} $. A zero-order approximation method (ZOA) is used to calculate the double integration in Eq. (7) and make a fair comparison with the classic EFMs in [39] via the following experiments:

$$ {\int}_{x_i-\frac{\Delta x}{2}}^{x_i+\frac{\Delta x}{2}}{\int}_{y_j+\frac{\Delta y}{2}}^{y_j+\frac{\Delta y}{2}}{T}_n\left(\alpha, \sqrt{x^2+{y}^2}\right){e}^{-\tilde{j}m\left({\tan}^{-1}\left(y/x\right)\right)} dxdy\approx \Delta x\Delta {yT}_n\left(\alpha, \sqrt{x_i^2+{y}_j^2}\right){e}^{-\tilde{j}m\left({\tan}^{-1}\left({y}_j/{x}_i\right)\right)} $$

(8)

Substituting Eq. (8) into Eq. (7), the modified EFM can be calculated by ZOA as

$$ {\tilde{M}}_{nm}=\frac{2}{\pi \left({M}^2+{N}^2\right)}\sum \limits_{i=0}^{M-1}\sum \limits_{j=0}^{N-1}f\left(f\left({x}_i,{y}_j\right){T}_n\left(\alpha, \sqrt{x_i^2+{y}_j^2}\right){e}^{-\tilde{j}m\left({\tan}^{-1}\left({y}_j/{x}_i\right)\right)}\right) $$

(9)

Similarly, the reconstructed image can be expressed by the following formula:

$$ \overline{f}\left({x}_i,{y}_j\right)\approx \sum \limits_{n=1}^{b_{\mathrm{max}}}\sum \limits_{m=1}^{m_{\mathrm{max}}}{\tilde{M}}_{nm}{T}_n\left(\alpha, \sqrt{x_i^2+{y}_j^2}\right){e}^{\tilde{j}m\left({\tan}^{-1}\left({y}_j/{x}_i\right)\right)} $$

$$ \overline{f}\left({x}_i,{y}_j\right)\approx \sum \limits_{n=1}^{n_{\mathrm{max}}}\sum \limits_{m=1}^{m_{\mathrm{max}}}{\tilde{M}}_{nm}{T}_n\left(\alpha, \sqrt{x_i^2+{y}_j^2}\right){e}^{\tilde{j}m\left({\tan}^{-1}\left({y}_j/{x}_i\right)\right)} $$

(10)

3.4 Computation complexity and stability analysis of MEFMs

All the computations of image moments, including the other moments used for comparison, are implemented by the ZOA algorithm proposed in [38] to fairly compare and efficiently verify the properties of the MEFMs without considering accurate calculation and fast algorithm of image moments. Compared with the existing classic orthogonal moments based on higher-order polynomials (e.g., ZMs, LMs, OFMMs, and BFMs), the proposed radial polynomial of the MEFMs is simple (i.e., it is only composed of trigonometric and exponential functions with parameter variables). In practice, it does not involve factorial and accumulative summation operations in classic orthogonal moments; thus, the computational complexity is lower. The radial polynomial of ZMs, OFMMs, and BFMs in lower-order moments (n = 10) [40] is basically close to the uniform distributions, and the amplitudes are more stable (e.g., the amplitude of the radial polynomials of ZMs and BFMs is located in the interval of [− 1, 1], while there are only a few lower-order moments of OFMMs, whose amplitudes exceed 2, and the rest are located in the interval of [− 2, 2]). However, with the increase in the order of the image moments, a numerical instability will appear in the calculation of the abovementioned classic orthogonal moments (e.g., Fig. 3 shows the numerical distribution curves of the classic orthogonal moments at higher-order moments, order n = 50). Figure 3 shows that the radial polynomial of ZMs and OFMMs is close to 0 in the interval [0, 0.8]. Each amplitude gradually increases in the interval of (0.8, 1), and the numerical values tend to be unstable (i.e., the amplitude of OFMMs is close to 1.7 × 10²⁰, when r = 0.95). The radial polynomial of BFMs also tends to decay in the interval [0, 1] (e.g., the amplitude is attenuated to [− 0.1, 0.1] in the interval of [0.5, 1]). However, the polynomial of MEFMs is almost uniform when the order n = 50, and the amplitude is stable. In addition, the classic EFMs [39] and RHFMs [26] have introduced factor $ \sqrt{1/r} $ into their radial basis functions to satisfy orthogonal condition; however, the polynomials’ amplitude of the EFMs and RHFMs tend to NaN (non-number) and Inf (infinity), respectively, when r = 0. This will result in a numerical instability in the image moments constructed. Compared with the other classic orthogonal moments, the proposed image moments can avoid this phenomenon and make the constructed moments more stable. The orthogonal moments constructed by the orthogonal polynomials are better than the non-orthogonal moments in terms of the overall performance. However, this does not mean that the orthogonal polynomials are in a stable state at each point in the defined domain; thus, the proper correction of its unstable orthogonal basis functions can make the image moments reach their best performance. This is the major purpose of the proposed image moments in this paper.

3.5 Time–frequency analysis of MEFMs

From the time-domain point of view, the ZOA theory can effectively explain the properties of the constructed basis functions of the orthogonal moments (i.e., the location of zeros of the radial function and the number of zeros of the radial function represent the sampling position and the sampling frequency of an image, respectively). The higher the number of zeros and the more even the distribution in a region, the better is the reconstructed image. For a given order n and repetition m, the radial polynomial of BFMs and OFMMs has n + 2 and n zeros in the interior of a unit circle, respectively, while the radial polynomial of the ZMs only has (n − m)/2 zeros in the interval 0 ≤ r ≤ 1 [40]. Among radial polynomials (or radial functions) with trigonometric functions as basis functions, the real and imaginary parts of the radial polynomial of EFMs [39] and polar harmonic Fourier moments (PHT) [26] have 2n and 2n + 1 zeros in the interior of the unit circle, respectively. The radial polynomials of the polar sine transforms (PST) [41], polar cosine transforms (PCT) [41], and circularly semi-orthogonal moments (SOMs) similarly have n + 2 zeros. Meanwhile, the real and imaginary parts of the radial polynomial of MEFMs have 2n and 2n + 1 zeros in the interior of the unit circle, respectively. As illustrated in Fig. 4, the curve distribution of the real part of the radial polynomial of the MEFMs is smoother at the lower-order moments. This is then compared with the classic orthogonal moments (i.e., ZMs, OFMs, and BFMs) and other orthogonal moments with trigonometric functions as basis functions (i.e., EFMs, PHT, and PCT), which are closer to the uniform distribution and whose magnitudes are more stable (e.g., the amplitude distribution interval is [−1, 1]). For image recognition, most of the algorithms use the lower-order moments of the image moments as the feature extraction for classification. The lower-order moments have a good robustness to noise in pattern recognition; however, the orders of image moments should be increased to obtain more image feature points as the classification features and make a more precise classification for the image sets under a higher similarity condition (e.g., texture images). Therefore, we need to deeply study the higher-order moments of the image moments. The lower-order moments generally correspond to the low-frequency components of an image (e.g., contours or shapes of an image), while the higher-order moments of the image moments represent the detail components of an image (i.e., high-frequency components). The method of the time-domain analysis can be used for the quantitative analysis of the lower-order moments of the image moments, but it cannot provide a more reasonable description of the high-frequency components of an image (corresponding higher-order moments) for image processing or analysis. In view of the abovementioned reasons, a method of time–frequency correspondence is proposed from the frequency domain perspective. This method can analyze and improve the stability of different order moments for image recognition. The basic objective is to consider the representation of the basis functions of the image moments in the frequency domain as a 2D filter. We hope that the frequency bandwidth corresponding to the basis functions of the image moments is wider at the lower-order moments, and the attenuation of the cut-off frequency is as slow as possible. While the corresponding bandwidth is narrower in the higher-order moments, and the attenuation of the frequency cutoff is as fast as possible, in this study, a parameter-modulated MEFM is still proposed and used to verify our concept (Fig. 5). The main principle is to change the bandwidth in the frequency domain by adjusting parameter α of the radial function of the MEFMs in the time domain. In the low-frequency region of the image (lower-order moments), we want to change parameter α (e.g., α = 2 in the experiments) to make the bandwidth as wide as possible, such that more image features of the lower-order moments can be obtained. In the high-frequency region (higher-order moments), the bandwidth is made as narrow as possible by changing parameter α (e.g., α = 0.2 in the experiments), such that more high-frequency components can be suppressed, especially noise interference. Finally, the theoretical results are illustrated and verified by the experimental results of image reconstruction (Section 4.2).

4 Results and discussion

In this section, the experimental results are used to validate the theoretical framework developed in the previous sections. This section includes four subsections. In the first subsection, we discuss the computational complexities of MEFMs as compared to those of BFMs, ZMs, OFMs, PST, and PCT. In the second subsection, the question of how well an image can be represented using MEFMs is addressed, and the image reconstruction capability of MEFMs is compared with those of BFMs, ZMs, OFMs, SOMs, PST, and PCT. In the third subsection, the question of optimal parameter selection for image reconstruction and recognition is discussed. A new method for the RST invariant image recognition using the proposed moments and the experimental study on the RST recognition accuracy of MEFMs is provided in the last subsection.

4.1 Computational complexities

In this section, we demonstrate in terms of the computation time exactly how less complex the computation of the radial polynomial of the MEFMs is when compared to those of BFMs, ZMs, and OFMs. Table 1 shows a summary of the comparisons of the computation time for computing the radial polynomials between MEFMs and the other radial orthogonal moments. In the calculation, the order of the image moments is 5, 10, 15,...30. The test image is a Lena gray-level image (Fig. 6), while the size is 128 × 128. The average value of the computation time by six different order moments is taken as the time-consuming measurements for all the image moments. The hardware configuration of the test computer comprises a 3.2 GHz Intel^(R) Core (TM) i5 CPU and 8 GB memory. The software is MATLAB R2013a. Table 1 shows that the time consumed by the MEFMs is slightly higher than that of the PST and EFMs, but its computing time is significantly lower than that of the other classical orthogonal moments (i.e., ZMs, OFMs, and BFMs).

Table 1 Basis functions computation time

Full size table

4.2 Image reconstruction

In this subsection, the image representation capability of the MEFMs is presented. For the convenience of computing the image moments, the number of moments used in the image reconstruction and recognition experiments is limited based on n_max = m_max = N, N ∈ Z⁺. In addition, we use the statistical-normalization image reconstruction error (SNIRE) defined in [34] to measure the performance of the image reconstruction.

$$ \overline{\varepsilon^2}=\frac{\sum \limits_{x=1}^N\sum \limits_{y=1}^N\left|f\left(x,y\right)-\overline{f}\left(x,y\right)\left|{}^2\right.\right.}{\sum \limits_{x=1}^N\sum \limits_{y=1}^N{f}^2\left(x,y\right)} $$

(11)

Where f(x, y) is the original image and $ \overline{f}\left(x,y\right) $ is the reconstructed image.

Experiment 1

A set of binary images including digits from 0 to 9 and the uppercase English letters from A to J, and another set of gray-level images and color images including Lena, cameraman, woman and plane, baboon, and pepper, as shown in Fig. 7 are used as test images. The size of each image is 64 × 64. The proposed MEFMs are obtained from the images shown in Fig. 7, and the images are reconstructed using the maximum order of 35 and the parameter α of the radial function of the MEFMs is 2. The results are given in Fig. 8. It can be seen from Fig. 8 that by using the proposed MEFMs, either color, gray-level, or binary images can be reconstructed well.

Experiment 2

To demonstrate the validity of the theory related to the proposed method of time–frequency correspondence in Section 3.5. A comparison of the proposed moments for image reconstruction ability in different parameters is performed and a binary image of uppercase English letter E, a gray-level image cameraman, and a color image baboon are considered in the experiment. The reconstructed experimental results from two types of different methods of determining parameters (i.e., α = 0 and α = 2) in lower-order moments (e.g., the order N = 5, 7, 9, 11, and 13) and higher-order moments (e.g., the order N = 55, 60, 65, 70,...120) are shown in Figs. 9, 10, and 11. Incidentally, lower-order moments and higher-order moments of image moments are related to image reconstruction, e.g., let order N of image moments be 10 and 100, respectively. N = 10 is considered to be a lower-order moment, while N = 100 is a higher-order moment. The comparison study of the reconstructed images using the MEFMs in two types of different parameters (α = 0 and α = 2) shows that, in lower-order moments (N = 5, 7, 9, 11, and 13), the subjective vision of the reconstructed images under parameter α = 2 is better than the reconstructed image when α = 0, the objective evaluation standard related to the performance of image reconstruction has illustrated this phenomenon as well, i.e., the SNIRE of parameter α = 2 in the lower-order moments is generally less than that of parameter α = 0. However, with the increase of a moment’s orders, when the order N of the image moments exceeds 15, the performance for image reconstruction of MEFMs is just opposite to that of lower-order moments. As shown in Table 2, the SNIRE of parameter α = 0in the higher-order moments (e.g., the order N exceeds 25) is less than parameter α = 2, when the moments’ order N = 65, the SNIRE reaches the minimum, and the reconstructed binary image of uppercase English letter “E” is almost close to the original image. As can be seen from Figs. 9 and 10, the subjective vision of reconstructed gray-level and color images under parameter α = 0 is better than that the reconstructed under parameter α = 2 in higher-order moments. The above experimental results also verify the reliability and rationality of the proposed method with respect to time–frequency correspondence in Section 3.5, i.e., the radial function (or polynomial) of the proposed image moments (MEFMs), whose bandwidths and cutoff frequencies in frequency domain will affect the quality and numerical stability of image reconstruction. If lower-order moments are used to describe the image features, the bandwidth of the radial polynomial of the MEFMs can be adjusted to be slightly larger (e.g., the parameter α = 2), while to obtain more image features (the reconstructed images using higher-order moments), the adjustment of bandwidth for radial polynomial of the MEFMs are as narrow as possible (e.g., the parameter α = 0).

Table 2 Uppercase English letter “E” are reconstructed in parameters (α = 0 and α = 2) under different order moments

Full size table

Experiment 3

According to the characteristic analysis of the MEFMs’ radial function in frequency domain, we propose a method of image projection transformation for an original image using piecewise function (or polynomial), when an image is reconstructed at lower-moments and higher-moments, respectively (see Eq. (5) in Section 3.2). In order to verify the validity of the piecewise function in Eq. (5), the proposed image moments (MEFMs) are compared with the ZMs, OFMMs, BFMs, and EFMs in this study, and simulation experiments are performed by the reconstruction of the binary image of uppercase English letter “E.” From the experimental results of Fig. 11, it is known that the performance of the proposed image moments constructed by the basis functions, which consists of piecewise polynomials in Section 3.2 is superior to other classical orthogonal moments either in lower-order moments or higher-order moments. Especially with the increase in the order of moments, and when the order is N = 40, the reconstructed images by OFMMs is invalid. When the order is N = 50, the reconstructed images using ZMs is invalid, and the reconstructed images using BFMs and EFMs can maintain good numerical stability in higher-order moments, but those visual effect of image reconstruction are obviously worse than that of the proposed image moments (MEFMs).

We choose the image moments (e.g., SOMs, PST, and PCT) with trigonometric functions as the radial basis functions to reconstruct images and compare the results with the MEFMs to further verify the validity of the proposed image moments. The experimental results show that the SNIRE of the MEFMs along with SOMs, PCT, and PST approximately linearly decreases with the increase in the moments’ order at lower order moments. Moreover, the quality of the reconstructed images is gradually improved. The curve of Fig. 12a shows that the image reconstruction ability of the proposed image moments is better than that of the EFMs, PCTs, and PSTs. However, with the increase of the moments’ order in higher-order moments, Fig. 12b illustrates that the SNIRE of the image moments with trigonometric functions as the radial basis functions does not linearly decrease, and numerical instability exists during image reconstruction. On the contrary, the proposed image moments (MEFMs) can keep the SNIRE gradually decreasing with the increase of moments’ order, showing that the performance of image reconstruction in higher-order moments is better than that of image moments with trigonometric functions as the radial basis functions.

4.3 Optimal parameter selection for image reconstruction and recognition

Based on the analysis theory of the time–frequency correspondence in Section 3.5, the selection of parameter value α in Eq. (5) is crucial for the proposed image moments (MEFMs) that will affect the image reconstruction accuracy and the image recognition rate. In other words, choosing the optimal parameter value α to obtain a better image description ability is a problem that needs to be solved at the present. Therefore, a selection method of parameter α must be selected for the proposed MEFMs, which could lead to desirable results in image reconstruction. The selection of parameter α is equivalent to an unconstrained optimization problem (i.e., $ \min \left\{\overline{\varepsilon^2}\left[f,\overline{f};\alpha, N\right]\right\} $) if two variables α and N are limited based on α_min ≤ α ≤ α_max, N_min ≤ N ≤ N_max. For the unconstrained optimization problems, the genetic algorithm (GA) is the most popular and effective method in the recent years. Using GA computing in the proposed image moments, more precise values of parameters α and N can be obtained. However, considering the complexity of the GA implementation process, a simpler algorithm is adopted herein to realize the optimization of parameter α. If the order N of the proposed image moments is fixed, the unconstrained optimization problem of double variables is transformed into the unconstrained optimization problem of a single variable. The specific implementation process is presented below.

First, we will employ 20 gray-level images selected from the Coil-20 database [42] presented in Fig. 16 and use D_g(α) to evaluate the best selection of parameter α defined as follows to investigate the influence of parameter α on the performance of our introduced method:

$$ {D}_g\left(\alpha \right)=\frac{1}{g}\sum \limits_{n=1}^g\overline{\varepsilon^2}\left[{f}_n,{\overline{f}}_n;\alpha \right] $$

(12)

where g = 20 denotes the number of gray-level images from the Coil-20 database, and f_n and $ {\overline{f}}_n $ represent the nth original and reconstructed images, respectively. A lower value of D_g indicates a better performance of the proposed image moments in image reconstruction or recognition.

Let us consider herein the influence of orthogonality on the basis function of the proposed image moments (e.g., $ {T}_n\left(\alpha, r\right)={e}^{-\tilde{j}2 n\pi r} $, which is orthogonal in the interval [0,1]) when α = 0. Note that the search interval is restricted to $ -\frac{7}{2}\le \alpha \le \frac{7}{2} $ (i.e., we empirically take a value close to zero, and the stepping increment is 0.5) in the experiments. While the order N of the proposed image moments is given (N = 10 in lower-order moments and N = 60 in higher-order moments in experiments), some different values of D_g can be obtained in terms of the corresponding parameter value α summed up in Table 3. Table 3 clearly shows that {N = 10, α = 2} is optimal in lower-order moments, and {N = 60, α = 0} is optimal in higher-order moments, which are more appropriate for the task of image reconstruction or recognition. Finally, we can conclude that this experiment could considerably help in selecting the optimal parameter value α for the image reconstruction and classification tasks in the future. The optimal value of parameter α given in Table 3 is also consistent with the conclusion of the time–frequency correspondence method proposed in Section 3.5.

Table 3 The search results of the D_g according to different parameter value α

Full size table

4.4 Rotation, scaling, and translation invariant image recognition

In this section, a new RST invariant system for MEFMs that can be implemented in two steps is proposed: for translation invariance, the proposed image projection approach can be considered as a new alternative of the traditional algorithm (i.e., the method for the image translation invariant based on calculating the image geometric moments [28] and center moments [43]), followed by extending the basis functions of the MEFMs from the polar coordinate space to the log-polar space, such that the MEFMs have invariant properties of scaling and rotation at the same time.

4.4.1 Scaling and rotation invariance of MEFMs

Log-polar mapping

In the image processing and recognition process, the original image is usually acquired in a Cartesian coordinate system. First, let f^sr(x, y) be the scaled and rotated image of an image function f(x, y) with the scaling factor σ⁻¹ and the rotation angle ϕ in the Cartesian coordinates. We then have

$$ {f}^{sr}\left(x,y\right)=f\left({\sigma}^{-1}\left(x\cos \phi +y\sin \phi \right),{\sigma}^{-1}\left(y\cos \phi -x\sin \phi \right)\right) $$

(13)

Using the conversion relationship from the Cartesian coordinate system to the log-polar coordinate space: x = e^ρ cos θ, y = e^ρ sin θ, 0 ≤ θ ≤ 2π, ρ ∈ ℜ², we can rewrite Eq. (13) as

$$ {\displaystyle \begin{array}{c}{f}^{sr}\left({e}^{\rho}\cos \theta, {e}^{\rho}\sin \theta \right)=f\left({\sigma}^{-1}\left({e}^{\rho}\cos \theta \cos \phi +{e}^{\rho}\sin \theta \sin \phi \right),{\sigma}^{-1}\left({e}^{\rho}\sin \theta \cos \phi -{e}^{\rho}\cos \theta \sin \phi \right)\right)\\ {}=f\left({\sigma}^{-1}{e}^{\rho}\cos \left(\theta -\phi \right),{\sigma}^{-1}{e}^{\rho}\sin \left(\theta -\phi \right)\right)\\ {}=f\left({e}^{-\ln \sigma }{e}^{\rho}\cos \left(\theta -\phi \right),{e}^{-\ln \sigma }{e}^{\rho}\sin \left(\theta -\phi \right)\right)\\ {}=f\left({e}^{\rho -\ln \sigma}\cos \left(\theta -\phi \right),{e}^{\rho -\ln \sigma}\sin \left(\theta -\phi \right)\right)\end{array}} $$

(14)

The above equation can be simply expressed as

$$ {f}^{sr}\left(\rho, \theta \right)=f\left(\rho -\ln \sigma, \theta -\phi \right) $$

(15)

The Fourier transform (FT) of a 2D image function f(ρ, θ) in the log-polar coordinates can be denoted as follows:

$$ {f}^{sr}\left(\rho, \theta \right)\leftrightarrow F\left(u,v\right) $$

(16)

According to the translation characteristic of 2D Fourier transform, for f^sr(ρ, θ) we have

$$ f\left(\rho -\ln \sigma, \theta -\phi \right)\leftrightarrow F\left(u,v\right){e}^{-2\tilde{\pi j}\left(u\ln \sigma + v\phi \right)} $$

(17)

Thus, it is straightforward that $ \left|F\left(u,v\right){e}^{-2\tilde{\pi j}\left(u\ln \sigma + v\phi \right)}\left|=\right|F\left(u,v\right)\right| $.

The above equations and Fig. 13 show that the geometric transformation of the image scaled and rotated in the Cartesian coordinate system will be converted into the corresponding translation operation in the log-polar coordinate space, followed by 2D Fourier transform for f^sr(ρ, θ); thus, the invariance of image scaling and rotation can be achieved.

MEFMs invariant computing method in the log-polar coordinate space

Encouraged by the success of the Log-polar mapping (LPM) approach and some related works in [44], we take on the challenge of extending the basis functions of MEFMs from the polar coordinates to the log-polar coordinate space, such that the scaling and rotation invariance for the proposed MEFMs can be easily achieved. In light of Eq. (5), we have

$$ {T}_n\left(\alpha, r\right)=\left|{T}_n\left(\alpha, r\right)\right|{e}^{-\tilde{j}2 n\pi r} $$

(18)

We then let

$$ g\left(r,\theta \right)=w\left(\alpha, r\right)f\left(r,\theta \right) $$

(19)

where w(α, r) = |T_n(α, r)|r can be regarded as a weighted function, and g(r, θ) is a weighted image in the polar coordinate system.

Thus, Eq. (4) can be rewritten as follows:

$$ {\displaystyle \begin{array}{c}{M}_{nm}=\frac{1}{2\pi }{\int}_0^{2\pi }{\int}_0^1f\left(r,\theta \right){T}_n\left(\alpha, r\right){e}^{-\tilde{j} m\theta} rdrd\theta \\ {}=\frac{1}{2\pi }{\int}_0^{2\pi }{\int}_0^1f\left(r,\theta \right)\left|{T}_n\left(\alpha, r\right)\right|{e}^{-\tilde{j}2 n\pi r}{e}^{-\tilde{j} m\theta} rdrd\theta \\ {}=\frac{1}{2\pi }{\int}_0^{2\pi }{\int}_0^1g\left(r,\theta \right){e}^{-\tilde{j}2 n\pi r}{e}^{-\tilde{j} m\theta} drd\theta \end{array}} $$

(20)

Similarly, by using the conversion relationship from the polar coordinate system to the log-polar coordinate space: ρ = ln r, θ = θ, 0 ≤ θ ≤ 2π, ρ ∈ (−∞, 0], we can change the above Eq. (20) in the polar coordinate domain to the log-polar domain. The modified version of the radial basis function of MEFMs is redefined as

$$ {T}_n\left(\rho \right)={e}^{\tilde{j}2 n\pi p} $$

(21)

which satisfies the orthogonal condition:

$$ {\int}_{-1}^0{T}_n\left(\rho \right){T}_m^{\ast}\left(\rho \right) dp={\delta}_{nm} $$

(22)

Hence, the basis function of MEFMs in the log-polar domain can be represented as

$$ {P}_{nm}\left(\rho, \theta \right)={T}_n\left(\rho \right){e}^{\tilde{j} m\theta} $$

(23)

P_nm also satisfies the following orthogonal condition:

$$ {\int}_0^{2\pi }{\int}_{-1}^0{P}_{nm}\left(\rho, \theta \right){P}_{lk}^{\ast}\left(\rho, \theta \right) d\rho d\theta ={\delta}_{nl}{\delta}_{mk} $$

(24)

In light of these conclusions, the modified version of the MEFMs in the log-polar domain is defined as

$$ {M}_{nm}^{LPM}={\int}_0^{2\pi }{\int}_{-1}^0g\left(\rho, \theta \right){P}_{nm}^{\ast}\left(\rho, \theta \right) d\rho d\theta $$

(25)

Let g^sr(ρ, θ) denote the scaled and rotated version of an image g(ρ, θ) with the scaling Factor σ and rotation angle ϕ in the log-polar coordinates. We then have

$$ {g}^{sr}\left(\rho, \theta \right)=g\left(\ln \left(\sigma r\right),\theta +\phi \right)=g\left(\ln \sigma +\ln r,\theta +\phi \right)=g\left(\rho +\ln \sigma, \theta +\phi \right) $$

(26)

Thus, according to Eqs. (24) and (25), the MEFMs of g^sr(ρ, θ) are

$$ {\displaystyle \begin{array}{l}{\tilde{M}}_{nm}^{LPM}={\int}_0^2{\int}_{-1}^0{g}^{sr}\left(\rho, \theta \right){P}_{nm}^{\ast}\left(\rho, \theta \right) d\rho d\theta ={\int}_0^{2\pi }{\int}_{-1}^0{g}^{sr}\left(\rho, \theta \right){e}^{-\tilde{j}2 n\pi \rho}{e}^{-\tilde{j} m\theta} d\rho d\theta \\ {}={\int}_0^{2\pi }{\int}_{-1}^0g\left(\rho +\ln \sigma, \theta +\phi \right){e}^{-\tilde{j}2 n\pi \rho}{e}^{-\tilde{j} m\theta} d\rho d\theta \end{array}} $$

(27)

Let $ \hat{\rho}=\rho +\ln \sigma $ and $ \hat{\theta}=\theta +\phi $, we then have $ \rho =\hat{\rho}-\ln \sigma $ and $ \theta =\hat{\theta}-\phi $. Eq. (27) can be rewritten as

$$ {\displaystyle \begin{array}{c}{\tilde{M}}_{nm}^{LPM}={\int}_0^{2\pi }{\int}_{-1}^0g\left(\hat{\rho},\hat{\theta}\right){e}^{-\tilde{j}2 n\pi \left(\hat{\rho}-\ln \sigma \right)}{e}^{-\tilde{j}m\left(\hat{\theta}-\varphi \right)}d\hat{\rho}d\hat{\theta}\\ {}={\int}_0^{2\pi }{\int}_{-1}^0\left[g\left(\hat{\rho},\hat{\theta}\right){e}^{-\tilde{j}2 n\pi \hat{\rho}}{e}^{-\tilde{j}m\hat{\theta}}d\hat{\rho}d\hat{\theta}\right]{e}^{\tilde{j}2 n\pi \ln \sigma }{e}^{\tilde{j} m\varphi}\\ {}={\int}_0^{2\pi }{\int}_{-1}^0\left[g\left(\hat{\rho},\hat{\theta}\right){e}^{-\tilde{j}2 n\pi \hat{\rho}}{e}^{-\tilde{j}m\tilde{\theta}}d\hat{\rho}d\hat{\theta}\right]{e}^{\tilde{j}2 n\pi \ln \sigma }{e}^{\tilde{j} m\varphi}\\ {}{M}_{nm}^{LPM}{e}^{\tilde{j}2 n\pi \ln \sigma +\tilde{j} m\varphi}\end{array}} $$

(28)

and we have

$$ \left|{\tilde{M}}_{nm}^{LPM}\right|=\left|{M}_{nm}^{LPM}{e}^{\tilde{j}2 n\pi \ln \sigma +\tilde{j} m\phi}\right|=\left|{M}_{nm}^{LPM}\right| $$

(29)

Equations (28) and (29) show that the scaling and rotation of an image by a scaling factor of σ and an angle of ϕ result in a shift of the MEFMs in the ρ-axis and θ-axis, respectively. This simple property leads to the conclusion that the magnitudes of the MEFMs of the scaled and rotated image function remain identical to those before scaling and rotation. Thus, the magnitudes $ \left|{M}_{nm}^{LPM}\right| $ of the MEFMs can be taken as a scaling and rotation invariant feature for image recognition. For the discretization calculation for scaling and rotation invariance of MEFMs, see Appendix.

P_nm is a complete orthogonal basis function; hence, a 2D image can be reconstructed by $ {M}_{nm}^{LPM} $ and represented by the following formula:

$$ g\left(\rho, \theta \right)=\sum \limits_{n=-\infty}^{\infty}\sum \limits_{m=-\infty}^{\infty }{M}_{nm}^{LPM}{P}_{nm}\left(\rho, \theta \right)=\sum \limits_{n=-\infty}^{\infty}\sum \limits_{m=-\infty}^{\infty }{M}_{nm}^{LPM}{e}^{\tilde{j}2 n\pi \rho}{e}^{\tilde{j} m\theta} $$

(30)

If we maintain the constraints n ≤ n_max and m ≤ m_max, an approximate version of the 2D image function denoted as $ \tilde{g}\left(\rho, \theta \right) $ can be calculated as

$$ \tilde{g}\left(\rho, \theta \right)=\sum \limits_{n=1}^{n_{\mathrm{max}}}\sum \limits_{m=1}^{m_{\mathrm{max}}}{M}_{nm}^{LPM}{P}_{nm}\left(\rho, \theta \right)=\sum \limits_{n=1}^{n_{\mathrm{max}}}\sum \limits_{m=1}^{m_{\mathrm{max}}}{M}_{nm}^{LPM}{e}^{\tilde{j}2 n\pi \rho}{e}^{\tilde{j} m\theta} $$

(31)

4.4.2 Projection approach for the image translation invariance

The existing moments-based image translation invariance was mainly achieved by calculating the image geometric moments or center moments [28, 43]. Its major drawbacks include being time-consuming and a more complex computation process in image recognition (Fig. 14a). The main reason is that if other orthogonal moments (e.g., ZMs, OFMs, and BFMs) are used to extract the image features in the image recognition process, the geometric moments or center moments must be calculated again to achieve the translation invariance. Considering the shortcomings of the existing methods, this study proposes a new translation invariant algorithm, also known as the image projection-based method. Our basic approach is to treat translation invariance to separate the target image from the background image. The approach for image translation can be summarized as follows (for the chief algorithm process, see Fig. 14b):

(1)
If the original image is a color image, the color image should first be gray-scale; otherwise, this step can be a default.
(2)
Otsu’s algorithm [45] is used to determine the thresholds of the gray-scale image in the global region and then binarize the gray-scale image according to the thresholds.
(3)
A high-quality binary image can be obtained via a simple image pre-processing operation for binary images (e.g., denoising, filtering, etc.).
(4)
Calculating the projection image in the horizontal direction of the binary image and obtaining the position for the troughs of the projection image, segmentation is performed for the whole image according to the trough point.
(5)
The projection operation in the vertical direction is the same as that in Step 4.
(6)
Finally, according to the segmentation position of the binary image, the target image can be separated from the background in the original image. The experiments are performed on the selected cartoon cat color images (Fig. 15) from the Columbia University image library database [42]. Figure 15 shows (a) as the process of using the projection approach for the untranslated images and (b) as the process of using the projection approach for the translated images.

4.4.3 Test of classification results for the RST invariance

This subsection presents a simulation experimental study on the RST invariant image classification accuracy of the MEFMs under both noisy and noise-free conditions and in a smoothing filter. A comparison with the accuracy of the classic radial orthogonal moments (e.g., ZMs, OFMs, BFMs, and EFMs) and certain latest orthogonal image moments (e.g., fractional order Legendre moments (Fr-LMs) [2017] and SOMs [2016]) is also depicted. Accordingly, 10th lower-order moments and 60th higher-order moments were adopted and the magnitudes of these selected image moments were used as features for the image classification task. A k-nearest neighbor classifier was used to execute the classification. To evaluate the performance of the classification results, the expression of the correct classification percentages (CCPs) is introduced as follows:

$$ \mathrm{CCPs}=\frac{\mathrm{Number}\ \mathrm{of}\ \mathrm{correctly}\ \mathrm{classified}\ \mathrm{objects}}{\mathrm{The}\ \mathrm{total}\ \mathrm{number}\ \mathrm{of}\ \mathrm{classified}\ \mathrm{objects}}\times 100 $$

(32)

Datasets

The image classification performance of the proposed methods was evaluated with three test datasets: D1, D2, and D3 (Fig. 16). D1 was produced by selecting pictures from a publicly available database, named Coil-100, from Columbia University (the size of each image was 128 × 128; see [42]) (Fig. 17). D2, including 20 butterfly images, was collected from the internet. Some of these images are shown in Fig. 18 and available in [46]. The Brodatz texture image database was used for D3, which included 112 texture images (the size of each image was 640 × 640). Figure 19 shows the typical 35 pictures in the D3 dataset.