 Research
 Open access
 Published:
Multiscale texture retrieval based on lowdimensional and rotationinvariant features of curvelet transform
EURASIP Journal on Image and Video Processing volumeÂ 2014, ArticleÂ number:Â 22 (2014)
Abstract
Multiscalebased texture retrieval algorithms use lowdimensional feature sets in general. However, they do not have as good retrieval performances as those of the stateoftheart techniques in the literature. The main motivation of this study is to use lowdimensional multiscale features to provide comparable retrieval performances with the stateoftheart techniques. The proposed features of this study are lowdimensional, robust against rotation, and have better performance than the earlier multiresolutionbased algorithms and the stateoftheart techniques with lowdimensional feature sets. They are obtained through curvelet transformation and have considerably small dimensions. The rotation invariance is provided by applying a novel principal orientation alignment based on cross energies of adjacent curvelet blocks. The curvelet block pair with the highest cross energy is marked as the principle orientation, and the rest of the blocks are cycleshifted around the principle orientation. Two separate rotationinvariant feature vectors are proposed and evaluated in this study. The first feature vector has 84 elements and contains the mean and standard deviation of curvelet blocks at each angle together with a weighting factor based on the spatial support of the curvelet coefficients. The second feature vector has 840 elements and contains the kernel density estimation (KDE) of curvelet blocks at each angle. The first and the second feature vectors are used in the classification of textures based on nearest neighbor algorithm with Euclidian and KullbackLeibler distance measures, respectively. The proposed method is evaluated on wellknown databases such as, Brodatz, TC10, TC12t184, and TC12horizon of Outex, UIUCTex, and KTHTIPS. The best performance is obtained for kernel density feature vector. Mean and standard deviation feature vector also provides similar performance and has less complexity due to its smaller feature dimension. The results are reported as both precisionrecall curves and classification rates and compared with the existing stateoftheart texture retrieval techniques. It is shown through several experiments that the proposed rotationinvariant feature vectors outperform earlier multiresolutionbased ones and provide comparable performances with the rest of the literature even though they have considerably small dimensions.
1. Introduction
Texture classification and retrieval has been investigated by many researchers. Recognizing textures is essential in contentbased image retrieval (CBIR) applications since images are actually constructed of many texture combinations. Unfortunately, textures rarely exist in a fixed orientation and scale. Hence, defining rotationinvariant features is important and rotation invariance is a hot research topic since 1980s. In one of the early works [1], rotationinvariant matched filters are used for rotationinvariant pattern recognition. The authors of [2] applied a modelbased approach, in which they used statistical features of textures for classification. Using the statistics of spatial features as in [1, 2] may provide good results, however, it may include great interclass variations depending on the recording conditions of textures such as contrast, illumination, etc. Hence, multiscale techniques which have the capability of representing the feature in one or more resolution with lesser effect of these recording conditions have been used since 1990s. The main idea behind multiscale analysis in image processing is to provide the views of the same image in different resolutions to enhance the feature that can be more apparent in a specific resolution. In this way, it is easier to analyze or classify the image based on certain properties and certain scales. Nonstationary structures such as images require their multiscale transforms to be well localized both in time and frequency. However, according to Heisenberg's uncertainty principle, it is impossible to have localization both in time and frequency simultaneously. In other words, one cannot find a particular frequency to represent a certain point in time. Hence, frequency localizations require the time to be defined over a particular time window. It is also important that these localizations can be performed over orthogonal basis of tight frames. Wavelets [3] can address all these requirements. They are generated from one mother wavelet through translations and scalings. In one of the earliest works [4], the authors used statistics of Gabor wavelet as the features over Brodatz database while performing multiscale analysis for texture retrieval. However, the effects of rotations are not considered in this work. Another drawback of this work is such that wavelet transform is able to capture singularities around a point. The textures which have curvaturelike structures may not provide good results by using the wavelet transform. Other transforms such as ridgelet [5] which extends wavelets to capture singularities along a line and curvelets [6, 7] which can capture singularities around a curve are proposed to overcome such issues. One promising result of curvelet is that it can capture the edge around a curve in terms of very few coefficients. This creates new opportunities in the area of image processing. Curvelets with their nice features are also used in texture retrieval [8]. However, rotation invariance is not considered in [8]. Rotation invariance in the multiscale framework was first investigated in [9] for Gabor wavelet features. In a similar work, the authors used Gaussianized steerable pyramids for providing rotationinvariant features in [10]. Waveletbased rotation invariance is introduced in [11] using rotated complex wavelet filters and in [12] using waveletbased hidden Markov trees. These works show the effectiveness of their methods on the average performance. The details of their work also reveal that the textures with curvaturelike structures perform worse than other textures. Hence, curvelet is a good alternative to overcome such issues. However, the authors of [8] realized that curvelet is actually very orientationdependent and sensitive to rotation. Then, they provided rotationinvariant curvelet features in [13, 14] based on comparison of energies of curvelet coefficients and realigning the curvelet blocks by cycleshifting them with reference to the highest energy curvelet block. They showed that this scheme creates great advantage when compared to rotationvariant curvelet features. They also showed that their features provide better results when compared to wavelets and rotationinvariant Gabor filters. However, the authors of [15] indicated that the provided method of [13, 14] does not work for all the images, and they proposed another method based on modeling the curvelet coefficients as generalized Gaussian distributions (GGD) and then providing a distance measure by using KullbackLeibler divergence between the statistical parameters of curvelets. It should be noted that they also use the highest energy curvelet block for circular shifting with the exception that they use only one reference point instead of using different reference points for each scale. This approach may provide good fits for higher scales of curvelet coefficients; however, lower levels of curvelet coefficients tend to not behave as Gaussian. In this study, we investigate the distributions of curvelet coefficients and use kernel density estimation (KDE) which provides better fits for lower scales as well. Although the complexity increases with density estimations, better results are obtained. There are also some latest and comprehensive works in texture retrieval trying to address both the scale invariance and rotation invariance issues. For instance in [16], HarrisLaplace detector [17] is used for salient region detection and then scaleinvariant feature transformation (SIFT) [18] is used in order to provide scale invariance and rotationinvariant feature transformation (RIFT) [19] is used for rotation invariance. The results are pretty good; however, feature vector sizes are considerably large, 5,120 (40â€‰Ã—â€‰128) for SIFT descriptor with earth mover distance (EMD). In [20], local binary pattern (LBP) variance is used for rotation invariance, in which two principle orientations are found and local binary pattern variances are used for texture retrieval. The feature dimensions of [20] with feature reduction are in the range of 1,000 s. In [21], both the scale and rotation variance are considered together using LBP, and it provides promising results again with feature sizes around LBP variants.
The main motivation of this study is to provide good retrieval performance with lowdimensional feature sets. The multiresolution structure in the literature has lowdimensional feature sets but not in the desired range of performances. In this study, we provide solutions for lowdimensional rotationinvariant multiresolution features with good retrieval performances by using curvelet transformation. First, a novel method is introduced for obtaining rotationinvariant curvelet features. The proposed method is based on cross energy principle. Second, the lowdimensional feature set based on mean and standard deviation of curvelet coefficients, used in the literature [13, 14], is modified to reflect its support region. The size of this feature vector is 84, and the increase in the performance by this modification is also shown. Third, we use kernel density estimate, a nonparametric density estimation, of curvelet coefficients to estimate the densities and use symmetric KullbackLeibler distance as the distance measure. Although this feature set has higher dimension, 840, it provides better results and still remains in the low complexity region when compared with the other methods in the literature. It is shown through experiments that the results of the proposed feature sets are better than those of the stateoftheart techniques in low dimension and comparable in medium dimension feature sets. The organization of the paper is as follows. First, multiresolution transforms are introduced in Section 2. Second, Section 3 explains the proposed texture retrieval scheme. Third, the proposed rotation invariance method is provided in Section 4, and classification is explained in Section 5. Fourth, the experimental results are presented in Section 6. Then, Section 7 includes discussions and comparisons with stateoftheart texture retrieval techniques. Finally, Section 8 includes conclusions.
2. Background
Multiscale transforms are widely used in CBIR and texture retrieval. Hence, in order to better appreciate and understand the multiscale transforms, especially the curvelet transform, we briefly define wavelets, ridgelets, and curvelet transforms in this section.
2.1. Wavelets
Given that Î¨_{s,Ï„}(x, y) is a wavelet function for scale s and translation Ï„, wavelet transform of a function f(x, y) and the inverse transform can be obtained by using Equations 1 and 2, respectively.
where Î¨ is a twodimensional mother wavelet. Other wavelets can be generated by scaling the mother wavelet function by s and shifting in the x or y direction by Ï„_{ x } or Ï„_{ y }, respectively, as given in Equation 3. In wavelet transform, only the transformation framework is outlined and the wavelet functions are left to the choice of the designer. Commonly used Mexican hat wavelet is depicted in Figure 1. The isometric shape of the wavelet can be seen from the figure. The projection of the function of interest (i.e., an image) to this isometric wavelet results in capturing point singularities very well. However, the singularities in images are generally continuous around a line or a curve. In order to provide a better solution for the detection of lineshaped geometries, the ridgelets are proposed.
2.2. Ridgelets
Ridgelets are proposed for effectively describing anisotropic elements such as lines or curves with small number of coefficients. In order to have the ability of detecting lines or curves, it is necessary to define functions with directional geometry. Such a function is constant along lines of x cos(Î¸)â€‰+â€‰y sin(Î¸). A sample ridgelet is given in Figure 2. The ridgelet is obtained by scaling and translating the mother wavelet function Î¨(x, y). The ridgelet in Equation 4 is defined for angle Î¸, scale s, and translation Ï„. Ridgelets can be used to identify the singularities along lines.
Using the ridgelet functions defined in Equations 4, the ridgelet transform and inverse ridgelet transform can be performed using Equations 5 and 6, respectively.
2.3. Curvelets
Curvelet transformation enables the detection of singularities along a curvature, while the ridgelets are not sufficient enough for the identification of curves due to their linedirectional geometry. Basically, a curvelet function is also a wavelet function which is rotated, scaled, and translated for different angles, scales, and shifts, respectively. A curvelet function can also be defined as a ridgelet function with various rotation angles. Figure 3 shows a curvelet function for specific scale, rotation, and translation. If the translations on Z^{2} are defined by kâ€‰=â€‰(k_{1}, k_{2}), rotations are given by Î¸_{ â„“ }â€‰=â€‰2Ï€.2^{s}â€‰Â·â€‰â„“ where â„“â€‰=â€‰0, 1,...., 2^{s} such that 0â€‰â‰¤â€‰Î¸_{ â„“ }â€‰<â€‰2Ï€, parabolic scaling matrix D_{ s } is given by Equation 7 and rotation operator is given by Equation 8, then the curvelet function is defined by Equation 9.
where Î¨_{ s } is a mother wavelet function. Based on the above definitions, the curvelet coefficient is given by Equation 10.
A graphical explanation of the curvelet can be depicted as in Figure 4. Here, the image is represented by a red curve over which the curvelet transform is calculated, and the blue line in black ovals represents the crosssectional magnitude of curvelet operator. The dot product of the line, originally the image, and the curvelet function becomes maximum when the image and the signal are aligned, in other words, have the maximum number of common points (pixels). On the other end, the curvelet coefficients become zero if the two do not cross each other for any rotational and/or translational change. Hence, it is possible to follow the orientation and location of the image, red line, by just determining the maximum of curvelet coefficients. Due to this efficient property, it is possible to use curvelets for edge detection, object detection, noise removal, texture identification, etc. Since orientation is an important feature of curvelet transformation, curvelet coefficients may significantly vary with rotation. Hence, the direct use of curvelet coefficients as the image features introduce rotation dependency and overall texture classification performance may deteriorate if rotated replica of a texture exists in the database. So, it is necessary to utilize curvelet coefficients in a rotationinvariant manner to overcome this downside.
3. Proposed texture retrieval scheme
The proposed texture retrieval scheme is depicted in Figure 5. In the proposed scheme, first, the query and training images are selected from the image database. Second, curvelet transform is applied to both sets of images. Third, principle orientation (PO) of each image is detected by analyzing the cross energies of the curvelet coefficients. Then, the extracted features are realigned by cycleshifting all the features around the PO. Finally, POaligned features are compared for classification. Each step of the algorithm is explained in the following subsections.
3.1. Feature extraction
Broad range of feature sets are used in the literature such as entropy, energy, first and secondorder statistics, and many more. In this study, we propose and evaluate two different feature vectors. The first one is called as mean and standard deviation feature vector, F_{ Î¼Ïƒ }, and the second one is called as kernel density feature vector, F_{KDE}. F_{ Î¼Ïƒ } includes the mean and standard deviation of curvelet coefficients, which belong to different levels and angles, scaled with a support coefficient. Similar features previously used in [13, 14] without a scaling factor. Using only the first and secondorder statistics may describe the distribution fully only if the distribution is Gaussian. However, as indicated in earlier works [15], the Gaussian probability density function (PDF) may not be a perfect fit for curvelet data. Moreover, the curvelet coefficients at lower levels deviate from the Gaussian distribution as it can be seen from Figure 6, which presents second level curvelet coefficients of an image. Hence, kernel density feature, F_{KDE}, which estimates the PDF of curvelet coefficients using KDE, is also proposed. It is expected to obtain better classification results when the PDF of curvelet coefficients is used since it represents full statistics. An alignment step is needed in both approaches to provide rotation invariance. Before going into the details of the alignment step, the feature vectors of this study are defined first.
3.2. Mean standard deviation feature vector F_{ Î¼Ïƒ }
A feature vector which includes the first and secondorder statistics of curvelet coefficients for five levels is given by Equation 11.
where Î¼_{s,â„“} and Ïƒ_{s,â„“} are the mean and standard deviation of curvelet coefficients at scale s and angle â„“, respectively. It should be noted that it is enough to consider only the first half plane of the curvelet coefficients since curvelet transform is even symmetric around Ï€. This feature vector is depicted for 5 scales and includes 84 elements. The feature vector of Equation 11 is used in [13] as well. Since the feature vector includes robust features such as the first and secondorder statistics, it can be used for comparison purposes. As it can be seen from Figure 7, the number of wedges doubles every other scale going from the lower to the higher frequencies. This means that the spatial support is halved every other scale as well. In other words, curvelet transformation is applied over a narrower region going from the lower to the higher scales. A larger special support region means that it is more likely to have dissimilarities. Thus, the statistics carried out from dissimilarities should be penalized. A similar approach is also used in [22], where the authors use spatially obtained features for classification of various scene categories. In order to reflect the size of the spatial support, we apply a weighting factor, Î±_{ s }, given by Equation 12 and obtain the scaled meanstandard deviation feature vector, F_{ Î¼Ïƒ }, given by Equation 13.
where N is the total number of scales and s is the scale. The â€˜ceilâ€™ function rounds up the number to the nearest integer. If there are five scales, then the corresponding feature vector is given by the following:
The images we use are either 128â€‰Ã—â€‰128 or converted to 128â€‰Ã—â€‰128 in the preprocessing stage during our work, and the feature vector used in this study has five scales. Considering 8 angles at 2nd, 16 angles at 3rd and 4th, and 1 for 1st and 5th scales, the size of the feature vector is (1â€‰+â€‰8â€‰+â€‰16â€‰+â€‰16â€‰+â€‰1)â€‰Ã—â€‰2â€‰=â€‰84.
3.3. Kernel density feature vector F_{ KDE }
Probability density of curvelet coefficients is very close to normal distribution. However, earlier works have showed that the coefficients may not exactly be modeled by using a normal PDF. It is shown in [15] that modeling curvelet coefficients by GGD provides a better fit than that of the normal PDF. In this study, we use a nonparametric approach for estimating the density of curvelet coefficients due to the fact the Gaussianity assumption gets even weaker for lower levels. One may notice nonGaussian behavior by observing Figure 6. Nonparametric estimation is widely used when parametric modeling of the distribution becomes infeasible. We obtain the proposed kernel density feature vector, F_{KDE}, through KDE. It is given by Equation 15.
where each element of F_{KDE}, which represents the density of curvelet coefficients at a particular scale and angle, is estimated through KDE. The feature vector of Equation 15 is given for five scales and can be extended to include higher number of scales. In KDE, first, a kernel function is defined [23]. Then, using n data points (X_{1}, X_{2}, â€¦, X_{ n }) of a random variable x, the kernel estimator for PDF p(x) is given by Equation 16:
where K is the kernel function and h is the smoothing parameter called bandwidth. The kernel function used in this study is normal kernel with zero mean and unity variance. Each kernel is placed on the data points and normalized over the data to obtain the kernel estimation. A more depth analysis on KDE is given in [23]. The histogram of the curvelet coefficients, corresponding Gaussian fit, and KDE is shown in Figure 6. As it can be seen from the figure, KDE provides much better fit than Gaussian. The nonGaussian structure of curvelet coefficients can be observed for secondlevel coefficients of a sample image given in Figure 6. We have evaluated the kernel density at 20 bins, resulting in a feature vector dimension of 840 (42â€‰Ã—â€‰20).
4. Rotation invariance
4.1. Effect of rotation on curvelet transform
Following the curvelet transformation, curvelet coefficients for different orientations and specific scales are obtained. Hence, the curvelet coefficients reflect the effect of the rotation. Let us consider a particular scale s with rotation angles represented by {Î¸_{1}, Î¸_{2}........., Î¸_{ n }}. For each rotation angle, there exists a curvelet coefficient matrix. The elements of this matrix are obtained following a translation in x and y direction. Curvelet transformation of two different images and their rotated versions are given in Figure 8. These images are in the size of 128â€‰Ã—â€‰128 and have 5 scales in curvelet domain. Four of those scales are shown in Figure 8. The fifth scale is the highest resolution and is not divided into angles. The most inner box and the most outer box represent the lowest and highest resolutions, respectively. We can follow that the rotation is captured in all scales. It is difficult to notice the rotation by just looking at the curvelet domain image. However, high energy areas are really noticeable. The authors of [13, 14] realized this feature and proposed to synchronize them by aligning the highest energy curvelet coefficients while cycleshifting the others not to change the relative order among all. Since the curvelet coefficients are arranged in a cyclic fashion, applying this idea gave promising results. However, the obvious energy compaction is not valid for all images as the authors of [15] pointed out. It is also possible that the high energy area may exist at some other location in the rotated image after curvelet transformation is applied, especially in the figures where a nice uniform texture does not exist.
This nonuniformity can be observed in Figure 8c,d. In order to overcome this issue, first, we propose to find the most robust area of the image against rotation based on curvelet transform and mark that point as principle orientation; then perform an alignment by cycleshifting the feature vector with reference to principle orientation. In order to find the least affected rotation angle, we perform crosscorrelation check for two adjacent curvelet coefficients at each scale.
4.2. Principle orientation detection
In order to minimize the effect of rotation in the texture, it is necessary to find a reference point, namely, principle orientation, so that all feature vectors can be synchronized by reordering the features. The rotation dependence is expected to be eliminated after the synchronization. The authors of [13, 14] suggest a synchronization routine by means of the curvelet block with the maximum energy. We propose to use cross energy of adjacent curvelet blocks for the principle orientation detection, and the procedure is explained in the following subsection.
4.3. Crosscorrelation and cross energy of curvelet coefficients at adjacent angles
The crosscorrelation of two adjacent curvelet blocks for angles â„“ and â„“â€‰+â€‰1 is given as follows:
The crosscorrelation function actually reflects the cross energies for different lags. In obtaining the latter curvelet coefficient on the right hand side of Equation 17, only a rotation is applied to curvelet operator while the image stands still. Also, as it can be seen from Equation 9 that this rotation operator is not supposed to cause a lag in the latter coefficient. Hence, it is expected to get the maximum value of crosscorrelation function at 0th lag, that is R_{s,â„“}(0, 0). As a result, Equation 17 can be used to detect the highest crossenergy blocks. Another view can be expressed as follows: by analyzing the adjacent blocks of curvelet transform in terms of their crosscorrelation quantities, one may find the orientation for each scale which is the least affected by rotation. In other words, getting a high correlation between two adjacent blocks means that the directional change has little effect on curvelet coefficients for the specific two orientations at hand. In short, if curvelet coefficients of two adjacent blocks of an image at specific orientation give the highest values, they will also be the ones with the highest correlation values for the rotated version of original texture. The proposed method is structured based on this approach. Since rotation of curvelet operator and rotation of image has the same effect, the observed angle between the curvelet operator and the image for the highest correlation value remains fixed. Based on this principle, we determine the fixed angle by searching for the highest cross correlation and take the first of the highest crossenergy (correlated) blocks as the principle block (orientation) and then cycleshift all the coefficients in reference to the principle orientation. Hence, this operation provides an alignment based on the highest crossenergy principle. Once the crosscorrelation functions are obtained for all scales except the coarsest and finest due to the fact that there is only one coefficient matrix for them, the curvelet coefficients are aligned with reference to the highest 0th lag value of crosscorrelations in each scale. The dimension mismatch is generally the case faced for two coefficient matrices of adjacent orientations. If there are not enough coefficients to match the larger sized coefficient block, then the smaller sized coefficient block is padded with zero coefficients in order to overcome the dimension mismatch problem. This zerofilling solves the dimension mismatch problem and does not affect the cross energy.
4.3. Closer look on principle orientation alignment based on cross energy
In this subsection, we outline some examples to better understand the contribution of this study. In the first example, we consider an image taken from the Brodatz database as shown in Figure 9. The corresponding curvelet coefficients of this image and its 30Â° and 60Â° rotated versions are given in Figure 10. The yellow boxes on each scale show the principle orientations obtained by the proposed algorithm. Similarly, Figure 11 shows the same curvelet transforms with yellow boxes representing the reference points based on the algorithm of [13]. A close look immediately reveals that both algorithms have common reference points. But it can also be observed that the proposed algorithm captures the boxes where orientation at each scale is the same, whereas the algorithm of [13] may not detect the correct orientation at the scale 2 for this particular example. This is due to the fact that the texture of this figure does not have a uniform pattern, and rotation may cause the curvelet transform to capture the most dominant edges for that orientation. Since the proposed algorithm focuses on the amount of change in the rotation, it manages to capture the correct orientation at each scale.
In the second example, we consider the image â€˜000480.rasâ€™ of Outex TC12_t184 database and its rotated image of â€˜000649.rasâ€™. The images and their corresponding kernel density estimations are given in Figure 12. As can be observed from the figure, coefficients of right column are cycleshifted around the highest crossenergy coefficient block, second from the top and highlighted by a bold frame. As a result, this coefficient block, levelâ€‰=â€‰2 and angular parameterâ€‰=â€‰2, is reordered (cycleshifted) in a way that this set gets angular parameter value of 1 (the one at the top of the middle column) and all the others move into the position of prior angular parameter in a cyclic manner.
It should also be noted that the curvelet coefficients of unrotated and rotated images show some differences even after principle orientation alignment. This can also be observed by comparing the first and second columns of Figure 12. This is due to the fact that the curvelet coefficients of these images may be similar; however, it is hardly likely that they will be the same. Hence, the purpose of the alignment is to make the curvelet coefficients of two images comparable as much as possible.
4.5. POaligned feature vectors
The mean standard deviation, F_{ Î¼Ïƒ }, and kernel density, F_{KDE}, feature vectors are aligned according to principle orientation, following the principle orientation detection. The aligned feature vectors are cycleshifted versions of the initial ones. The POaligned meanstandard deviation feature vector and kernel density feature vector are denoted as {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}} and {\mathbf{F}}_{\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}}, respectively. The rotationinvariant meanstandard deviation feature vector without scaling, {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathbf{\xe2\u20ac\xb2},\mathrm{\text{PO}}}, is also used in our simulations for comparison purposes. The proposed POaligned feature vectors are used in the classification process in this study.
5. Classification
The classification is performed based on nearest neighbor (NN) classifier. In NN, the query image is compared against the training images of all the classes and the image is assigned to the class which has the minimum distance with. Separate distance measures are used in this study for each proposed feature vector. Euclidian distance is used with the mean and standard deviation feature vector and KullbackLeibler distance measure with kernel density feature vector.
5.1. Distance measures
Euclidian distance
The POaligned feature vectors of training and query images are compared to find the best match based on Euclidian distance measure. The Euclidian distance, {\mathit{d}}_{\mathit{\text{ij}}}^{\mathrm{\text{euc}}}, between the ith query image and the jth database image is calculated by Equation 18.
where {\mathbf{F}}_{\mathit{i},\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}} and {\mathbf{F}}_{\mathit{j},\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}} are the feature vector of query image (ith image of the database) and the training image (the feature vector of the jth database image), respectively.
Symmetric KullbackLeibler distance
KullbackLeibler divergence is a common method to measure the distance between two PDFs and is given by Equation 19:
Since {\mathit{d}}_{\mathit{p}{\mathit{p}}^{\xe2\u20ac\xb2}}^{\mathrm{\text{KL}}} is not necessarily equal to {\mathit{d}}_{\mathit{p}{\mathit{p}}^{\xe2\u20ac\xb2}}^{\mathrm{\text{KL}}}, it is more appropriate to use symmetric KullbackLeibler (SKL) distance, given by Equation 20;
The SKL distance between the kernel density feature vectors of query image, {\mathbf{F}}_{\mathit{i},\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}}, and the training images, {\mathbf{F}}_{\mathit{j},\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}} is then given by Equation 21, in which n is the dimension of the feature vector.
6. Experimental results
The proposed algorithm is evaluated over various databases, Brodatz [24], Outex TC10 [25], Outex TC12horizon [25], Outex TC12t184 [25], KTHTIPS [26], and UIUCTex [19]. The setup for each database is as follows: 100 simulations are run for each database, and average precisionrecall and classification performances are reported for all the simulation setups.

(a)
Training images: They are selected randomly from each class of each database. Number of training images is varied from 10 to 70 in increments of 10 s. The results are reported separately for various numbers of training images.

(b)
Query images: Training images are excluded from the database, and the remaining images are used as queries. The average classification and precisionrecall results are reported.

(c)
Brodatz database: The database is proposed in [24] and includes 112 classes, each with 192 images. In order to create large enough database with translations and rotations, first nonrotated test images are created by dividing each original 512â€‰Ã—â€‰512 image into 16 nonoverlapping 128â€‰Ã—â€‰128 regions; then, 12 rotated test images are obtained for multiple of 30Â° rotations. The reason for 30Â° rotations is to obtain results, comparable with [24] which uses the same database with the same setup. A database of 21,504 images (112â€‰Ã—â€‰16â€‰Ã—â€‰12) is constructed in this way. In this setup, each class includes 192 images.

(d)
Outex TC10 database: The database is proposed in [25] and includes 24 classes each with 180 images. The images are recorded under incandescent (inca) illumination. Each class consists of 20 nonoverlapping portions of the same texture with 9 different orientations (0, 5, 10, 15, 30, 45, 60, 75, 90). The database includes a total of 4,320 images (24â€‰Ã—â€‰20â€‰Ã—â€‰9).

(e)
Outex TC12horizon database: The database is proposed in [25] and includes 24 classes and 180 images for each class. The same setup of Outex TC10 database is used except that the images are recorded under horizon (horizon sunlight) illumination.

(f)
Outex TC12t184 database: The database is proposed in [25] and includes 24 classes and 180 images for each class. Same setup is used as Outex TC10 database except that the images are recorded under t184 (fluorescent 184) illumination.

(g)
KTHTIPS database: The database is proposed in [26] and includes 10 classes and 81 images for each class. The images are recorded under varying illumination, pose, and scale. The database includes total of 810 images (10â€‰Ã—â€‰81).

(h)
UIUCTex database: The database is proposed in [19] and includes 25 classes and 40 images for each class. The images include significant scale and viewpoint variations as well as rotations. The database includes a total of 1,000 images (25â€‰Ã—â€‰40).
The experimental results are reported under two main performance measurement categories, precisionrecall curves and classification accuracies. The studies in the literature make use of both performance measures. In order to make our work easily comparable with future works as well as the literature, we have provided our results under these two categories. In order to see only the effect of principle orientation alignment and performance of two feature vectors of this study, the results of the proposed methods are compared generally with only one reference from the literature. The results of [13] are used for general comparison purposes with our results since the authors of [13] also use curvelet features. We make a broader comparison with the literature in the discussion section.
6.1. Precisionrecall curves
Precision is the ratio of number of relevant retrieved images to number of all retrieved images whereas recall is the ratio of number of relevant retrieved images over total number of relevant images in the database. The precisionrecall curves for all the databases are provided in Figure 13. Figure 13a compares the performances of the proposed rotationinvariant {\mathbf{F}}_{\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}} and {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}} features with the feature {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathbf{\xe2\u20ac\xb2},\mathrm{\text{PO}}} where scaling is not used, the algorithm of [13] represented by F[13], wavelet, and rotationvariant features of curvelet in Brodatz database. Since the algorithm of [13] is already better than Gabor and ridgelet transforms and shown in detail in the literature, they are not included in this figure. As can be seen from this figure, the performance of {\mathbf{F}}_{\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}} is better than that of the other methods. It should be kept in mind that using {\mathbf{F}}_{\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}} instead of {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}} increases the complexity due to the increased feature size. Hence, the better performance against {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}} comes in the expense of complexity. The results for Outex TC10, TC12t184, and TC12horizon are given in Figure 13b,c,d, respectively. It can be observed from these figures that the proposed algorithm with the feature vector {\mathbf{F}}_{\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}} provides the best results followed by the feature vector {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}}. Although the same performance order is preserved for the results of UIUCTex and KTHTIPS, given in Figure 13e,f, respectively, a lower precisionrecall performance is observed compared to that of Outex database. The reason for that both UIUCTex and KTHTIPS databases include scale and viewpoint variations and the proposed algorithm does not perform as well under viewpoint and scale variations as it does for rotation variations.
We now provide a more depth analysis based on the precisionrecall curve for a particular image taken from Brodatz database. Figure 14 shows the precisionrecall curve of D1 query image of Brodatz database given in Figure 9. As it can be followed from Figure 14, the proposed feature vector {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}} with rotation invariance provides much better precisionrecall curve on this particular image. Figure 15 includes intermediate results and gives the mixed classes that are not relevant with the query image and the point where they are included in the precisionrecall curve. Figure 15 shows that the first irrelevant image comes at the 26% recall and 100% precision point. It means that 50 relevant images (192â€‰Ã—â€‰0.26) are retrieved before an irrelevant image is retrieved. This break point can also be seen on the blue line in Figure 14. Similarly, Figure 16 provides intermediate results for the algorithm of [13]. The first irrelevant image is retrieved at 7% recall and 100%. It means that 13 relevant images (192â€‰Ã—â€‰0.07) are retrieved before an irrelevant one is retrieved.
6.2. Classification rates
In this section, classification rates are provided. If the query image is classified to its own class, then this classification is marked as true, if not, then it is marked as false. The percentage of correctly marked ones gives the classification rate. The training images are selected randomly from each class, and then, the remaining images are used as queries to get the classification rate. This process is repeated 100 times, and the average results are reported in Table 1 where classification rates of the proposed feature vectors {\mathbf{F}}_{\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}} and {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}} and nonscaled feature vectors {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathbf{\xe2\u20ac\xb2},\mathrm{\text{PO}}} and F^{[13]} of [13] are included. As can be seen from the table, {\mathbf{F}}_{\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}} has the superior performance followed by {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}}. Brodatz database provides the highest performance in terms of classification since it only contains the rotated replica of the cropped images. Outex database provides the next best results followed by TC12horizon, TC10, and TC12t184 with slight differences. The differences among the subclasses of Outex database are not much, and overall performance for this database is good since it also includes only rotations and does not have scale or viewpoint variations. The UIUCTex and KTHTIPS databases are the ones with the worst results among all. This is due to the fact that both databases include scale and pose variations as can be seen from Figure 17. The proposed feature vectors perform well for these databases as well, as can be seen from Table 1.
7. Discussion
In this section, a broader comparison with the most recent and successful works in the literature is provided. The proposed rotationinvariant texture retrieval algorithm is evaluated by using the proposed POaligned feature vectors {\mathbf{F}}_{\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}} and {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}} and observed that they really perform well even though the feature dimensions are considerably low compared to those of the literature. In [15], following an energybased cycle shift based on only one level, GGD estimations of curvelet coefficients are used with KullbackLeibler distance (KLD) measure. Although the size of the feature dimensions is not elaborated in [15], we presume that it is close to size of {\mathbf{F}}_{\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}} which is 840 in our case. As can be seen from Table 2, both proposed methods outperform KLD in KTHTIPS database. The precision recall curve is also provided in Figure 18 for comparison in the Brodatz database. The superior performance of the proposed methods over KLD can be observed from this figure.
In [20], LBP variance features provide really promising results. We compare our results with the results of [20] in Table 3. The classification results for {\mathbf{F}}_{\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}} and {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}} reflect inclass training. That is, the training images and the query images belong to the same class. However, the authors of [20] used inclass training for TC10 while they use out of class training for TC12horizon and TC12t184 for which they choose 20 images from TC10 database and use it for the queries of the other databases. Hence, we have run the simulations for these settings as well. â€˜{\mathbf{F}}_{\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}} out of classâ€™ and â€˜{\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}} out of classâ€™ reflect the results of these simulations. As can be seen from Table 3, variants of LBP with low feature sizes perform worse than the proposed algorithm, but for high feature sizes, they outperform our algorithm especially in out of class classifications. The main reason for this outcome is that our algorithm is computationally efficient with its small feature size, and good results of LBP come in the expense of increased computational complexity.
The authors of [16] use Laplace and Harris detectors for salient region detection. SIFT is also used for scale invariance, and RIFT is used for rotation invariance. Although the results are good, feature vector dimensions are considerably large, 5,120 (40â€‰Ã—â€‰128) for SIFT descriptor with EMD. It should also be noted that support vector machine (SVM) classification, a strong classifier requiring learning effort, is used in [16]. Since we are not using SVM, we are not exactly able to tell how much of the better performance is obtained due to SVM. It is worth noting that using rotationinvariant technique RIFT and decreasing the feature size in their work also cause decrease in performance, and this effect can be seen from HSRâ€‰+â€‰LSR of RIFT [16] in Table 4 where our proposed algorithm has better performance in KTHTIPS and Brodatz databases.
Table 5 is included for easy comparison with the literature in terms of computational load and performance. The proposed algorithms, especially mean and standard deviation feature vector, have small feature dimensions. This is important as execution of the distance calculation at each comparison is proportional to the feature size. The computational complexity based on feature sizes are depicted as low, medium, and high in Table 5, and the table is arranged in an increasing complexity manner. That is, top rows have lower complexity and bottom rows have higher complexity. Since the computational complexity of SVM is much higher than that of NN, SVMbased algorithms are placed at the bottom of Table 5. The proposed {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}} feature vector has quite low dimension, 84, and it provides really good results. The other proposed vector {\mathbf{F}}_{\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}} also provides good results with 840 feature dimension.
Table 5 gives a comparison of the proposed algorithm with the rest of the literature in terms of performance, dimension, and complexity in related databases. The algorithms in the top three rows are based on curvelet transformation, which are shown to outperform earlier multiscalebased texture classification methods. It is clear from the table that although they have similar feature sizes and complexities with the proposed algorithms, the proposed algorithms outperform all of them. The variants of LBP proposed in [20] are given in the table as well. It is seen that for dimension size of 160, {\mathrm{\text{LBP}}}_{8,1}^{\mathit{\text{riu}}2}/{\mathrm{\text{VAR}}}_{8,1} algorithm provides worse results than the proposed algorithms in TC12horizon and TC12t184 databases but better result in TC10 database. The performance of the proposed algorithms are better than {\mathrm{V}}_{8,1}^{\mathit{u}2}{\mathrm{\text{GM}}}_{\mathrm{PD}2}, whose dimension size is 227, in all of the compared databases. {\mathrm{\text{LBPV}}}_{8,1}^{\mathit{u}2}{\mathrm{\text{GM}}}_{\mathrm{PD}2} and {\mathrm{\text{LBP}}}_{24,3}^{\mathit{u}2}{\mathrm{\text{GM}}}_{\mathrm{\text{ES}}} whose dimensions are 2,211 and 13,251, respectively, provide better results than the proposed algorithms at a high cost of increased feature size. The algorithms of [16] are provided in the 10th and 11th rows. It should be noted that their classification algorithm is based on SVM, an algorithm with higher computational complexity. Moreover, their feature vectors have higher dimensions than those of the proposed algorithms. Even though their algorithm provides better results for HSRâ€‰+â€‰LSRâ€‰+â€‰SIFT, the proposed algorithms outperform HSRâ€‰+â€‰LSRâ€‰+â€‰RIFT in Brodatz and KTH_TIPS databases. This result is suspected to arise from the deduced feature size and less satisfactory performance of RIFT in scalevariant database (KTHTIPS). In general, the proposed algorithms of this study outperform all of the multiscalebased texture classification algorithms. They also outperform LBP variants of [20] with small dimensions. The performance of the algorithm of [20] with high dimensions is better as expected. Proposed algorithms also outperform the algorithms of [16] in smaller dimensions especially in rotationvariant databases.
Finally, we mention one of the latest works in the rotation and scaleinvariant texture retrieval published by Li et al. [21]. They provide scale invariance by finding optimal scale of each pixel. They modify Outex and Brodatz databases to include enough scale and rotation variations and report their results on these databases. For scale and rotation invariance feature, they report average precision rates around 69% for Brodatz and 60% for Outex database. Since they use a modified database, including this database will extend the scope of this study considerably, and we are leaving the scale invariance and the comparison with their database as our future work.
8. Conclusions
Lowdimensional and rotationinvariant curvelet features for multiscale texture retrieval are proposed through two feature vectors in this study. This study is important since it provides the best results for multiscale texture retrieval in the literature to the best of our knowledge. Moreover, the results are comparable with the stateoftheart techniques in low and medium feature dimension sizes. Rotation invariance is provided by using the cross energies of curvelet blocks at adjacent orientations. The orientations with maximum cross energy are defined as the principle orientation of an image, which is the least affected location by rotation. The corresponding location is selected as the reference point for the image, and the feature vector is cycleshifted based on this reference point. The feature vector {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}} has 84 elements. The other proposed feature vector {\mathbf{F}}_{\mathrm{\text{KDE}}}^{\mathrm{\text{PO}}} uses KDE, and it has 840 elements. It provides better results than {\mathbf{F}}_{\mathit{\text{\xce\xbc\xcf\u0192}}}^{\mathrm{\text{PO}}} in the expense of increased complexity. The texture retrieval results of the proposed method are better than earlier works which make use of other rotationinvariant curvelet features and are comparable with the stateoftheart works in the literature, especially in the low and medium feature dimension ranges. As a result, we provide a novel rotation invariance method for curvelets and two separate feature vectors for texture retrieval in this study. The proposed methods suggest highly effective discriminative power for texture retrieval. The comparisons with the literature show the effectiveness of the proposed algorithms since they provide good performances with low complexity. Addition of scale invariance for curvelet features may provide better results. Thus, we plan to extend this study for scaleinvariant features of curvelet transform as our future work.
References
Arsenault HH, Hsu YN, Chalasinskamacukow K: Rotationinvariant patternrecognition. Opt. Eng. 1984, 23: 705709.
Kashyap RL, Khotanzad A: A modelbased method for rotation invariant texture classification. IEEE T. Pattern Anal. 1986, 8: 472481.
Mallat SG: A theory for multiresolution signal decomposition  the wavelet representation. IEEE T. Pattern Anal 1989, 11: 674693. 10.1109/34.192463
Manjunath BS, Ma WY: Texture features for browsing and retrieval of image data. IEEE T. Pattern Anal. 1996, 18: 837842. 10.1109/34.531803
Do MN, Vetterli M: The finite ridgelet transform for image representation. IEEE T. Image Process. 2003, 12: 1628. 10.1109/TIP.2002.806252
Candes EJ, Donoho DL: Curvelets, multiresolution representation, and scaling laws. Wavelet Appl Signal Image ProcessViii Pts 1 and 2 2000, 4119: 112.
Candes EJ, Donoho DL: New tight frames of curvelets and optimal representations of objects with piecewise C2 singularities. Commun. Pur. Appl. Math. 2004, 57: 219266. 10.1002/cpa.10116
Sumana IJ, Islam M, Zhang DS, Lu GJ: Content based image retrieval using curvelet transform, vol. 1 and 2 (2008 IEEE 10th Workshop on Multimedia Signal Processing, 2008). Queensland, Australia; 2008:1116.
Haley GM, Manjunath BS: Rotationinvariant texture classification using a complete spacefrequency model. IEEE T. Image Process. 1999, 8: 255269. 10.1109/83.743859
Tzagkarakis G, BeferullLozano B, Tsakalides P: Rotationinvariant texture retrieval with Gaussianized steerable pyramids. IEEE T. Image Process. 2006, 15: 27022718.
Kokare M, Biswas PK, Chatterji BN: Rotationinvariant texture image retrieval using rotated complex wavelet filters. IEEE T. Syst. Man. Cy. B 2006, 36: 12731282.
Rallabandi VR, Rallabandi VPS: Rotationinvariant texture retrieval using waveletbased hidden Markov trees. Signal Process 2008, 88: 25932598. 10.1016/j.sigpro.2008.04.019
Zhang DS, Islam MM, Lu GJ, Sumana IJ: Rotation invariant curvelet features for region based image retrieval. Int. J. Comput. Vision 2012, 98: 187201. 10.1007/s1126301105036
Islam MM, Zhang DS, Lu GJ: Rotation invariant curvelet features for texture image retrieval, Presented at the Icme, vol. 1â€“3 (2009 IEEE International Conference on Multimedia and Expo. New York, NY, USA; 2009.
Gomez F, Romero E: Rotation invariant texture characterization using a curvelet based descriptor. Pattern Recogn Lett 2011, 32: 21782186.
Zhang J, Marszalek M, Lazebnik S, Schmid C: Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vision 2007, 73: 213238. 10.1007/s1126300697944
Mikolajczyk K, Schmid C: Scale & affine invariant interest point detectors. Int. J. Comput. Vision 2004, 60: 6386.
Lowe D: Distinctive image features from scaleinvariant keypoints. Int. J. Comput. Vision 2004, 60: 91110.
Lazebnik S, Schmid C, Ponce J: A sparse texture representation using local affine regions. IEEE T. Pattern Anal. Mach. Intel. 2005, 27: 12651278.
Guo ZH, Zhang L, Zhang D: Rotation invariant texture classification using LBP variance (LBPV) with global matching. Pattern Recogn 2010, 43: 706719. 10.1016/j.patcog.2009.08.017
Li Z, Liu GZ, Yang Y, You JY: Scale and rotationinvariant local binary pattern using scaleadaptive texton and subuniformbased circular shift. IEEE T. Image Process. 2012, 21: 21302140.
Lazebnik S, Schmid C, Ponce J: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, in Computer Vision and Pattern Recognition (IEEE Computer Society Conference on, 2006. New York, NY, USA; 2006:21692178.
Silverman B: Density Estimation for Statistics and Data Analysis. London: Chapman & Hall; 1986.
Brodatz P: Textures: A Photographic Album for Artists and Designers. NewYork: Dover; 1966.
Ojala T, Maenpaa T, Pietikainen M, Viertola J, Kyllonen J, Huovinen S: Outex  new framework for empirical evaluation of texture analysis algorithms, in Pattern Recognition, 2002, vol. 1 (Proceedings. 16th International Conference on, 2002). Quebec City, QC, Canada; 701706.
Hayman E, Caputo B, Fritz M, Eklundh JO: On the significance of realworld conditions for material classification. In Computer Vision. Edited by: Pajdla T, Matas J. Berlin Heidelberg: Springer; 2004:253266.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The author declares that he has no competing interests.
Authorsâ€™ original submitted files for images
Below are the links to the authorsâ€™ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Cavusoglu, B. Multiscale texture retrieval based on lowdimensional and rotationinvariant features of curvelet transform. J Image Video Proc 2014, 22 (2014). https://doi.org/10.1186/16875281201422
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/16875281201422