 Research
 Open Access
 Published:
Minimal residual ordinal loss hashing with an adaptive optimization mechanism
EURASIP Journal on Image and Video Processing volume 2020, Article number: 10 (2020)
Abstract
The binary coding technique has been widely used in approximate nearest neighbors (ANN) search tasks. Traditional hashing algorithms treat binary bits equally, which usually causes an ambiguous ranking. To solve this issue, we propose an innovative bitwise weight method dubbed minimal residual ordinal loss hashing (MROLH). Different from a twostep mechanism, MROLH simultaneously learns binary codes and bitwise weights by a feedback mechanism. When the algorithm converges, the binary codes and bitwise weights can be well adaptive to each other. Furthermore, we establish the ordinal relation preserving constraint based on quartic samples to enhance the power of preserving relative similarity. To decrease the training complexity, we utilize a tensor ordinal graph to represent quartic ordinal relation, and the original objective function is approximated by the one based on triplet samples. In this paper, we also assign different weight values to training samples. During the training procedure, the weight of each data is initialized to the same value, and we iteratively boost the weight of the data whose relative similarity is not well preserved. As a result, we can minimize the residual ordinal loss. Experimental results on three largescale ANN search benchmark datasets, i.e., SIFT1M, GIST1M, and Cifar10, show that the proposed method MROLH achieves a superior ANN search performance in both the Hamming space and the weighted Hamming space over the sateoftheart approaches.
Introduction
The aim of hashing algorithms [1–6] is to learn the binary representations of data which can preserve their original similarity relationship in the Hamming space. Thus, hashing algorithms can retrieve the nearest neighbors of a query data according to Hamming distances. As the advantageous in storage and computation, hashing algorithms have recently been popular in various computer vision and artificial intelligence applications, e.g., image retrieval, object detection, multitask learning, linear classifier training, and active learning.
We roughly divide existing hashing algorithms into either dataindependent hashing or datadependent ones. The dataindependent hashing, such as localitysensitive hashing (LSH) [7], randomly generates hashing functions, and it typically requires a long binary code or multihash tables to achieve satisfying performance. In contrast, the datadependent hashing algorithms, such as BDMFH [8] and ARE [9], utilize machine learning mechanisms to learn similarity preserving binary codes. Bidirectional discrete matrix factorization hashing (BDMFH) [8] proposes to alternate two mutually promoted processes of learning binary codes from data and recovering data from the binary codes. To enforce the learned binary codes inheriting intrinsic structure from the original data, BDMFH designs an inverse factorization model. Angular reconstructive embeddings (ARE) method [9] learns binary codes by minimizing the reconstruction error between the cosine similarities computed by the original data and the binary embeddings. Usually, the datadependent hashing can obtain an excellent approximate nearest neighbors (ANN) search performance with compact binary codes. Furthermore, according to the similarity preserving restriction, the datadependent hashing can be divided into the absolute similarity preserving hashing [10, 11] and the relative similarity preserving hashing [6, 12]. The former ones emphasize that the Hamming distances of similar data pairs should be minimal enough, and they are proper for the semantic neighbor search task. The relative similarity preserving hashing demands that the ranking orders of data in different spaces should be consistent with each other. Thus, the relative similarity preserving hashing can achieve a better ANN search performance.
Traditional hashing algorithms treat each binary bit equally, which would cause an ambiguous ranking. For Mbit binary codes, there are \(C_{M}^{m}\) kinds of data sharing the same Hamming distance m to a query sample. To further explain this phenomenon, we give a simple example as in Fig. 1.
In Fig. 1, H={h_{1}(x),h_{2}(x)} represents a set of linear hashing functions, and it separately maps x_{1},x_{2}, and x_{3} to a 2bit binary code. If the importance of each binary bit is considered to be equal, x_{2} and x_{3} have the same Hamming distance to x_{1}. As a result, x_{2} and x_{3} will be simultaneously returned when retrieving the nearest neighbors of x_{1} in the Hamming space. However, the similarity degrees of (x_{1},x_{2}) and (x_{1},x_{3}) are different in the Euclidean space. To avoid such an ambiguous situation, the bitwise weight methods are proposed to assign different values to each binary bit. Thus, the similarity degree among the data pairs with the same Hamming distance can be distinguished by the weighted Hamming distances. In Fig. 1, according to the distribution of the query data x_{1} and the hashing functions, a larger weight value is assigned to h_{2}(x). As a result, the weighted Hamming distance of (x_{1},x_{2}) is larger than that of (x_{1},x_{3}). When retrieving the nearest neighbors of x_{1},x_{3} is firstly returned. As described above, in order to further distinguish the ranking orders of the data with the same Hamming distance to a query data, we should take the importance of bits into consideration. However, the bitwise weight methods, such as QaRank [13, 14], QsRank [15], WhRank [16], and QRank [17], just focus on learning bitwise weights by a twostep mechanism. In this setting, these methods firstly generate binary codes by an existing hashing method (e.g., LSH [7] and ITQ [10]), then generate bitwise weights according to the learnt codes. The twostage schema causes the learning process of binary codes and bitwise weights to separate with each other, and their performances cannot be iteratively boosted.
In this paper, we propose a novel bitwise weight method dubbed minimal residual ordinal loss hashing (MROLH) and the flowchart is shown in Fig. 2. To enhance the power of preserving relative similarity, we define the ordinal relation preserving objective function based on quartic samples in (a). In (b), we transform the constraint and utilize a tensor ordinal graph to decrease the training time consuming. Unlike most hashing, we simultaneously learn the relative similarity preserving binary codes and bitwise weights with a feedback mechanism by steps (c), (e), and (f). During the iterative training process, we update the weights of the data whose relative similarity is not well preserved by steps (d) and (g), which can minimize the residual performance loss. We compare the proposed MROLH against various stateoftheart hashing methods on three widely used benchmarks, SIFT1M [18], GIST1M [19], and Cifar10 [20]. Quantitative experiments demonstrate that our algorithm achieves the best ANN search performance in both the Hamming space and the weighted Hamming space.
The main contributions of this paper include:
1. In this paper, both binary codes and bitwise weights are demanded to preserve the original relative similarity of training data, and we establish the similarity preserving constraint based on quartic samples to enhance the power of preserving ordinal relation.
2. To decrease training time complexity, we embed the quartic ordinal relationship into a triplet one and utilize a tensor product graph to approximate the ordinal set.
3. During the iterative training process, we jointly learn binary codes and bitwise weights by a feedback mechanism to make them well adaptive to each other, and fix the problem of residual performance loss by boosting the weights of the data whose ordinal relation is not well preserved.
The rest of this paper is organized as follows: In Section 2, we briefly overview the relative similarity preserving hashing and the bitwise weight methods. Section 3 describes the proposed MROLH with three innovation measures. In Section 4, we show and analyze the comparative experiments on three large datasets. Finally, we conclude this paper in Section 5.
Related work
In this paper, we mainly focus on two issues: (a) How to preserve the original ordinal relation in the Hamming space and the weighted Hamming space. (b) How to guarantee bitwise weights and binary codes are well adaptive to each other.
To solve problem (a), we demand binary codes and bitwise weights to preserve the relative similarity. However, almost of the existing relative similarity preserving restrictions are defined based on triplet samples, which has an inferior ANN search performance. Minimal loss hashing [21] defines a hinglike loss to penalize the similar (dissimilar) data pair with a large (small) Hamming distance, and it solves this issue by optimizing the convexconcave upper bound of the objective function using a perceptionlike learning procedure. Triplet loss hashing [22] and listwise supervision hashing [23] directly demand that the Hamming distance among similar data points should be minimal than that among dissimilar data points. Ordinal preserving hashing (OPH) [12] divides all training data into different clusters, and all cluster centers are involved in computing the performance loss. However, OPH demands the distribution of training samples should be uniform. Ordinal constraint hashing (OCH) [6] aims to minimize retrieval loss by preserving ordinal relations of ranking tuples in the Hamming space. As the number of ranking tuples is quadratic or cubic to the size of the training samples, it is difficult to build ranking tuples efficiently in a largescale data set. To fix the above problem, OCH embeds in which the original quartic order relation can hold as the triplet order relation.
As Hamming distances are discrete integer values, many data pairs with different binary codes would share the same distance value which causes their relative similarity relationship hard to distinguish. To fix this issue, the bitwise weight methods propose to assign different weight values to each bit. QaRank [13, 14] learns bitwise weights by minimizing the intraclass distance while preserving the interclass relationship computed based on original training samples. The bitwise weights in QsRank [15] are learned according to the probability of mapping training samples to specified codes, and it is well designed for PCA hashing. WhRank [16] takes the distribution of query samples into consideration, which can effectively distinguish the similarity relationship among data pairs with the same binary codes. The bitwise weights in QRank [17] relates to the discriminate ability of hashing functions and the distribution of query data. Most bitwise weight methods adopt a twostep mechanism, which firstly learns binary codes by a hashing method (such as LSH [7] or ITQ [10]), then generates bitwise weights according to the learnt binary codes. As a result, the retrieval results obtained by weighted Hamming distances cannot further feedback the procedure of learning binary codes, which causes binary codes and bitwise weights not well adaptive to each other as in problem (b).
Methods
For x∈R^{d}, we can map it into Mbit binary code B={b_{1},⋯,b_{M}} by the hash functions H(x)={h_{1}(x),⋯,h_{M}(x)}, and the mth bit b_{m} is calculated as b_{m}(x)=sgn(h_{m}(x)). In this paper, h_{m}(x) is a linear function.
Generally, Hamming distances are utilized to achieve an ANN search task. But, it usually causes an ambiguous ranking order [15–17]. To avoid this embarrassing situation, we learn the bitwise weights W(x)={w_{1}(x),⋯,w_{M}(x)} of data x, and w_{m}(x) represents the mth bit weight function.
In this paper, to ensure the hashing functions H(x) and bitwise weight functions W(x) have an excellent performance, we propose three innovation measures which are described in Sections 3.1, 3.2, and 3.3.
The ordinal relation preserving constraint based on quartic samples
As discussed in many previous works [6, 12], both the absolute similarity preserving hashing and the relative similarity preserving hashing based on triplet samples have a poor performance in retrieving approximate nearest neighbors. In contrast, we demand binary codes and bitwise weights should satisfy the ordinal relation preserving constraint defined based on quartic samples as in Eq. (1). It directly maximums the number of the data points whose ordinal relation is well preserved in set C.
(x_{i},x_{j},x_{k},x_{l}) are the quartic samples which satisfy the ordinal relationship defined in the Euclidean space. I(·) is the judge function. It returns 1, if the condition is satisfied; otherwise, 0 is returned.
For the problem defined in Eq. (1), the primary question is how to construct the ordinal relation preserving set C. Generally, we can establish the set C by collecting similar data pairs and dissimilar ones. However, it is hard to define the similarity relationship. To fix this problem, we adopt a tensor product graph G to represent the ordinal relationship of quartic samples as below:
The definition of graph S is shown in Eq. (3), which utilizes the distance value to indicate the similarity relationship.
DS represents the dissimilar graph, and the value of DS(i,j) is computed as in Eq. (4).
⊗ represents the Kronecker product of matrixes, then G(ij,kl)=S(i,j)·DS(k,l). As a result, the value in G can represent the similarity relationship of quartic samples as in Eq. (5).
As described above, the ordinal relation preserving set C can be constructed according to the tensor ordinal graph G. But, for massive samples, the construction time complexity is relatively higher. So, we further transform the ordinal relation constraint as shown in Eq. (6).
where \(M=\sum _{i}x_{i}^{T}x_{i}\) is a positive semidefinite symmetrical matrix. So, it is convenient to use SVD to decompose into \(\phantom {\dot {i}\!}Z\in R^{d_{\text {svd}}\times d}\) such that M=Z^{T}ΛZ. Then, a mapping function can be defined as \(\phantom {\dot {i}\!}u_{i}={Zx}_{i}\in R^{d_{\text {svd}}}\), and the ordinal relation constraint can be written as in Eq. (7).
Finally, the objective function defined in Eq. (8) is utilized to learn binary codes. The set \(\hat {C}\) can be easily constructed by selecting the elements whose values are minimal than 1 in G.
Similarly, the ordinal relation preserving restriction for bitwise weights is redefined as in Eq. (9).
Minimal residual loss
For traditional algorithms, the weights of samples keep unchanged during training process. As a result, each hash function and bitwise weight just try to minimize the performance loss induced by its own, and the residual loss caused by their former ones are totally ignored. To fix up the above problem, we propose to iteratively boost the weights of the data whose similarity relationship is not well preserved.
Initially, we set the weight of each data as \(\frac {1}{n}\) (n is the number of the training samples), and we utilize Eq. (10) to update their weights during the training process.
\(\pi _{m}^{r}(x_{i})\) is the weight of x_{i} for the mth hash function or bitwise weight function during the rth training procedure. T(x_{i}) returns 0, if the similarity relationship among x_{i} and its nearest neighbors is preserved; otherwise, 1 is returned. The definition of \(\xi _{m}^{r}\) is shown in Eq. (12).
After introducing the data weights, we separately redefine the objective function for learning hash functions and bitwise weight functions as in Eqs. (13) and (14). \(\pi _{m}^{r}(x_{i})\) is the weight value of the samples when the algorithm converges.
Joint optimization
To make binary codes and bitwise weights well adaptive to each other, we propose a joint optimization mechanism, and the objective function is defined as in Eq. (15). During the training process, we iteratively optimize the parameters of hash functions and bitwise weight functions.
In this paper, the sign function is utilized to generate discrete integer values, which makes the objective function become NP hard problem. To solve this issue, we adopt tanh(·) to approximate sign(·) function. Then, the binary code is redefined as B(x_{i})=tanh(V^{T}x_{i}). Thus, we can separately compute the Hamming distance and the weighted Hamming distance by Eqs. (16) and (17). M is the number of binary bits. ⊙ is the bitwise product operation.
If we define \(\phi (\hat {c})\) and \(\phi (\tilde {c})\) as in Eqs. (18) and (19), the objective function can be rewritten as in Eq. (20).
When learning the mth hash function during the rth training procedure, the partial derivation of the objective function is shown in Eq. (21).
For the parameter v_{m}, the partial derivation of the Hamming distance function and the weighted Hamming distance function can be computed as in Eqs. (22) and (23).
As a result, the parameter v_{m} can be updated by Eq. (24) during the rth training procedure.
Similarly, for the parameter w_{m}, the partial derivation of the objective function is shown in Eq. (25).
During the iterative training procedure, we can compute the value of w_{m} by Eq. (27).
The iterative process for learning the hash functions and bitwise weight functions which can preserve the ordinal relation is described as in Algorithm 1.
Results and discussion
In this section, we describe the ANN search comparative experiments.
Experimental setting
In this paper, we evaluate the comparative experiments on three large datasets SIFT1M [18], GIST1M [19], and Cifar10 [20], which are widely used in ANN search experiments. The SIFT1M dataset contains 1 million SIFT descriptors [24] with 128 dimensions, and 100,000 of them are considered as training samples. We also randomly select 10,000 features from SIFT1M dataset as query samples. In GIST1M dataset, there are 1 million 320dimensional GIST descriptors [25], and we separately choose 50,000 and 10,000 data as training and query samples. The Cifar10 dataset contains 60,000 GIST features with 320 dimensions, and 50,000 samples are utilized as training dataset. Correspondingly, the number of query samples in Cifar10 dataset is 10,000.
The baseline methods include two kinds of algorithms: the binary code methods and bitwise weight methods. Localitysensitive hashing (LSH) [7], iterative quantization hashing (ITQ) [10], and kmeans hashing (KMH) [11] can generate the absolute similarity preserving binary codes. In contrast, ordinal constraint hashing (OCH) [6] aims to preserve the relative similarity in the Hamming space. QRank [17] and WhRank [16] assign different weights to each binary bit, which can be applied to further boost the ANN search performance of the binary code methods.
We use the criterion of mAP and recall to evaluate the ANN search performance. As defined in Eq. (28), recall represents the fraction of the positive data that are successfully returned. N_{positive} means the number of the positive data that are retrieved. N_{all} is the number of the true nearest neighbors.
The recall criterion cannot exactly express which position the ith positive data point locates in. To fix this problem, the criterion of mAP defined in Eq. (29) is adopted. Where Q represents the number of query samples, K_{i} is the number of the ith query sample’s ground truth. rank(j) is the ranking position of the jth true positive sample in the retrieval results.
Experimental results
In this section, the data are separately mapped into 32, 64, and 128bit binary codes, and their corresponding bitwise weights are learnt.
The purpose of hashing algorithms is to guarantee the approximate nearest neighbors’ retrieval results obtained in the Hamming space are identical to those in the Euclidean space. Therefore, we consider a data pair’s Euclidean distance as its true similarity degree, and we separately define the 10 and 100 samples with smaller Euclidean distances to a query data as its ground truth in this paper. We show the experimental results in Tables 1, 2, and 3, and Figs. 3, 4, and 5. In the experimental results, MROLB represents the retrieval results obtained according to the binary codes, and MROLH utilizes the bitwise weights to further improve the ANN search performance of MROLB. From the experimental results, we know that MROLH and MROLB separately obtains the best ANN search performance in the Hamming space and the weighted Hamming space.
LSH [7] randomly generates hashing functions without training process, and its performance cannot be obviously improved with the binary bits increasing. ITQ [10], KMH [11], and MROLB utilize a machine learning mechanism to generate compact binary codes which can achieve satisfying ANN search performance. ITQ [10] maps data points to the vertices of a hyper cubic. However, the vertices in ITQ [10] are fixed, and the encoding results are not adaptive to the data distribution. To fix this problem, KMH [11] learns encoding centers by simultaneously minimizing the quantization loss and the similarity loss. LSH [7], ITQ [10], and KMH [11] belong to the absolute similarity preserving hashing. In contrast, OCH [6] establishes an ordinal constraint to preserve the relative similarity among data points in the Hamming space. For the above hashing methods, the learning procedure of each binary bit is independent with each other, and the residual performance loss accumulated by former bits cannot be eliminated. To solve this problem, we propose to iteratively boost the weights of incorrectly encoded data during training process. Furthermore, we establish a ordinal relation preserving constraint based on quartic samples, which can obviously enhance the power of preserving relative similarity.
WhRank [16] can distinguish the similarity degree among the data pairs which have the same Hamming distance. Furthermore, the bitwise weights in QRank [17] and the proposed MROLH are sensitive to query data. As a result, for the data pairs with the same binary code, their similarity degree can be distinguished by QRank and MROLH. WhRank [16] and QRank [17] demand the bitwise weights should satisfy the absolute similarity preserving restriction, and utilize fixed binary codes to learn bitwise weights. Different from WhRank and QRank, we simultaneously learn the binary codes and bitwise weights by minimizing the ordinal relation preserving loss. As a result, MROLH can well preserve the relative similarity in both the Hamming space and the weighted Hamming space, and the ANN search performances can be iteratively boosted by the feedback mechanism.
The efficiency and convergence
An excellent hashing method should online encode a raw data efficiently and has a reasonable offline training time complexity [10]. Below, we separately discuss the time complexity of all compared methods.
For online encoding a query data as Mbit binary code, our algorithm, LSH [7], ITQ [10], OCH [6], and WhRank [16] need to compute the sign of the results projected by M linear functions, and they have the same time complexity of O(M). Correspondingly, KMH [11] should compute and compare the distances between a query data and 2^{M} centers, and its time complexity is O(2^{M}). QRank [17] firstly transforms a query data into anchor representation and computes its similarities to 2^{M} landmarks in O(r+2^{M}) and obtains queryadaptive weights by quadratic programming in polynomial time. Here, r represents the number of anchors.
For offline training stage, LSH [7] randomly generates M linear hashing functions with a constant time. QRank [17] represents 2^{M} landmarks using r anchors in O(2^{M}r). The time complexity of WhRank [16] is O(Mdk), and k represents the number of nearest neighbors. ITQ [10] iteratively optimizes a rotation matrix with a linear time complexity. In order to decrease training time complexity, OCH [6] and KMH [11] just select n (n≪N) samples with d dimensions from all N data to join in their training procedure. For each iteration, KMH [11] computes and compares the distances between n data points and 2^{M} centers, and the time complexity is O(2^{M}nd). In contrast, the overall training complexity of OCH [6] is O(tMn^{3}d+nN), and t is the number of iteration. For our algorithm, the training process includes three stages, and we separately discuss their time complexity as below: Firstly, we adopt kmeans algorithm to select n centers from N training samples, which needs to compare the distance relationship between each training data and all cluster centers. Therefore, the time complexity of the first stage is O(Nnd). Secondly, we utilize a gradient descent algorithm to minimize the performance loss, and the time complexity mainly depends on the number of training groups. Initially, a training group contains quartic items, and its number is n^{4}. Actually, we project the original set to an approximation ordinal relation set established based on triplet elements, and the number of training groups reduces to n^{3}. In addition, to map d dimensional data to Mbit binary code, the hash functions with Md parameters are learnt. As a result, the time complexity of the second stage is O(Mn^{3}d). Thirdly, to minimize the residual error, we update the weights of n training samples before learning each hashing function, and the time complexity of this stage is O(Mn). As described above, the overall training time complexity of our method is O(Nnd+Mn^{3}d+Mn).
To validate the above analysis, we separately test the efficiency of online encoding procedure and offline training process in the GIST1M dataset, and the time consumed is shown in Table 4.
Generally, we consider an algorithm to have converged when its objective value remains unchanged or changed a little. In this paper, we define the number of triplet elements whose ordinal relation is not well preserved as the objective value Θ. We conduct the convergence experiments in the GIST1M database, and the number of training samples is 50,000. As shown in Table 5, the objective value decreases as the iteration number increases. But, it changes a little after 700 iterations, and we consider the algorithm to have converged.
Conclusion
In this paper, we propose a novel hashing algorithm dubbed minimal residual ordinal loss hashing (MROLH). Different from tradition hashing algorithms, MROLH simultaneously learns binary codes and bitwise weights by a feedback mechanism. When the algorithm converges, the encoding results and bitwise weights are well adaptive to each other. In this paper, we aim to preserve the data pairs’ original relative similarity in both the Hamming space and the weighted Hamming space. Furthermore, we establish the relative similarity preserving constraint based on quartic samples to obviously enhance the power of preserving ordinal relation. During the training process, we iteratively boost the weight of the data whose relative similarity is not preserved. Thus, the residual performance loss can be minimized during later training procedure. Extensive experiments on three benchmark datasets demonstrate that the proposed MROLH is superior to many existing statoftheart approaches. In the future work, we will investigate to decrease the probability of an ambiguous ranking occurring at the top position of retrieval results.
Availability of data and materials
SIFT1M: [18]
GIST1M: [19]
Cifar10: [20]
Cifar10: http://www.cs.toronto.edu/~kriz/cifar.html
Please contact author for data requests.
Abbreviations
 ITQ:

Iterative quantization hash
 KMH:

kmeans hash
 LSH:

Localitysensitive hash
 MROLH:

Minimal residual ordinal loss hash
 QRank:

Querysensitive ranking method
 WhRank:

Ranking based on weighted hamming distance
References
 1
X. Luo, P. Zhang, Z. Huang, L. Nie, X. Xu, Discrete hashing with multiple supervision. IEEE Trans. Image Process.28(6), 2962–2975 (2019).
 2
M. Hu, Y. Yang, F. Shen, N. Xie, R. Hong, H.T. Shen, Collective reconstructive embeddings for crossmodal hashing. IEEE Trans. Image Process.28(6), 2770–2784 (2019).
 3
Y. Cui, J. Jiang, Z. Lai, Z. Hu, W. Wong, Supervised discrete discriminant hashing for image retrieval. Pattern Recog.78:, 79–90 (2018).
 4
C. Yue, B. Liu, M. Long, J. Wang, in Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Hashgan: deep learning to hash with pair conditional wasserstein gan (IEEESalt Lake City, 2018), pp. 1287–1296.
 5
C. Li, C. Deng, N. Li, W. Liu, X. Gao, D. Tao, in Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Selfsupervised adversarial hashing networks for crossmodal retrieval (IEEESalt Lake City, 2018), pp. 4242–4251.
 6
H. Liu, R. Ji, J. Wang, C. Shen, Ordinal constraint binary coding for approximate nearest neighbor search. IEEE Trans. Pattern. Anal. Mach. Intell.41(4), 941–955 (2019).
 7
M. Datar, N. Immorlica, P. Indyk, V.S. Mirrokni, in Proceedings of Twentieth Annual Symposium on Computational Geometry. Localitysensitive hashing scheme based on pstable distributions (ACMBrooklyn, 2004), pp. 253–262.
 8
S. He, B. Wang, Z. Wang, Y. Yang, F. Shen, Z. Huang, H.T. Shen, Bidirectional discrete matrix factorization hashing for image search. IEEE Trans. Cybern., 1—12 (2019). https://ieeexplore.ieee.org/document/8863122.
 9
M. Hu, Y. Yang, F. Shen, N. Xie, H.T. Shen, Hashing with angular reconstructive embeddings. IEEE Trans. Image Process.27(2), 545–555 (2018).
 10
Y. Gong, S. Lazebnik, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Iterative quantization: a procrustean approach to learning binary codes (IEEEColorado Springs, 2011), pp. 817–824.
 11
K. He, F. Wen, J. Sun, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Kmeans hashing: an affinitypreserving quantization method for learning binary compact codes (IEEEPortland, 2013), pp. 2938–2945.
 12
J. Wang, J. WANG, N. YU, S. Li, in Proceedings of the 21st ACM International Conference on Multimedia. Order preserving hashing for approximate nearest neighbor search (ACMBarcelona, 2013), pp. 133–142.
 13
Y.G. Jiang, J. Wang, S.F. Chang, in Proceedings of the 1st ACM International Conference on Multimedia Retrieval. Lost in binarization: queryadaptive ranking for similar image search with compact codes (ACMTrento, 2011), pp. 16–1168.
 14
Y. G. Jiang, J. Wang, X. Xue, S. F. Chang, Queryadaptive image search with hash codes. IEEE Trans. Multimedia. 15(2), 442–453 (2013).
 15
H.Y. Shum, L. Zhang, X. Zhang, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Qsrank: querysensitive hash code ranking for efficient neighbor search (IEEEProvidence, 2012), pp. 2058–2065.
 16
L. Zhang, Y. Zhang, J. Tang, K. Lu, Q. Tian, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Binary code ranking with weighted hamming distance (IEEEPortland, 2013), pp. 1586–1593.
 17
T. Ji, X. Liu, C. Deng, L. Huang, B. Lang, in Proceedings of the 22nd ACM International Conference on Multimedia. Queryadaptive hash code ranking for fast nearest neighbor search (ACMOrlando, 2014), pp. 1005–1008.
 18
H. Jegou, M. Douze, C. Schmid, Product quantization for nearest neighbor search. IEEE Trans. Pattern. Anal. Mach. Intell.33(1), 117–128 (2011).
 19
J. Wang, S. Kumar, S.F. Chang, Semisupervised hashing for largescale search. IEEE Trans. Pattern. Anal. Mach. Intell.33(12), 2393–2406 (2012).
 20
A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech Rep (2009).
 21
M. Norouzi, D.J. Fleet, in Proceedings of the 28th International Conference on Machine Learning. Minimal loss hashing for compact binary codes (ACMBellevue, 2011), pp. 353–360.
 22
M. Norouzi, D.M. Blei, R. Salakhutdinov, in Proceedings of the Advances in Neural Information Processing Systems, Harrahs and Harveys, Lake Tahoe, USA. Hamming distance metric learning (Curran Associates IncLake Tahoe, 2012), pp. 1070–1078.
 23
J. Wang, W. Liu, A.X. Sun, Y.G. Jiang, in Proceedings of the IEEE International Conference on Computer Vision. Learning hash codes with listwise supervision (IEEESydney, 2013), pp. 3032–3039.
 24
D. G. Lowe, Distinctive image features from scaleinvariant keypoints. Int. J. Comput. Vis.60(2), 91–110 (2004).
 25
A. Oliva, A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis.42(3), 145–175 (2001).
Acknowledgements
The authors would like to thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.
Funding
This work is funded by the Natural Science Foundation of Shandong Province of China (Grant No. ZR2018PF005), and the National Natural Science Foundation of China (Grant No. 61841602).
Author information
Affiliations
Contributions
All authors take part in the discussion of the work described in this paper. ZW, LZ, and PL conceived and designed the experiments. ZW performed the experiments. ZW, LZ, and FS analyzed the data. ZW, LZ, and FS wrote the paper. All authors read and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, Z., Sun, F., Zhang, L. et al. Minimal residual ordinal loss hashing with an adaptive optimization mechanism. J Image Video Proc. 2020, 10 (2020). https://doi.org/10.1186/s13640020004974
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13640020004974
Keywords
 Binary codes
 Bitwise weights
 Ordinal relation preserving
 Joint optimization
 Minimal residual loss