 Research Article
 Open Access
Rigid Registration of Renal Perfusion Images Using a NeurobiologyBased Visual Saliency Model
 Dwarikanath Mahapatra^{1}Email author and
 Ying Sun^{1}
https://doi.org/10.1155/2010/195640
© D. Mahapatra and Y. Sun. 2010
 Received: 19 January 2010
 Accepted: 6 July 2010
 Published: 21 July 2010
Abstract
General mutual information (MI) based registration methods treat all voxels equally. But each voxel has a different utility depending upon the task. Because of its robustness to noise, low computation time, and agreement with human fixations, the IttiKoch visual saliency model is used to determine voxel utility of renal perfusion data. The model is able to match identical regions in spite of intensity change due to its close adherence to the centersurround property of the visual cortex. Saliency value is used as a pixel's utility measure in an MI framework for rigid registration of renal perfusion data exhibiting rapid intensity change and noise. We simulated varying degrees of rotation and translation motion under different noise levels, and a novel optimization technique was used for fast and accurate recovery of registration parameters. We also registered real patient data having rotation and translation motion. Our results show that saliency information improves registration accuracy for perfusion images and the IttiKoch model is a better indicator of visual saliency than scalespace maps.
Keywords
 Human Visual System
 Salient Region
 Registration Accuracy
 Rigid Registration
 Saliency Model
1. Introduction
Image registration is the process of aligning two or more images which may be taken at different time instances, from different views or by different sensors (or modalities in medical imaging applications). The floating image(s) is (are) then registered to a reference image by estimating a transformation between them. Image registration plays a vital role in many applications such as video compression [1], video enhancement [2], scene representation [3], and medical image processing [4].
Medical image registration has acquired immense significance in automated or semiautomated medical image analysis, intervention planning, guidance, and assessment of disease progression or effects of treatment. Some of the applications have been in the areas of brain imaging [5], kidney (renal) perfusion images [6], and radiological images [7]. Over the years, rigid registration algorithms have used mutual information (MI) [8, 9], Fourier transforms [10–12], correlationbased methods [13–15] and attribute vectors [16]. For registering dynamic kidney perfusion images three approaches were tested in [17], namely, template matching, Fourier transforms, and cross correlation, and the Fourier transformbased approach was found to give the best performance. A method for correcting image misregistration due to organ motion in dynamic magnetic resonance (MR) images combines mutual correspondence between images with transform invariant features [18]. Other methods for registration of renal perfusion MR images are based on a combination of wavelet and Fourier transforms [6] and a contrast invariant similarity measure [19].
In dynamic contrast enhanced (DCE) MRI, a contrast agent (e.g., GdDTPA) is injected into the blood stream. The resulting images exhibit rapid intensity change in an organ of interest. Apart from intensity change, images from a single patient are characterized by noise and movement of the organ due to breathing or patient motion. Registering images with such rapid intensity changes is a challenge for conventional registration algorithms. Although previous works [6, 17–19] demonstrate good results in registering renal perfusion MR images, they fail to incorporate the contribution of the human visual system (HVS) in such tasks. The HVS is adept at distinguishing objects in noisy images, a challenge yet to be completely overcome by object recognition algorithms. Humans are also highly capable of matching objects and regions between a pair of images in spite of noise or intensity changes. We believe it is worthwhile to investigate whether a model of the HVS can be used to register images in the presence of intensity change. In this paper, we use a neurobiologybased HVS model for rigid registration of kidney MRI in an MI framework. As we shall, see later MI is a suitable framework to include the contribution of the HVS.
Most MIbased registration methods treat all voxels equally. But a voxel's utility or importance would vary depending upon the registration task at hand. For example, in renal perfusion MRI a voxel in the renal cortex has greater significance in registration than a voxel in the background even though they may have the same intensity. Luan et al. in [20] have defined a voxel's importance based on its saliency and used it in a quantitativequalitative mutual information (QMI) measure for rigid registration of brain MR images. Saliency refers to the importance ascribed to a voxel by the HVS. Different computational models have been proposed to determine saliency maps of images [21, 22]. An important characteristic of the HVS is its ability to match the same landmark in images exhibiting intensity change (as in DCE images). An accurate model of the HVS should be able to imitate this property and assign similar importance (or utility) values to corresponding landmarks in a pair of images. The entropybased saliency model used in [20], called scalespace maps, fails to achieve the desired objectives for DCE images.
Scalespace maps [21] calculate the entropy over different scales around a pixel's neighborhood and the maximum entropy at a particular scale is used to calculate the saliency value. When there is a change in intensity due to contrast enhancement the entropy (and hence saliency) value of a pixel also changes. As a result, the same landmark in two different images has different utility measures. But it is desirable that a landmark have the same utility value in different images. In contrast, the neurobiology based saliency model of [22] assigns the same importance to corresponding landmarks and has been shown to have a high correlation with human fixations [23]. Besides, it has advantages over scalespace maps in terms of robustness to noise and computational complexity. Therefore, we hypothesize that a neurobiological model of saliency would produce more accurate results than scalespace maps for rigid registration of kidney perfusion images. Saliency models have also been used for computer vision tasks like image retrieval [24] and image interpolation [25].
In this paper, we investigate the usefulness of a neurobiologybased saliency model for registering renal perfusion images. Our paper makes the following contributions. First, it investigates the effectiveness of a computational model of the HVS for image registration within the QMI framework proposed in [20]. Previously used saliency models are limited by their inaccurate correspondence with actual human fixations and sensitivity to noise. Our work is different from [20] in the use of saliency models. Second, we perform a detailed analysis of the effectiveness of different mutual informationbased similarity measures, with and without using saliency information, for the purpose of registering renal perfusion images. This gives an idea of the effectiveness of different saliency methods. Third, we use a randomized optimization scheme which evaluates greater number of candidate solutions, which minimizes the possibility of being trapped in a local minimum and increases registration accuracy. The rest of the paper is organized as follows. In Section 2, we describe the neurobiologybased saliency model, theoretical foundations of MIbased registration and our optimization scheme. Sections 3 and 4, respectively, give details about our method and experimental results. Finally we conclude with Section 5.
2. Theory
2.1. Saliency Model
 (1)
The changing intensity of perfusion images assigns different entropy and hence saliency values to corresponding pixels in an image pair exhibiting intensity change. This is undesirable when matching contrast enhanced images.
 (2)
There is the inherent problem of choosing an appropriate scale. For every voxel, the neighborhood (scale) that maximizes the local entropy is chosen to be its optimal scale resulting in unnecessary computational cost.
 (3)
Presence of noise greatly affects the scalespace map which results in erroneous saliency values. Since local entropy gives a measure of the information content in a region, presence of noise can alter its saliency value.
 (4)
The scalespace saliency map does not truly determine what is salient to the human eye. An entropybased approach takes into account distribution of intensity in a local neighborhood only. Thus the information derived is restricted to a small area in the vicinity of the pixel.
 (1)
An important aspect of the model is its centersurround principle which determines how different a pixel is from its surroundings. As long as a pixel has feature values different from its surroundings its saliency value is preserved, thus acting as a robust feature. This is better than the entropy model where the intensity distribution leads to different saliency values when intensity changes due to contrast enhancement.
 (2)
By representing the image in the form of a Gaussian pyramid, the need for determining the appropriate scale for every voxel does not arise.
 (3)
Inherent to the model is the process of lateral inhibition that greatly contributes to suppressing noise in the saliency map.
 (4)
The model, when used to identify salient regions in a scene, has high correlation with actual human fixations.
The model calculates a saliency map by considering intensity and edge orientation information from a given image. Saliency at a given location is determined primarily by the contrast between this location and its surroundings with respect to the image features. The image formed on the fovea of the eye is the central object on which a person is focusing his attention resulting in a clear and sharp image. Regions surrounding the central object have a less clearer representation on the retina. To simulate this biological mechanism, an image is represented as a Gaussian pyramid comprising of layers of subsampled and lowpass filtered images. The central representation of the image on the fovea is equivalent to the image at higher spatial scales, and the surrounding regions are obtained from the lower spatial scales. The contrast is thus the difference between the various feature maps at these scales.
where denotes centersurround difference, the center is given by level and the surround is given by level , in the Gaussian pyramid. Thus, we have contrast maps for every feature. Although the original model uses three features, including color, intensity, and edge information, we use only intensity and edge information because our datasets were in grayscale. The edge information is obtained from the image by using oriented Gabor filters [29] at different orientation angles ( , , , and ). In total feature maps are obtained, for edge orientation and for intensity.
 (1)
Normalize the values in the map to a fixed range to eliminate modality or featuredependent amplitude differences. We set in our experiments.
 (2)
Find location of the map's global maxima, , and calculate the average of its other local maxima.
 (3)
Globally multiply the map by .
2.1.1. Saliency Map in D
The gap between slices of the original volume is mm which does not provide sufficient information along the axis to extend each step of the saliency map to D. Intensity maps can be obtained directly from the data but calculating orientation maps proves to be challenging as D oriented Gaussian filters are computationally intensive. Therefore, for each slice of the D volume, we calculate its D saliency map which is subsequently used for registration.
2.2. Rigid Registration
Rigid registration requires us to align a floating image (volume) with respect to a reference image (volume) by correcting any relative motion between them. For simplicity, we describe the registration framework in terms of D images but our experiments were for D volumes. Let be the floating image (volume for D data) which is to be registered to a reference image . For D volumes there are 6 degrees of freedom (i.e., translation and rotation along each of ,  and axis) while D images have degrees of freedom. The similarity between two images is determined from the value of a similarity measure which depends upon the type of images being registered. The parameters for translation and rotation that give maximum value of the similarity measure are used to register the floating image.
To determine the effectiveness of the neurobiology model of saliency, we used it in a QMIbased cost function for rigid registration. This cost function combines saliency information (or utility measure) with the MI of the two images to evaluate the degree of similarity between them. A joint saliency (or joint utility) histogram, similar to a joint intensity histogram, is used to determine the cooccurrence of saliency values in the saliency maps of the images under consideration. We follow the QMI definition and formulation of [20].
2.2.1. QuantitativeQualitative Measure of Mutual Information
which is the relative entropy between the joint distribution, , and the product of marginal distributions and .
where the utility can be any nonnegative real number.
where is the joint utility of the events and .
2.3. SaliencyBased Registration
Joint Utility
where the summation is over all pairs of pixels with intensity values ; and are the voxels under consideration. We use the multiplication operator to consider the joint occurrence of utility values. For example, to calculate the joint utility of intensity pair (128,58), we find all the pairs of points such that all points in image have intensity and the corresponding points in image has intensity . The joint utility is determined by multiplying the saliency values for a pair of points and summing over all such pairs. A normalized saliency map is used so that the most salient regions in two images have an equal importance of . However, the joint utility value can exceed as it reflects the joint importance of intensity pairs and not just individual utility values.
2.4. Optimization
 (1)
The original image is subsampled to three coarser levels. indicates the original image; indicates a subsampling factor of , indicates a factor of , and indicates a subsampling factor of .
 (2)
At , we perform an exhaustive search individually for each DOF and the optimal parameters are used to transform the image. The search range is voxels for translation along , , axis ( ) and degree for rotation about , , axis ( ).
 (3)
The registration parameters are interpolated which act as starting points for . The DOFs are individually optimized in two passes: first, rotation parameters over a search range of degrees and then , , and with search ranges of 5, 5, and 2 voxels. The optimal parameters are used to transform the volume and a second pass with the same sequence of steps is performed. The volume is transformed only if the parameters from the second pass indicate a better match than the parameters from first pass
 (4)
The same process as step is repeated at a finer resolution level of the image.
 (5)
The parameters from are interpolated to and an exhaustive search is carried out for ( 3 degrees), ( 5 voxels) and ( 2 voxels).
 (6)
The final parameters are used to get the registerd image.
The above optimization scheme proves to be robust as we pick the DOF to be optimized at random and repeat the entire scheme.
2.4.1. Results for DerivativeBased Optimizer
The Powell's optimization routine that we adopt is highly suitable for cost functions whose derivatives are not available and the computation cost is prohibitive. It works by evaluating candidate solutions in the parameter space over straight lines, that is, linear combinations of parameters. Such combinations require a bracketing of the minimum before the optimization can be started [34]. As a result, several necessary criterion estimations have to be performed which is inefficient when using a multiresolution strategy. Thévenaz et al. in [35] propose an optimization method based on the derivative of the similarity measure that makes better use of a multiresolution optimization setup.
where and are the saliency values of the reference and floating images. denotes the cooccurring intensity pairs and . The utility measure is treated as a constant although it is dependent upon the cooccurring intensity pairs of and . This is achieved by actually transforming the original saliency map of according to the transformation, , incurring a minor additional computational cost. Parzen windows is not used because the joint utility histogram is not a distribution of saliency values but the sum of the product of saliency values of cooccurring intensity pairs.
To compute the QMI value at different transformations we also calculate the second derivative of as its Hessian . We refer the reader to [35] for details regarding calculation of and derivative of the joint probability distribution, that is, in (17). Note that the utility is always treated as a constant, and as shown in (17), does not change the essence of the way derivatives of the cost functions are calculated.
A derivativebased cost function makes the method quite sensitive to the initial search parameters and their wrong choice may even lead to nonconvergence. Therefore, a multiresolution framework is used to get good candidate parameters from the first step. A level image pyramid is created with the fourth level denoting the coarsest resolution. The parameters from the coarsest level are used to find the optimal parameters at finer levels by using the derivative of mutual information. This results in a significant reduction of computation time as compared to Powell's method where greater number of parameters need to be evaluated.
Details of derivation of the different equations can be found in [35]. The optimization routine from the insight registration and segmentation toolkit (ITK) [36] was used. Each image was decomposed to resolutions (similar to the scheme using Powell method) and registered using , , and by Thévenaz's optimization framework. To calculate the joint utility measure, the saliency maps of ( ) and ( ) are calculated and for every parameter, is transformed to get the new map . and are used to calculate the joint utility measure at every step.
Although the computation time is significantly lower than Powell's method the registration results are sensitive to the initial conditions. If the optimal parameters determined from the coarsest image resolution is far away from the actual transformation parameters then it is highly unlikely that Thevenaz's scheme will converge at the right solution. This problem is particularly acute when no multiresolution strategy is used. In that case, Powell's method is markedly superior. In a multiresolution setup when the initial conditions are good, Thevenaz's method converges in less time as compared to Powell's method with significantly less number of evaluations, but similar accuracy. Thevenaz's method can stop at any time and simultaneously optimizes all parameters from the first criterion resulting in a reduction in the number of criterion evaluations.
A clear advantage of the Powell method is its robustness. This calls for the use of a derivativebased global optimization method using Powell's method in the coarsest stage. Subsequently, Thevenaz's method can be used in the finer stages for faster convergence. The registration accuracy using such an approach is consistently closer to the values reported in Table 2. Without using Powell's method in the coarsest stage, the registration error for many of the volume pairs is greater than using Powell's method.
3. Experiments
3.1. Subjects
The volumes were obtained from healthy volunteers ( women and men, years) and patients ( women and men, years) with renal insufficiency manifested by serum creatinine mg/dl ( mg/dl). Written informed consent was obtained from all subjects. All the datasets were used for testing. Note that every dataset comprised of kidneys. The results for each dataset are the average errors for tests on both kidneys.
3.2. MRI Acquisition Protocol
Dynamic MRI was performed on a T system (Avanto; Siemens, Erlangen, Germany) with a maximum slew rate of T/m/s, maximum gradient strength of mT/m, and a torso phasedarray coil. D weighted spoiled gradientecho imaging was performed in the oblique coronal orientation to include the abdominal aorta and both kidneys. The following parameters were used: ms, ms, flip , , , Hz/voxel, volume acquisition s. The original 5mm coronal partitions were interpolated to mm slices.
Five unenhanced acquisitions were performed during a single breathhold. A ml bolus of GdDTPA(Magnevist; Berlex laboratories, Wyne, NJ, USA) was then injected, followed by ml of saline, both at ml/s. Over min, D volumes were acquired using a variable sampling schedule: sets acquired at s intervals, followed by sets at intervals of s, followed by at s intervals, and ending with sets over one minute intervals. The first sets were attempted to be acquired within a single breathhold. Before each subsequent acquisition, the patients were instructed to suspend respiration at endexpiration. Oxygen via nasal cannula was routinely offered to the patients before the exam to facilitate breathholding. For image processing, all D volumes ( acquired before and after contrast agent injection) were evaluated.
3.3. Registration Procedure
Two volumes of interest (VOI), each encompassing a kidney were selected from each volume. We test the effectiveness of our algorithm by registering the entire VOI sequence of each patient to a reference VOI. Each kidney had a different reference VOI. For different cases, different pre and postcontrast VOIs were chosen as reference. Saliency maps were calculated for each slice of a VOI and saliency information from these maps was used to define the utility measure of each voxel. For every referencefloating VOI pair, the floating VOI is transformed according to the scheme outlined in Section 2.4 and for each candidate transformation parameter, the QMIbased similarity measure (6) is calculated. The candidate transformation parameters that give the maximum value of QMI are used to get the final transformation. We evaluate the performance of our algorithm using the ground truth for registration provided by a clinical expert.
To check for the robustness and effectiveness of the proposed similarity measure we determined its characteristics with change in transformation parameters. For this purpose, rotation and translation motion was simulated on the datasets. In an attempt to recover the applied motion the value of the similarity measure at different candidate transformation parameters was calculated. The characteristics thus obtained gave an idea of the suitability of the similarity measure for registering DCE images. The robustness of different similarity measures was determined by first misaligning the images by different degrees of known translation and rotation. Three different similarity measures were used in the tests, namely, normalized mutual information ( ) [37], QMI in [20] ( ), and our proposed method ( ). NMI is a popular similarity measure used for registering multimodal images; that is, images of the same organ but from different modalities such as MR and CT, and its performance can help us gauge the effectiveness of our method.
4. Results
We present results for different experiments that show the importance of using saliency in registering DCE images of the kidney. datasets comprising of D volumes were used and each volume consists of slices. Manual registration parameters by experts were available for each dataset facilitating performance comparison. First, we present proof of the suitability of saliency for registering contrast enhanced images. Then we show properties of the different similarity measures with respect to registration. These sets of results are similar to those presented in [20]. They highlight the fact although was a good measure to register brain MR images, shows better performance than in registering renal perfusion images. This is reflected in the properties of the different similarity measures. Finally, we present registration results of real patient datasets and compare relative performance of different similarity measures with respect to manual registration parameters.
For simulated motion, registration was deemed to be accurate if .
4.1. Saliency Maps for Pre and Postcontrast Enhanced Images
4.2. Registration Functions
A similarity measure for two images should have the following desirable properties: (a) it should be smooth and convex with respect to the transformation parameters; (b) the global optimum of the registration function should be close to the correct transformation that aligns two images perfectly; (c) the capture range should be as large as possible; and (d) the number of local maxima should remain at a minimum. We can determine the registration function of by calculating its value under different transformations.
It is to be kept in mind that the profile for the different similarity measures in Figure 3 is for . For the performance of and is comparable, that is, the maximum of the similarity measures is mostly at zero relative error. When , shows a superior performance demonstrating the efficacy of a neurobiology based saliency model. Similarly, for , performance of is comparable to the other two saliency measures but degrades once . The corresponding threshold for is . The accuracy (from (22)) in recovering the correct transformation was for , for , and for .
In most cases, was unable to detect the right transformation between a pair of pre and postcontrast images. Figure 4(a) shows two maxima for at nonzero error, in addition to being noisy. Such characteristics are undesirable for registration. For although there are no multiple maxima, it is at nonzero relative error. It is observed that even though performs better than due to use of saliency, outperforms both of them.
The accuracy rate for registering DCE images was for , for , and for . The low registration accuracy of makes it imperative that we investigate the reason behind it. We shall do this with the help of an example.
We want to register the central patch in image Figure 5(a) similar to a region of interest, the values of which are highlighted in bold. The intensity values of Figure 5(c) only indicate contrast enhancement without any kind of motion. For an ideal registration, the central patch of Figure 5(a) should give maximum value of NMI (from [37]) for the central patch of Figure 5(c). The value in this case is . However, the maximum value is obtained for the image patch shown in bold in Figure 5(c) ( ), which corresponds to a displacement of one pixel to the left and one pixel down. Although there is no translation motion, the maximum value of is obtained for parameters corresponding to such motion. The intensity change in the image patch is quite similar to what we observe for DCE images of the kidney. Consequently, the maximum value is obtained at nonzero relative error and more than one maximum is observed for many cases. Thus, there are a significantly high number of misregistrations using which contributes to its high error rate.
From these observations, we infer that performs well when a particular intensity in the first image ( ) is mapped to a distinct intensity in the second image ( ). If two intensity values in are mapped to the same intensity value in or viceversa then leads to poor matching. Due to contrast enhancement, it is very common to find more than one intensity mapped to a single intensity. Consequently, based registration is prone to error which is reflected in the error measures.
4.3. Robustness of Registration
A robust registration algorithm should be able to recover the true transformation between two images even if the initial misalignment between them is very large. We evaluate the robustness of , , and under various amounts of initial misalignment between two kidney MR images. Four sets of tests were performed where the degree of initial misaligned rotation angles were randomly picked from four different rotation ranges, that is, , , , and degrees. Similarly, misalignment was simulated for translational motion in the , , and directions. The misalignment values varied between , and mm. For each misalignment range, we performed registrations between different pairs of images. Zero mean Gaussian noise of variance was added to the images.
The average translation error along the axes was ( ) mm for , ( ) mm for , and ( ) mm for . The average rotation errors were ( ) degrees for , ( ) degrees for and ( ) degrees for . The maximum errors for simulated motion was mm and for , mm and for , and mm and for .
Average translation error and registration accuracy for different noise levels. The figures are for simulated motion studies on all volumes of the sequence. Translation errors are for values along , , axis.
Variance of Added noise ( )  Average Registration Error (in mm)  Registration Accuracy in %  






 
0  (5.3,5.2,0.5)  (1.9,1.7,0.2)  (1.2,1.1,0.2)  68.1  88.9  98.8 
0.01  (5.3,5.2,0.6)  (1.7,1.6,0.3)  (1.3,1.3,0.2)  67.2  88.1  98.3 
0.04  (5.5,5.5,0.8)  (1.8,1.8,0.4)  (1.4,1.4,0.3)  61.3  83.2  95.3 
0.06  (5.8,5.9,1.0)  (1.9,1.9,0.6)  (1.6,1.5,0.4)  47.1  78.2  92.1 
0.085  (6.2,6.3,1.1)  (2.2,2.2,0.7)  (1.7,1.7,0.50)  41.2  62.3  89.1 
0.1  (6.4,6.5,1.3)  (2.4,2.4,0.9)  (1.9,1.9,0.8)  40.1  57.4  75.6 
Average translation errors for rigid registration. is normalized mutual information. is the measure in [20] using scalespace maps. is our approach using the neurobiologybased saliency model. All values are in units of mm.
Dataset 




Dataset1  (4.8,4.3,0.5)  (2.0,1.7,0.3)  (1.2,1.3,0.2) 
Dataset2  (5.1,5.7,0.4)  (1.3,1.4,0.4)  (1.2,1.2,0.2) 
Dataset3  (5.0,4.7,0.6)  (1.7,1.7,0.3)  (1.3,1.2,0.3) 
Dataset4  (5.2,5.0,0.6)  (1.5,1.6,0.4)  (1.3,1.2,0.2) 
Dataset5  (4.7,4.8,0.7)  (1.7,1.7,0.4)  (1.2,1.3,0.2) 
Dataset6  (5.1,4.9,0.5)  (1.52,1.4,0.3)  (1.1,1.0,0.2) 
Dataset7  (5.2,5.9,0.4)  (1.4,1.5,0.2)  (1.3,1.4,0.1) 
Dataset8  (6.5,6.1,0.4)  (1.7,1.6,0.2)  (1.2,1.0,0.1) 
Dataset9  (4.9,4.2,0.5)  (1.7,1.5,0.3)  (1.2,1.1,0.1) 
Dataset10  (5.4,5.4,0.5)  (1.4,1.3,0.3)  (1.3,1.2,0.1) 
Average Error  (5.2,5.1,0.5)  (1.6,1.5,0.3)  (1.2,1.2,0.2) 
4.4. Registration Accuracy for Real Patient Data
Average rotation errors for rigid registration. is normalized mutual information. is the measure in [20] using scalespace maps. is our approach using the neurobiologybased saliency model. All values are in units of degrees.
Dataset 




Dataset1  (0,0,2.75)  (0,0,0.56)  (0,0,0.43) 
Dataset2  (0,0,2.71)  (0,0,0.50)  (0,0,0.44) 
Dataset3  (0,0,2.67)  (0,0,0.55)  (0,0,0.41) 
Dataset4  (0,0,2.66)  (0,0,0.53)  (0,0,0.39) 
Dataset5  (0,0,2.72)  (0,0,0.52)  (0,0,0.40) 
Dataset6  (0,0,4.81)  (0,0,0.53)  (0,0,0.32) 
Dataset7  (0,0,4.23)  (0,0,0.65)  (0,0,0.44) 
Dataset8  (0,0,3.98)  (0,0,0.75)  (0,0,0.29) 
Dataset9  (0,0,3.12)  (0,0,0.54)  (0,0,0.31) 
Dataset10  (0,0,3.33)  (0,0,0.58)  (0,0,0.24) 
Average Error  (0,0,3.31)  (0,0,0.57)  (0,0,0.36) 
For all datasets, shows a higher error measure compared to and . This can be attributed to the errors due to registering pre and postcontrast image pairs. For , the maximum error was as high as mm for translation and degrees for rotation. Such a large error is not desirable, especially in medical image registration. For the maximum error was mm and degrees and the corresponding values for were mm and degrees, respectively. Moreover, the average error values for were higher than that of and . For translation along axis, there was no significant difference between error values of different similarity measures as there is hardly any motion along the axis. For rotation, we see that the error values for  and axis are all because there is no rotation about these axes. Rotational motion is observed only about the axis with the average error measures for much greater than those for and .
4.5. Computation Time
The difference between our method and the one proposed in [20] is the choice of saliency models. While we use the saliency model of [22], Luan et al. use the scalespace method of [21]. The source code for both the methods is available from the websites of the respective authors. For a kidney image of dimension , the average time taken to calculate the scale space map and identify salient regions was seconds while the neurobiology based saliency map could be computed in seconds on average. The difference in computing saliency maps is not significant and in registering a large number of images by our method, the saving in computation time is a few seconds.
Another difference from the method in [20] is an optimization scheme that incorporates a certain degree of randomness, thus reducing the chances of being trapped in a local minimum. This modification involves a marginally greater number of steps leading to a slight increase in computation time. While the average time taken by our method (inclusive of calculating saliency maps) is s for registering a pair of volumes, the corresponding average time for the method in [20] was s. By Thevenaz's method, the computation time reduces to s using and s for .
5. Discussion and Conclusion
In this work, we have investigated a neurobiological model of visual saliency and its use in registering perfusion images. The motivation was to determine whether the HVS's ability to recognize and match images in presence of noise and contrast enhancement can be simulated by a computational model. We register MR kidney perfusion volumes because they exhibit rapid intensity change and the acquired datasets also have a significant amount of noise.
The neurobiologybased saliency model is used because it produces very similar saliency maps for a pair of images with intensity change between them and facilitates registration in the face of contrast enhancement. We do a comparative study of the effectiveness of different saliency models for registering renal perfusion images and find the neurobiologybased model to be better than scalespace maps.
Several factors contribute to the superior performance of the neurobiological model of saliency. There are certain inherent faults in the scale space method used in [20] to get saliency information. First, the change in intensity assigns different saliency values to corresponding voxels in an image pair. This is undesirable for registration. Second, there is the problem of the choice of an appropriate scale (neighborhood) for calculating the local entropy of a voxel. The scale which gives the maximum value of entropy is chosen as the best scale, thus making the procedure computationally intensive. Third, since it is an entropybased method, noise can greatly affect the entropy value leading to erroneous results. Fourth, a scalespace saliency map of an image does not truly represent what is salient to the human eye. In the neurobiology model, the centersurround approach assigns the same saliency value to corresponding pixels in an image pair and a Gaussian pyramidal representation of the image eliminates the need for determining the optimal scale for each voxel. An important part of the model is the process of lateral inhibition that suppresses noise giving rise to a saliency map that has distinctly salient regions. Lastly, the neurobiology model has been used to predict human fixations in a scene and there is high degree of correlation between the predicted and actual fixations.
Our optimization technique also contributes to improved performance of our method. Instead of following a set pattern for optimizing the DOFs, we introduce a degree of randomness in the entire optimization scheme based on Powell's method. A level multiresolution approach was adopted where candidate transformation parameters for different DOFs were first calculated at the coarsest level and the solution propagated to finer levels. The optimization routine was repeated at the finer levels to get the final transformation. The sequence of DOFs optimized is random. By adopting this method the optimization scheme avoids being trapped in local optima and reachs the global optima, as determined by an exhaustive search, in most of the experiments. This approach also gives better performance than the optimization scheme outlined in [33]. We also use a derivativebased optimizer (Thévenaz's method) to determine the optimal registration parameters. If the starting point for the search is close to the actual optima ths method gives accurate results in significantly less time. An approach using Powell's method for search at the coarsest level followed by Thevenaz's method at finer levels gives registration accuracy close to what is obtained using Powell's method at all levels but in significantly lesser computation time.
Thus, we conclude that the neurobiological model of saliency gives a fairly accurate working of the HVSbased on bottomup cues alone. It is robust to varying degrees of noise and simulated motion. The original model in [22] uses color, intensity, and edge orientation as features in determining the saliency map. But, for our work, we use only intensity and edge orientation information since our datasets are in gray scale. The findings of our experiments provide a basis for investigating how saliency can be used in more challenging registration tasks and also in other computer vision applications like tracking.
Declarations
Acknowledgments
The authors would like to thank Dr. Vivian S. Lee, Professor of Radiology, Physiology, and Neuroscience, ViceDean for Science, Senior VicePresident, and Chief Scientific Officer, New York University Medical Center, for providing the datasets. This work was supported by NUS Grant R263000470112.
Authors’ Affiliations
References
 Dufaux F, Konrad J: Efficient, robust, and fast global motion estimation for video coding. IEEE Transactions on Image Processing 2000,9(3):497501. 10.1109/83.826785View ArticleGoogle Scholar
 Irani M, Peleg S: Motion analysis for image enhancement: resolution, occlusion, and transparency. Journal of Visual Communication and Image Representation 1993,4(4):324335. 10.1006/jvci.1993.1030View ArticleGoogle Scholar
 Irani M, Anandan P, Hsu S: Mosaic based representations of video sequences and their applications. Proceedings of the 5th International Conference on Computer Vision, June 1995 605611.View ArticleGoogle Scholar
 Hill DLG, Batchelor PG, Holden M, Hawkes DJ: Medical image registration. Physics in Medicine and Biology 2001,46(3):R1R45. 10.1088/00319155/46/3/201View ArticleGoogle Scholar
 Lao Z, Shen D, Jawad A, Karacali B, Liu D, Melhem ER, Bryan RN, Davatzikos C: Automated segmentation of white matter lesions in 3D brain MR images, using multivariate pattern classification. Proceedings of the 3rd IEEE International Symposium on Biomedical Imaging, April 2006 307310.Google Scholar
 Song T, Lee VS, Rusinek H, Kaur M, Laine AF: Automatic 4D registration in dynamic mr renography based on overcomplete dyadic wavelet and Fourier transforms. Proceedings of the 8th International Conference on Medical Image Computing and ComputerAssisted Intervention (MICCAI '05), October 2005, Palm Springs, Calif, USA, Lecture Notes in Computer Science 3750: 205213.Google Scholar
 Hawkes DJ: Algorithms for radiological image registration and their clinical application. Journal of Anatomy 1998,193(3):347361. 10.1046/j.14697580.1998.19330347.xView ArticleGoogle Scholar
 Viola P, Wells WM III: Alignment by maximization of mutual information. International Journal of Computer Vision 1997,24(2):137154. 10.1023/A:1007958904918View ArticleGoogle Scholar
 Collignon A, Maes F, Delaere D, Vandermeulen D, Suetens P, Marchal G: Automated multimodality image registration based on information theory. Proceedings of the International Conference on Information Processing in Medical Imaging (IPMI '95), 1995 263274.Google Scholar
 Keller Y, Averbuch A, Israeli M: Pseudopolarbased estimation of large translations, rotations, and scalings in images. IEEE Transactions on Image Processing 2005,14(1):1222.View ArticleMathSciNetGoogle Scholar
 Wolberg G, Zokai S: Robust image registration using logpolar transform. Proceedings of the International Conference on Image Processing (ICIP '00), September 2000, Vancouver, Canada 493496.Google Scholar
 Reddy BS, Chatterji BN: An FFTbased technique for translation, rotation, and scaleinvariant image registration. IEEE Transactions on Image Processing 1996,5(8):12661271. 10.1109/83.506761View ArticleGoogle Scholar
 Lemieux L, Jagoe R, Fish DR, Kitchen ND, Thomas DGT: A patienttocomputedtomography image registration method based on digitally reconstructed radiographs. Medical Physics 1994,21(11):17491760. 10.1118/1.597276View ArticleGoogle Scholar
 Keller Y, Averbuch A: A projectionbased extension to phase correlation image alignment. Signal Processing 2007,87(1):124133. 10.1016/j.sigpro.2006.04.013View ArticleMATHGoogle Scholar
 Wong A, Fieguth P: Fast phasebased registration of multimodal image data. Signal Processing 2009,89(5):724737. 10.1016/j.sigpro.2008.10.028View ArticleMATHGoogle Scholar
 Shen D, Davatzikos C: HAMMER: hierarchical attribute matching mechanism for elastic registration. IEEE Transactions on Medical Imaging 2002,21(11):14211439. 10.1109/TMI.2002.803111View ArticleGoogle Scholar
 Giele ELW, De Priester JA, Blom JA, Den Boer JA, Van Engelshoven JMA, Hasman A, Geerlings M: Movement correction of the kidney in dynamic MRI scans using FFT phase difference movement detection. Journal of Magnetic Resonance Imaging 2001,14(6):741749. 10.1002/jmri.10020View ArticleGoogle Scholar
 Gupta SN, Solaiyappan M, Beache GM, Arai AE, Foo TKF: Fast method for correcting image misregistration due to organ motion in timeseries MRI data. Magnetic Resonance in Medicine 2003,49(3):506514. 10.1002/mrm.10394View ArticleGoogle Scholar
 Sun Y, Jolly MP, Moura JMF: Integrated registration of dynamic renal perfusion MR images. Proceedings of the International Conference on Image Processing (ICIP '04), October 2004, Singapore 19231926.Google Scholar
 Luan H, Qi F, Xue Z, Chen L, Shen D: Multimodality image registration by maximization of quantitativequalitative measure of mutual information. Pattern Recognition 2008,41(1):285298. 10.1016/j.patcog.2007.04.002View ArticleMATHGoogle Scholar
 Kadir T, Brady M: Saliency, scale and image description. International Journal of Computer Vision 2001,45(2):83105. 10.1023/A:1012460413855View ArticleMATHGoogle Scholar
 Itti L, Koch C, Niebur E: A model of saliencybased visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998,20(11):12541259. 10.1109/34.730558View ArticleGoogle Scholar
 Itti L, Koch C: A saliencybased search mechanism for overt and covert shifts of visual attention. Vision Research 2000,40(10–12):14891506.View ArticleGoogle Scholar
 Feng S, Xu D, Yang X: Attentiondriven salient edge(s) and region(s) extraction with application to CBIR. Signal Processing 2010,90(1):115. 10.1016/j.sigpro.2009.05.017View ArticleMATHGoogle Scholar
 Chen HY, Leou JJ: Saliencydirected image interpolation using particle swarm optimization. Signal Processing 2009,90(5):16761692.View ArticleMATHGoogle Scholar
 Bergholm F: Edge focussing. IEEE Transactions on Pattern Analysis and Machine Intelligence 1987,9(6):726741.View ArticleGoogle Scholar
 Deriche R, Giraudon G: A computational approach for corner and vertex detection. International Journal of Computer Vision 1993,10(2):101124. 10.1007/BF01420733View ArticleGoogle Scholar
 Renninger LW, Verghese P, Coughlan J: Where to look next? Eye movements reduce local uncertainty. Journal of Vision 2007,7(3, article 6):117.View ArticleGoogle Scholar
 Greenspan H, Belongie S, Goodman R, Perona P, Rakshit S, Anderson CH: Overcomplete steerable pyramid filters and rotation invariance. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 1994, Seattle, Wash, USA 222228.Google Scholar
 Cannon MW, Fullenkamp SC: A model for inhibitory lateral interaction effects in perceived contrast. Vision Research 1996,36(8):11151125. 10.1016/00426989(95)001808View ArticleGoogle Scholar
 Belis M, Guiasu S: A quantitativequalitative measure of information in cybernetic systems. IEEE Transactions on Information Theory 1968, 14: 593594. 10.1109/TIT.1968.1054185View ArticleGoogle Scholar
 Cover TM, Thomas JA: Elements of Information Theory. Wiley, New York, NY, USA; 1991.View ArticleMATHGoogle Scholar
 Jenkinson M, Smith S: A global optimisation method for robust affine registration of brain images. Medical Image Analysis 2001,5(2):143156. 10.1016/S13618415(01)000366View ArticleGoogle Scholar
 Press WH, Flannery BP, Teukolsky SA, Vetterling WT: Numerical Recipes in C. 2nd edition. Cambridge University Press, Cambridge, UK; 1992.MATHGoogle Scholar
 Thévenaz P, Unser M: Optimization of mutual information for multiresolution image registration. IEEE Transactions on Image Processing 2000,9(12):20832099. 10.1109/83.887976View ArticleMATHGoogle Scholar
 The Insight Segmentation and Registration Toolkit http://www.itk.org/
 Studholme C, Hill DLG, Hawkes DJ: An overlap invariant entropy measure of 3D medical image alignment. Pattern Recognition 1999,32(1):7186. 10.1016/S00313203(98)000910View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.