# Rigid Registration of Renal Perfusion Images Using a Neurobiology-Based Visual Saliency Model

- Dwarikanath Mahapatra
^{1}Email author and - Ying Sun
^{1}

**2010**:195640

**DOI: **10.1155/2010/195640

© D. Mahapatra and Y. Sun. 2010

**Received: **19 January 2010

**Accepted: **6 July 2010

**Published: **21 July 2010

## Abstract

General mutual information- (MI-) based registration methods treat all voxels equally. But each voxel has a different utility depending upon the task. Because of its robustness to noise, low computation time, and agreement with human fixations, the Itti-Koch visual saliency model is used to determine voxel utility of renal perfusion data. The model is able to match identical regions in spite of intensity change due to its close adherence to the center-surround property of the visual cortex. Saliency value is used as a pixel's utility measure in an MI framework for rigid registration of renal perfusion data exhibiting rapid intensity change and noise. We simulated varying degrees of rotation and translation motion under different noise levels, and a novel optimization technique was used for fast and accurate recovery of registration parameters. We also registered real patient data having rotation and translation motion. Our results show that saliency information improves registration accuracy for perfusion images and the Itti-Koch model is a better indicator of visual saliency than scale-space maps.

## 1. Introduction

Image registration is the process of aligning two or more images which may be taken at different time instances, from different views or by different sensors (or modalities in medical imaging applications). The floating image(s) is (are) then registered to a reference image by estimating a transformation between them. Image registration plays a vital role in many applications such as video compression [1], video enhancement [2], scene representation [3], and medical image processing [4].

Medical image registration has acquired immense significance in automated or semiautomated medical image analysis, intervention planning, guidance, and assessment of disease progression or effects of treatment. Some of the applications have been in the areas of brain imaging [5], kidney (renal) perfusion images [6], and radiological images [7]. Over the years, rigid registration algorithms have used mutual information (MI) [8, 9], Fourier transforms [10–12], correlation-based methods [13–15] and attribute vectors [16]. For registering dynamic kidney perfusion images three approaches were tested in [17], namely, template matching, Fourier transforms, and cross correlation, and the Fourier transform-based approach was found to give the best performance. A method for correcting image misregistration due to organ motion in dynamic magnetic resonance (MR) images combines mutual correspondence between images with transform invariant features [18]. Other methods for registration of renal perfusion MR images are based on a combination of wavelet and Fourier transforms [6] and a contrast invariant similarity measure [19].

In dynamic contrast enhanced (DCE) MRI, a contrast agent (e.g., Gd-DTPA) is injected into the blood stream. The resulting images exhibit rapid intensity change in an organ of interest. Apart from intensity change, images from a single patient are characterized by noise and movement of the organ due to breathing or patient motion. Registering images with such rapid intensity changes is a challenge for conventional registration algorithms. Although previous works [6, 17–19] demonstrate good results in registering renal perfusion MR images, they fail to incorporate the contribution of the human visual system (HVS) in such tasks. The HVS is adept at distinguishing objects in noisy images, a challenge yet to be completely overcome by object recognition algorithms. Humans are also highly capable of matching objects and regions between a pair of images in spite of noise or intensity changes. We believe it is worthwhile to investigate whether a model of the HVS can be used to register images in the presence of intensity change. In this paper, we use a neurobiology-based HVS model for rigid registration of kidney MRI in an MI framework. As we shall, see later MI is a suitable framework to include the contribution of the HVS.

Most MI-based registration methods treat all voxels equally. But a voxel's utility or importance would vary depending upon the registration task at hand. For example, in renal perfusion MRI a voxel in the renal cortex has greater significance in registration than a voxel in the background even though they may have the same intensity. Luan et al. in [20] have defined a voxel's importance based on its saliency and used it in a quantitative-qualitative mutual information (QMI) measure for rigid registration of brain MR images. Saliency refers to the importance ascribed to a voxel by the HVS. Different computational models have been proposed to determine saliency maps of images [21, 22]. An important characteristic of the HVS is its ability to match the same landmark in images exhibiting intensity change (as in DCE images). An accurate model of the HVS should be able to imitate this property and assign similar importance (or utility) values to corresponding landmarks in a pair of images. The entropy-based saliency model used in [20], called scale-space maps, fails to achieve the desired objectives for DCE images.

Scale-space maps [21] calculate the entropy over different scales around a pixel's neighborhood and the maximum entropy at a particular scale is used to calculate the saliency value. When there is a change in intensity due to contrast enhancement the entropy (and hence saliency) value of a pixel also changes. As a result, the same landmark in two different images has different utility measures. But it is desirable that a landmark have the same utility value in different images. In contrast, the neurobiology based saliency model of [22] assigns the same importance to corresponding landmarks and has been shown to have a high correlation with human fixations [23]. Besides, it has advantages over scale-space maps in terms of robustness to noise and computational complexity. Therefore, we hypothesize that a neurobiological model of saliency would produce more accurate results than scale-space maps for rigid registration of kidney perfusion images. Saliency models have also been used for computer vision tasks like image retrieval [24] and image interpolation [25].

In this paper, we investigate the usefulness of a neurobiology-based saliency model for registering renal perfusion images. Our paper makes the following contributions. First, it investigates the effectiveness of a computational model of the HVS for image registration within the QMI framework proposed in [20]. Previously used saliency models are limited by their inaccurate correspondence with actual human fixations and sensitivity to noise. Our work is different from [20] in the use of saliency models. Second, we perform a detailed analysis of the effectiveness of different mutual information-based similarity measures, with and without using saliency information, for the purpose of registering renal perfusion images. This gives an idea of the effectiveness of different saliency methods. Third, we use a randomized optimization scheme which evaluates greater number of candidate solutions, which minimizes the possibility of being trapped in a local minimum and increases registration accuracy. The rest of the paper is organized as follows. In Section 2, we describe the neurobiology-based saliency model, theoretical foundations of MI-based registration and our optimization scheme. Sections 3 and 4, respectively, give details about our method and experimental results. Finally we conclude with Section 5.

## 2. Theory

### 2.1. Saliency Model

- (1)
The changing intensity of perfusion images assigns different entropy and hence saliency values to corresponding pixels in an image pair exhibiting intensity change. This is undesirable when matching contrast enhanced images.

- (2)
There is the inherent problem of choosing an appropriate scale. For every voxel, the neighborhood (scale) that maximizes the local entropy is chosen to be its optimal scale resulting in unnecessary computational cost.

- (3)
Presence of noise greatly affects the scale-space map which results in erroneous saliency values. Since local entropy gives a measure of the information content in a region, presence of noise can alter its saliency value.

- (4)
The scale-space saliency map does not truly determine what is salient to the human eye. An entropy-based approach takes into account distribution of intensity in a local neighborhood only. Thus the information derived is restricted to a small area in the vicinity of the pixel.

- (1)
An important aspect of the model is its center-surround principle which determines how different a pixel is from its surroundings. As long as a pixel has feature values different from its surroundings its saliency value is preserved, thus acting as a robust feature. This is better than the entropy model where the intensity distribution leads to different saliency values when intensity changes due to contrast enhancement.

- (2)
By representing the image in the form of a Gaussian pyramid, the need for determining the appropriate scale for every voxel does not arise.

- (3)
Inherent to the model is the process of lateral inhibition that greatly contributes to suppressing noise in the saliency map.

- (4)
The model, when used to identify salient regions in a scene, has high correlation with actual human fixations.

The model calculates a saliency map by considering intensity and edge orientation information from a given image. Saliency at a given location is determined primarily by the *contrast* between this location and its surroundings with respect to the image features. The image formed on the fovea of the eye is the central object on which a person is focusing his attention resulting in a clear and sharp image. Regions surrounding the central object have a less clearer representation on the retina. To simulate this biological mechanism, an image is represented as a Gaussian pyramid comprising of layers of subsampled and low-pass filtered images. The central representation of the image on the fovea is equivalent to the image at higher spatial scales, and the surrounding regions are obtained from the lower spatial scales. The contrast is thus the difference between the various feature maps at these scales.

*c*and , respectively. The contrast map is defined as

where denotes center-surround difference, the center is given by level and the surround is given by level , in the Gaussian pyramid. Thus, we have contrast maps for every feature. Although the original model uses three features, including color, intensity, and edge information, we use only intensity and edge information because our datasets were in grayscale. The edge information is obtained from the image by using oriented Gabor filters [29] at different orientation angles ( , , , and ). In total feature maps are obtained, for edge orientation and for intensity.

#### 2.1.1. Saliency Map in D

The gap between slices of the original volume is mm which does not provide sufficient information along the -axis to extend each step of the saliency map to D. Intensity maps can be obtained directly from the data but calculating orientation maps proves to be challenging as D oriented Gaussian filters are computationally intensive. Therefore, for each slice of the D volume, we calculate its D saliency map which is subsequently used for registration.

### 2.2. Rigid Registration

Rigid registration requires us to align a floating image (volume) with respect to a reference image (volume) by correcting any relative motion between them. For simplicity, we describe the registration framework in terms of D images but our experiments were for D volumes. Let be the floating image (volume for D data) which is to be registered to a reference image . For D volumes there are 6 degrees of freedom (i.e., translation and rotation along each of -, - and -axis) while D images have degrees of freedom. The similarity between two images is determined from the value of a similarity measure which depends upon the type of images being registered. The parameters for translation and rotation that give maximum value of the similarity measure are used to register the floating image.

To determine the effectiveness of the neurobiology model of saliency, we used it in a QMI-based cost function for rigid registration. This cost function combines saliency information (or utility measure) with the MI of the two images to evaluate the degree of similarity between them. A joint saliency (or joint utility) histogram, similar to a joint intensity histogram, is used to determine the cooccurrence of saliency values in the saliency maps of the images under consideration. We follow the QMI definition and formulation of [20].

#### 2.2.1. Quantitative-Qualitative Measure of Mutual Information

which is the relative entropy between the joint distribution, , and the product of marginal distributions and .

where the utility can be any nonnegative real number.

### 2.3. Saliency-Based Registration

Joint Utility

where the summation is over all pairs of pixels with intensity values
;
and
are the voxels under consideration. We use the *multiplication* operator to consider the joint occurrence of utility values. For example, to calculate the joint utility of intensity pair (128,58), we find all the pairs of points
such that all points in image
have intensity
and the corresponding points in image
has intensity
. The joint utility is determined by multiplying the saliency values for a pair of points and summing over all such pairs. A normalized saliency map is used so that the most salient regions in two images have an equal importance of
. However, the joint utility value can exceed
as it reflects the joint importance of intensity pairs and not just individual utility values.

### 2.4. Optimization

- (1)
The original image is subsampled to three coarser levels. indicates the original image; indicates a subsampling factor of , indicates a factor of , and indicates a subsampling factor of .

- (2)
At , we perform an exhaustive search individually for each DOF and the optimal parameters are used to transform the image. The search range is voxels for translation along -, -, -axis ( ) and degree for rotation about -, -, -axis ( ).

- (3)
The registration parameters are interpolated which act as starting points for . The DOFs are individually optimized in two passes: first, rotation parameters over a search range of degrees and then , , and with search ranges of 5, 5, and 2 voxels. The optimal parameters are used to transform the volume and a second pass with the same sequence of steps is performed. The volume is transformed only if the parameters from the second pass indicate a better match than the parameters from first pass

- (4)
- (5)
The parameters from are interpolated to and an exhaustive search is carried out for ( 3 degrees), ( 5 voxels) and ( 2 voxels).

- (6)
The final parameters are used to get the registerd image.

The above optimization scheme proves to be robust as we pick the DOF to be optimized at random and repeat the entire scheme.

#### 2.4.1. Results for Derivative-Based Optimizer

The Powell's optimization routine that we adopt is highly suitable for cost functions whose derivatives are not available and the computation cost is prohibitive. It works by evaluating candidate solutions in the parameter space over straight lines, that is, linear combinations of parameters. Such combinations require a bracketing of the minimum before the optimization can be started [34]. As a result, several necessary criterion estimations have to be performed which is inefficient when using a multiresolution strategy. Thévenaz et al. in [35] propose an optimization method based on the derivative of the similarity measure that makes better use of a multiresolution optimization setup.

where and are the saliency values of the reference and floating images. denotes the cooccurring intensity pairs and . The utility measure is treated as a constant although it is dependent upon the cooccurring intensity pairs of and . This is achieved by actually transforming the original saliency map of according to the transformation, , incurring a minor additional computational cost. Parzen windows is not used because the joint utility histogram is not a distribution of saliency values but the sum of the product of saliency values of cooccurring intensity pairs.

To compute the QMI value at different transformations we also calculate the second derivative of as its Hessian . We refer the reader to [35] for details regarding calculation of and derivative of the joint probability distribution, that is, in (17). Note that the utility is always treated as a constant, and as shown in (17), does not change the essence of the way derivatives of the cost functions are calculated.

A derivative-based cost function makes the method quite sensitive to the initial search parameters and their wrong choice may even lead to nonconvergence. Therefore, a multiresolution framework is used to get good candidate parameters from the first step. A level image pyramid is created with the fourth level denoting the coarsest resolution. The parameters from the coarsest level are used to find the optimal parameters at finer levels by using the derivative of mutual information. This results in a significant reduction of computation time as compared to Powell's method where greater number of parameters need to be evaluated.

Details of derivation of the different equations can be found in [35]. The optimization routine from the insight registration and segmentation toolkit (ITK) [36] was used. Each image was decomposed to resolutions (similar to the scheme using Powell method) and registered using , , and by Thévenaz's optimization framework. To calculate the joint utility measure, the saliency maps of ( ) and ( ) are calculated and for every parameter, is transformed to get the new map . and are used to calculate the joint utility measure at every step.

Although the computation time is significantly lower than Powell's method the registration results are sensitive to the initial conditions. If the optimal parameters determined from the coarsest image resolution is far away from the actual transformation parameters then it is highly unlikely that Thevenaz's scheme will converge at the right solution. This problem is particularly acute when no multiresolution strategy is used. In that case, Powell's method is markedly superior. In a multiresolution setup when the initial conditions are good, Thevenaz's method converges in less time as compared to Powell's method with significantly less number of evaluations, but similar accuracy. Thevenaz's method can stop at any time and simultaneously optimizes all parameters from the first criterion resulting in a reduction in the number of criterion evaluations.

A clear advantage of the Powell method is its robustness. This calls for the use of a derivative-based global optimization method using Powell's method in the coarsest stage. Subsequently, Thevenaz's method can be used in the finer stages for faster convergence. The registration accuracy using such an approach is consistently closer to the values reported in Table 2. Without using Powell's method in the coarsest stage, the registration error for many of the volume pairs is greater than using Powell's method.

## 3. Experiments

### 3.1. Subjects

The volumes were obtained from healthy volunteers ( women and men, years) and patients ( women and men, years) with renal insufficiency manifested by serum creatinine mg/dl ( mg/dl). Written informed consent was obtained from all subjects. All the datasets were used for testing. Note that every dataset comprised of kidneys. The results for each dataset are the average errors for tests on both kidneys.

### 3.2. MRI Acquisition Protocol

Dynamic MRI was performed on a T system (Avanto; Siemens, Erlangen, Germany) with a maximum slew rate of T/m/s, maximum gradient strength of mT/m, and a torso phased-array coil. D -weighted spoiled gradient-echo imaging was performed in the oblique coronal orientation to include the abdominal aorta and both kidneys. The following parameters were used: ms, ms, flip , , , Hz/voxel, volume acquisition s. The original 5-mm coronal partitions were interpolated to mm slices.

Five unenhanced acquisitions were performed during a single breath-hold. A -ml bolus of Gd-DTPA(Magnevist; Berlex laboratories, Wyne, NJ, USA) was then injected, followed by ml of saline, both at ml/s. Over min, D volumes were acquired using a variable sampling schedule: sets acquired at s intervals, followed by sets at intervals of s, followed by at s intervals, and ending with sets over one minute intervals. The first sets were attempted to be acquired within a single breath-hold. Before each subsequent acquisition, the patients were instructed to suspend respiration at end-expiration. Oxygen via nasal cannula was routinely offered to the patients before the exam to facilitate breath-holding. For image processing, all D volumes ( acquired before and after contrast agent injection) were evaluated.

### 3.3. Registration Procedure

Two volumes of interest (VOI), each encompassing a kidney were selected from each volume. We test the effectiveness of our algorithm by registering the entire VOI sequence of each patient to a reference VOI. Each kidney had a different reference VOI. For different cases, different pre- and postcontrast VOIs were chosen as reference. Saliency maps were calculated for each slice of a VOI and saliency information from these maps was used to define the utility measure of each voxel. For every reference-floating VOI pair, the floating VOI is transformed according to the scheme outlined in Section 2.4 and for each candidate transformation parameter, the QMI-based similarity measure (6) is calculated. The candidate transformation parameters that give the maximum value of QMI are used to get the final transformation. We evaluate the performance of our algorithm using the ground truth for registration provided by a clinical expert.

To check for the robustness and effectiveness of the proposed similarity measure we determined its characteristics with change in transformation parameters. For this purpose, rotation and translation motion was simulated on the datasets. In an attempt to recover the applied motion the value of the similarity measure at different candidate transformation parameters was calculated. The characteristics thus obtained gave an idea of the suitability of the similarity measure for registering DCE images. The robustness of different similarity measures was determined by first misaligning the images by different degrees of known translation and rotation. Three different similarity measures were used in the tests, namely, normalized mutual information ( ) [37], QMI in [20] ( ), and our proposed method ( ). NMI is a popular similarity measure used for registering multimodal images; that is, images of the same organ but from different modalities such as MR and CT, and its performance can help us gauge the effectiveness of our method.

## 4. Results

We present results for different experiments that show the importance of using saliency in registering DCE images of the kidney. datasets comprising of D volumes were used and each volume consists of slices. Manual registration parameters by experts were available for each dataset facilitating performance comparison. First, we present proof of the suitability of saliency for registering contrast enhanced images. Then we show properties of the different similarity measures with respect to registration. These sets of results are similar to those presented in [20]. They highlight the fact although was a good measure to register brain MR images, shows better performance than in registering renal perfusion images. This is reflected in the properties of the different similarity measures. Finally, we present registration results of real patient datasets and compare relative performance of different similarity measures with respect to manual registration parameters.

For simulated motion, registration was deemed to be accurate if .

### 4.1. Saliency Maps for Pre- and Postcontrast Enhanced Images

### 4.2. Registration Functions

A similarity measure for two images should have the following desirable properties: (a) it should be smooth and convex with respect to the transformation parameters; (b) the global optimum of the registration function should be close to the correct transformation that aligns two images perfectly; (c) the capture range should be as large as possible; and (d) the number of local maxima should remain at a minimum. We can determine the registration function of by calculating its value under different transformations.

It is to be kept in mind that the profile for the different similarity measures in Figure 3 is for . For the performance of and is comparable, that is, the maximum of the similarity measures is mostly at zero relative error. When , shows a superior performance demonstrating the efficacy of a neurobiology based saliency model. Similarly, for , performance of is comparable to the other two saliency measures but degrades once . The corresponding threshold for is . The accuracy (from (22)) in recovering the correct transformation was for , for , and for .

In most cases, was unable to detect the right transformation between a pair of pre- and postcontrast images. Figure 4(a) shows two maxima for at nonzero error, in addition to being noisy. Such characteristics are undesirable for registration. For although there are no multiple maxima, it is at nonzero relative error. It is observed that even though performs better than due to use of saliency, outperforms both of them.

The accuracy rate for registering DCE images was for , for , and for . The low registration accuracy of makes it imperative that we investigate the reason behind it. We shall do this with the help of an example.

We want to register the central patch in image Figure 5(a) similar to a region of interest, the values of which are highlighted in bold. The intensity values of Figure 5(c) only indicate contrast enhancement without any kind of motion. For an ideal registration, the central patch of Figure 5(a) should give maximum value of NMI (from [37]) for the central patch of Figure 5(c). The value in this case is . However, the maximum value is obtained for the image patch shown in bold in Figure 5(c) ( ), which corresponds to a displacement of one pixel to the left and one pixel down. Although there is no translation motion, the maximum value of is obtained for parameters corresponding to such motion. The intensity change in the image patch is quite similar to what we observe for DCE images of the kidney. Consequently, the maximum value is obtained at nonzero relative error and more than one maximum is observed for many cases. Thus, there are a significantly high number of misregistrations using which contributes to its high error rate.

From these observations, we infer that performs well when a particular intensity in the first image ( ) is mapped to a distinct intensity in the second image ( ). If two intensity values in are mapped to the same intensity value in or vice-versa then leads to poor matching. Due to contrast enhancement, it is very common to find more than one intensity mapped to a single intensity. Consequently, -based registration is prone to error which is reflected in the error measures.

### 4.3. Robustness of Registration

A robust registration algorithm should be able to recover the true transformation between two images even if the initial misalignment between them is very large. We evaluate the robustness of , , and under various amounts of initial misalignment between two kidney MR images. Four sets of tests were performed where the degree of initial misaligned rotation angles were randomly picked from four different rotation ranges, that is, , , , and degrees. Similarly, misalignment was simulated for translational motion in the , , and directions. The misalignment values varied between , and mm. For each misalignment range, we performed registrations between different pairs of images. Zero mean Gaussian noise of variance was added to the images.

The average translation error along the axes was ( ) mm for , ( ) mm for , and ( ) mm for . The average rotation errors were ( ) degrees for , ( ) degrees for and ( ) degrees for . The maximum errors for simulated motion was mm and for , mm and for , and mm and for .

Average Registration Error (in mm) | Registration Accuracy in % | |||||
---|---|---|---|---|---|---|

0 | (5.3,5.2,0.5) | (1.9,1.7,0.2) | (1.2,1.1,0.2) | 68.1 | 88.9 | 98.8 |

0.01 | (5.3,5.2,0.6) | (1.7,1.6,0.3) | (1.3,1.3,0.2) | 67.2 | 88.1 | 98.3 |

0.04 | (5.5,5.5,0.8) | (1.8,1.8,0.4) | (1.4,1.4,0.3) | 61.3 | 83.2 | 95.3 |

0.06 | (5.8,5.9,1.0) | (1.9,1.9,0.6) | (1.6,1.5,0.4) | 47.1 | 78.2 | 92.1 |

0.085 | (6.2,6.3,1.1) | (2.2,2.2,0.7) | (1.7,1.7,0.50) | 41.2 | 62.3 | 89.1 |

0.1 | (6.4,6.5,1.3) | (2.4,2.4,0.9) | (1.9,1.9,0.8) | 40.1 | 57.4 | 75.6 |

Average translation errors for rigid registration. is normalized mutual information. is the measure in [20] using scale-space maps. is our approach using the neurobiology-based saliency model. All values are in units of mm.

Dataset | |||
---|---|---|---|

Dataset1 | (4.8,4.3,0.5) | (2.0,1.7,0.3) | (1.2,1.3,0.2) |

Dataset2 | (5.1,5.7,0.4) | (1.3,1.4,0.4) | (1.2,1.2,0.2) |

Dataset3 | (5.0,4.7,0.6) | (1.7,1.7,0.3) | (1.3,1.2,0.3) |

Dataset4 | (5.2,5.0,0.6) | (1.5,1.6,0.4) | (1.3,1.2,0.2) |

Dataset5 | (4.7,4.8,0.7) | (1.7,1.7,0.4) | (1.2,1.3,0.2) |

Dataset6 | (5.1,4.9,0.5) | (1.52,1.4,0.3) | (1.1,1.0,0.2) |

Dataset7 | (5.2,5.9,0.4) | (1.4,1.5,0.2) | (1.3,1.4,0.1) |

Dataset8 | (6.5,6.1,0.4) | (1.7,1.6,0.2) | (1.2,1.0,0.1) |

Dataset9 | (4.9,4.2,0.5) | (1.7,1.5,0.3) | (1.2,1.1,0.1) |

Dataset10 | (5.4,5.4,0.5) | (1.4,1.3,0.3) | (1.3,1.2,0.1) |

Average Error | (5.2,5.1,0.5) | (1.6,1.5,0.3) | (1.2,1.2,0.2) |

### 4.4. Registration Accuracy for Real Patient Data

Average rotation errors for rigid registration. is normalized mutual information. is the measure in [20] using scale-space maps. is our approach using the neurobiology-based saliency model. All values are in units of degrees.

Dataset | |||
---|---|---|---|

Dataset1 | (0,0,2.75) | (0,0,0.56) | (0,0,0.43) |

Dataset2 | (0,0,2.71) | (0,0,0.50) | (0,0,0.44) |

Dataset3 | (0,0,2.67) | (0,0,0.55) | (0,0,0.41) |

Dataset4 | (0,0,2.66) | (0,0,0.53) | (0,0,0.39) |

Dataset5 | (0,0,2.72) | (0,0,0.52) | (0,0,0.40) |

Dataset6 | (0,0,4.81) | (0,0,0.53) | (0,0,0.32) |

Dataset7 | (0,0,4.23) | (0,0,0.65) | (0,0,0.44) |

Dataset8 | (0,0,3.98) | (0,0,0.75) | (0,0,0.29) |

Dataset9 | (0,0,3.12) | (0,0,0.54) | (0,0,0.31) |

Dataset10 | (0,0,3.33) | (0,0,0.58) | (0,0,0.24) |

Average Error | (0,0,3.31) | (0,0,0.57) | (0,0,0.36) |

For all datasets, shows a higher error measure compared to and . This can be attributed to the errors due to registering pre- and postcontrast image pairs. For , the maximum error was as high as mm for translation and degrees for rotation. Such a large error is not desirable, especially in medical image registration. For the maximum error was mm and degrees and the corresponding values for were mm and degrees, respectively. Moreover, the average error values for were higher than that of and . For translation along -axis, there was no significant difference between error values of different similarity measures as there is hardly any motion along the -axis. For rotation, we see that the error values for - and -axis are all because there is no rotation about these axes. Rotational motion is observed only about the -axis with the average error measures for much greater than those for and .

### 4.5. Computation Time

The difference between our method and the one proposed in [20] is the choice of saliency models. While we use the saliency model of [22], Luan et al. use the scale-space method of [21]. The source code for both the methods is available from the websites of the respective authors. For a kidney image of dimension , the average time taken to calculate the scale space map and identify salient regions was seconds while the neurobiology based saliency map could be computed in seconds on average. The difference in computing saliency maps is not significant and in registering a large number of images by our method, the saving in computation time is a few seconds.

Another difference from the method in [20] is an optimization scheme that incorporates a certain degree of randomness, thus reducing the chances of being trapped in a local minimum. This modification involves a marginally greater number of steps leading to a slight increase in computation time. While the average time taken by our method (inclusive of calculating saliency maps) is s for registering a pair of volumes, the corresponding average time for the method in [20] was s. By Thevenaz's method, the computation time reduces to s using and s for .

## 5. Discussion and Conclusion

In this work, we have investigated a neurobiological model of visual saliency and its use in registering perfusion images. The motivation was to determine whether the HVS's ability to recognize and match images in presence of noise and contrast enhancement can be simulated by a computational model. We register MR kidney perfusion volumes because they exhibit rapid intensity change and the acquired datasets also have a significant amount of noise.

The neurobiology-based saliency model is used because it produces very similar saliency maps for a pair of images with intensity change between them and facilitates registration in the face of contrast enhancement. We do a comparative study of the effectiveness of different saliency models for registering renal perfusion images and find the neurobiology-based model to be better than scale-space maps.

Several factors contribute to the superior performance of the neurobiological model of saliency. There are certain inherent faults in the scale space method used in [20] to get saliency information. First, the change in intensity assigns different saliency values to corresponding voxels in an image pair. This is undesirable for registration. Second, there is the problem of the choice of an appropriate scale (neighborhood) for calculating the local entropy of a voxel. The scale which gives the maximum value of entropy is chosen as the best scale, thus making the procedure computationally intensive. Third, since it is an entropy-based method, noise can greatly affect the entropy value leading to erroneous results. Fourth, a scale-space saliency map of an image does not truly represent what is salient to the human eye. In the neurobiology model, the center-surround approach assigns the same saliency value to corresponding pixels in an image pair and a Gaussian pyramidal representation of the image eliminates the need for determining the optimal scale for each voxel. An important part of the model is the process of lateral inhibition that suppresses noise giving rise to a saliency map that has distinctly salient regions. Lastly, the neurobiology model has been used to predict human fixations in a scene and there is high degree of correlation between the predicted and actual fixations.

Our optimization technique also contributes to improved performance of our method. Instead of following a set pattern for optimizing the DOFs, we introduce a degree of randomness in the entire optimization scheme based on Powell's method. A -level multiresolution approach was adopted where candidate transformation parameters for different DOFs were first calculated at the coarsest level and the solution propagated to finer levels. The optimization routine was repeated at the finer levels to get the final transformation. The sequence of DOFs optimized is random. By adopting this method the optimization scheme avoids being trapped in local optima and reachs the global optima, as determined by an exhaustive search, in most of the experiments. This approach also gives better performance than the optimization scheme outlined in [33]. We also use a derivative-based optimizer (Thévenaz's method) to determine the optimal registration parameters. If the starting point for the search is close to the actual optima ths method gives accurate results in significantly less time. An approach using Powell's method for search at the coarsest level followed by Thevenaz's method at finer levels gives registration accuracy close to what is obtained using Powell's method at all levels but in significantly lesser computation time.

Thus, we conclude that the neurobiological model of saliency gives a fairly accurate working of the HVS-based on bottom-up cues alone. It is robust to varying degrees of noise and simulated motion. The original model in [22] uses color, intensity, and edge orientation as features in determining the saliency map. But, for our work, we use only intensity and edge orientation information since our datasets are in gray scale. The findings of our experiments provide a basis for investigating how saliency can be used in more challenging registration tasks and also in other computer vision applications like tracking.

## Declarations

### Acknowledgments

The authors would like to thank Dr. Vivian S. Lee, Professor of Radiology, Physiology, and Neuroscience, Vice-Dean for Science, Senior Vice-President, and Chief Scientific Officer, New York University Medical Center, for providing the datasets. This work was supported by NUS Grant R-263-000-470-112.

## Authors’ Affiliations

## References

- Dufaux F, Konrad J:
**Efficient, robust, and fast global motion estimation for video coding.***IEEE Transactions on Image Processing*2000,**9**(3):497-501. 10.1109/83.826785View ArticleGoogle Scholar - Irani M, Peleg S:
**Motion analysis for image enhancement: resolution, occlusion, and transparency.***Journal of Visual Communication and Image Representation*1993,**4**(4):324-335. 10.1006/jvci.1993.1030View ArticleGoogle Scholar - Irani M, Anandan P, Hsu S:
**Mosaic based representations of video sequences and their applications.***Proceedings of the 5th International Conference on Computer Vision, June 1995*605-611.View ArticleGoogle Scholar - Hill DLG, Batchelor PG, Holden M, Hawkes DJ:
**Medical image registration.***Physics in Medicine and Biology*2001,**46**(3):R1-R45. 10.1088/0031-9155/46/3/201View ArticleGoogle Scholar - Lao Z, Shen D, Jawad A, Karacali B, Liu D, Melhem ER, Bryan RN, Davatzikos C:
**Automated segmentation of white matter lesions in 3D brain MR images, using multivariate pattern classification.***Proceedings of the 3rd IEEE International Symposium on Biomedical Imaging, April 2006*307-310.Google Scholar - Song T, Lee VS, Rusinek H, Kaur M, Laine AF:
**Automatic 4-D registration in dynamic mr renography based on over-complete dyadic wavelet and Fourier transforms.***Proceedings of the 8th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI '05), October 2005, Palm Springs, Calif, USA, Lecture Notes in Computer Science***3750:**205-213.Google Scholar - Hawkes DJ:
**Algorithms for radiological image registration and their clinical application.***Journal of Anatomy*1998,**193**(3):347-361. 10.1046/j.1469-7580.1998.19330347.xView ArticleGoogle Scholar - Viola P, Wells WM III:
**Alignment by maximization of mutual information.***International Journal of Computer Vision*1997,**24**(2):137-154. 10.1023/A:1007958904918View ArticleGoogle Scholar - Collignon A, Maes F, Delaere D, Vandermeulen D, Suetens P, Marchal G:
**Automated multimodality image registration based on information theory.***Proceedings of the International Conference on Information Processing in Medical Imaging (IPMI '95), 1995*263-274.Google Scholar - Keller Y, Averbuch A, Israeli M:
**Pseudopolar-based estimation of large translations, rotations, and scalings in images.***IEEE Transactions on Image Processing*2005,**14**(1):12-22.View ArticleMathSciNetGoogle Scholar - Wolberg G, Zokai S:
**Robust image registration using log-polar transform.***Proceedings of the International Conference on Image Processing (ICIP '00), September 2000, Vancouver, Canada*493-496.Google Scholar - Reddy BS, Chatterji BN:
**An FFT-based technique for translation, rotation, and scale-invariant image registration.***IEEE Transactions on Image Processing*1996,**5**(8):1266-1271. 10.1109/83.506761View ArticleGoogle Scholar - Lemieux L, Jagoe R, Fish DR, Kitchen ND, Thomas DGT:
**A patient-to-computed-tomography image registration method based on digitally reconstructed radiographs.***Medical Physics*1994,**21**(11):1749-1760. 10.1118/1.597276View ArticleGoogle Scholar - Keller Y, Averbuch A:
**A projection-based extension to phase correlation image alignment.***Signal Processing*2007,**87**(1):124-133. 10.1016/j.sigpro.2006.04.013View ArticleMATHGoogle Scholar - Wong A, Fieguth P:
**Fast phase-based registration of multimodal image data.***Signal Processing*2009,**89**(5):724-737. 10.1016/j.sigpro.2008.10.028View ArticleMATHGoogle Scholar - Shen D, Davatzikos C:
**HAMMER: hierarchical attribute matching mechanism for elastic registration.***IEEE Transactions on Medical Imaging*2002,**21**(11):1421-1439. 10.1109/TMI.2002.803111View ArticleGoogle Scholar - Giele ELW, De Priester JA, Blom JA, Den Boer JA, Van Engelshoven JMA, Hasman A, Geerlings M:
**Movement correction of the kidney in dynamic MRI scans using FFT phase difference movement detection.***Journal of Magnetic Resonance Imaging*2001,**14**(6):741-749. 10.1002/jmri.10020View ArticleGoogle Scholar - Gupta SN, Solaiyappan M, Beache GM, Arai AE, Foo TKF:
**Fast method for correcting image misregistration due to organ motion in time-series MRI data.***Magnetic Resonance in Medicine*2003,**49**(3):506-514. 10.1002/mrm.10394View ArticleGoogle Scholar - Sun Y, Jolly M-P, Moura JMF:
**Integrated registration of dynamic renal perfusion MR images.***Proceedings of the International Conference on Image Processing (ICIP '04), October 2004, Singapore*1923-1926.Google Scholar - Luan H, Qi F, Xue Z, Chen L, Shen D:
**Multimodality image registration by maximization of quantitative-qualitative measure of mutual information.***Pattern Recognition*2008,**41**(1):285-298. 10.1016/j.patcog.2007.04.002View ArticleMATHGoogle Scholar - Kadir T, Brady M:
**Saliency, scale and image description.***International Journal of Computer Vision*2001,**45**(2):83-105. 10.1023/A:1012460413855View ArticleMATHGoogle Scholar - Itti L, Koch C, Niebur E:
**A model of saliency-based visual attention for rapid scene analysis.***IEEE Transactions on Pattern Analysis and Machine Intelligence*1998,**20**(11):1254-1259. 10.1109/34.730558View ArticleGoogle Scholar - Itti L, Koch C:
**A saliency-based search mechanism for overt and covert shifts of visual attention.***Vision Research*2000,**40**(10–12):1489-1506.View ArticleGoogle Scholar - Feng S, Xu D, Yang X:
**Attention-driven salient edge(s) and region(s) extraction with application to CBIR.***Signal Processing*2010,**90**(1):1-15. 10.1016/j.sigpro.2009.05.017View ArticleMATHGoogle Scholar - Chen H-Y, Leou J-J:
**Saliency-directed image interpolation using particle swarm optimization.***Signal Processing*2009,**90**(5):1676-1692.View ArticleMATHGoogle Scholar - Bergholm F:
**Edge focussing.***IEEE Transactions on Pattern Analysis and Machine Intelligence*1987,**9**(6):726-741.View ArticleGoogle Scholar - Deriche R, Giraudon G:
**A computational approach for corner and vertex detection.***International Journal of Computer Vision*1993,**10**(2):101-124. 10.1007/BF01420733View ArticleGoogle Scholar - Renninger LW, Verghese P, Coughlan J:
**Where to look next? Eye movements reduce local uncertainty.***Journal of Vision*2007,**7**(3, article 6):1-17.View ArticleGoogle Scholar - Greenspan H, Belongie S, Goodman R, Perona P, Rakshit S, Anderson CH:
**Overcomplete steerable pyramid filters and rotation invariance.***Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 1994, Seattle, Wash, USA*222-228.Google Scholar - Cannon MW, Fullenkamp SC:
**A model for inhibitory lateral interaction effects in perceived contrast.***Vision Research*1996,**36**(8):1115-1125. 10.1016/0042-6989(95)00180-8View ArticleGoogle Scholar - Belis M, Guiasu S:
**A quantitative-qualitative measure of information in cybernetic systems.***IEEE Transactions on Information Theory*1968,**14:**593-594. 10.1109/TIT.1968.1054185View ArticleGoogle Scholar - Cover TM, Thomas JA:
*Elements of Information Theory*. Wiley, New York, NY, USA; 1991.View ArticleMATHGoogle Scholar - Jenkinson M, Smith S:
**A global optimisation method for robust affine registration of brain images.***Medical Image Analysis*2001,**5**(2):143-156. 10.1016/S1361-8415(01)00036-6View ArticleGoogle Scholar - Press WH, Flannery BP, Teukolsky SA, Vetterling WT:
*Numerical Recipes in C*. 2nd edition. Cambridge University Press, Cambridge, UK; 1992.MATHGoogle Scholar - Thévenaz P, Unser M:
**Optimization of mutual information for multiresolution image registration.***IEEE Transactions on Image Processing*2000,**9**(12):2083-2099. 10.1109/83.887976View ArticleMATHGoogle Scholar - The Insight Segmentation and Registration Toolkit http://www.itk.org/
- Studholme C, Hill DLG, Hawkes DJ:
**An overlap invariant entropy measure of 3D medical image alignment.***Pattern Recognition*1999,**32**(1):71-86. 10.1016/S0031-3203(98)00091-0View ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.