Automatic image-based segmentation of the heart from CT scans

The segmentation of the heart is usually demanded in the clinical practice for computing functional parameters in patients, such as ejection fraction, cardiac output, peak ejection rate, or filling rate. Because of the time required, the manual delineation is typically limited to the left ventricle at the end-diastolic and end-systolic phases, which is insufficient for computing some of these parameters (e.g., peak ejection rate or filling rate). Common computer-aided (semi-)automated approaches for the segmentation task are computationally demanding, and an initialization step is frequently needed. This work is intended to address the aforementioned problems by providing an image-driven method for the accurate segmentation of the heart from computed tomography scans. The resulting algorithm is fast and fully automatic (even the region of interest is delimited without human intervention). The proposed methodology relies on image processing and analysis techniques (such as multi-thresholding based on statistical local and global parameters, mathematical morphology, and image filtering) and also on prior knowledge about the cardiac structures involved. Segmentation results are validated through the comparison with manually delineated ground truth, both qualitatively (no noticeable errors found after visual inspection) and quantitatively (mass overlapping over 90%).


Introduction
Cardiovascular disease is the leading direct or contributing cause of non-accidental deaths in the world [1]. As a consequence, the current research is particularly focused on its early diagnosis and therapy. An example of this effort is the delineation of the left ventricle (LV) of the heart, which turns out to be an important tool in the assessment of cardiac functional parameters such as ejection fraction, myocardium mass, or stroke volume. Fully automatic and reliable segmentation methods are desirable for the quantitative and massive analysis of these clinical parameters, because the traditional practice of manual delineation of the heart's ventricles is subjective, prone to errors, tedious, hardly reproducible, and very time-consuming -typically between 1 and 2 h per cardiac study, thus exhausting the radiologist's capacity and resources. Even though the most relevant medical information can be extracted from the left heart, a segmentation of the whole heart (and eventually also the great vessels) can be useful to extract a model of the organ before surgery or to facilitate diagnosis [2,3].
Compared with other imaging modalities (such as ultrasound and magnetic resonance imaging), cardiac computed tomography (CT) can provide detailed anatomical information about the heart chambers, great vessels, and coronary arteries [4,5]. Actually, CT is often preferred by diagnosticians since it provides more accurate anatomical information about the visualized structures, thanks to its higher signal-to-noise ratio and better spatial resolution. Although computed tomography was at one time almost absent in cardiovascular examinations, recent technological advances in X-ray tubes, detectors, and reconstruction algorithms, along with the use of retrospectively gated spiral scanning, have opened the doors to new diagnostic opportunities [6], enabling the non-invasive derivation of the aforementioned functional parameters [7,8]. Therefore, computed tomography becomes an important imaging modality for diagnosing cardiovascular diseases [9].
In the recent literature, one can find many papers which tackle the (semi-)automated segmentation of the heart from CT or MRI scans. These works deal with different strategies for approaching the segmentation task, including image-driven algorithms [10][11][12][13], probabilistic atlases [14,15], fuzzy clustering [16], deformable models [17][18][19], neural networks [20], active appearance models [21,22], anatomical-based landmarks [23], or level set and its variations [24,25]. A comprehensive review of techniques commonly used in cardiac image segmentation can be found in Kang et al. [5]. Nevertheless, many published methods have various disadvantages for routine clinical practice: they are either computationally demanding [6,14,16,22], potentially unstable for subjects with pathology [25,26], limited to the left ventricle [11,24,25,27], require additional images to be acquired [28,29], or need complex shape and/or gray-level appearance models constructed (or 'learned') from many manually segmented images -which is labor intensive and of limited use due to both anatomical and image contrast inconsistencies [14,22,[26][27][28]. Moreover, most prior work has been devoted to segmenting cardiac data given a reasonable initialization [25,30] or an accurate manual segmentation of a subset of the image data [31,32]. For full automation, and with the purpose of eliminating the inter-and intra-observer variability, initialization should also be automatic.
In this work, we propose an efficient image-driven method for the automatic segmentation of the heart from CT scans. The methodology relies on image processing techniques such as multi-thresholding based on statistical local and global features, mathematical morphology, or image filtering, but it also exploits the available prior knowledge about the cardiac structures involved. The development of such a segmentation system comprises two major tasks: initially, a pre-processing stage in which the region of interest (ROI) is delimited and the statistical parameters are computed; and next, the segmentation procedure itself, which makes use of the data obtained during the previous stage. Our fully automatic approach improves on the state of the art through both computation speed and simplicity of implementation.
The paper is organized as follows: in Section 2, the proposed methodology is presented; along subsections 2.1 and 2.2, the pre-processing and segmentation stages, respectively, are detailed; Subsection 2.3 deals with the extraction of the left ventricle from the outcome of the previous segmentation. Next, the validity of our approach is tested through the segmentation of different cardiac CT scans and the subsequent comparison of the results with manually delineated ground truth. Finally, the conclusions close the paper.

Proposed methodology
As commented before, the segmentation algorithm is based on the information available about the cardiac structures and tissues. This knowledge allows us to separate the region of interest from the rest of the image (such as bones of the rib cage) and to obtain the statistically derived thresholds which are needed in order to define the binary masks that will be used along the procedure. In the following subsection is explained how to calculate these thresholds, which depend on the distribution of the image histogram. An important feature of the proposed algorithm is that it uses the same type of thresholds for all the slices of the scan, not an ad hoc set for each image.

Pre-processing stage
In this stage, all the variables needed to perform the segmentation (statistical parameters, position of the spine, etc.) are determined, and a preliminary cleaning of the images (which basically selects the ROI) is performed.

Statistical parameters
Let us consider the volume which results of the CT scan as a scalar function f(x,y,z), where x =1,…,N, y =1,…,M, and z =1,…,P, being N, M, and P the number of discrete elements (voxels) in each of the spatial dimensions. For each of the axial slices (i.e., for a fixed value k of the z coordinate) the following parameters are computed: a) Mean value of the intensity of the pixels, μ(k): This value allows us to automatically separate the air and the background from the rest of the image. Indeed, the histogram of images which result from a standard CT scan always present five to seven well-delimited distributions of gray levels. The lowest intensity levels are related to the air and the highest to the bones. Consequently, the first (i.e., leftmost) and second peaks of the histogram correspond to the image background and the air in the lungs, respectively. This can be seen in Figure 1, where the image is thresholded with an intensity value laying in the valley which separates the two leftmost maxima from the remaining peaks (five, in this example). This value is the parameter μ(k). b) Mean intensity value of the pixels with an intensity level higher than μ(k) in the kth slice, μ sup (k): where R k is the number of pixels (X i ,Y i ) in the kth slice which satisfy f(X i ,Y i ,k) > μ(k). This value is used when computing the global mean μ global , which is the parameter that the algorithm requires in the segmentation stage in order to separate cardiac structures from the rest of the image. Moreover, it is also used for obtaining a binary mask which determines the position of the spine in each image. The gray level represented by the parameter μ sup (k) belongs to the interval of intensities in which deoxygenated blood and bone marrow are included. Hence, masks obtained from this parameter would contain the outer layer of bones and tissues where oxygenated blood flows, whose intensity levels are higher than the value of μ sup (k). However, as shown in Figure 2, this parameter is not a suitable threshold for segmenting cardiac structures, since the resulting mask does not include some tissues where deoxygenated blood flows, such as right atrium and right ventricle. Therefore, in order to accomplish our goal, a lower threshold is needed. More precisely, the required threshold has to be located in the interval of gray levels which corresponds to muscular tissues.
c) Standard deviation of intensities of pixels in the kth slice with an intensity level higher than μ(k), σ(k): The threshold μ sup (k) + σ(k) allows us to obtain a binary mask which is used later in the segmentation stage in order to locate the descending aorta in all the slices of the volumetric scan. The resulting gray level is useful for separating the outer layer of the bones and the structures where oxygenated blood flows from the rest of the image, as shown in Figure 3. d) Mean of the parameter μ sup (k) minus the standard deviation of μ sup (k) (in the following global mean), μ global : This is a global parameter, since it depends on the whole CT scan. It belongs to the interval of intensities which characterize muscular tissues. The reason for not using the mean of μ sup (k) as a threshold is that this value is located on the edge of two distributions, one representing muscular tissues and the other representing deoxygenated blood, thus occasionally causing an overfitting to the  structures of interest and consequently yielding the appearance of holes in the mask. In order to avoid this problem, a less restrictive threshold, i.e., μ global , is used instead. Figure 4 shows the difference of thresholding with μ global and μ sup (k). Anyway, the resulting binary mask is yet inadequate for separating the structures of interest, since pulmonary veins and part of the bones are still present after the thresholding. This is addressed further in Section 2.2.

Position of the spine and the aorta
Once the statistical parameters are computed, a later step (which will be performed in the segmentation stage) is to remove the spine from the dataset. For doing so, we exploit the fact that both the spine and the descending aorta are present in all the slices of the (axial) scan. Firstly, P binary masks are obtained by thresholding each CT slice with its corresponding parameter μ sup (k). If the area which is common to all these masks is computed (e.g., by means of a logical AND), the resulting pixels with a value of 1 certainly belong to either the spine or the aorta. More precisely, the common object with the highest number of pixels should belong to the spine. Nevertheless, it is possible that the pixels which belong to the spine are nonconnected, and as a result, the object with the highest number of pixels actually represents the aorta, which would be falsely labeled as spine. In order to avoid such an error, a morphological dilation with a horizontal structuring element is previously performed, as shown in Figure 5b. The object of highest area after the dilation is used as the mask for selecting the spine in all the slices.
During the process of removing the spine, a portion of the descending aorta can also be incorrectly deleted (e.g., if it overlaps with the mask computed through  the dilation of the common area). Therefore it becomes necessary to previously locate the aorta in order to restore it after the deletion procedure. With this purpose, we first compute the common area to all the superimposed masks which are obtained by thresholding each slice with its corresponding value μ sup (k) + σ(k). As explained in the previous subsection, the threshold μ sup (k) + σ(k) allows us to select the structures where oxygenated blood flows: aorta and left atrium and ventricle. Among these structures, the only one which is common to all slices is the descending aorta. As shown in Figure 5e, the resulting image exclusively contains pixels belonging to the aorta, which will be used to select and restore the latter in the segmentation stage. It should be noted that the logical AND (Figure 5e) would likely result in an empty mask in cases of severe scoliosis or tortuous aorta. In order to prevent such a problem, the algorithm includes a rigid registration stage, which finds the relative displacement (in pixels) between each binary mask and the following one. The P masks are then correctly  aligned (i.e., shifted an integer number of pixels in the xand/or y-axis) prior to the computation of the logical AND.

Automatic selection of the region of interest
This procedure determines, through the analysis of the columns of each image (considered as a matrix of size N × M), which regions have to be removed. For each image, M one-dimensional profiles (i.e.; M arrays of N elements, corresponding to the M columns of the slice) are obtained from the binary mask computed by thresholding with μ(k); as commented before, this parameter is suitable for separating the air and the background from the rest of the image, as shown in Figure 1. Additionally, all the objects, but that with the highest number of pixels, are removed after the thresholding, as shown in Figure 6c.
Each profile (i.e., each column of the binary mask) consists in a number of 'pulses' of amplitude 1 (the number of pulses may vary from none to more than one), as shown in Figure 6d. These pulses represent the pixels with a value of 1 in the corresponding column of the binary mask. The proposed algorithm, which automatically selects the ROI depending on the number and width of the pulses which appear in each one-dimensional profile, is summarized in the following pseudo-code: 1. DO initialize the mean width: w mean =0.1*N 2. DO initialize the maximum width to be removed: w max =0.3*N 3. FOR j =1:M DO compute the jth one-dimensional profile IF width w j of the leftmost pulse of the jth profile satisfies w j < w max (i.e., the corresponding pixels belong to the rib cage) THEN update the mean width w mean with the value w j and remove (i.e., set to 0) the upmost w j pixels with a value of 1 in the jth column of the binary mask ELSE remove the upmost w mean pixels with a value of 1 in the jth column of the binary mask (i.e., remove only the pixels which belong to the rib cage, not the ones which belong to the heart) 4. IF after the processing there is more than one object in the resulting mask, select the largest one and discard the rest.
An example of the results obtained with this procedure can be seen in Figure 6.

Segmentation stage
In this stage, the segmentation itself is performed, using for this purpose the data collected through the previous subsection: the local and global statistical parameters (which will serve as thresholds), some pixels which belong to the spine and some pixels which belong to the descending aorta, and the particular region of interest which will be processed in each slice of the scan. In the following, the sequential steps of the proposed segmentation algorithm (whose flowchart is shown in Figure 7) are detailed.

Location of the aorta
This procedure consists of two tasks. Firstly, each one of the P slices of the scan is thresholded with its corresponding value μ sup (k) + σ(k). Next, the objects which appear in the resulting binary mask are labeled; the object which contains the pixels extracted in the process described in Subsection 2.1.2 is the descending aorta in the kth image. Figure 8 illustrates this procedure. The reason for locating the aorta is twofold: it is the only object of interest in the slices with too much liver (i.e., slices in which the liver takes up a large area), as shown Figure 8d; additionally, since there exists the possibility of deleting part or even the totality of the aorta during the removal of the spine (as explained in Subsection 2.1.2), it becomes necessary to know the position of this artery in order to restore it at the end of the following procedure.

Deletion of the spine
This process consists of four steps. First, the P slices of the scan are thresholded with their corresponding values μ sup (k), thus allowing us to isolate bones and tissues where oxygenated blood flows from the rest of the image. At this point, the objects of the resulting binary mask are labeled, and the spine is then selected as the object which contains the pixels obtained by the process described in Subsection 2.1.2. Next, the binary mask defined by the spine is dilated with a horizontal structuring element, and the outcome is used as a mask for separating cardiac structures from the posterior part of the chest wall (since the process described in Subsection 2.1.3 does not remove the lower part of the image). Finally, the descending aorta is added, and the object in which it is contained is selected as the resulting mask. Figure 9 illustrates this procedure.

Computation of the final mask
In order to segment the structures of interest (i.e., ventricles, atria, aorta, and vena cava vein), a threshold belonging to the interval of intensities which represent muscular tissues is needed. As explained in Subsection 2.1.1, this value is the parameter μ global . Obviously, the use of μ global as a threshold results in a binary mask which contains all the aforementioned structures, since the gray level of the cardiac muscles is lower than the gray level of the blood (either oxygenated or not). The bone marrow, which also has an intensity level higher than μ global , does not appear in this final mask (shown in Figure 10b) because of the cleaning process previously performed (i.e., selection of the ROI and deletion of the spine).

Post-processing of the final mask
As can be appreciated in Figure 10b, the outcome of the previous step still shows slight imperfections. Therefore, a post-processing of the binary mask is required. First, objects with a size lower than the minimum area a min (chosen as a min ≤ min{N,M}, which has the value of 500 pixels for all CT scans considered in this paper) are removed; the size of the objects can be easily determined after a labeling and pixel counting procedure. Next, objects with a size similar to that of the structures of interest but which do not represent cardiac tissues are also removed. For doing so, we exploit the fact that these undesirable objects are local, i.e., they only appear in a narrow range of slices in the z axis. For the kth image, the algorithm computes the common area between the 2 × r +1 binary masks from k − r to k + r, r being the axial range (a value of 5% the number of slices P performs well in all experiments); these masks are the ones obtained through the application of the threshold μ(k). Unless the computed common area is greater than 30% of its actual area (i.e., 30% of the number of pixels with a value of 1 in the kth slice), an object is removed from the mask. Lastly, a morphological closing by reconstruction is carried out in order to fill the tiny holes that may appear in the final mask. Figure 10c,d,e,f displays the result of this post-processing stage.

Left heart segmentation
As already commented in Section 1, the analysis of the LV is of great importance, since this structure supplies the oxygenated blood to distant tissues through the aorta. This subsection illustrates how the left heart (i.e., left ventricle and left atrium) and the aorta can be extracted from the outcome of the methodology presented in subsections 2.1 and 2.2. After the pre-processing and segmentation stages, the resulting images show a quasi-bimodal histogram (i.e., a histogram which consists in two main clusters of gray levels, corresponding to oxygenated and non-oxygenated blood), as shown in Figure 11c. This feature allows us to precisely segment the left heart by means of the algorithm Isodata [33], which provides an optimal result with a low computational cost if the two clusters of gray levels are nearly Gaussian distributions (an assumption which is true for virtually all CT scans). The particularization of the Isodata algorithm to our scenario is summarized in the following pseudo-code: 1. DO compute the initial threshold t 1 as the mean gray level of the segmented slice 2. DO compute μ 1 and μ 2 as the mean gray level of each of the two classes obtained after thresholding the segmented slice with the threshold t 1  3. DO compute the new threshold t 2 as the mean value of μ 1 and μ 2 : t 2 = (μ 1 + μ 2 )/2 4. IF t 1 and t 2 differ less than 1% THEN go to 5 ELSE t 1 = t 2 , go to 2 5. RETURN t 2 Once the left heart is separated from the right heart, the resulting mask has to be post-processed as explained in Subsection 2.2.4 (i.e., small objects are removed, contours are smoothed, and holes are filled). The outcome of this procedure is shown in Figure 11.

Results
Following the methodology described above, the segmentation algorithm introduced in this paper was applied to 32 clinical exams from randomly selected adult patients (source: Hospital Universitario Virgen de la Arrixaca -Murcia, Spain). The datasets were acquired during multiple breath holds as a stack of 2D + time grayscale axial slices, using two different CT scanners (Siemens Sensation 64 and Toshiba Aquilion). The imaging protocols are heterogeneous with diverse capture ranges and resolutions. A volume may contain 75 to 190 slices, while the size of each slice is the same with 512 × 512 pixels. The resolution inside a slice is isotropic and varies from 0.488 to 0.781 mm for different volumes (therefore, the FOV varies from 250 × 250 mm to 400 × 400 mm). The slice thickness (i.e., the distance between neighboring slices) is larger than the in-slice resolution and varies from 0.75 to 3 mm for different volumes. All the data is in DICOM 3.0 format. The experiments were carried out on a PC with Intel Core 2 Duo (2 × 2.4 GHz), 4 GByte of RAM, and the computations were performed under MATLAB 7.6 (R2008a). The mean running time for fully segmenting the cardiac structures varies from 23.1 s (512 × 512 × 75 voxels) to 110.9 s (512 × 512 × 190 voxels). The mean running time for segmenting only the left heart varies from 5.6 s (512 × 512 × 75 voxels) to 25.1 s (512 × 512 × 190 voxels). It should be noted that all these times could be significantly improved through an optimized implementation of the algorithms in C/C++.
The resulting contours were visually inspected by experienced cardiologists from Hospital Universitario Virgen de la Arrixaca (in the following, HUVA). According to their evaluation, our automatic approach generated acceptable results to clinicians. Noticeable errors were not found. Figure 12 presents some segmentation outputs from our method. More precisely, the outcome of the segmentation of the left heart is shown: left atrium (LA), left ventricle (LV), aorta (Ao), and descending aorta (DAo). Three-dimensional reconstructions of the full heart and the left heart, obtained from the corresponding segmentation (as explained in sections 2.2 and 2.3, respectively), are also displayed.
A quantitative validation was also performed. In this sense, a typically used performance metric is the correlation ratio (please refer to [34] for its mathematical definition), which is equivalent to a measure of mass overlapping between the segmentation results and the ground truth. In our case, the ground truth consists in a collection of contours manually delineated by an expert from HUVA. The mean correlation ratio (CR) was 94.42%, where a value of 100% means a perfect match. This value descends down to 87.64% if we consider the whole CT scan, i.e., if we include the slices with too much liver (such as e.g., Figure 12f, in which the expert did not delineate the cardiac structure labeled as LV, but only the descending aorta). The maximum computed CR value was 99.81%, and the minimum value was 46.95% (the latter corresponding to a slice in which the liver was present). Thus, the correlation ratio reveals a good agreement between the automatic and the manual segmentations. Another similarity measure which is broadly used when dealing with contours of segmented objects is the maximal surface distance (refer to [35] for a mathematical definition). The mean value of this measure was 2.36 mm (with a minimum of 1.11 mm and a maximum of 6.12 mm), where 0 mm would mean a perfect match of the compared contours.
Finally, an assessment of the left heart's volume-time curves was carried out. The temporal variation of the volume of the left heart (left atrium and left ventricle) was obtained for both the output of our method and the manually delineated ground truth. As can be appreciated in the example shown in Figure 13, the estimated volumes were very close to the ground truth with a mean error of 1.22% and a standard deviation of 0.68%. It should be noted that in all cases, the ground truth volumes were greater of equal than the computed volumes due to the fact that the output of our method was tightly adjusted to the boundary of the cardiac structures, while the contours delineated by the expert followed more loosely their overall shape.

Conclusions
We have developed a comprehensive image-driven segmentation methodology to segment the cardiac structures (or only the left heart) from CT scans by using a processing pipeline of multi-thresholding, image cleaning, mathematical morphology, and image filtering techniques. The algorithm we propose is simple; hence, it is easy to implement and validate. All the contours are delineated automatically, without any initialization or user interaction. Testing results on the data randomly selected from clinical exams demonstrated that our approach can be computed significantly faster than other automated techniques (especially if compared with model-based approaches). This makes it feasible to conveniently calculate online the left heart's volume for all the imaged cardiac phases (not only end-diastolic and end-systolic), which in turn enables the computation of additional quantitative clinical parameters such as peak ejection rate and filling rate. Moreover, this allows for the automatic identification of the imaged time points of the end-diastole and end-systole (which correspond to the maximum and minimum left heart's volume among all time points).
The complete cardiac segmentation methodology performed well on the validation set of 32 clinical datasets acquired on two different CT scanners from two manufacturers. Its accuracy is comparable to other approaches recently published. Additionally, visual inspection by experts showed that the proposed algorithm is overall robust and succeeds in segmenting the heart up to minor local corrections.
A limitation of our method is that it only provides a segmentation of the left heart's blood pool volume (i.e., endocardium). While this is sufficient for computing most of the common clinical quantitative parameters for cardiac function, a segmentation of the left heart's epicardium would provide additional clinical information.
Further directions of our research include the porting of the presented methodology to other modalities, such as cardiac cine magnetic resonance imaging (cMRI).