As commented before, the segmentation algorithm is based on the information available about the cardiac structures and tissues. This knowledge allows us to separate the region of interest from the rest of the image (such as bones of the rib cage) and to obtain the statistically derived thresholds which are needed in order to define the binary masks that will be used along the procedure. In the following subsection is explained how to calculate these thresholds, which depend on the distribution of the image histogram. An important feature of the proposed algorithm is that it uses the same type of thresholds for all the slices of the scan, not an ad hoc set for each image.
2.1 Pre-processing stage
In this stage, all the variables needed to perform the segmentation (statistical parameters, position of the spine, etc.) are determined, and a preliminary cleaning of the images (which basically selects the ROI) is performed.
2.1.1 Statistical parameters
Let us consider the volume which results of the CT scan as a scalar function f(x,y,z), where x = 1,…,N, y = 1,…,M, and z = 1,…,P, being N, M, and P the number of discrete elements (voxels) in each of the spatial dimensions. For each of the axial slices (i.e., for a fixed value k of the z coordinate) the following parameters are computed:
(1)
-
a)
Mean value of the intensity of the pixels, μ(k):
This value allows us to automatically separate the air and the background from the rest of the image. Indeed, the histogram of images which result from a standard CT scan always present five to seven well-delimited distributions of gray levels. The lowest intensity levels are related to the air and the highest to the bones. Consequently, the first (i.e., leftmost) and second peaks of the histogram correspond to the image background and the air in the lungs, respectively. This can be seen in Figure 1, where the image is thresholded with an intensity value laying in the valley which separates the two leftmost maxima from the remaining peaks (five, in this example). This value is the parameter μ(k).
(2)
-
b)
Mean intensity value of the pixels with an intensity level higher than μ(k) in the k th slice, μ sup(k):
where R
k
is the number of pixels (X
i
,Y
i
) in the k th slice which satisfy f(X
i
,Y
i
,k) > μ(k). This value is used when computing the global mean μglobal, which is the parameter that the algorithm requires in the segmentation stage in order to separate cardiac structures from the rest of the image. Moreover, it is also used for obtaining a binary mask which determines the position of the spine in each image. The gray level represented by the parameter μsup(k) belongs to the interval of intensities in which deoxygenated blood and bone marrow are included. Hence, masks obtained from this parameter would contain the outer layer of bones and tissues where oxygenated blood flows, whose intensity levels are higher than the value of μsup(k). However, as shown in Figure 2, this parameter is not a suitable threshold for segmenting cardiac structures, since the resulting mask does not include some tissues where deoxygenated blood flows, such as right atrium and right ventricle. Therefore, in order to accomplish our goal, a lower threshold is needed. More precisely, the required threshold has to be located in the interval of gray levels which corresponds to muscular tissues.
(3)
-
c)
Standard deviation of intensities of pixels in the k th slice with an intensity level higher than μ(k), σ(k):
The threshold μsup(k) + σ(k) allows us to obtain a binary mask which is used later in the segmentation stage in order to locate the descending aorta in all the slices of the volumetric scan. The resulting gray level is useful for separating the outer layer of the bones and the structures where oxygenated blood flows from the rest of the image, as shown in Figure 3.
(4)
-
d)
Mean of the parameter μ sup(k) minus the standard deviation of μ sup(k) (in the following global mean), μ global:
This is a global parameter, since it depends on the whole CT scan. It belongs to the interval of intensities which characterize muscular tissues. The reason for not using the mean of μsup(k) as a threshold is that this value is located on the edge of two distributions, one representing muscular tissues and the other representing deoxygenated blood, thus occasionally causing an overfitting to the structures of interest and consequently yielding the appearance of holes in the mask. In order to avoid this problem, a less restrictive threshold, i.e., μglobal, is used instead. Figure 4 shows the difference of thresholding with μglobal and μsup(k). Anyway, the resulting binary mask is yet inadequate for separating the structures of interest, since pulmonary veins and part of the bones are still present after the thresholding. This is addressed further in Section 2.2.
2.1.2 Position of the spine and the aorta
Once the statistical parameters are computed, a later step (which will be performed in the segmentation stage) is to remove the spine from the dataset. For doing so, we exploit the fact that both the spine and the descending aorta are present in all the slices of the (axial) scan. Firstly, P binary masks are obtained by thresholding each CT slice with its corresponding parameter μsup(k). If the area which is common to all these masks is computed (e.g., by means of a logical AND), the resulting pixels with a value of 1 certainly belong to either the spine or the aorta. More precisely, the common object with the highest number of pixels should belong to the spine. Nevertheless, it is possible that the pixels which belong to the spine are non-connected, and as a result, the object with the highest number of pixels actually represents the aorta, which would be falsely labeled as spine. In order to avoid such an error, a morphological dilation with a horizontal structuring element is previously performed, as shown in Figure 5b. The object of highest area after the dilation is used as the mask for selecting the spine in all the slices.
During the process of removing the spine, a portion of the descending aorta can also be incorrectly deleted (e.g., if it overlaps with the mask computed through the dilation of the common area). Therefore it becomes necessary to previously locate the aorta in order to restore it after the deletion procedure. With this purpose, we first compute the common area to all the superimposed masks which are obtained by thresholding each slice with its corresponding value μsup(k) + σ(k). As explained in the previous subsection, the threshold μsup(k) + σ(k) allows us to select the structures where oxygenated blood flows: aorta and left atrium and ventricle. Among these structures, the only one which is common to all slices is the descending aorta. As shown in Figure 5e, the resulting image exclusively contains pixels belonging to the aorta, which will be used to select and restore the latter in the segmentation stage. It should be noted that the logical AND (Figure 5e) would likely result in an empty mask in cases of severe scoliosis or tortuous aorta. In order to prevent such a problem, the algorithm includes a rigid registration stage, which finds the relative displacement (in pixels) between each binary mask and the following one. The P masks are then correctly aligned (i.e., shifted an integer number of pixels in the x- and/or y-axis) prior to the computation of the logical AND.
2.1.3 Automatic selection of the region of interest
This procedure determines, through the analysis of the columns of each image (considered as a matrix of size N × M), which regions have to be removed. For each image, M one-dimensional profiles (i.e.; M arrays of N elements, corresponding to the M columns of the slice) are obtained from the binary mask computed by thresholding with μ(k); as commented before, this parameter is suitable for separating the air and the background from the rest of the image, as shown in Figure 1. Additionally, all the objects, but that with the highest number of pixels, are removed after the thresholding, as shown in Figure 6c.Each profile (i.e., each column of the binary mask) consists in a number of ‘pulses’ of amplitude 1 (the number of pulses may vary from none to more than one), as shown in Figure 6d. These pulses represent the pixels with a value of 1 in the corresponding column of the binary mask. The proposed algorithm, which automatically selects the ROI depending on the number and width of the pulses which appear in each one-dimensional profile, is summarized in the following pseudo-code:
-
1.
DO initialize the mean width: w mean =0.1*N
-
2.
DO initialize the maximum width to be removed: w max =0.3*N
-
3.
FOR j =1:M
DO compute the j th one-dimensional profile
IF width w
j
of the leftmost pulse of the j th profile satisfies w
j
< wmax (i.e., the corresponding pixels belong to the rib cage)
THEN update the mean width wmean with the value w
j
and remove (i.e., set to 0) the upmost w
j
pixels with a value of 1 in the j th column of the binary mask
ELSE remove the upmost wmean pixels with a value of 1 in the j th column of the binary mask (i.e., remove only the pixels which belong to the rib cage, not the ones which belong to the heart)
-
4.
IF after the processing there is more than one object in the resulting mask, select the largest one and discard the rest.An example of the results obtained with this procedure can be seen in Figure 6.
2.2 Segmentation stage
In this stage, the segmentation itself is performed, using for this purpose the data collected through the previous subsection: the local and global statistical parameters (which will serve as thresholds), some pixels which belong to the spine and some pixels which belong to the descending aorta, and the particular region of interest which will be processed in each slice of the scan. In the following, the sequential steps of the proposed segmentation algorithm (whose flowchart is shown in Figure 7) are detailed.
2.2.1 Location of the aorta
This procedure consists of two tasks. Firstly, each one of the P slices of the scan is thresholded with its corresponding value μsup(k) + σ(k). Next, the objects which appear in the resulting binary mask are labeled; the object which contains the pixels extracted in the process described in Subsection 2.1.2 is the descending aorta in the k th image. Figure 8 illustrates this procedure. The reason for locating the aorta is twofold: it is the only object of interest in the slices with too much liver (i.e., slices in which the liver takes up a large area), as shown Figure 8d; additionally, since there exists the possibility of deleting part or even the totality of the aorta during the removal of the spine (as explained in Subsection 2.1.2), it becomes necessary to know the position of this artery in order to restore it at the end of the following procedure.
2.2.2 Deletion of the spine
This process consists of four steps. First, the P slices of the scan are thresholded with their corresponding values μsup(k), thus allowing us to isolate bones and tissues where oxygenated blood flows from the rest of the image. At this point, the objects of the resulting binary mask are labeled, and the spine is then selected as the object which contains the pixels obtained by the process described in Subsection 2.1.2. Next, the binary mask defined by the spine is dilated with a horizontal structuring element, and the outcome is used as a mask for separating cardiac structures from the posterior part of the chest wall (since the process described in Subsection 2.1.3 does not remove the lower part of the image). Finally, the descending aorta is added, and the object in which it is contained is selected as the resulting mask. Figure 9 illustrates this procedure.
2.2.3 Computation of the final mask
In order to segment the structures of interest (i.e., ventricles, atria, aorta, and vena cava vein), a threshold belonging to the interval of intensities which represent muscular tissues is needed. As explained in Subsection 2.1.1, this value is the parameter μglobal. Obviously, the use of μglobal as a threshold results in a binary mask which contains all the aforementioned structures, since the gray level of the cardiac muscles is lower than the gray level of the blood (either oxygenated or not). The bone marrow, which also has an intensity level higher than μglobal, does not appear in this final mask (shown in Figure 10b) because of the cleaning process previously performed (i.e., selection of the ROI and deletion of the spine).
2.2.4 Post-processing of the final mask
As can be appreciated in Figure 10b, the outcome of the previous step still shows slight imperfections. Therefore, a post-processing of the binary mask is required. First, objects with a size lower than the minimum area amin (chosen as amin ≤ min{N,M}, which has the value of 500 pixels for all CT scans considered in this paper) are removed; the size of the objects can be easily determined after a labeling and pixel counting procedure. Next, objects with a size similar to that of the structures of interest but which do not represent cardiac tissues are also removed. For doing so, we exploit the fact that these undesirable objects are local, i.e., they only appear in a narrow range of slices in the z axis. For the k th image, the algorithm computes the common area between the 2 × r +1 binary masks from k - r to k + r, r being the axial range (a value of 5% the number of slices P performs well in all experiments); these masks are the ones obtained through the application of the threshold μ(k). Unless the computed common area is greater than 30% of its actual area (i.e., 30% of the number of pixels with a value of 1 in the k th slice), an object is removed from the mask. Lastly, a morphological closing by reconstruction is carried out in order to fill the tiny holes that may appear in the final mask. Figure 10c,d,e,f displays the result of this post-processing stage.
2.3 Left heart segmentation
As already commented in Section 1, the analysis of the LV is of great importance, since this structure supplies the oxygenated blood to distant tissues through the aorta. This subsection illustrates how the left heart (i.e., left ventricle and left atrium) and the aorta can be extracted from the outcome of the methodology presented in subsections 2.1 and 2.2. After the pre-processing and segmentation stages, the resulting images show a quasi-bimodal histogram (i.e., a histogram which consists in two main clusters of gray levels, corresponding to oxygenated and non-oxygenated blood), as shown in Figure 11c. This feature allows us to precisely segment the left heart by means of the algorithm Isodata[33], which provides an optimal result with a low computational cost if the two clusters of gray levels are nearly Gaussian distributions (an assumption which is true for virtually all CT scans). The particularization of the Isodata algorithm to our scenario is summarized in the following pseudo-code:
-
1.
DO compute the initial threshold t 1 as the mean gray level of the segmented slice
-
2.
DO compute μ 1 and μ 2 as the mean gray level of each of the two classes obtained after thresholding the segmented slice with the threshold t 1
-
3.
DO compute the new threshold t 2 as the mean value of μ 1 and μ 2: t 2 = (μ 1 + μ 2)/2
-
4.
IF t 1 and t 2 differ less than 1%
THEN go to 5
ELSE t1 = t2, go to 2
-
5.
RETURN t 2Once the left heart is separated from the right heart, the resulting mask has to be post-processed as explained in Subsection 2.2.4 (i.e., small objects are removed, contours are smoothed, and holes are filled). The outcome of this procedure is shown in Figure 11.