Retinal vessel segmentation with constrained-based nonnegative matrix factorization and 3D modified attention U-Net

Due to the complex morphology and characteristic of retinal vessels, it remains challenging for most of the existing algorithms to accurately detect them. This paper proposes a supervised retinal vessels extraction scheme using constrained-based nonnegative matrix factorization (NMF) and three dimensional (3D) modified attention U-Net architecture. The proposed method detects the retinal vessels by three major steps. First, we perform Gaussian filter and gamma correction on the green channel of retinal images to suppress background noise and adjust the contrast of images. Then, the study develops a new within-class and between-class constrained NMF algorithm to extract neighborhood feature information of every pixel and reduce feature data dimension. By using these constraints, the method can effectively gather similar features within-class and discriminate features between-class to improve feature description ability for each pixel. Next, this study formulates segmentation task as a classification problem and solves it with a more contributing 3D modified attention U-Net as a two-label classifier for reducing computational cost. This proposed network contains an upsampling to raise image resolution before encoding and revert image to its original size with a downsampling after three max-pooling layers. Besides, the attention gate (AG) set in these layers contributes to more accurate segmentation by maintaining details while suppressing noises. Finally, the experimental results on three publicly available datasets DRIVE, STARE, and HRF demonstrate better performance than most existing methods.

(2021) 2021: 6 Page 2 of 21 competent. Therefore, the automatic detection of blood vessels instead of using only manual depiction is the most critical step for computer-aided diagnosis systems. In this paper, a hierarchical classification framework for retinal vessel extraction is developed by using within-class and between-class constraint nonnegative matrix factorization (NMF) and three-dimensional (3D) modified attention U-Net architecture. The proposed framework performs on full green channel (G-channel) images directly and contains three major steps: (i) before feature extraction and classifier training, the input retinal images are pre-processed by Gaussian filtering, gamma correction, and region processing, respectively. This step is to reduce noise and outlier and adjust image contrast for better detection process of retinal vessel; (ii) this study considers the spatial relationship for each pixel by generating a vector ranking its neighboring 9 × 9 pixels. By placing each element of this vector as the row and the numbers of pixels as the column, thus, a nonnegative data matrix is formed. Then, we incorporate within-class and between-class constraints into the standard NMF objective function to obtain the nonnegative low-dimensional representation of the neighborhood information of each pixel. The within-class and between-class constraints are applied respectively into objective function of classical NMF to better discriminate features between different classes. This could be achieved by assimilating same classes eigenvectors with within-class constraint and differentiating different classes' eigenvectors by between-class constraint. By applying NMF, the coefficient matrix at lower dimension with 20 channels containing meaningful neighboring information is ready for final retinal vessel segmentation using network; (iii) we present a modified attention U-Net structure aim more precisely on extracting vessels by limited computation. This proposed network model is a symmetric U-shaped structure with attention mechanism so that the contraction path and expansion path can highlight salient feature useful for segmentation task. Unlike the conventional U-Net [1] that a large number of parameters would be trained in several input feature channels, we design a modified U-shape structure aim more precisely on extracting vessels. More specifically, we add an upsampling layer before the U-Net structure and encode information with only three max-pooling layers based on the feature maps with higher resolution obtained from upsampling layer. Similarly, three upsampling layers in the decoder path following by one max-pooling layer is built symmetrically to achieve end-to-end classifier. At the same time, an attention gate (AG) at each layer is set to record and convey detail information to decode path so that a more accurate identification of all vessels would be achieved in general. Being quantitatively and qualitatively verified on three public datasets DRIVE, STARE, and HRF, the proposed approach achieves better performance over other related algorithms.
The rest of the paper is organized as follows: Section 2 introduces some related works. Section 3 presents details of the proposed framework. All experiments and corresponding analyses are displayed in Section 4. At last, Section 5 outlines the concluding remarks and future research directions.

Related works
Existing segmentation approaches of retinal vessels could be divided into two categories: supervised and unsupervised by the use of manual labeled ground truth or not.
Unsupervised algorithms are designed according to inherent features of the retinal vessels without relying on artificial labeled images. Recent proposed unsupervised approaches can be roughly divided into matching filter methods [2], vascular tracing methods [3], level set methods [4], model-based methods [5], hierarchical image matting models [6], etc. Generally, although unsupervised algorithms improve segmentation performance, thin vessels which affect the whole performance considerably is difficult to be detected [7]. Supervised algorithms require samples of vessels and non-vessels pixels from training databases with help of ophthalmologists to classify pixels for vessel detection. The algorithm usually uses extracted feature vector to train the classifier and to identify whether pixels belong to vascular or non-vascular. For example, Zhu et al. [8] designed a multi-dimensional discriminative feature vector extracting local features for vessel detection. Other supervised segmentation approaches include using Gaussian mixture model (GMM) [5], support vector machine (SVM) [9], random forest [10], various clustering strategies [11], etc. Supervised methods rely on hand-design feature extraction schemes predefined with prior knowledge. Features must be carefully defined in advance before entering the classifier, while features needed to be redesigned as dataset changes. Some other algorithms combined several types of features into one feature vector, while dimensionality problem may emerge [12]. Soares et al. [13] proposed a feature-based Bayesian extractor that build a 7-D feature vector for every pixel by Gabor wavelet transform. Lupascu et al. [14] adopted another 41-D feature vector for classification. Therefore, NMF [15], a linear dimensionality reduction technique commonly used for extracting basic and latent features from high-dimensional data matrices, is wildly adopted. However, latent semantic structure within data set may not be discovered well by the basis vectors in classical NMF while high-dimensional data are represented by low-dimensional vectors [16]. In addition, since features are extracted from similar images, some inherent relations should have existed in these features whereas sometimes failed to. To overcome these problems, some local-based feature representation NMF algorithms by integrating sparseness constraints and graph constraints were presented [17,18].
Recently, deep learning-based schemes have shown enormous success on pixel-wise classification problems due to its good performance in feature learning [19]. Carefully designed convolution neural network (CNN) could well serve instead of manual selection of features on vessel detection task. For example, Szkulmowski et al. [20] trained a CNN for vessel detection using augmented retinal vessel data. Soomro et al. [7] proposed a strided-CNN model that is very effective for thin vessel detection. This model is an encoder and decoder architecture where the pooling layers are replaced with strided convolutional layers. Another deeply supervised neural network with short connections to transfer semantic information between side-output layers [21]. In [22], Guo et al. formulated retinal vessel extraction task as a classification problem and solve it using CNN as a two labels classifier. Although CNN-based architectures can automatically learn features by convolution layers and pooling operations without prior knowledge, one of the main drawbacks of these methods is large number of training data required. Recently, a symmetric encoder and decoder structure U-Net was introduced and was approved to segmentation tasks with a small amount of data [1]. Wang et al. [23] introduced a modified U-Net architecture to capture more semantic information of fundus images by designing two encoders: spatial path and context path. Bhatkalkar et al. [24] integrated attention module in skip-connections between encoders and decoders of U-Net to highlight salient features. Recently, attention block is widely applied to emphasize targets and reduce the

Methodology
The entire process of the proposed approach for extracting retinal vessels from fundus image consists of three main phases: (i) pre-processing of fundus images, (ii) reduce dimension using constrained NMF, and (iii) segmentation via 3D modified attention U-Net. Figure 1 shows the block diagram of this proposed approach.

Image preprocessing
The main purpose of image pre-processing is to suppress background noise in images through Gaussian filter, and to equalize illumination of the optic disc and the fovea via Gamma correction. At last, the vascular features are emphasized by using region processing operator. In this paper, the G-channel of the retinal image is applied since it reflects the highest contrast, as shown in Fig. 2.
The first stage is to reduce background noise with Gaussian filter, which is a highly effective measure dealing with random noise. The filter function of this filter is written as follows where σ denotes the standard deviation. The value for σ is about 0.8 in this paper.
In the second stage, the gamma correction is used to adjust the contrast of images and to enhance local details. Besides, it can reduce the impact of local shadow and light variance of images. According to [27], the formula based on gamma correction is defined by In the experiment section, we will discuss how to set the best value of γ , From Fig. 2, we can observe that γ correction method provides high contrast image.
Region processing is the last stage of pre-processing, the gamma corrected images are converted, and the background around the retina is replaced with a density level of the same region in its mask image. The label image is binary image, where vessel area is 1 and non-vessel area is 0. But the corrected G-channel image showed vessel areas closing to 0, non-vessel area approaching to 1, and the mask area equals 0. Therefore, we reversed the image first, so that vessel area approaches to 1, non-vessel area approaches to 0, similar to the representation of label image. Mask area showing 0 is returned to zero according to the mask image.

Within-class and between-class constraint NMF
1) Image coding: This algorithm would first extract features for every pixel by its surrounding information. This algorithm constructs a 9 × 9 window with the observed pixels as center, the nearest 80 pixels having its G-channel density value collected. Figure 3 displays the block diagram of the proposed encoding method. This algorithm uses a column vector of 81 (including itself ) as the original feature vector of every pixel. After all the original images (including training images and test images) are encoded into one matrix X, the original feature vectors are extracted from every pixel of all images and then all pixels are spliced to a matrix X with size of m × n by columns. In matrix X, one column denotes neighborhood information of a pixel. 2) Proposed constraint-based NMF: Given a data matrix X =[ x ij ] =[ x 1 , x 2 , · · · , x n ] ∈ R m×n , the standard NMF aims to find two nonnegative matrices 1 , v 2 , · · · , v n ] ∈ R t×n to approximate the given matrix X using The objective function of classical NMF can be formulated by where || · || F denotes Frobenius norm (F-norm). Lee et al. [28] proposed that local minimum can be found by using the following multiplicative updates rules.
In this paper, we proposed a new objective function to properly obtain the optimal solution. Let r and c represent row and column of an image I, the total number of images in its dataset and training set is N andT, respectively. Let the feature matrix of image I be X ∈ R m×n , where m = w × w and n = r × c × N , x i (i = 1, ..., n) represents the feature of the ith pixel. The feature of each pixel is an m-dimensional vector, which is achieved by using a square window at size of w × w (w = 9 in this paper) so that m is the total number of neighboring pixels around the current pixel including itself. Thus, the spatial relationship between neighboring pixels is taken into account.
The coefficient matrix is given as U ∈ R m×t , and the inner dimension t is set to 20 in this paper. The new feature matrix after reducing dimension is defined as V ∈ R t×n . In this case, the proposed objective function consists of the three terms could be given as follows where α and β adjust the contributions of two constrained terms. In this model, we randomly selectT images from dataset as training set. According to ground truth of these images, we split the coefficient matrix The first L a column of the matrix L represents that all of these training images have a pixels belonging to blood vessels, and b pixels belong to the background. After NMF decomposition, G ∈ R t×τ is a t × τ matrix with the first a column denoting the features of pixels which belong to blood vessels. The aim of the second regularized term on the right-hand side of (6) is to find the two nonnegative matrixes G and U in order to maintain the data space structure of original selecting training images on low-dimensional data space obtained from the matrix factorization. The first few columns of this matrix G represent the pixels that belong to vessels. The goal of the third regularized term of (6) is to take the within-class and between-class distances as additional constraint on the proposed objective function. Let matrix A being The constrained term ||G − GA|| 2 F is designed in such a way that within-class distance between the feature vector of each pixel and the mean value is expected to approach zero. The study also considers the F-norm as a constraint on matrix GB to take between-class scatter distance into consideration in our model. To do is, we define the vector B ∈ R τ ×1 In our method, two constrained terms ||G−GA|| 2 F and ||GB|| 2 F are negatively correlated. Specially, the ideal situation is that the within-class distance ||G − GA|| 2 F decreases while between-class distance ||GB|| 2 F increases. The constraint term ||GB|| 2 F is multiplied by the coefficient τ to balance its weight values and ||G − GA|| 2 F . For convenience of calculations, α in (6) is set to one, and β is defined τ (α = 1, β = τ ). Thus, the right-hand side of the third term of (6) becomes ||G − GA|| 2 F ||GB|| −2 F . In order to avoid the initial value of this matrix at the beginning of the iteration being zero while considering that ||GB|| 2 F >> 1 , this paper rewrites the objective function (6) as follows The following expression is obtained by taking partial derivatives of J with respect to coefficient matrix U and feature matrix V.
Taking a derivative of J with respect to G, we have According to Karush-Kuhn-Tucker (KKT) conditions [29] ψ ik u ik = 0 and φ kj v kj = 0 , we have the following formulas for u ik and v kj .
Yu and Zhu EURASIP Journal on Image and Video Processing Similarly, ϕ kl g kl = 0 leads to the following expression.
Then, the updating rules of U and V can be deduced from the above equations as follows.
Using similar strategy, the multiplicative update of matrix G would be Because matrix A is a symmetric positive definite matrix, that is GA T = GA, GAA T = GA , thus, (17) can be rewritten as follows The optimizing scheme of this proposed constrained-based NMF is summarized in Algorithm 1.

Algorithm 1
The optimizing scheme of constrained-based NMF Input: X, L; Output: V ; 1: Initial: U by using the random initialization, V and G with one. Construct matrices A and B; 3)Image regeneration: After final generation of a more contributing feature matrix V where columns represent neighborhood information of n pixels , only 20 (t = 20 in this paper) rows remained for feature description. For further utilization of this contributing low dimension feature description method, we convert matrix V back to images with every pixel of the same encoding sequence. Thus, every pixel of the processed image with neighborhood feature information of 20 dimensions would be conveyed to proposed 3D segmentation networks for further vessel classification. Figure 4 shows the block diagram of image set regeneration.

Proposed 3D modified attention U-Net
Computer vision-based blood vessel detection requires algorithms with high accuracy and relatively convenient computation. By comprehensively describing neighborhood information using our constrained-based NMF, this study designs an end-to-end 3D modified attention U-Net architecture as a trainable classifier for vessel extraction. The architecture of this proposed network is shown in Fig. 5. Considering that current attention U-Net [24] plays only basic function in classification, the proposed 3D modified attention U-Net aims at reducing computational complexity with limited resource devoting to region of intense classification need. Specifically, input data obtaining from N images where one image would be saved as r × c with 20 channels representing neighborhood features. These data would be firstly divided into patches at size 32 × 32 × 20 and conveyed into the whole network. This network set an upsampling layer raising patch size by doubling the original input size before conventional maxpooling layers. Besides, only three maxpooling layers remain in our network compared to original four in U-Net compressing patch size. Symmetrically, three upsampling layers followed by a maxpooling recovering patch size is set in concert with encode path. This new design achieves comprehensive result among networks in terms of computational cost and segmentation accuracy. Considering that retinal vessel segmentation is a highly specific task on few image regions with intense meaningful information, three AGs conveying contributing information are inserted as shown in Fig. 6. Here, x signal conveying attention maps maintaining fine-grained details adding with g signal from former layers would generate a y output multiplying with former feature maps. In this way, more focus on salient features would be distributed on more detailed feature maps, and vessel areas would gain more learning resources, where large area of non-vessel retina would be suppressed.

Results and discussions
In this section, we first describe the datasets and metrics used in the experiment, and then detail the experimental results and performance analysis of the proposed method on some publicly used benchmark datasets.

Benchmark datasets
The framework will be evaluated based on high-resolution images from three publicly available datasets: DRIVE 1 [30], START 2 [31], and HRF 3 .
The DRIVE contains 40 fundus images which the size of each image is 565×584 pixels with 8 bits per color channel. All of the images have been segmented manually as a ground truth and their field of view (FOV) binary masks are also provided. The DRIVE dataset used in this paper contains 40 fundus images, in which training set is consisted of 35 color retinal images, and the other 5 images are adopted for testing.
The START contains 20 fundus images with resolution 700×605 pixels and 8 bits per color channel, in which 10 with pathologies and 10 without any pathologies. All images in this dataset are manually segmented by two observers. The results of the first observer are regarded as the ground truth. In the experiment, we randomly selects 16 images with hand labeled results for training and left 4 images for testing.
The HRF database [32] consists of 45 high-resolution eye fundus images at size 3504×2336 segmented by a group of experts working in the field of retinal image analysis and clinicians from the cooperated ophthalmology clinics. In specific, the dataset could be divided into 15 images of healthy patients, 15 images of patients with diabetic retinopathy and 15 images of glaucomatous patients. One ground truth image and a mask determinate field of view (FOV) is attached for each image. In this paper, we randomly select 41 images for training and the remaining 4 images for test. Testing images contain

Experimental environment and evaluation metrics
This subsection is to evaluate the segmentation performance on DRIVE and START datasets and compares the proposed method with state-of-the-art algorithms. All experiments are run on a small server with Intel (R) Core (TM) i7-9700KF CPU (4.8 GHz) with NVIDIA GeForce RTX 2080 Ti GPU. Our architecture was built based on a publicly available Python 3.7 platform and was implemented on Tensorflow backend Keras deep learning library. The performance of the vessel segmentation is measured using sensitivity (SE), specificity (SP), accuracy (ACC), precision, and recall. They are defined as follows: where TP and TN denote the number of pixels correctly classified as vessel pixels and non-vessel, respectively. FN represents the number of vessel pixels incorrectly labeled as non-vessel. FP is the number of non-vessel pixels incorrectly labeled as vessels. Precision and Recall measure the exactness and completeness of model performance. In addition, the performances have been examined in terms of standard indexes, such as AUC (area under the curve) and ROC (receiver operating characteristic curve) [33]. The AUC value is calculated using the trapezoidal rule. The closer the AUC value is to 1, the better the performance of the corresponding blood vessel segmentation algorithm. The ROC curve is a plot of SE versus 1-SP by varying the threshold on probability map.

Selection of parameter
This subsection discusses which value of γ from the gamma correction phase is most appropriate for the proposed approach. Thus, the pre-processing performances on ten images randomly selected from DRIVE dataset and compares the enhanced images with the ground truth in terms of Euclidean distance. Figure 7 shows the effect of varying value of γ in gamma correction on Euclidean distance. It is experimentally found that there is an obvious bottom point of the Euclidean distance while parameter γ increases, and the  minimum is achieved at γ = 0.14. Based on this parameter study, we adopt γ = 0.14 in the following experiments as our method can produce the best results in this case.

Experiment on retinal vessel extraction
The training processing is summarized as follows: After the network model generated, some processes are implemented before training. Reading the feature matrix, mask, and ground truth images of the original training set respectively, pixels outside the mask regarded as the region of interesting (ROI) could be gathered into a label set. Through this measure, useless information could be abandoned before training. We train and test the proposed network model on both STARE and DRIVE datasets and fine-tuned our network with a learning rate of 5e − 5, a weight decay of 1e − 6. A dropout rate of 0.2 was used between two convolutional layers. Batch size was set to 32, and 150 epochs were used to ensure convergence. In the training phase, we used the Adam optimizer [34]. The loss values verse epochs obtained in the process of model training are given in Fig. 8. During the image prediction phase, a similar data process as the training model, including pre-processing and constrained-based NMF, is carried out as well. Reading the information of each pixel and locating them in mask images of the original testing set, pixels which do not fall in the mask areas are conveyed into the proposed network model for testing. After a new label set (segmentation result) is outputted by the predicting process implemented by our model, pixels in the label set are filled back into the image with the order they are picked. Figure 9 shows some examples generated by the proposed methodology on DRIVE dataset, from which we can observed that our method is available to extract abundant vascular branches at different thickness. To prove the validity of the proposed, we evaluate quantitatively the retinal vessel segmentation results on test sets of both DRIVE and STARE by comparing the average values of the predictions with ground truth. Four different evaluation metrics ACC, SE, SP, and precision are applied, where all of them are computed from TP, FN, FP, and TN. Tables 1 and 2 list the evaluation results obtained by using the proposed framework on different datasets. The ROC curves of two databases are measured to quantify the proposed predication results and are provided in Fig. 10. As can be observed, our method performs better in detecting vessels on STARE than DRIVE. Our model also generates high AUC on two test tasks, at 0.9909 for

Comparison with other network models
To test the effectiveness of the proposed framework, we compared the output of our approach with several advanced algorithms U-Net 4 [1], AG-UNet 5 [35], IterNet 6 [26], DenseNet 7 [36], and V-GAN 8 [37] on STARE and DRIVE. Their segmentation results are obtained by running publicly available codes. All these deep convolution network-based algorithms are able to extract most of these vessels, while the proposed method performs well on most images, even when image contrast is low. Four images demonstrating retinal vessel segmentation algorithms results and ground truth from DRIVE dataset are shown in Fig. 11. Figure 12 shows the enlarged images of six models by bilinear interpolation to the size of 200 × 200 pixels on three images from STARE dataset. It could be observed that crossing vessel branches and thick vessels are two most significant factors for misclassification, which a different level is presented under different algorithms. U-Net and AG-UNet achieve similar segmentation while the inserted AG presents slightly better preservation of vessel details. However, thin vessels still remain broken or blur. In contrast, DenseNet and V-GAN almost capture all suspicious vein compared to the other four methods. But all predicted vascular networks seem to be exaggerated so that distinction of vessel thickness are not significant, and some vessels are excessive detected. IterNet, however, presents both problems of the above methods where detected areas are all of similar thickness, and details are detected less manifest. Comparing with all these methods, our model results could avoid detecting either too coarse or blurred. Interrupting strips are also less likely to be falsely captured, whereas inapparent blood vessels could be identified more approximate the ground truth. For more visually convenient comparison, we list six methods in a reasonable sequence as shown in Fig. 12.
For more validation of the proposed method, we calculate the evaluation metrics of vessels of resulted images using their corresponding ground truth. After that, we calculate the SE, SP, ACC, and precision of vessels of DRIVE and STARE datasets as shown in Tables 3 and 4. Higher sensitivity assures all potential vessel areas being detected, and higher specificity assure correctness among detected area. Consistent with our visual analysis, the SP value of U-Net and AG-UNet is higher on both datasets at about 0.97. This indicates that the identification is relatively conservative and basic so that uncertain areas such as thin and blur vessels maybe missed. But the SE of V-GAN are higher than other network at about 0.95 on STARE and 0.85 on DRIVE. This means the classification are relatively coarse and suspicious areas are of high likelihood to be identified as vessels. The results of IterNet also consists with former evaluations, where the SEs are smaller than DenseNet and V-GAN but larger than U-Net and AG-UNet. Also, the SPs are either similar with methods from both trends or staying at a moderate level. However, it could be observed that these methods have their own characteristics, while neither being too specific nor too sensitive meet the requirement of efficient real-world application. Our method shows a good tradeoff between both metrics, while qualified results are reached compared to all methods. If we look at ACC and precision, which higher figure means

Comparison of segmentation results on high-resolution dataset
Since high resolution images are becoming common in clinical use, we evaluate our method on HRF dataset (image size 3504×2336 pixels) in this experiment. We compare the proposed approach with several state-of-the-art methods and retinal vessel segmentation results are displayed in Fig. 13. Among these methods, Soares et al. [13] is a standard segmentation algorithm while the others are all based on CNN. Some competitive methods proposed recently [37,38] are also engaged in comparative experiment. It could be seen from Fig. 13 that traditional segmentation method [13] shows a relatively blur vessel outlines than CNN-based approaches. U-Net and AG-UNet still seem to miss thin vessels, and DenseNet and V-GAN depict relatively thick blood vessels for any suspicious area. Different from segmentation results on STARE and DRIVE, IterNet fails to capture vessels around the optic discs on images from HRF dataset. M-GAN and our approach both achieve competitive results on these high-resolution images, where our approach is slightly more sensitive to thin vessels. It could be observed from Table 5 that traditional segmentation method [13] show less satisfied results in terms of precision and sensitivity, although test time is noticeably less than other CNN-based methods. All methods achieve similar level of specificity where M-GAN is 0.003 slightly higher than ours. Different from previous results, AG-UNet is slightly more sensitive than other approaches under HRF database while sensitivity of V-GAN dropped to 0.8196, second only to AG-UNet. Our approach and M-GAN both achieve competitive results where M-GAN achieves the highest accuracy at 0.97 and our approach arrives at the highest precision at 0.8947. Nevertheless, our approach trains for 89 seconds per epoch and tests for 17 s per image which is relatively quicker than M-GAN. In general, all experimental results keep the same level of figures as on previous datasets. Our approach shows advantage on high-resolution database in terms of specificity and precision. The advantage of our method is mainly due to the proposed 3D modified Attention U-Net architecture and the use of constrained-based NMF, which not only offers highly discriminative features that help us to classify small segments from non-vessel pixels, but also improves global spatial consistency of the results. It can be seen from the experiences that our method outperforms these competitive methods in terms of reasonable and accurate vessels detecting for application purpose.

Conclusions
In this paper, we proposed a 3D modified attention U-Net architecture along with constrained-based NMF to extract retinal vessels accurately especially for thin vessels. The pre-processing steps include gamma correction and region processing to achieve well contrast images for subsequent calculation. Next, we proposed an novel NMF algorithm with within-class and between-class constraints to encode and extract neighborhood feature information of each pixel, while image dimension reducing. Our constrained-based NMF approach also provide a new choice for computer vision research while compressing dimension is necessary. Next, a 3D modified attention U-Net with an upsampling beforehand and a downsampling after the three max-pooling layer is proposed. At the same time, the AGs used in the skip connection highlight useful feature information and suppress irrelevant content. Finally, to measure the effectiveness of the proposed framework, we tested the proposed model on three datasets DRIVE, STARE, and HRF. The obtained results and related comparisons shown that the performances of this proposed scheme were better than most of the exist. The proposed retinal vessel extraction scheme can be extended to other similar vessel segmentation focused tasks such as cardiovascular extraction.