 Research
 Open access
 Published:
PointPCA: point cloud objective quality assessment using PCAbased descriptors
EURASIP Journal on Image and Video Processing volumeÂ 2024, ArticleÂ number:Â 20 (2024)
Abstract
Point clouds denote a prominent solution for the representation of 3D photorealistic content in immersive applications. Similarly to other imaging modalities, quality predictions for point cloud contents are vital for a wide range of applications, enabling tradeoff optimizations between data quality and data size in every processing step from acquisition to rendering. In this work, we focus on use cases that consider human endusers consuming point cloud contents and, hence, we concentrate on visual quality metrics. In particular, we propose a set of perceptually relevant descriptors based on principal component analysis (PCA) decomposition, which is applied to both geometry and texture data for fullreference point cloud quality assessment. Statistical features are derived from these descriptors to characterize local shape and appearance properties for both a reference and a distorted point cloud. The extracted statistical features are subsequently compared to provide corresponding predictions of visual quality for the distorted point cloud. As part of our method, a learningbased approach is proposed to fuse these individual predictors to a unified perceptual score. We validate the accuracy of the individual predictors, as well as the unified quality scores obtained after regression against subjectively annotated datasets, showing that our metric outperforms stateoftheart solutions. Insights regarding design decisions are provided through exploratory studies, evaluating the performance of our metric under different parameter configurations, attribute domains, color spaces, and regression models. A software implementation of the proposed metric is made available at the following link: https://github.com/cwidis/pointpca.
1 Introduction
With the increasing popularity of extended reality technology and the adoption of depthenhanced visual data in modern telecommunication and imaging systems, point clouds have emerged as a promising 3D content representation. However, a faithful rendition of 3D visual information using point clouds requires vast amounts of data, several orders of magnitude higher than what current transmission infrastructure can handle. Thus, reliable point cloud compression schemes are essential and have been a main focus of the Motion Picture Expert Group (MPEG) [1] and Joint Picture Expert Group (JPEG) [2] standardization bodies in the last few years. As a result of these efforts, MPEG has crafted two standards, namely Videobased Point Cloud Compression (VPCC) [3], and Geometrybased Point Cloud Compression (GPCC) [4], while the JPEG Pleno [5] Learningbased Point Cloud Coding standard [6] is under development. These milestones are crucial to establish interoperability and facilitate the integration of point cloud technology in daily use cases.
Compression schemes often offer size reduction at the cost of added visual distortions. Moreover, point cloud contents might undergo signal deformations during processing, transmission, and/or rendering, which may have an additional negative effect on their perceptual quality. Therefore, there is a need for mechanisms to quantify the induced visual impairments, enabling perceptually based optimizations and ensuring the best Quality of Experience (QoE) for the endusers. Groundtruth ratings for the amount of visual impairments in a stimulus are obtained through subjective quality assessments. However, these procedures are timeconsuming, costly, and essentially impractical for reallife applications. Thus, objective quality methods that can automatically predict the visual quality of distorted stimuli are required.
Two main types of characterization are commonly used to distinguish approaches for objective quality metrics for point cloud contents. One characterization comes from image and video objective quality metrics, and distinguishes between fullreference, reducedreference, and noreference metrics, based on their requirement for the reference content, some reference data, and no reference information at execution time, respectively. An orthogonal characterization for point cloud quality metrics is based on the domain in which the metric is computed, differentiating them as projectionbased and pointbased [7]. The former refers to 2D solutions, capturing geometric and textural distortions as reflected upon rendering on planar arrangements. These methods commonly adopt or extend techniques that were devised for images in the past, and they are view and renderingdependent [7]. Conversely, pointbased counterparts operate in the 3D point cloud domain and are renderingagnostic. Both projection and pointbased schemes could rely on either conventional or learningbased approaches [8]. However, the latter are often treated as a separate category.
Fullreference metrics are widely used in scenarios such as ratedistortion optimization for efficient compression, in which there is a need for comparing and quantifying distortions added to a pristine reference to determine the best rate allocation. Pointbased solutions are renderingagnostic and offer better generalization for cases in which the final rendering parameters are not known. Thus, in this work, we focus on a fullreference, pointbased solution.
Initial attempts of fullreference pointbased metrics built on simple distances between individual points, whereas more recent algorithms utilize richer features that capture local patterns of geometric and textural information. The majority of modern pointbased methods make use of small sets of geometric features, often focusing on specific surface properties, with normal vectors (e.g., [9,10,11]) and curvatures (e.g., [11,12,13]) being more widely used. Textural features typically rely on statistics of luminance or lightness (e.g., [11, 13]) and occasionally chromatic components (e.g., [13]), computed over spatial neighborhoods. Geometric and textural features are often linearly combined [13], while more recently, paradigms of more advanced regression models, such as Random Forest [14, 15] and Support Vector Regression [16] are gaining ground. Employing learningbased frameworks to combine handcrafted features offers the advantage of interpretability, while still leveraging machine learning to effectively map predictions from the extracted features to a single quality score. Such methods have been successfully used in the field of image and video quality assessment, with VMAF being among the most renowned examples [17].
In this paper, we introduce PointPCA, an objective quality metric that makes use of handcrafted, interpretable descriptors of geometric and textural properties, based on principal component analysis (PCA), in a learningbased framework for visual quality assessment of point clouds. Subsets of the proposed geometric descriptors have already been used for urban classification [18], semantic interpretation [19], semantic segmentation [20], contour detection [21], and, more recently, noreference objective quality assessment [16] of point cloud data. We complement the existing literature by proposing an enriched set of PCAbased geometric and a novel set of PCAbased textural descriptors, with corresponding predictors fused through Random Forest regression to a single perceptual quality score in a fullreference design. Our results show that PointPCA achieves high performance under all tested datasets, with substantial improvements over stateoftheart metrics. Exploratory studies are performed under different parameter configurations, color spaces, attributespecific descriptors, and regression models to showcase the effectiveness and performance stability of our metric. Our contributions can be summarized as follows:

We propose the use of statistical features computed from PCAbased descriptors to quantify point cloud geometric and textural distortions. The descriptors are obtained per point after applying PCA over spatial neighborhoods, and capture local geometric and textural properties, while the statistical features estimate average and dispersion trends, promoting interpretability.

We choose the Random Forest algorithm to produce a unique perceptual quality score by fusing individual predictors obtained from the proposed statistical features in a nonlinear manner. We demonstrate the effect of the selected learningbased framework through comparison to other commonly used regression models. Our results show high robustness under any nonlinear method.

We compare the performance of PointPCA to stateoftheart metrics on a variety of datasets, showing gains in all datasets under consideration.
2 Related work
A brief description of point cloud objective quality assessment methods is provided below, after clustering them based on their operating principle. The interested reader may refer to [8] for a more detailed overview.
2.1 Pointbased objective quality metrics
The pointtopoint and pointtoplane [9] denote the earliest attempts for the establishment of pointbased objective quality metrics. The former measures the Euclidean distance between point coordinates, while the latter relies on the projected error of distorted points across reference normal vectors. In both metrics, the mean square error (MSE) or the Hausdorff distance is applied over the individual, perpoint error values, to deliver a global degradation score. In [22], the generalized Hausdorff distance is proposed to mitigate the sensitivity of the Hausdorff distance in outlying points, by excluding a percentage of the largest individual errors. The geometric peaksignaltonoise ratio (PSNR), defined in [23] for both metrics to account for differently scaled contents, was revised in [24] to consider the contentâ€™s intrinsic or rendering resolution. The planetoplane metric is described in [10] and estimates the angular similarity of tangent planes, as expressed through unoriented normals. The pointtodistribution metric, introduced in [25], computes the Mahalanobis distance between a distorted point and a reference neighborhood. The PCMSDM [12] evaluates the similarity of local curvature statistics, extracted after quadratic fitting in support regions.
Previous metrics examine only geometric distortions. A few more recent attempts employ texturalonly information, albeit, the majority of metrics incorporate both geometric and textural information. Specifically, the first textureonly metric follows the pointtopoint logic and measures the MSE or PSNR [26], analogously to the wellknown 2D image counterpart. More sophisticated textureonly paradigms are proposed in [27], which compute histograms or correlograms of luminance and chrominance components, to characterize color distributions.
Regarding metrics that consider both geometry and texture, the pointtodistribution metric was extended to capture color degradations in [28] by additionally applying the same formula on the luminance component. The PCMSDM was extended to PCQM [13] by incorporating local statistical measurements from luminance, chrominance, and hue in order to evaluate textural impairments. The PointSSIM [11] relies on statistical dispersion of location, normal, curvature, and luminance data. An optional preprocessing step of voxelization is proposed to enable different scaling effects and reduce intrinsic geometric resolution differences across contents. The VQACPC [14] computes statistics upon Euclidean distances between every sample and the arithmetic mean of the point cloud, using geometric coordinates and color values. An extension is presented in [15], namely CPCGSCT, which involves a point cloud partition stage, before extraction of features per region.
A graph signal processingbased approach, namely GraphSIM, is described in [29] and evaluates statistical moments of color gradients on keypoints, after highpass filtering on the pristine contentâ€™s topology. A multiscale version, namely MSGraphSIM, is presented in [30]. In [31], local binary patterns are applied to the luminance component of neighboring points. This work is extended in [32] considering the pointtoplane distance between point clouds, and the pointtopoint distance between feature maps. A variant descriptor called local luminance pattern is proposed in [33], introducing a voxelization stage. A textural descriptor to compare neighboring color values using the CIEDE2000 distance is reported in [34]. The color differences are coded as bitbased labels, which denote frequency values of predefined intervals. An extension is presented in [35], namely, BitDance, which incorporates bitbased labels from a geometric descriptor that relies on the comparison of neighboring normal vectors. The EPES presented in [36], relies on potential energy; that is, the energy needed to move points of a local neighborhood from an origin to their current geometric and color status. The MPED [37] also utilizes the point potential energy, which quantifies the spatial distribution and color under certain metric space to measure isometrical distortion. The potential energy discrepancy is further extended to a multiscale form.
The aforementioned are fullreference metrics. Fewer attempts have been reported for reducedreference and noreference metrics. In particular, the first reducedreference objective quality metric, PCM_RR, is described in [38] and relies on global features that are extracted from location, color, and normal data. More recently, a reducedreference metric for point clouds encoded with VPCC is presented in [39]. It is based on a linear model of geometry and color quantization parameters, with the modelâ€™s parameters determined by a local and a global color fluctuation feature. A noreference method, namely BQECVP, is proposed in [40] that combines pointbased geometric features, pointbased and projectionbased texture degradations, and a joint geometriccolor feature. In [16], the logic of using natural scene statistics for noreference quality assessment of 2D images, is extended to 3D contents. Specifically, the authors propose statistical properties of geometric features and LAB color value distributions, to evaluate the visual quality of both point clouds and meshes.
2.2 Projectionbased objective quality metrics
The prediction accuracy of 2D quality metrics over images obtained after projecting point clouds on the six faces of a surrounding cube, was initially examined in [41]. The influence of the number of viewpoints in denser camera arrangements and the exclusion of background pixels is explored in [42], which also proposes a weighting scheme based on user interactivity. In [43], a weighted combination of global and local features extracted from texture and depth images, is defined. The Jensenâ€“Shannon divergence on the luminance component serves as the global feature, whereas a depthedge map, a texture similarity map, and an estimated content complexity factor account for the local features. In [44], color and curvature values are projected on planar surfaces. Color impairments are evaluated using probabilities of local intensity differences, together with statistics of their residual intensities, and similarity values between chromatic components. Geometric distortions are assessed based on statistics of curvature residuals. A hybrid approach using both projection and pointbased algorithms is proposed in [45], namely, LPPCQM. The point clouds are divided into nonoverlapping partitions called layers, with a planarization process taking place at each layer, before applying the IWSSIM [46] to assess geometric distortions. Color impairments are evaluated using RGBbased variants of similarity measurements defined in [13]. In [47], an imagebased metric is proposed that tackles misalignment between the original and the distorted geometry. This is achieved by mapping the color of the distorted point cloud to the original geometry. The resulting and the original point clouds are then projected to the six faces of a surrounding cube, followed by cropping and padding to eliminate background pixels, before the execution of any 2D quality metric. The same process is repeated after mapping the original color to the distorted geometry, and a total quality score is obtained as a weighted average.
2.3 Learningbased objective quality metrics
In [48], Convolutional Neural Network (CNN) pretrained for classification is evaluated in the task of noreference point cloud quality assessment, after necessary adjustments. Geometric distances, mean curvatures, and luminance values are packed into patches, with patch quality indexes computed using a CNN, and a global score obtained after pooling. An extension of this metric for fullreference quality assessment is presented in [49]. In [50], the use of perceptual loss is extended to point clouds, represented as voxel grids or truncated signed distances. The perceptual loss is applied to the latent space, after a simple autoencoding architecture of convolution layers. In [51] a neural network architecture for noreference quality assessment based on projected views is proposed, namely PQANet. Features are extracted after a series of CNN blocks and are shared between a distortion identifier and a quality prediction unit to obtain a final quality score. In [52], the PMBVQA is proposed, which relies on a CNNbased joint colorgeometric feature extractor that is fed with corresponding projections maps, followed by a twostage multiscale feature fusion step, and a spatial pooling module. In [53], point clouds are split into submodels for geometry representation and 2D image projections for texture representation, with both modalities encoded using PointNet++ and ResNet50, respectively. Symmetric crossmodal attention is employed to fuse multimodality qualityaware information. In [54], a graph convolution kernel (GPAConv) is introduced to capture the perturbation of structure and texture. Subsequently, the network employs a multitask framework, with quality regression as the main task, and auxiliary tasks for predicting distortion type and degree. A coordinate normalization module is employed to enhance the stability of GPAConv results when confronted with shifts, scales, and rotations.
3 Description of PointPCA
The architecture of the proposed metric can be decomposed into seven stages, namely, (a)Â Duplicates Merging, (b)Â Correspondence, (c)Â Descriptors, (d)Â Statistical Features, (e)Â Comparison, (f)Â Predictors, and (g)Â Quality Score. A corresponding system diagram is presented in Fig.Â 1. The metric requires a reference during execution in order to provide a quality prediction for a point cloud under evaluation. Specifically, a correspondence between the two point clouds is obtained after merging points with identical coordinates that belong to the same point cloud. Then, 23 geometric and textural descriptors are computed per point, for both point clouds. For every descriptor, we capture local relations by applying statistical functions, leading to corresponding statistical features. Given the correspondence, 46 statistical features extracted from the reference and the point cloud under evaluation per point, are compared. The derived error samples are pooled together, resulting in a predictor of visual quality per statistical feature. The obtained 46 predictors are finally fused by means of a regression algorithm to obtain a total quality score for the point cloud under evaluation. Below, every stage is detailed separately.
3.1 Duplicates merging
Within a single point cloud, points that have identical coordinates are identified and merged; that is, only one point per coordinate set is kept [9, 11]. The color of the merged point is obtained by averaging the color of respective points with the same coordinates. This offers the advantage that points with unique locations form neighborhoods to compute descriptors and statistical features, eliminating bias due to duplicated values. Moreover, redundant correspondences between a reference and a point cloud under evaluation are avoided.
3.2 Correspondence
Identifying matches between two sets of points is an illposed problem. To favor lower complexity, we use the nearest neighbor algorithm for the identification of correspondences between two point clouds, similar to the majority of existing metrics (e.g., [9,10,11]). For this purpose, one point cloud is set as the reference and the other as the point cloud under evaluation. Then, for every point \({\textbf{b}}_i\) that belongs to the point cloud under evaluation \({\mathcal {B}}\) (i.e., \({\textbf{b}}_{i} \in {\mathcal {B}}\)), a matching point \({\textbf{a}}_i \in {\mathcal {A}}\) is identified as its nearest neighbor in terms of Euclidean distance, and is registered as its correspondence. Formally, for the point cloud under evaluation, the correspondence function is defined as \(c^{{\mathcal {B}}, {\mathcal {A}}}: {\mathcal {B}} \xrightarrow {} {\mathcal {A}}\) with \(c^{{\mathcal {B}}, {\mathcal {A}}}({\textbf{b}}_i) = {\textbf{a}}_i\).
Note that, different sets of matching points are obtained when iterating over the points of \({\mathcal {B}}\) to identify nearest neighbors in \({\mathcal {A}}\), with respect to starting from \({\mathcal {A}}\) to find matches inÂ \({\mathcal {B}}\); that is when setting \({\mathcal {A}}\), or \({\mathcal {B}}\) as reference, respectively. In our case, we set both the pristine and the impaired point clouds as reference, as further described in Sect.Â 3.6, and we use a max operation [9,10,11] to obtain a final prediction that is independent of the reference selection. This is commonly referred to in the literature as symmetric error [8].
3.3 Descriptors
A set of 15 geometric and 8 textural descriptors is defined per point, to reflect local properties of point cloud topology and appearance, respectively. The majority of those descriptors are extracted after applying PCA on spatial neighborhoods of geometric coordinates and textural values, correspondingly. Specifically, provided a query point \({\textbf{p}}_i\), we identify a surrounding support region that belongs to the same point cloud, forming a set \({\textbf{P}}_i\) that consists of points \({\textbf{p}}_{n} \in {\textbf{P}}_i\). The covariance matrix \(\mathbf {\Sigma }_i\) of this set is computed, as shown in Eq. (1):
with \({\textbf{P}}_{i}\) indicating the cardinality, and \(\mathbf {{\overline{p}}}_{i}\) the centroid of \({\textbf{P}}_i\), which is given in Eq. (2):
Eigendecomposition is then applied to the covariance matrix, which is symmetric and positive definite and, thus, its eigenvalues exist, are nonnegative, and correspond to an orthogonal system of eigenvectors. Eigenvectors indicate directions across which the data are mostly dispersed, while eigenvalues denote the variance of the transformed data across the principal axes.
3.3.1 Geometric descriptors
For the computation of geometric descriptors, the coordinates of the points that belong in \({\textbf{P}}_i\) are used; hence, in Eqs.Â 1 andÂ 2, we set \({\textbf{p}}_i = (x_i, y_i, z_i)^T\). Let us assume that \({\textbf{e}}^{g}_{1}\), \({\textbf{e}}^{g}_{2}\), and \({\textbf{e}}^{g}_{3}\) denote the eigenvectors that correspond to the eigenvalues \(\lambda ^{g}_1\), \(\lambda ^{g}_2\) and \(\lambda ^{g}_3\), with \(\lambda ^{g}_1> \lambda ^{g}_2 > \lambda ^{g}_3\) obtained after eigendecomposition of the covariance matrix. Moreover, let us define \({\textbf{u}}_x = (1, 0, 0)^T\), \({\textbf{u}}_y = (0, 1, 0)^T\) and \({\textbf{u}}_z = (0, 0, 1)^T\) to depict unit vectors across the x, y and z axis, respectively. Eigenvalues, eigenvectors, and unit vectors are employed to construct the proposed geometric descriptors, \({\textbf{d}}^{g} \in {\mathbb {R}}^{1 \times 15}\), which are defined in TableÂ 1. As can be seen, each descriptor corresponds to an interpretable shape property. Intuitively, \(d^{g}_{14}\) denote the individual (i.e., \(d^{g}_{13}\)) and the aggregated sum (i.e., \(d^{g}_{4}\)) of eigenvalues that indicate dispersion magnitudes for the points distribution across the principal axes. \(d^{g}_{57}\) reveal behaviors of a neighborhoodâ€™s points arrangement, capturing the dimensionality of the local surface. \(d^{g}_{8}\) focuses on data variation across the \(1^{\text {st}}\) and the \(3^{\text {rd}}\) principal directions. \(d^{g}_{911}\) provide an estimate of spread, uncertainty, and variation of the underlying surface, respectively, considering all principal axes. \(d^{g}_{12}\) quantifies the projected error of a queried point from its neighborhoodâ€™s centroid, across the estimated normal vector, \({\textbf{e}}^{g}_3\). Finally, \(d^{g}_{1315}\) measure the projected error of \({\textbf{e}}^{g}_3\) across unit vectors parallel to the Cartesian coordinate system axes where a point cloud lies. In summary, \(d^{g}_{111}\) capture patterns in data dispersion, \(d^{g}_{12}\) local roughness, and \(d^{g}_{1315}\) the direction of data dispersion.
3.3.2 Textural descriptors
The red green blue (RGB) color values serve as the first three descriptors of a point, noted as \(d^{t}_{13}\). For the computation of PCAbased textural descriptors, the RGB color values of the points that belong in \({\textbf{P}}_i\) are employed; hence, we set \({\textbf{p}}_i = (\text {R}_i, \text {G}_i, \text {B}_i)^T\) in Eq.Â 1 andÂ 2 and obtain the eigenvalues \(\lambda ^{t}_1\), \(\lambda ^{t}_2\) and \(\lambda ^{t}_3\), with \(\lambda ^{t}_1> \lambda ^{t}_2 > \lambda ^{t}_3\). The individual (i.e., \(d^{t}_{46}\)) and the aggregated sum (i.e., \(d^{t}_{7}\)) of eigenvalues, as well as the eigenentropy (i.e., \(d^{t}_{8}\)) are computed to estimate dispersion magnitudes and uncertainty of the color distribution across one or all principal axes of a local neighborhood, respectively. The formal definition of the textural descriptors, \({\textbf{d}}^{t} \in {\mathbb {R}}^{1 \times 8}\), is given in TableÂ 1.
3.3.3 Support regions
A support region is required around every point sample in order to compute corresponding descriptors. Note that for both geometric and textural PCAbased descriptors (i.e., all excluding \(d^{t}_{13}\)), the same support region is used and is specified based on spatial vicinity. In general, there are two alternatives widely employed to specify point cloud neighborhoods; that is, the k nearest neighbor and the range search algorithms, hereafter, noted as knn and rsearch, respectively. The former leads to neighborhoods of arbitrary extent and a fixed population of points (k), whereas the latter identifies the same spherical volumes (of radius r) that enclose varying numbers of samples.
We choose the rsearch algorithm to estimate descriptors. This is justified by our requirement to represent properties of the same surface areas in both the reference and the distorted stimuli. This behavior is granted by the rsearch variant, as opposed to the knn algorithm, which is susceptible to different point densities. For example, in the presence of downsampling, there is no difference between the size of regions identified in the pristine and the impaired point clouds using the rsearch. However, when using knn, larger regions are considered in the impaired point cloud; thus, descriptor values represent properties of underlying surfaces of different sizes.
3.4 Statistical features
A set of 46 statistical features is computed per point, after applying 2 statistical functions to geometric and textural descriptor values that lie in the same neighborhood to capture interpoint local relations (e.g., [11, 13]). In particular, the mean is computed to provide a smoother estimate of a surface property (i.e., either geometric or textural), accounting for a broader region. The standard deviation is also obtained, to quantify the level of variation of a surface property in the surrounding area. Considering a query point \({\textbf{p}}_i\), we identify a support region defined as a set \(\mathbf {{\widehat{P}}}_{i}\) that consists of neighboring points \({\textbf{p}}_{{\hat{n}}} \in \mathbf {{\widehat{P}}}_{i}\). The first statistical feature of point \(\mathbf {{{p}}}_{i}\) is computed per Eq.Â 3:
where \(d_{u}^{\omega}({\textbf{p}}_{{\hat{n}}})\) denotes a descriptor relative to point \({\textbf{p}}_{{\hat{n}}}\) from either geometry (\(g\)) or texture (\(t\)) domain \(\omega \in \lbrace g, t\rbrace\), with \(u \in \lbrace 1, 2,..., 15 \rbrace\) if \(\omega = g\), and \(u \in \lbrace 1, 2,..., 8 \rbrace\) if \(\omega = t\). The second statistical feature of point \({\mathbf{p}}_i\) is then obtained from Eq.Â 4:
For point \({\textbf{p}}_i\), we denote with \(\varvec{\mu }_i \in {\mathbb {R}}^{1\times 23}\) the concatenation of all \({\mu }_{i}(d_{u}^{\omega })\), for all descriptors from geometry followed by texture domain; analogously, we denote with \(\varvec{\sigma }_i \in {\mathbb {R}}^{1\times 23}\) the concatenation of all \({\sigma }_{i}(d_{u}^{\omega })\). A complete statistical features vector is given as \(\varvec{\phi }_i = [\varvec{\mu }_{i}, \varvec{\sigma }_{i}] \in {\mathbb {R}}^{1\times 46}\). In Fig.Â 2, indicative visual examples of statistical features are presented.
Statistical features are able to better capture dependencies within local neighborhoods, and provide measurements that are more perceptually coherent with respect to single points. Specifically, they are wellaligned with primary characteristics of the human visual system, such as lowpass filtering and sensitivity to highpass frequencies. Applying the mean in local regions mimics the former, whereas the standard deviation provides an estimate of the latter. Moreover, statistical features are computed per point and contain contributions from its surroundings, thus, alleviating the negative effects of an erroneous correspondence, or outlying descriptor values. That is, considering impaired stimuli that are characterized by point removal or displacement with respect to their pristine positions, errors might be introduced by the matching algorithm, or descriptor values might be poorly estimated. Hence, comparing means instead of descriptor values mitigates the error.
3.4.1 Support regions
We choose the knn algorithm to compute statistical features. We argue that, in this case, the operating principle of this approach is beneficial for revealing topological deformations. In particular, by appending neighboring samples until reaching k, we consider larger areas in a sparser impaired stimulus, and we recruit erroneous points in case of repositioning. Thus, larger differences will be observed in comparison to corresponding measurements taken from the pristine content. In simpler terms, using knn allows us to penalize point sparsity and displacement.
3.5 Comparison
Given the correspondence function \(c^{{\mathcal {B}}, {\mathcal {A}}}({\textbf{b}}_{i}) = {\textbf{a}}_i\) defined in Sect. 3.2, the \(j^{\text {th}}\) statistical feature of point \({\textbf{b}}_i \in {\mathcal {B}}\), namely \(\phi ^{{\mathcal {B}}}_{i,j}\), is compared to the \(j^{\text {th}}\) statistical feature of point \({\textbf{a}}_{i} \in {\mathcal {A}}\), namely \(\phi ^{{\mathcal {A}}}_{i,j}\) using the relative difference as in [11], per Eq.Â 5:
where \(r_{i,j}^{{\mathcal {B}},{\mathcal {A}}}\) indicates the derived error sample that corresponds to \({\textbf{b}}_i\), with \(1 \le i \le {\mathcal {B}}\) and \(1 \le j \le 46\), while \(\varepsilon\) represents a small constant to avoid undefined operations; in this case, we use the machine rounding error for floating point numbers. This computation is repeated for all \({\textbf{b}}_i\), and corresponding error samples \(r_{i,j}^{{\mathcal {B}},{\mathcal {A}}}\) are obtained.
3.6 Predictors
For every statistical feature j, the error samples of \({\mathcal {B}}\) are pooled together, as shown in Eq.Â 6:
The same computations are repeated after setting the point cloud \({\mathcal {B}}\) as the reference, provided the correspondence function \(c^{{\mathcal {A}}, {\mathcal {B}}}({\textbf{a}}_k) = {\textbf{b}}_k\), and a corresponding measurement \(s_{j}^{{\mathcal {A}},{\mathcal {B}}}\) is computed. Finally, for every statistical feature \(j\), a corresponding predictor \(s_j\), with \(1 \le j \le 46\), is obtained after applying the symmetric max operation similarly to [9,10,11], per Eq.Â 7:
3.7 Quality score
Each predictor \(s_{j}\) provides a quality rating based on the \(j^{\text {th}}\) statistical feature. To combine all 46 predictors into a total quality score, q, any linear or nonlinear regression model can be used. Machine learningbased regression models have been extensively used to tackle this problem in the domain of quality assessment. As part of our metric, we use the Random Forest algorithm. This is an ensemble learning method that can improve the prediction performance with respect to single features while limiting overfitting issues. Note that we evaluate the impact of using different regression models on the performance of our method in Sect. 6.4.
3.8 Complexity
The total complexity of the algorithm is dominated by the operations that require the definition of a support region using the rsearch and kNN algorithms for the computation of descriptors and statistical features, as described in Sects.Â 3.3 andÂ 3.4, respectively. For a given point cloud \({\mathcal {P}}\), such operations generally have average complexity \(O({\mathcal {P}}\log {\mathcal {P}})\) for wellbehaved cases, and \(O({\mathcal {P}}^2)\) in the worstcase scenario. Thus, an upper bound of the complexity of the algorithm can be defined as \(O(N^2)\), in which \(N = \max ({\mathcal {A}}, {\mathcal {B}})\).
4 Benchmarking setup
4.1 Selection of datasets
Three subjectively annotated data sets are used to evaluate the performance of the proposed and stateoftheart quality metrics under consideration, namely, MPCCD (D1) [7], SJTU (D2) [43] and WPC (D3) [55]. D1 consists of 8 colored static point clouds illustrating both human figures and inanimate objects, whose geometry and color are encoded using VPCC and four GPCC variants (i.e., OctreeplusLifting, OctreeplusRAHT, TriSoupplusLifting, and TriSoupplusRAHT), resulting in 232 distorted stimuli. D2 comprises 9 colored point clouds depicting both human figures and inanimate objects that are subject to octreebased compression, color noise, geometry Gaussian noise, downscaling, and a superposition of every combination of two aforementioned degradations excluding compression, for a sum of 378 distorted stimuli. Finally, D3 contains 20 colored point clouds depicting inanimate objects, that are subject to octreebased downsampling, a superposition of geometric and color Gaussian noise, and a superposition of geometric and color compression distortions using a TriSoup and an Octreebased GPCC variant, as well as VPCC, for a total of 740 distorted stimuli.
4.2 Computation of performance indexes
To evaluate the performance of an objective quality metric in predicting perceptual quality, Mean Opinion Score (MOS) from subjects participating in dedicated experiments are employed as ground truth. The metrics are typically benchmarked after applying a fitting function to map the objective scores to the subjective quality range, while also accounting for biases, nonlinearities, and saturations from subjective testing. Let us define a score obtained by the execution of an objective metric as a Predicted Quality Score (PQS). A predicted MOS, denoted as P(MOS), is estimated by applying the fitting function on the [PQS, MOS] data set. In our analysis, the Recommendation ITUT J.149 [56] is followed, using the logistic function typeÂ II. Then, the Pearson Linear Correlation Coefficient (PLCC), the Spearman Rank Order Correlation Coefficient (SROCC), and the Root Mean Square Error (RMSE) are computed between the P(MOS) and MOS to draw conclusions on the linearity, monotonicity, and accuracy of the objective quality metrics, respectively.
4.3 Configuration and execution of objective quality metrics
Stateoftheart objective quality metrics are employed in our performance evaluation analysis for comparison purposes. In particular, we use the pointtopoint, pointtoplane [9], and color PSNR on luminance component, which are being used in the MPEG standardization activities for point cloud compression. We also use the planetoplane [10], the joint pointtodistribution metric [28] with logarithmic values, the BitDance [35], the PointSSIM [11] (on geometry, normal, curvature and luminance), the PCQM [13], and the MPED [37].
To compute the pointtopoint and pointtoplane, the software version 0.13.5 [26] is used. For the latter, the normals are computed using a quadratic fitting with rsearch and \(r = 0.01 \times B_{R}\), where \(B_{R}\) indicates the maximum length of the bounding box of the reference point cloud. For planetoplane, the normals are computed based on quadratic fitting with rsearch and \(r = 0.02 \times B_{R}\), following literature best practices [57]. In the pointtodistribution metric, neighborhoods consisting of \(k = 31\) point samples are considered. For BitDance, we use the recommended configurations, namely, \(k = 6\) for the target voxel edge size, while the neighborhood size is set to 6/12 and the label bits to 16/8 for geometry/color histogram. For PointSSIM, the default parameters are employed, with the variance as the selected estimator of statistical dispersion, and \(k = 12\); for the computation of curvatures and normals, quadric fitting with rsearch and \(r = 0.01 \times B_{R}\) was used. In PCQM, the default configurations are used. For MPED [37], the default settings are employed, with L defined as a fraction of the total number (i.e., 1/10000), and the square of \(\ell ^2\) norm adopted as the distance function. For PointPCA, the PCAbased descriptors (i.e., all except \(d_{13}^{t}\)) are estimated using the rsearch with \(r = 0.008 \times B_{R}\), while for the statistical features, the knn algorithm with \(k = 9\) is used. The Random Forest regression method is implemented using the scikitlearn python framework [58] with the default configuration, namely, MSE as a criterion for split and 100 trees. Note that results from the PSNR versions of pointtopoint and pointtoplane are not reported, due to the presence of infinity values, which prevented correlation computations and fair comparison.
4.4 Evaluation of objective quality metrics
As part of our analysis, we evaluate the performance of each individual predictor on the datasets D1, D2, and D3; in this case, all contents of each dataset are considered. Moreover, we evaluate the performance of PointPCA after fusing individual predictors using learningbased regression. However, such a validation requires splitting the datasets into training and testing sets. In our analysis, performance indexes are computed and provided only for the testing counterparts. In particular, PointPCA quality prediction models obtained using either Random Forest (i.e., as part of our architecture with results reported in Sect.Â 5.2) or other regression models (i.e., as part of our comparative analysis in Sect.Â 6.4), are validated both within and across datasets using the leavepout method. Specifically, each dataset is split into two partitions that contain 80% and 20% of the contents for training and testing, respectively, with all the distorted versions of a specific content placed in one partition. For D1, D2, and D3, we use 6/2, 7/2, and 16/4 contents for training/testing, respectively. Then, a quality prediction model is trained on the training data and tested on the corresponding testing data of the same dataset, for withindataset validation. Moreover, the same quality prediction model is tested on each of the other two (entire) datasets for crossdataset validation. This process is repeated for all possible 80%20% splits of each dataset, leading to 28, 36, and 4845 testing partitions and an equal number of corresponding quality prediction models for D1, D2, and D3, respectively. The average and the standard deviation of the performance indexes across all testing partitions are reported for the withindataset validation, while only the average is reported for the crossdataset validation.
Finally, we compare PointPCA with stateoftheart metrics. To enable a fair comparison between PointPCA quality models and nonlearningbased metrics from the literature, performance indexes for the latter are computed over the same testing partitions. That is, on the same testing data obtained after applying the leavepout method with 80%20% splits on each dataset, separately. Then, the average and the standard deviation of every performance index are computed across all testing partitions of each dataset (i.e., 28, 36, and 4845 testing partitions for D1, D2, and D3, respectively).
5 Results
5.1 Performance evaluation of predictors
In Fig.Â 3, the PLCC and SROCC of every predictor are illustrated in the form of bars grouped per descriptor, against subjectively annotated datasets. It can be noticed that the prediction accuracy of the proposed predictors is reaching a different performance plateau per dataset; in particular, we observe high performance for D1 and D2, while substantially lower for D3. This can be explained by the different distortion characteristics of each dataset. Specifically, geometriconly and texturalonly predictors cannot accurately capture combinations of different geometric and textural degradation levels (e.g., D3), whereas better trends are expected when the level of degradation in both geometry and texture is amplified simultaneously (e.g., D1 and D2).
Moreover, the standard deviation is found to perform better than the mean across all datasets, showing a certain level of consistency. Specifically, for \(d^{g}_{3, 7, 8, 9, 11, 14, 15}\) and \(d^{t}_{4, 5, 7, 8}\) the standard deviation performs steadily better than the mean, while the mean is superior only for \(d^{g}_{12}\). For the remaining descriptors, different behaviors are observed across datasets, although the differences are limited. For instance, for \(d^{g}_{1}\), the standard deviation exhibits higher accuracy in D1 compared to the mean, with the opposite being true for D2, while equivalent performance is observed in D3.
Finally, it is remarked that predictors using the textural descriptors \(d^{t}_{4, 7, 8}\) are ranked among the best places consistently across all datasets. In general, they are found to be superior to every geometric predictor in D1 and D3, while in D2 they show high predictive power, despite the fact that geometric predictors perform overall better in this dataset. The high effectiveness of textural predictors can be justified by considering that they incorporate a spatial dimension through the usage of geometric neighborhoods for the computation of descriptors and statistical features. Therefore, they not only explicitly evaluate textural distortions, but they additionally capture topological deformations in an implicit manner.
The above observations are in alignment with the results presented in Fig.Â 4, where the importance ranking scores of the proposed predictors are depicted. Specifically, the average ranking order of every predictor is computed across all datasets based on the average PLCC and SROCC. The average ranking order is then scaled to the range [1â€“100], with 1 indicating the minimum and 100 the maximum importance score, which corresponds to the lowest and highest average ranking order, respectively. Importance ranking scores are grouped and stacked per descriptor (blue corresponds to \({\mu }(d_{u}^{\omega })\) and red to \({\sigma }(d_{u}^{\omega })\) statistical feature), before being sorted in descending order, based on their aggregated sum. Thus, the final ranking scale would range between [3â€“199]. The results show that the predictor based on \(\sigma (d^{t}_{4})\) achieves the highest score, with predictors based on \(\sigma (d^{t}_{7})\) and \(\sigma (d^{t}_{8})\) closely following. These results confirm the superiority of textural predictors based on \(d^{t}_{4, 7, 8}\) as ï»¿ï»¿already noted in Fig.Â 3.
5.2 Performance evaluation of PointPCA
TableÂ 2 shows the performance of PointPCA over the three selected datasets, for both within and crossdataset validation as described in Sect.Â 4.4. Substantial improvements are remarked when combining predictors with respect to using them singularly, as depicted in Fig.Â 3. In particular, significant performance boosts are observed for D3, which is the most populated dataset with the most diverse distortion types. Notable gains are also shown for D2, while smaller differences are noticed for D1.
As expected, withindataset results generally achieve better performance with respect to crossdataset results. Considering crossdataset validation results, training on D1 leads to poor generalization capabilities on D2 and D3, compared to training on D3 and D2, respectively. Training on D2 leads to better generalization on D1 with respect to training on D3, while the performance on D3 remains low. These results can be explained by the intrinsic characteristics of the datasets; D1 contains only compression distortions with both human and object models, D2 additionally employs geometric and color noise, while D3 is the most diverse in terms of distortion types containing only objects (see Sect.Â 4.1).
5.3 Comparison with the state of the art
In TableÂ 3, we show performance results of PointPCA and existing point cloud quality metrics across the selected datasets, for comparison purposes. Specifically, we report the performance indexes as obtained from the withindataset validation of PointPCA and the evaluation of the alternative metrics on the same testing partitions, as described in Sect. 4.4. Our results suggest that the PointPCA metric achieves the best performance in all datasets with high scores. Considering D1, the luminancebased PointSSIM variant achieves the secondbest performance in terms of PLCC and RMSE followed by PCQM, which attains the secondbest performance in terms of SROCC. The PCQM is consistently ranked as the secondbest option in D2 and D3, followed by the MPED, and the normal and curvaturebased variants of the PointSSIM. It is evident that in D2 and D3, our proposed metric achieves substantial gains in terms of PLCC, SROCC, and RMSE with respect to alternative metrics.
6 Exploratory studies
In this section, we evaluate the impact of several parameters on the performance of the proposed metric to further understand their effect. In particular, we first analyze how the support region sizes influence the performance of individual predictors and total quality scores. Secondly, we explore the usage of different color spaces for the definition of textural descriptors. Thirdly, we study the effect of using predictors coming from only one out of the two attribute domains, namely, geometry or texture, to validate our selection of both. Lastly, we investigate the usage of different regression models to fuse individual predictors to total quality scores.
6.1 Support regions
In this first study, we aim to understand the impact of varying the size of the support regions over which we compute descriptors and statistical features on the performance of our metric. Please note that for the former, we use rsearch to define a support region, whereas for the latter we employ knn, as explained in Sect. 3.3 andÂ 3.4, respectively.
It is worth noting that there is an interdependency between support regions for descriptors and for statistical features. For example, decreasing the descriptorsâ€™ support region leads to descriptor values being more susceptible to noise; thus, neighboring descriptor values will exhibit greater differences, better capturing highfrequency components. On the contrary, increasing the descriptorsâ€™ support region causes a loss of fine details and is equivalent to smoothening the surface properties or applying a lowpass filter; in this case, neighboring descriptor values will be similar. At the same time, lowering the statistical featuresâ€™ support region implies that the descriptor values under consideration will be similar given that they are adjacent and reflect surface properties from very close vicinities. Conversely, increasing a statistical featuresâ€™ support region decreases the error due to the larger sample size; yet, it increases the dispersion between descriptor values due to the recruitment of remote, spatially irrelevant samples. Thus, there is a need to evaluate the effect of their configuration on the performance of the proposed metric. For this purpose, we initially fix the descriptorsâ€™ and alter the statistical featuresâ€™ support region size; then, we fix the statistical featuresâ€™ and alter the descriptorsâ€™ support region size.
6.1.1 Support regions for statistical features
In this case, we compute the statistical features using the knn algorithm with \(k = \lbrace 9, 25, 49, 81 \rbrace\), and the descriptors using the rsearch with \(r = 0.008 \times B_{R}\). Our selection of k values is based on the fact that the point clouds of the datasets under consideration are voxelized, dense, and represent large models; thus, we may assume that small point neighborhoods represent local regions, which in turn can be approximated by planar surfaces. The selected k values represent the number of vertices in fully occupied planes of length size equal to 2, 4, 6, and 8 times the distance between two voxels. FigureÂ 5 illustrates the SROCC values achieved by every predictor \(s_{j}\), \(1 \le j \le 46\) and the average SROCC values across all testing partitions attained by the total quality score q (i.e., the PointPCA metric), with different colors indicating the performance over different k values. Recall that predictors \(s_{123}\) make use of the mean, while predictors \(s_{2446}\) employ the standard deviation. Moreover, for every statistic, the first 15 predictors refer to the geometry and the last 8 to the texture domain. Our results show that the selected neighborhood size for the computation of statistical features does not have a large impact on the performance, with the trends indicating that predictors perform better under smaller rather than larger neighborhoods. Moreover, it can be observed that the total quality scores q always outperform each individual predictor \(s_{j}\). Finally, different neighborhood sizes lead to minor differences in the performance of total quality scores, slightly favoring smaller neighborhoods.
6.1.2 Support regions for descriptors
In this case, we compute the descriptors using the rsearch algorithm with \(r = \lbrace 0.006 \times B_{R}, 0.008 \times B_{R}, 0.01 \times B_{R} \rbrace\), and the statistical features using the knn with \(k = 9\). Our selection of r values is inspired by the current literature (e.g., [11, 13]), where similar volume sizes have been used to compute point cloud features for objective quality assessment. FigureÂ 6 shows the SROCC values achieved by every predictor \(s_{j}\) with \(1 \le j \le 46\) and the average SROCC values of the total quality score q. Our results indicate no clear pattern in the performance of meanbased predictors across all datasets, with geometric predictors (i.e., \(s_{115}\)) showing no consistent trends, and textural predictors (i.e., \(s_{1623}\)) performing better in smaller neighborhoods. For the majority of predictors that employ standard deviation (i.e., \(s_{2446}\)), though, larger neighborhood sizes are preferable. Please note that no differences can be observed across different r values for textural predictors \(s_{1618}\) and \(s_{3942}\) since they do not employ a support region for the computation of corresponding descriptors (i.e., these are the nonPCAbased descriptors, \(d^{t}_{13}\), equal to the RGB color values). After fusing predictors into a total quality score q, we observe clear benefits with respect to individual predictors \(s_j\). Finally, considering total quality scores, marginal differences with slight gains for mid over smaller or larger neighborhood sizes are remarked.
6.1.3 Final selection
Our results confirm that the total quality scores lead to high prediction accuracy under all tested configurations for the descriptorsâ€™ and statistical featuresâ€™ support region sizes. In the proposed settings of our metric, we set \(r = 0.008 \times B_R\) and \(k = 9\) for descriptors and statistical features, respectively.
6.2 Color spaces
In this study, we examine the performance achieved with the proposed metric by computing the same textural descriptors in alternative color spaces that are popular in the literature. In particular, alongside the RGB color space, we use the YCbCr which has been widely used for objective quality assessment; in our case, the color space conversion is performed following the ITUR Recommendation BT.709 [59]. Moreover, we employ the GCM [60] which is reported to correlate well with human perception, and CIELAB [61] which is recommended by the International Commission on Illumination in 1976 and designed for perceptual uniformity. Note that in this analysis, we use all predictors from both geometric and textural domains. Specifically, instead of using textural predictors only, we additionally include geometric predictors to compute total quality scores, which are then compared to subjective ground truth ratings. This way, we do not explicitly assess the performance of the same textural predictors under different color spaces; but, we explore the effect of different color spaces in the performance of the proposed metric and aim to identify the one that leads to the most beneficial interactions between geometric and textural predictors. Similarly to the analysis of Sect.Â 5.2, we learn optimal weights for all predictors per dataset and test the accuracy of the learned models in both within and crossdataset validation.
In TableÂ 4, we present the performance indexes obtained from our metric considering different color spaces. In general, small variations in performance can be observed. In the majority of cases, RGB has either equivalent or marginally better performance with respect to the other color spaces. In particular, RGB leads to better performance for withindataset validation for D1 (PLCC = 0.938, SROCC = 0.942) and D3 (PLCC = 0.894, SROCC = 0.890), whereas it ranks second behind YCbCr for D2 (PLCC = 0.935, SROCC = 0.911 for YCbCr, against PLCC = 0.932, SROCC = 0.907 for RGB). For crossdataset validation, YCbCr performs better when training on D1 and D2 and testing on D3 (training on D1, testing on D3: PLCC = 0.571, SROCC = 0.574; training on D2, testing on D3: PLCC = 0.690, SROCC = 0.679). GCM performs better when training on D2 and D3, and testing on D1 (training on D2, testing on D1: PLCC = 0.828, SROCC = 0.837; training on D3, testing on D1: PLCC = 0.802, SROCC = 0.835). On the other hand, RGB performs better when training on D1 and D3 and testing on D2 (training on D1, testing on D2: PLCC = 0.808, SROCC = 0.803; training on D3, testing on D2: PLCC = 0.862, SROCC = 0.842). However, as can be seen, the differences are rather small, showing the robustness of our metric with respect to the color space selection.
6.3 Geometric and textural predictors
In this study, we evaluate the impact of using predictors from different attribute domains (i.e., geometry or texture) on the proposed metric. To do so, we compute total quality scores considering geometryonly (i.e., \([s_{115}\), \(s_{2438}]\)) and textureonly predictors (i.e., \([s_{1623}\), \(s_{3946}]\)), and we compare their performance with respect to using the whole set (i.e., \([s_{146}]\)).
Results are shown in TableÂ 5 for all datasets. It can be observed that for withindataset validation, using both attribute domains leads to steadily better performance with respect to only using one. For D1 and D3, using textural information only leads to better performance with respect to using geometry only (D1: PLCC = 0.930, SROCC = 0.941 for texture only, versus PLCC = 0.903, SROCC = 0.907 for geometry only; D3: PLCC = 0.823 SROCC = 0.812 for texture only, versus PLCC = 0.662, SROCC = 0.625 for geometry only), whereas for D2, the opposite is true (D2: PLCC = 0.911, SROCC = 0.868 for geometry only, versus PLCC = 0.882, SROCC = 0.864 for texture only). This can be explained considering the nature of the datasets, namely,Â while D1 and D3 contain compression distortions where geometry and texture are simultaneously affected, D2 contains several point clouds with only geometry or only texture distortions.
For crossdataset validation, we can observe that when testing on D1, using textureonly descriptors leads to better performance with respect to using the whole set, whereas when testing on D2, using the whole set leads to consistently better results. When training on D1 and testing on D3, textural information leads to the best performance; however, when training on D2, using the whole set is preferable. In general, we see that using predictors from both attribute domains leads to higher performance, followed by textureonly predictors, with geometryonly predictors denoting the least optimal solution.
6.4 Regression models
In this study, we evaluate the performance achieved by the proposed metric when using different regression models to fuse individual predictors to a total quality score. Specifically, the Linear regression (R1), KNearest Neighbors (R2), Support Vector Regression (R3), XGBoost (R4), and MultiLayer Perceptron (R5) are examined as alternatives to the proposed Random Forest (R6), as implemented in the scikitlearn python package [58]. For R1â€“R4, we use the default parameters. For R5, we use 3 hidden fully connected layers with 128 neurons each; the input nodes are set equal to the number of predictors (i.e., 46) and the output nodes to one; ReLU activation function and MSE as loss function are employed. For R6, we use MSE as a criterion for a split. Moreover, our experimentation on the number of trees indicates stable performance from 50 to 350 trees; hence, we keep the default configuration with 100 trees, as mentioned in Sect.Â 4.3.
Performance results for every quality prediction model are presented in TableÂ 6. As can be seen, the performance remains high and stable for the majority of regression models when training and testing on the same dataset; drops are observed using R1 with D1 and D2, and also using R3 with D3. R3 is the bestperforming model in D1 (PLCC = 0.941, SROCC = 0.942), whereas for D2 and D3, R6 is the best for withindataset validation (D2: PLCC = 0.932, SROCC = 0.907; D3: PLCC = 0.894, SROCC = 0.890).
Regarding the performance of the tested regression models, R1 seems to be the weakest option, with limited generalization capabilities, independently of the dataset used for training. For the remaining regression models, the trends are similar, although different selections lead to the best generalization results, per training dataset. For instance, when training on D1, R3 and R6 show higher generalization capabilities on D2 (PLCC = 0.813, SROCC = 0.793 for R3, PLCC = 0.808, SROCC = 0.803 for R6), while R2 is the best for D3 (PLCC = 0.621, SROCC = 0.623). When training on D2, R3 is the best option on D1 (PLCC = 0.910, SROCC = 0.922) and D3 (PLCC = 0.685, SROCC = 0.675), while R2 and R6 achieve secondbest performances, respectively. Finally, when training on D3, R3 obtains the best performance on D1 by large margins (PLCC = 0.840, SROCC = 0.854), whereas on D2, R6 outperforms the rest (PLCC = 0.862, SROCC = 0.842) and is closely followed by R3 (PLCC = 0.862, SROCC = 0.830).
To see whether the difference in results between different regressors had statistical significance, we ran a 2tailed ttest on the performance indexes obtained when training and testing on the same dataset, for all regressor pairs, across all the splits. For D1, R1 had statistically significant differences with respect to all other regressors, according to all performance indexes (\(p < 0.001\) for all comparisons). In terms of PLCC, statistical differences were found between R3 and R5 (\(p = 0.0248\)), and between R6 and R5 (\(p = 0.0465\)); analogous results were obtained in terms of RMSE (R3â€“R5: \(p = 0.0055\); R3â€“R5: \(p = 0.0183\)), whereas for SROCC, statistical differences were only observed for R3 with respect to R5 (\(p = 0.0169\)). For D2, R1 was the only regressor exhibiting statistically significant differences with respect to all other regressors, according to all performance indexes (for PLCC and RMSE, \(p < 0.001\) for all comparisons; for SROCC, R1â€“R3: \(p = 0.0010\); R1â€“R4: \(p = 0.0028\), \(p < 0.001\) for all other comparisons). Finally, for D3, we found statistically significant differences between all the regressors under test, according to all performance metrics (\(p < 0.001\) for all comparisons). The latter is to be expected due to the large number of training/testing splits, which results in high degrees of freedom for the ttest. In general, the statistical test confirms our previous observations: with the exception of linear regression, all regressors under testing have similarly high performance, which demonstrates the robustness of the predictors with respect to the choice of regression models.
In conclusion, R3 and R6 lead to quality prediction models with the highest performance and generalization capabilities. In particular, R3 shows slightly better performance when testing on D1, whereas R6 achieves much better results on D3; on D2, R6 is the best with R3 attaining comparable performance. Overall, statistical analysis shows that differences between different regressors are not significant for D1 and D2, except R1, which was always found to be significantly different than the other regressors. It is worth noting that for both R3 and R6, performance indexes from withindataset validation show improvements over stateoftheart metrics. Finally, all regression models excluding R1 perform better than alternative metrics in D2 and D3, while they are following closely in D1, if not preceding.
7 Conclusion
In this paper, we propose a point cloud objective quality metric that relies on PCAbased shape and appearance predictors to evaluate distortions in the geometry and color domain, respectively. Statistical functions are applied to the descriptor values in order to capture local relationships between point samples, which are compared between a reference and a point cloud under evaluation, producing predictions of visual quality for the latter. The proposed predictors are assessed individually, showing good overall performance, with some textural variants leading to higher accuracy consistently across all tested datasets. To boost the performance by leveraging the predictive potential of all the proposed predictors and return a single quality score, the Random Forest regression model is employed as part of our architecture. Alternative learningbased models are examined and evaluated, indicating that nonlinear variants lead to similarly high performance. Moreover, the selection of parameter configuration, color space, and usage of descriptors from both geometry and texture domains are justified through a series of exploratory studies. Our results show that PointPCA outperforms existing metrics in all tested datasets. Considering that certain predictors are more efficient against particular types of contents and degradations, future work will focus on the identification and adoption of optimal subsets of predictors, per use case. Moreover, ensemble of regressors will be tested to increase the prediction power of our predictors.
Availability of data and materials
The data that support the findings of this study are available from the parties that provided the data, as cited in the document [7, 43, 55]. Restrictions may apply to the availability of these data, which were used under license for the current study, and so they might not be publicly available. Data are however available from the authors upon reasonable request and with permission of the third parties. The software developed in this work is available at the following link: https://github.com/cwidis/pointpca_suite, under PointPCA.
References
S. Schwarz, M. Preda, V. Baroncini, M. Budagavi, P. Cesar, P.A. Chou, R.A. Cohen, M. KrivokuÄ‡a, S. Lasserre, Z. Li, J. Llach, K. Mammou, R. Mekuria, O. Nakagami, E. Siahaan, A. Tabatabai, A.M. Tourapis, V. Zakharchenko, Emerging MPEG standards for point cloud compression. IEEE J. Emerg. Sel. Top. Circuits Syst. 9(1), 133â€“148 (2019). https://doi.org/10.1109/JETCAS.2018.2885981
T. Ebrahimi, S. Foessel, F. Pereira, P. Schelkens, JPEG Pleno: toward an efficient representation of visual reality. IEEE Multimedia 23(4), 14â€“20 (2016). https://doi.org/10.1109/MMUL.2016.64
ISO/IEC 230905: Information technologyâ€“Coded representation of immersive mediaâ€“Part 5: Visual volumetric videobased coding (V3C) and videobased point cloud compression (VPCC). International Organization for Standardization (2021)
ISO/IEC 230909: Information technologyâ€“Coded representation of immersive mediaâ€“Part 9: Geometrybased point cloud compression. International Organization for Standardization (2023)
P. Astola, L.A. Silva Cruz, E.A. Da Silva, T. Ebrahimi, P.G. Freitas, A. Gilles, K.J. Oh, C. Pagliari, F. Pereira, C. Perra et al., Jpeg pleno: Standardizing a coding framework and tools for plenoptic imaging modalities (ICT Discoveries, ITU Journal, 2020)
ISO/IEC AWI 217946: Information technologyâ€“Plenoptic image coding system (JPEG Pleno)â€“Part 6: Learningbased Point Cloud Coding. International Organization for Standardization (2024)
E. Alexiou, I. Viola, T.M. Borges, T.A. Fonseca, R.L. de Queiroz, T. Ebrahimi, A comprehensive study of the ratedistortion performance in MPEG point cloud compression. APSIPA Trans. Signal Inf. Process. 8, 27 (2019)
E. Alexiou, Y. NehmÃ©, E. Zerman, I. Viola, G. LavouÃ©, A. Ak, A. Smolic, P. LeÂ Callet, P. Cesar, Subjective and objective quality assessment for volumetric video. In: Valenzise, G., Alain, M., Zerman, E., Ozcinar, C. (eds.) Immersive Video Technologies, (Academic Press, Cambridge, Massachusetts, 2023, pp. 501â€“552. https://doi.org/10.1016/B9780323917551.000249
D. Tian, H. Ochimizu, C. Feng, R. Cohen, A. Vetro, Geometric distortion metrics for point cloud compression. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3460â€“3464 (2017). https://doi.org/10.1109/ICIP.2017.8296925
E. Alexiou, T. Ebrahimi, Point cloud quality assessment metric based on angular similarity. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1â€“6 (2018). https://doi.org/10.1109/ICME.2018.8486512
E. Alexiou, T. Ebrahimi, Towards a point cloud structural similarity metric. In: 2020 IEEE International Conference on Multimedia Expo Workshops (ICMEW), pp. 1â€“6 (2020). https://doi.org/10.1109/ICMEW46912.2020.9106005
G. Meynet, J. Digne, G. LavouÃ©, PCMSDM: a quality metric for 3D point clouds. In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), pp. 1â€“3 (2019). https://doi.org/10.1109/QoMEX.2019.8743313
G. Meynet, Y. NehmÃ©, J. Digne, G. LavouÃ©, PCQM: a fullreference quality metric for colored 3D point clouds. In: 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1â€“6 (2020). https://doi.org/10.1109/QoMEX48832.2020.9123147
L. Hua, M. Yu, G. Jiang, Z. He, Y. Lin, VQACPC: a novel visual quality assessment metric of color point clouds. In: Dai, Q., Shimura, T., Zheng, Z. (eds.) Optoelectronic Imaging and Multimedia Technology VII, International Society for Optics and Photonics, vol. 11550, (SPIE, Bellingham, WA, 2020), pp. 244â€“252.
L. Hua, M. Yu, Z. He, R. Tu, G. Jiang, CPCGSCT: Visual quality assessment for coloured point cloud based on geometric segmentation and colour transformation. IET Image Processing (2021)
Z. Zhang, W. Sun, X. Min, T. Wang, W. Lu, G. Zhai, Noreference quality assessment for 3D colored point cloud and mesh models. IEEE Trans. Circuits Syst. Video Technol. 32(11), 7618â€“7631 (2022). https://doi.org/10.1109/TCSVT.2022.3186894
Z. Li, A. Aaron, I. Katsavounidis, A. Moorthy, M. Manohara, Toward a practical perceptual video quality metric. Netflix Tech Blog 6(2) (2016)
N. Chehata, L. Guo, C. Mallet, Airborne lidar feature selection for urban classification using random forests. In: Laserscanning (2009)
M. Weinmann, B. Jutzi, S. Hinz, C. Mallet, Semantic point cloud interpretation based on optimal neighborhoods, relevant features and efficient classifiers. ISPRS J. Photogramm. Remote. Sens. 105, 286â€“304 (2015). https://doi.org/10.1016/j.isprsjprs.2015.01.016
T. Hackel, J.D. Wegner, K. Schindler, Fast semantic segmentation of 3D point clouds with strongly varying density. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 177â€“184 (2016)
T. Hackel, J.D. Wegner, K. Schindler, Contour detection in unstructured 3D point clouds. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1610â€“1618 (2016). https://doi.org/10.1109/CVPR.2016.178
A. Javaheri, C. Brites, F. Pereira, J. Ascenso, A generalized Hausdorff distance based quality metric for point cloud geometry. In: 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1â€“6 (2020). https://doi.org/10.1109/QoMEX48832.2020.9123087
D. Tian, H. Ochimizu, C. Feng, R. Cohen, A. Vetro, Evaluation metrics for point cloud compression. ISO/IEC JTC1/SC29/WG11 Doc. M39966, Geneva, Switzerland (2017)
A. Javaheri, C. Brites, F. Pereira, J. Ascenso, Improving psnrbased quality metrics performance for point cloud geometry. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 3438â€“3442 (2020). https://doi.org/10.1109/ICIP40778.2020.9191233
A. Javaheri, C. Brites, F. Pereira, J. Ascenso, Mahalanobis based point to distribution metric for point cloud geometry quality evaluation. IEEE Signal Process. Lett. 27, 1350â€“1354 (2020). https://doi.org/10.1109/LSP.2020.3010128
D. Tian, H. Ochimizu, C. Feng, R. Cohen, A. Vetro, Updates and Integration of Evaluation Metric Software for PCC. ISO/IEC JTC1/SC29/WG11 Doc. MPEG2017/M40522, Hobart, Australia (2017)
I. Viola, S. Subramanyam, P. Cesar, A colorbased objective quality metric for point cloud contents. In: 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1â€“6 (2020). https://doi.org/10.1109/QoMEX48832.2020.9123089
A. Javaheri, C. Brites, F. Pereira, J. Ascenso, A pointtodistribution joint geometry and color metric for point cloud quality assessment. In: 2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), pp. 1â€“6 (2021). https://doi.org/10.1109/MMSP53017.2021.9733670
Q. Yang, Z. Ma, Y. Xu, Z. Li, J. Sun, Inferring point cloud quality via graph similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1â€“1 (2020) https://doi.org/10.1109/TPAMI.2020.3047083
Y. Zhang, Q. Yang, Y. Xu, MSGraphSIM: Inferring point cloud quality via multiscale graph similarity. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1230â€“1238. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3474085.3475294
R. Diniz, P.G. Freitas, M.C.Q. Farias, Towards a point cloud quality assessment model using local binary patterns. In: 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1â€“6 (2020). https://doi.org/10.1109/QoMEX48832.2020.9123076
R. Diniz, P.G. Freitas, M.C.Q. Farias, Multidistance point cloud quality assessment. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 3443â€“3447 (2020). https://doi.org/10.1109/ICIP40778.2020.9190956
R. Diniz, P.G. Freitas, M.C.Q. Farias, Local luminance patterns for point cloud quality assessment. In: 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), pp. 1â€“6 (2020). https://doi.org/10.1109/MMSP48831.2020.9287154
R. Diniz, P.G. Freitas, M. Farias, A novel point cloud quality assessment metric based on perceptual color distance patterns. Electron. Imaging 2021(9), 256â€“125611 (2021). https://doi.org/10.2352/ISSN.24701173.2021.9.IQSP256
R. Diniz, P.G. Freitas, M.C.Q. Farias, Color and geometry texture descriptors for pointcloud quality assessment. IEEE Signal Process. Lett. 28, 1150â€“1154 (2021). https://doi.org/10.1109/LSP.2021.3088059
Y. Xu, Q. Yang, L. Yang, J.N. Hwang, EPES: point cloud quality modeling using elastic potential energy similarity. IEEE Trans. Broadcast. 68(1), 33â€“42 (2022)
Q. Yang, Y. Zhang, S. Chen, Y. Xu, J. Sun, Z. Ma, MPED: quantifying point cloud distortion based on multiscale potential energy discrepancy. IEEE Trans. Pattern Anal. Mach. Intell. 45(5), 6037â€“6054 (2023)
I. Viola, P. Cesar, A reduced reference metric for visual quality evaluation of point cloud contents. IEEE Signal Process. Lett. 27, 1660â€“1664 (2020). https://doi.org/10.1109/LSP.2020.3024065
Q. Liu, H. Yuan, R. Hamzaoui, H. Su, J. Hou, H. Yang, Reduced reference perceptual quality model with application to rate control for videobased point cloud compression. IEEE Trans. Image Process. 30, 6623â€“6636 (2021). https://doi.org/10.1109/TIP.2021.3096060
L. Hua, G. Jiang, M. Yu, Z. He, BQECVP: Blind quality evaluator for colored point cloud based on visual perception. In: 2021 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), pp. 1â€“6 (2021). https://doi.org/10.1109/BMSB53066.2021.9547070
E.M. Torlig, E. Alexiou, T.A. Fonseca, R.L. de Queiroz, T. Ebrahimi, A novel methodology for quality assessment of voxelized point clouds. In: Tescher, A.G. (ed.) Applications of Digital Image Processing XLI, International Society for Optics and Photonics, vol. 10752, (SPIE, Bellingham, WA, 2018), pp. 174â€“190.
E. Alexiou, T. Ebrahimi, Exploiting user interactivity in quality assessment of point cloud imaging. In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), pp. 1â€“6 (2019). https://doi.org/10.1109/QoMEX.2019.8743277
Q. Yang, H. Chen, Z. Ma, Y. Xu, R. Tang, J. Sun, Predicting the perceptual quality of point cloud: A 3Dto2D projectionbased exploration. IEEE Transactions on Multimedia, 1â€“1 (2020) https://doi.org/10.1109/TMM.2020.3033117
Z. He, G. Jiang, Z. Jiang, M. Yu, Towards a colored point cloud quality assessment method using colored texture and curvature projection. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 1444â€“1448 (2021). https://doi.org/10.1109/ICIP42928.2021.9506762
T. Chen, C. Long, H. Su, L. Chen, J. Chi, Z. Pan, H. Yang, Y. Liu, Layered projectionbased quality assessment of 3D point clouds. IEEE Access 9, 88108â€“88120 (2021). https://doi.org/10.1109/ACCESS.2021.3087183
Z. Wang, Q. Li, Information content weighting for perceptual image quality assessment. IEEE Trans. Image Process. 20(5), 1185â€“1198 (2011). https://doi.org/10.1109/TIP.2010.2092435
A. Javaheri, C. Brites, F. Pereira, J. Ascenso, Joint geometry and color projectionbased point cloud quality metric. IEEE Access 10, 90481â€“90497 (2022). https://doi.org/10.1109/ACCESS.2022.3198995
A. Chetouani, M. Quach, G. Valenzise, F. Dufaux, Deep learningbased quality assessment of 3D point clouds without reference. In: 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1â€“6 (2021). https://doi.org/10.1109/ICMEW53276.2021.9455967
A. Chetouani, M. Quach, G. Valenzise, F. Dufaux, Convolutional Neural Network for 3D point cloud quality assessment with reference. In: 2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), pp. 1â€“6 (2021). https://doi.org/10.1109/MMSP53017.2021.9733565
M. Quach, A. Chetouani, G. Valenzise, F. Dufaux, A deep perceptual metric for 3D point clouds. Electron. Imaging 2021(9), 257â€“12577 (2021). https://doi.org/10.2352/ISSN.24701173.2021.9.IQSP257
Q. Liu, H. Yuan, H. Su, H. Liu, Y. Wang, H. Yang, J. Hou, PQANet: Deep no reference point cloud quality assessment via multiview projection. IEEE Transactions on Circuits and Systems for Video Technology, 1â€“1 (2021) https://doi.org/10.1109/TCSVT.2021.3100282
W. Tao, G. Jiang, Z. Jiang, M. Yu, Point cloud projection and multiscale feature fusion network based blind quality assessment for colored point clouds. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 5266â€“5272. Association for Computing Machinery, New York, NY, USA (2021)
Z. Zhang, W. Sun, X. Min, Q. Zhou, J. He, Q. Wang, G. Zhai, Mmpcqa: Multimodal learning for noreference point cloud quality assessment. arXiv preprint arXiv:2209.00244 (2022)
Z. Shan, Q. Yang, R. Ye, Y. Zhang, Y. Xu, X. Xu, S. Liu, GPANet: Noreference point cloud quality assessment with multitask graph convolutional network. IEEE Transactions on Visualization and Computer Graphics (2023)
Q. Liu, H. Su, Z. Duanmu, W. Liu, Z. Wang, Perceptual quality assessment of colored 3D point clouds. IEEE Transactions on Visualization and Computer Graphics, 1â€“1 (2022) https://doi.org/10.1109/TVCG.2022.3167151
ITUT J.149: Method for specifying accuracy and crosscalibration of Video Quality Metrics (VQM). International Telecommunication Union (2004)
E. Alexiou, T. Ebrahimi, Benchmarking of the planetoplane metric (2020)
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikitlearn: machine learning in Python. J. Mach. Learn. Res. 12, 2825â€“2830 (2011)
ITUR BT.7096: Parameter values for the HDTV standards for production and international programme exchange. International Telecommunication Unionn (2015)
J.M. Geusebroek, R. Boomgaard, A.W.M. Smeulders, H. Geerts, Color invariance. IEEE Trans. Pattern Anal. Mach. Intell. 23(12), 1338â€“1350 (2001)
ISO/CIE 116644:2019: Colorimetry â€” Part 4: CIE 1976 L*a*b* colour space. International Organization for Standardization (2019)
Acknowledgements
We thank Konstantinos Ntemos and Hermina Petric Maretic for the useful discussion around the complexity of our algorithm.
Funding
This work was partially supported through the NWO WISE grant and the European Commission Horizon Europe program, under the grant agreement 101070109, TRANSMIXRhttps://transmixr.eu/. Funded by the European Union.
Author information
Authors and Affiliations
Contributions
EA provided the main idea for the work, defined the framework and theoretical basis of the metric, conducted the majority of the experiments and the experimental analysis, and drafted the manuscript. XZ aided in the running of the experiments, specifically the regression models, and with the comparison with the state of the art. IV aided in the definition of the theoretical framework and experimental analysis, as well as in the drafting of the manuscript. PC provided feedback on the idea and experimental setup and aided in the drafting of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the articleâ€™s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâ€™s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Alexiou, E., Zhou, X., Viola, I. et al. PointPCA: point cloud objective quality assessment using PCAbased descriptors. J Image Video Proc. 2024, 20 (2024). https://doi.org/10.1186/s13640024006263
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13640024006263