A perceptual quality metric for dynamic triangle meshes
 Zeynep Cipiloglu Yildiz^{1}Email author and
 Tolga Capin^{2}
https://doi.org/10.1186/s136400160157y
© The Author(s) 2017
Received: 29 October 2015
Accepted: 19 December 2016
Published: 25 January 2017
Abstract
A measure for assessing the quality of a 3D mesh is necessary in order to determine whether an operation on the mesh, such as watermarking or compression, affects the perceived quality. The studies on this field are limited when compared to the studies for 2D. In this work, we aim a fullreference perceptual quality metric for animated meshes to predict the visibility of local distortions on the mesh surface. The proposed visual quality metric is independent of connectivity and material attributes. Thus, it is not associated to a specific application and can be used for evaluating the effect of an arbitrary mesh processing method. We use a bottomup approach incorporating both the spatial and temporal sensitivity of the human visual system. In this approach, the mesh sequences go through a pipeline which models the contrast sensitivity and channel decomposition mechanisms of the HVS. As the output of the method, a 3D probability map representing the visibility of distortions is generated. We have validated our method by a formal user experiment and obtained a promising correlation between the user responses and the proposed metric. Finally, we provide a dataset consisting of subjective user evaluation of the quality of public animation datasets.
Keywords
1 Introduction
Recent advances in 3D mesh modeling, representation, and rendering have matured to the point that they are now widely used in several massmarket applications, including networked 3D games, 3D virtual and immersive worlds, and 3D visualization applications. Using a high number of vertices and faces allows a more detailed representation of a mesh, increasing the visual quality. However, this causes a performance loss because of the increased computations. Therefore, a tradeoff often emerges between the visual quality of the graphical models and processing time, which results in a need to estimate the quality of 3D graphical content.
Several operations on 3D models rely on a good estimate of 3D mesh quality. For example, network based applications require 3D model compression and streaming, in which a tradeoff must be made between the visual quality and the transmission speed. Several applications require levelofdetail (LOD) simplification of 3D meshes for fast processing and rendering optimization. Watermarking of 3D meshes requires evaluation of quality due to artifacts produced. Indexing and retrieval of 3D models require metrics for judging the quality of 3D meshes that are indexed. Most of these operations cause certain modifications to the 3D shape. For example, compression and watermarking schemes may introduce aliasing or even more complex artifacts; LOD simplification and denoising result in a kind of smoothing of the input mesh and can also produce unwanted sharp features.
Quality assessment of 3D meshes is generally understood as the problem of evaluation of a modified mesh with respect to its original form based on detectability of changes. Quality metrics are given a reference mesh and its processed version, and compute geometric differences to reach a quality value. Furthermore, certain operations on the input 3D mesh, such as simplification, reduce the number of vertices; and this makes it necessary to handle topographical changes in the input mesh.
2 Related work
Methods for quality assessment of triangle meshes can be categorized according to their approach to the problem and the solution space. Nonperceptual methods approach the problem geometrically, without taking human perception effects into account. On the other hand, perceptual methods integrate human visual system properties into computation. Moreover, solutions can further be divided into imagebased and modelbased solutions. Modelbased approaches work in 3D object space, and use structural or attribute information of the mesh. Imagebased solutions, on the other hand, work in 2D image space, and use rendered images to estimate the quality of the given mesh. Several quality metrics have been proposed; [6], [12], and [28] present surveys on the recently proposed 3D quality metrics.
2.1 Geometrydistancebased metrics
Several methods use geometrical information to compute a quality value of a single mesh or a comparison between meshes. Therefore, methods that fall into this category do not reflect the perceived quality of the mesh.
Modelbased metrics The most straightforward object space solution is the Euclidean distance or root mean squared (RMS) distance between two meshes. This method is limited to comparing two meshes with the same number of vertices and connectivity. To overcome this constraint, more flexible geometric metrics have been proposed. One of the most commonly used geometric measure is Hausdorff distance [9]. The Hausdorff distance defines the distance between two surfaces as the maximum of all pointwise distances. This definition is onesided (D(A B)≠D(B A)). Extensions to this approach have been proposed, such as taking the average, root mean squared error, or combinations [34].
Imagebased metrics The simplest view dependent approach is the rootmeansquared error of two rendered images, by comparing them pixel by pixel. This metric is highly affected by luminance, shifts and scales, therefore is not a good approach [6]. Peak signaltonoise ratio (PSNR) is also a popular quality metric for natural images where RMS of the image is scaled with the peak signal value. Wang et al. [49] show that alternative pure mathematical quality metrics do not perform better than PSNR although results indicate that PSNR gives poor results on pictures of artificial and humanmade objects.
2.2 Perceptually based metrics
Perceptually aware quality metrics or modification methods integrate computational models or characteristics of the human visual system into the algorithm. Lin and Kuo [31] present a recent survey on perceptual visual quality metrics; however, as this survey indicates, most of the studies in this field focus on 2D image or video quality. A large number of factors affect the visual appearance of a scene, and several studies only focus on a subset of features of the given mesh.
Modelbased perceptual metrics Curvature is a good indicator of structure and roughness which highly affect visual experience. A number of studies focus on the relation between curvaturelinked characteristics and perceptual guide, and integrate curvature in quality assessment or modification algorithms. Karni and Gotsman [22] introduce a metric (GL1) by calculating roughness for mesh compression using Geometric Laplacian of every vertex. The Laplacian operator takes into account the geometry and topology. This simplification scheme uses variances in dihedral angles between triangles to reflect local roughness and weigh mean dihedral angles according to the variance. Sorkine et al. [41] modifies this metric by using slightly different parameters to obtain the metric called GL2.
Following the widelyused structural similarity concept in 2D image quality assessment, Lavouè [26] proposes a local mesh structural distortion measure called MSDM which uses curvature for structural information. MDSM2 [25] method improves this approach in several aspects: The new metric is multiscale and symmetric, the curvature calculations are slightly different to improve robustness, and there is no connectivity constraints.
Spatial frequency is linked to variance in 3D discrete curvature, and studies have used this curvature as a 3D perceptual measure [24], [29]. Roughness of a 3D mesh has also been used to measure quality of watermarked meshes [19], [11]. In [11], two objective metrics (3DWPM1 and 3DWPM2) derived from two definitions of surface roughness are proposed as the change in roughness between the reference and test meshes. Pan et al. [37] use the vertex attributes in their proposed quality metric.
Another metric developed for 3D mesh quality assessment is called FMPD which is based on local roughness estimated from Gaussian curvature [48]. Torkhani and colleagues [44] propose another metric (TPDM) based on curvature tensor difference of the meshes to be compared. Both of these metrics are independent of connectivity and designed for static meshes. Dong et al. [16] propose a novel roughnessbased perceptual quality assessment method. The novelty of the metric lies in the incorporation of structural similarity, visual masking, and saturation effect which are highly employed in quality assessment methods separately. This metric is also similar to ours in the sense that it uses a HVS pipeline but it is designed for static meshes with connectivity constraints. Besides, they capture structural similarity which is not handled in our method.
Alternatively, Nader et al. [36] propose a just noticable distortion (JND) profile for flatshaded 3D surfaces in order to quantify the threshold for the change in vertex position to be detected by a human observer, by defining perceptual measures for local contrast and spatial frequency in 3D domain. Guo et al. [20] evaluate the local visibility of geometric artifacts on static meshes by means of a series of user experiments. In these experiments, users paint the local distortions on the meshes and the prediction accuracies of several geometric attributes (curvatures, saliency, dihedral angle, etc.) and quality metrics such as Hausdorff distance, MSDM2, and FMPD are calculated. According to the results, curvaturebased features outperform the others. They also provide a local distortion dataset as a benchmark.
A perceptually based metric for evaluating dynamic triangle meshes is the STED error [46]. The metric is based on the idea that perception of distortion is related to local and relative changes rather than global and absolute changes [12]. The spatial part of the error metric is obtained by computing the standard deviation of relative edge lengths within a topological neighborhood of each vertex. Similarly, the temporal error is computed by creating virtual temporal edges connecting a vertex to its position in the subsequent frame. The hypotenuse of the spatial and temporal components then gives the STED error. Another attempt for perceptual quality evaluation of dynamic meshes is by Torkhani et al. [45]. Their metric is a weighted mean square combination of three distances: speedweighted spatial distortion measure, vertex speedrelated contrast, and vertex moving direction related contrast. Experimental studies show that the metric performs quite well; however, it requires fixed connectivity meshes. They also provide a publicly available dataset and a comparative study to benchmark existing image and model based metrics.
Imagebased perceptual metrics Human visual system characteristics are also used in imagespace solutions. These metrics generally use the contrast sensitivity function (CSF), an empirically driven function that maps human sensitivity to spatial frequency. Daly’s widely used visible difference predictor [14] gives the perceptual difference between two images. Longhurst and Chalmers [32] study VDP to show favorable imagebased results with rendered 3D scenes. Lubin proposes a similar approach with Sarnoff Visual Discrimination Model (VDM) [33], which operates in spatial domain, as opposed to VDP’s approach in frequency domain. Li et al. [30] compare VDP and Sarnoff VDM with their own implementation of the algorithms. Analysis of the two algorithms shows that the VDP takes place in feature space and takes advantage of FFT algorithms, but a lack of evidence of these feature space transformations in the HVS gives VDM an advantage.
Bolin et al. [5] incorporate color properties in 3D global illumination computations. Studies show that this approach gives accurate results [50]. Minimum detectable difference is studied as a perceptual metric [39] that handles luminance and spatial processing independently. Another approach for computer generated images is visual equivalence detector [38]. Visual impressions of scene appearance are analyzed and the method outputs a visual equivalence map.
Visual masking is taken into account in 3D graphical scenes with varying texture, orientation and luminance values [18]. Several approaches with color emphasis is introduced by Albin et al. [1], which predict differences in LLAB color space. Dong et al. [15] exploit entropy masking, which accounts for the lower sensitivity of the HVS to distortions in unstructured signals, for guiding adaptive rendering of 3D scenes to accelerate rendering.
An important question that arises is whether modelbased metrics are superior over imagebased solutions. Although there are several studies on this issue, it is not possible to clearly state that one group of metrics is superior to the other. Rogowitz et al. conclude that image quality metrics are not adequate for measuring the quality of 3D meshes since lighting and animation affect the results significantly [40]. On the other hand, Cleju and Saupe claim that imagebased metrics predict perceptual quality better than metrics working on 3D geometry, and discuss ways to improve the geometric distances [10]. A recent study [27] investigates the best set of parameters for the imagebased metrics when evaluating the quality of 3D models and compares them to several modelbased methods. The implications from this study show that imagebased metrics perform well for simple use cases such as determining the best parameters of a compression algorithm or in the cases when modelbased metrics are not applicable.
The distinction of our work from the current metrics can be listed as follows: Firstly, our metric can handle dynamic meshes in addition to the static meshes. Secondly, we produce a pervertex error map instead of a global quality value permesh, which allows to guide perceptual geometry processing applications. Furthermore, our method can handle meshes with different connectivity. Lastly, the proposed metric is not application specific.
3 Background
In this section, we summarize and discuss several mechanisms of the human visual system that construct our model.
3.1 Luminance adaptation
The luminance that falls on the retina may vary in significant amount from a sunny day to moonless night. The photoreceptor response to luminance forms a nonlinear Sshaped curve, which is centered at the current adaptation luminance and exhibits a compressive behavior while moving away from the center [2].
where R(i,j)/R _{ max } is the normalized retinal response, L(i,j) is the luminance of the current pixel, and c _{1} and b are constants.
3.2 Channel decomposition
The receptive fields in the primary visual cortex are selective to certain spatial frequencies and orientations [2]. There are several alternatives to account for modeling the visual selectivity of the HVS such as Laplacian Pyramid, Discrete Cosine Transform (DCT), and Cortex Transform. Most of the studies in the literature tend to choose Cortex Transform [14] among these alternatives, since it offers a balanced solution for the tradeoff between physiological plausibility and practicality [2].
3.3 Contrast sensitivity
Temporal contrast sensitivity Intensity change across time constructs the temporal features of an image. In a user study conducted by Kelly [23], the sensitivity with respect to temporal frequency is estimated by displaying a simple shape with alternating luminance as a stimuli. The results of the experiment are used to plot the temporal CSF shown in Fig. 2 b.
Another issue to consider is the eye’s tracking ability, known as smooth pursuit, which compensates for the loss of sensitivity due to motion by reducing the retinal speed of the object of interest to a certain degree. Daly [13] draws a heuristic for smooth pursuit according to the experimental measurements.
4 Approach
Our work shares some features of the VDP method [14] and recent related work. These methods have shown the ability to estimate the perceptual quality of static images [14] and 2D video sequences for animated walkthroughs [35].
4.1 Preprocessing
Calculation of the illumination, construction of the spatiotemporal volume, and estimation of vertex velocities are performed in the preprocessing step.
where I _{ a } is the intensity of the ambient light, I _{ d } is the intensity of the diffuse light, N is the vertex normal, L is the direction to the light source, and k _{ a } and k _{ d } are ambient and diffuse reflection coefficients, respectively.
In this study, we aim a generalpurpose quality evaluation that is independent of shading and material properties. Therefore, information about the material properties, light sources, etc. are not available. A directional light source from leftabove of the scene is assumed in accordance with the human visual system’s assumptions ([21], section 24.4.2).
where n is the number of light sources, k _{ s } is the specular reflection coefficient, and H is the halfway vector.
Construction of the spatiotemporal volume We convert the objectspace mesh sequences into an intermediate volumetric representation, to be able to apply imagespace operations. We construct a 3D volume for each frame, where we store the luminance values of the vertices at each voxel. The values of the empty voxels are determined by linear interpolation.
Using such a spatiotemporal volume representation provides an important flexibility as we get rid of the connectivity problems and it allows us to compare meshes with different number of vertices. Moreover, the input model is not restricted to be a triangle mesh; volumetric representation enables the algorithm to be applied on other representations such as pointbased graphics. Another advantage is that the complexity of the algorithm is not much affected by the number of vertices.
At the end of this step, we obtain a 3D spatial volume for each frame, which in turn constructs a 4D (3D+time) representation for both reference and test mesh sequences. We call this structure spatiotemporal volume. Also, an index structure is maintained to keep the voxel indices of each vertex. The rest of the method operates on this 4D spatiotemporal volume.
In the following steps, we do not use the full spatiotemporal volume for performance related concerns. We define a time window as suggested by Myszkowski et al. [35, p. 362]. According to this heuristic, we only consider a limited number of consecutive frames to compute the visible difference prediction map of a specific frame. In other words, to calculate the probability map for the i ^{ t h } frame, we process the frames between i−⌊t w/2⌋ and i+⌊t w/2⌋, where tw is the length of the time window. We empirically set it as t w=3.
Velocity estimation Since our method also has a time dimension, we need the vertex velocities in each frame. Using an index structure, we compute the voxel displacement of each vertex (D _{ i }) between consecutive frames (Δ D _{ i }=∥p _{ it }−p _{ i(t−1)}∥ where p _{ it } denotes the voxel position of vertex i at frame t). The remaining empty voxels inside the bounding box are assumed to be static.
where v _{ R } is the compensated velocity, v _{ I } is the physical velocity, v _{ min } is the drift velocity of the eye (0.15 d e g/s e c), v _{ max } is the maximum velocity that the eye can track efficiently (80 d e g/s e c). According to Daly [13], the eye tracks all objects in the visual field with an efficiency of 82%. We adopt the same efficiency value for our spatiotemporal volume. However, if the visual attention map is available, it is also possible to substitute this map as the tracking efficiency [51].
4.2 Perceptual quality evaluation
In this section, the main steps of the perceptual quality evaluation system are explained in detail.
where x,y,z, and t are voxel indices, R(x,y,z,t)/R _{ max } is the normalized response, L(x,y,z,t) is the value of the voxel, b=0.63 and c _{1}=12.6 are constants. In this step, voxel values are compressed by this amplitude nonlinearity.
where C ^{ k } is the spatiotemporal volume of contrast values and I ^{ k } is the spatiotemporal volume of luminance values in frequency channel k.
Contrast sensitivity Filtering the input image with the contrast sensitivity function (CSF) constructs the core part of the VDPbased models (Section 3.3). Since our model is for dynamic meshes, we use the spatiovelocity CSF (Fig. 3 b) which describes the variations in visual sensitivity as a function of both spatial frequency and velocity, instead of the static CSF used in the original VDP.
Our method handles temporal distortions in two ways. First, smooth pursuit compensation handles temporal masking effect which refers to the loss of sensitivity due to high speed. Secondly, we use spatiovelocity CSF in which contrast sensitivity is measured according to the velocity, instead of static CSF.
where ρ is the spatial frequency in c y c l e s/d e g r e e, v is the velocity in d e g r e e s/s e c o n d, and c _{0}=1.14,c _{1}=0.67,c _{2}=1.7 are empirically set coefficients. A more principled way would be to obtain these parameters through a parameter learning method.
The resulting \(\hat {P}\) is a 4D volume that contains the detection probabilities per voxel. It is then straightforward to convert this 4D volume to per vertex probability map for each frame, using the index structure (Section 4.1). Lastly, to combine the probability maps of each frame into a single map, we take the average of all frames per vertex. This gives us a per vertex visible difference prediction map for the animated mesh.
5 Validation of the metric
In this section, we provide a twofold validation of our metric: through a psychophysical user study designed for dynamic meshes and comparison to several standard objective metrics. We also give measurements on the computational time of the proposed method.
5.1 User evaluation
We conducted subjective user experiments to evaluate the fidelity of our quality metric. In this section, we explain the experimental design and analyze the results. The subjective evaluation results in this study are publicly available as supplementary material.
5.1.1 Data
Information about the meshes
Camel  Elephant  Hand  Horse  

# vertices  21,885  42,321  7997  8431 
# frames  42  48  45  48 
5.1.2 Experimental design
In this experiment, our aim is to measure the correlation between the subjective evaluation and the proposed metric results. The subjects in the experiment evaluated the perceived quality of the animated meshes by marking the perceived distortions on the mesh. For the experiment setup, we used simultaneous double stimulus for continuous evaluation (SDSCE) methodology among the standards listed in [6]. According to this design, presenting both stimuli simultaneously eliminates the need for memorization.
In the evaluation screen (Fig. 8 b), a marking tool with tip intensity was supplied to the user. The user’s task was to mark the visible distortions. The task of annotation would be very difficult if it was performed on dynamic state. Therefore, the users marked the visible distortions on a single static frame, selected manually (frames in Fig. 7). One may argue that marking the distortions on static state may introduce bias. We try to minimize this effect in two ways. First of all, the annotation was done on a sample frame of the reference animation instead of the modified animation. In this way, the distortions were never seen statically by the observers. Secondly, the user was still able to view both of the animations and manipulate the viewpoint simultaneously in the viewing screen, during the evaluation. This eliminates the necessity for memorization.
At the beginning of the experiments, subjects were given the following instruction: “A distortion on the mesh is defined as the spatial artifacts, compared to the reference mesh. Consider the relative scale of distortions and mark the visible distortions accordingly, using the intensity tool.”

Viewing Parameters: The observers viewed the stimuli on a 19inch display from 0.5 m away the display.

Lighting: We use a stationary leftabove, center directed lighting [40].

Materials and Shading: To prevent highlighting effects and accentuate distortions unpredictably, we used Gouraud shading in the experiments. Moreover, we used meshes without texture.

Animation and Interaction: Freeviewpoint was enabled to the viewers for interaction. Furthermore, since inspection of the mesh during paused state was contradictory to the purpose of the experiment, two different displays were used and the evaluation of the mesh was conducted on one of the screens while the animation is ongoing on the other screen.

Stimuli order: Each modified and reference mesh combination was presented in a random order allowing for more accurate comparisons. In other words, there was not a specific ordering of the meshes and subjects were also able to pause their evaluation and continue whenever they want.
Subjects Twelve subjects with various levels of computer experience participated in the experiment. All of the subjects evaluated every animated mesh in the experiment.
5.1.3 Results and discussion
Next, we compare the mean subjective responses with our proposed method’s predictions. For this purpose, we use two common methods for correlation: Pearson linear correlation coefficient (r) for prediction accuracy, and Spearman rank order correlation coefficient (ρ) for monotonicity between the mean subjective response and estimated response [31].
Notice that correlation coefficients vary in the range of [1,1] and a negative coefficient indicates a negative correlation while positive coefficient means a positive correlation. While interpreting the correlation analysis, we used the categorization in [43], where correlation coefficients (in absolute value) which are ≤0.35 are considered as low or weak correlations, 0.36≤r,ρ≤0.67 modest or moderate correlations, and 0.68≤r,ρ≤1 strong or high correlations.
While measuring the correlation, we considered the limitations of the paint tool, in which subjects may unintentionally mark some region nearby the region they actually target. To reduce the effect of this problem, we followed the approach used in image/video quality assessment validations where image or video frame is divided into a regular grid and the comparison is done tile by tile [2]. Based on this idea, we grouped the nearby vertices and find the correlation based on the average intensity of these regions. We asked a designer to segment the mesh manually using a paintbased interface, although any available mesh segmentation technique could also be used for this purpose [7]. The designer was instructed to create about 50 segments for each model.
Pearson (r) and Spearman (ρ) correlation coefficients for each mesh
Pearson_{r}  Spearman_{ ρ }  Strength  

Camel  0.835  0.829  High 
Elephant  0.585  0.654  Modest 
Hand  0.715  0.707  High 
Horse  0.713  0.700  High 
Overall  0.712  0.723  High 
As the table indicates, the average correlation is about 70%, which can be considered as a promising result for the field of local dynamic mesh quality assessment. Correlation coefficients for Camel, Hand, and Horse meshes are high, while Elephant mesh exhibits a moderate correlation.
One important issue that affects the results negatively is that the subjects tend to evaluate only certain views of the meshes. Eight of the subjects reported that they had generally marked the meshes from the side views. In addition, since the meshes are known objects, visual attention principles may have come into play and our metric does not reflect this mechanism.
5.2 Comparison to STAR techniques
It is required to compare the performance of our method with the current stateoftheart techniques. We first compared our metric to the static metrics using the public LIRIS/EPFL general purpose dataset [26].
In this dataset, there are 88 models, between 40 K and 50 K vertices, which were generated from four reference objects: Armadillo, Venus, Dinosaur, and RockerArm. Two types of distortion, noise addition and smoothing, were applied with different strengths at four locations: on the whole model, on smooth areas, on rough areas, and on intermediate areas. The dataset also includes mean opinion scores (MOS) from 12 observers and 7 static metric results for these models.
Since our method is also applicable for static meshes, we ran our algorithm on these models by setting velocities to 0. Although our aim is to produce a 3D map as output, to be able to compare our metric to the other techniques, we used the average of the vertex probabilities in the output map as the overall score of the mesh quality. These scores are in the range of 0–1 and a high score indicates that the distortions on this mesh are highly visible.
Perceptual error metrics designed for dynamic meshes to date that we are aware of are [46] and [45]. However, dynamic mesh datasets of [46] and [45] provide only one frame per animation and this is not sufficient for our metric to be applied on these datasets. Our metric also differs from these metrics in two ways. First, we do not require the test and reference meshes to be the same connectivity; for example, the test mesh could be a simplified version of the reference mesh, with a different number of vertices. Moreover, they are not directly comparable to our method since we produce a 3D map of local visible distortions as output, while they give a global error per dynamic mesh. Even though they also generate a 3D map in the interim steps and accumulate it to a single value, we do not have access to those interim steps. Hence, although developing a single error value per dynamic mesh is out of our purpose, to be able to compare our metric, we unified our 3D map into a single score by averaging the error values of each vertex. Then, we performed a second user experiment, following a similar design in [46].
In this experiment, we produced three modification levels per dynamic mesh given in Table 1, resulting in 12 animations. Using the MeshLab [8] tool, we applied random vertex displacement filter by varying the maximum displacement parameter (The parameter was set as 0.1, 0.2, and 0.3 for modification levels 1, 2, and 3, respectively).
During the experiments, given the nonmodified animation as reference, the subjects were asked to assign a score of 0, 1, 2, o r 3 to the modified animation. In this evaluation scheme, 0 means that there is no perceptible difference between the reference and test animations. Evaluations of ten subjects were combined by calculating the mean opinion score (MOS) per modified mesh. Then, the correlation between the metric outputs and MOS values was calculated.
Pearson (r) and Spearman (ρ) correlation coefficients for each mesh
Pearson_{ r }  Spearman_{ ρ }  

Camel  0.926  0.937 
Elephant  0.939  0.972 
Hand  0.949  0.941 
Horse  0.988  0.948 
Overall  0.921  0.883 
5.3 Performance evaluation
5.3.1 Resolution of the spatiotemporal volume
The resolution of the spatiotemporal volume at each dimension affects the success of our method. In order to investigate this effect, we also performed several runs of our algorithm with varying voxel resolutions and calculated correlation coefficients for each run. We changed the minResolution parameter in Eq. 9, which determines the length of the spatiotemporal volume at each dimension, in proportion to the length of the bounding box of the mesh.
Effect of the minResolution parameter on the correlation strengths of each mesh
30  60  90  120  150  

Camel  Weak  Modest  High  High  High 
Elephant  Weak  Weak  Weak  Modest  Modest 
Hand  Weak  Modest  High  High  High 
Horse  Modest  High  High  High  High 
According to our experiments, we drew a new heuristic to calculate the minResolution parameter. It is not desired to have too small resolution that allows many vertices to fall into the same voxel. So, we aim to distribute the vertices to different voxels as much as possible. We start with the assumption that vertices are distributed homogeneously. We also know that a mesh is generally represented with the vertices located on the surface and inside of the mesh is empty. Hence, we can assume that vertices are located on the facets of the bounding box. More conservatively, we take the facet of the AABB with the minimum area and obtain a resolution that allows distributing all the N vertices of the mesh to this facet homogeneously. For this purpose, we first calculate the proportions of the facets of the AABB (w,h, and d in Eq. 9). Then, we can express each dimension as a function of some constant k (such that w k,h k,d k). If we select the minimum two of these dimensions as m i n _{1} and m i n _{2}, we can distribute N vertices to the facet of minimum area with \(k = \sqrt {N/({min}_{1}*{min}_{2})}\). We can then substitute this k value as the minResolution parameter.
This heuristic results in the following approximate minResolution values for C a m e l,E l e p h a n t,H a n d, and Horse meshes, respectively: 100,200,90, and 60. According to Table 5, these values provide high correlations.
In summary, the resolution of the spatiotemporal volume has a significant impact on the estimation accuracy and computational cost of our method. Our heuristic to calculate the resolution of the volume works well. Alternatively, a more intelligent algorithm that considers the distribution and density of the vertices along the mesh bounding box could produce better estimations.
5.3.2 Processing time
Processing times (seconds) for several meshes
# Vertices  minResolution  Time  

Horse  8 K  60  8 
Camel  21 K  100  33 
Elephant  42 K  200  274 
Venus  100 K  300  915 
6 Conclusions
In this paper, our aim is to provide a generalpurpose visual quality metric for dynamic triangle meshes since it is a costly process to accomplish subjective user evaluations. For this purpose, we propose a fullreference perceptual quality estimation method based on the wellknown VDP approach by Daly [14]. Our approach accounts for both spatial and temporal sensitivity of the HVS. As the output of our algorithm, we obtain a 3D probability map of visible distortions. According to our formal experimental study, our perceptuallyaware quality metric produces promising results.
The most significant distinction of our method is that it handles animated 3D meshes; since most of the studies in the literature omit the effect of temporal variations. Our method is independent of connectivity, shading, and material properties; which offers a generalpurpose quality estimation method that is not applicationspecific. It is possible to measure the quality of 3D meshes that are distorted by a modification method which changes the connectivity or number of vertices of the mesh. Moreover, the number of vertices in the mesh does not have a significant impact on the performance of the algorithm. The algorithm can also account for static meshes. The proposed method is even applicable to the scenes containing multiple dynamic or static meshes. More importantly, the representation of the input mesh is not limited to triangle meshes and it is possible to apply the method on pointbased surface representation. Lastly, we provide an open dataset including subjective user evaluation results for 3D dynamic meshes.
The main drawback of our method is the computational complexity due to 4D nature of the spatiotemporal volume. However, we overcome this problem to some extent by using a time window approach which processes a limited number of consecutive frames. Furthermore, a significant amount of speedup may be obtained by processing the spatiotemporal volume in GPU.
As a future work, we aim to perform a more comprehensive user study, investigating the effects of several parameters. Another possible research direction is to integrate visual attention and saliency mechanism to the system.
7 Appendix
7.1 Subjective user evaluation dataset
Supplementary material consisting of the subjective user evaluation results can be downloaded from the following link: http://cs.bilkent.edu.tr/~zeynep/DynamicMeshVQA.zip.

Metric output directory includes the results of our algorithm for each mesh used in the experiments.

Reference directory includes the original mesh animations.

Test directory includes the modified mesh animations.

User responses directory includes the user evaluations of twelve subjects and the mean subjective responses.
Declarations
Acknowledgements
We would like to thank all those who participated in the experiments for this study.
Authors’ contributions
ZCY and TC developed the methodology together. ZCY conducted the experimental analysis and drafted the manuscript. TC composed the Related Work section and performed the proofreading and editing of the overall manuscript. Both authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 S Albin, G Rougeron, B Peroche, A Tremeau, Quality image metrics for synthetic images based on perceptual color differences. IEEE Trans. Image Process. 11(9), 961–971 (2002).View ArticleGoogle Scholar
 TO Aydin, M Čadík, K Myszkowski, HP Seidel, ACM Transactions on Graphics (TOG), vol. 29, Video quality assessment for computer graphics applications (ACM, New York, 2010).Google Scholar
 PG Barten, Contrast sensitivity of the human eye and its effects on image quality, vol. 21 (SPIE Optical Engineering Press, Washington, 1999).View ArticleGoogle Scholar
 C Blakemore, FW Campbell, On the existence of neurones in the human visual system selectively sensitive to the orientation and size of retinal images. J. Physiol. 203(1), 237–260 (1969).View ArticleGoogle Scholar
 MR Bolin, GW Meyer, in Proceedings of the 25th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’98. A perceptually based adaptive sampling algorithm (ACMNew York, 1998), pp. 299–309.View ArticleGoogle Scholar
 A Bulbul, TK Çapin, G Lavoué, M Preda, Assessing visual quality of 3D polygonal models. IEEE Signal Process. Mag. 28(6), 80–90 (2011).View ArticleGoogle Scholar
 X Chen, A Golovinskiy, T Funkhouser, A benchmark for 3D mesh segmentation. ACM Trans. Graph (Proc. SIGGRAPH). 28(3), 73 (2009).View ArticleGoogle Scholar
 P Cignoni, M Corsini, G Ranzuglia, Meshlab: an opensource 3d mesh processing system. ERCIM News.73:, 45–46.Google Scholar
 P Cignoni, C Rocchini, R Scopigno, Metro: measuring error on simplified surfaces.Comput. Graph. Forum. 17(2), 167–174 (1998).View ArticleGoogle Scholar
 I Cleju, D Saupe, in Proceedings of the 3rd symposium on Applied perception in graphics and visualization, APGV ’06. Evaluation of suprathreshold perceptual metrics for 3d models (ACMNew York, 2006), pp. 41–44.View ArticleGoogle Scholar
 M Corsini, E Gelasca, T Ebrahimi, M Barni, Watermarked 3D mesh quality assessment. IEEE Trans. Multimed. 9(2), 247–256 (2007).View ArticleGoogle Scholar
 M Corsini, MC Larabi, G Lavoué, LVáṡa Petṙík O, K Wang, Computer Graphics Forum, Perceptual metrics for static and dynamic triangle meshes (Wiley Online Library, 2012).Google Scholar
 S Daly, Engineering observations from spatiovelocity and spatiotemporal visual models. Human Vision Electron Imaging III. 3299:, 180–191 (1998).View ArticleGoogle Scholar
 SJ Daly, in SPIE/IS&T 1992 Symposium on Electronic Imaging: Science and Technology. Visible differences predictor: an algorithm for the assessment of image fidelity, (1992), pp. 2–15. International Society for Optics and Photonics.Google Scholar
 L Dong, Y Fang, W Lin, C Deng, C Zhu, HS Seah, Exploiting entropy masking in perceptual graphic rendering. Signal Process Image Commun. 33:, 1–13 (2015).View ArticleGoogle Scholar
 L Dong, Y Fang, W Lin, HS Seah, Perceptual quality assessment for 3d triangle mesh based on curvature. IEEE Trans. Multimed. 17(12), 2174–2184 (2015).View ArticleGoogle Scholar
 R Eriksson, B Andren, KE Brunnstroem, in Photonics West’98 Electronic Imaging, Modeling the perception of digital images: a performance study. International Society for Optics and Photonics, (1998), pp. 88–97.Google Scholar
 JA Ferwerda, P Shirley, SN Pattanaik, DP Greenberg, in Proceedings of the 24th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’97. A model of visual masking for computer graphics (ACM Press/AddisonWesley Publishing Co.New York, 1997), pp. 143–152.View ArticleGoogle Scholar
 E Gelasca, T Ebrahimi, M Corsini, M Barni, in Image Processing, 2005. ICIP 2005. IEEE International Conference on Image Processing, 1. Objective evaluation of the perceptual quality of 3D watermarking (IEEE, 2005), pp. I–241.Google Scholar
 J Guo, V Vidal, A Baskurt, G Lavoué, in Proceedings of the ACM SIGGRAPH Symposium on Applied Perception. Evaluating the local visibility of geometric artifacts (ACMNew York, 2015), pp. 91–98.View ArticleGoogle Scholar
 I Howard, B Rogers, Seeing in Depth, (Oxford University Press, 2008).Google Scholar
 Z Karni, C Gotsman, in Proceedings of the 27th annual conference on Computer graphics and interactive techniques. Spectral compression of mesh geometry (ACMNew York, 2000), pp. 279–286.Google Scholar
 D Kelly, Motion and vision.ii. stabilized spatiotemporal threshold surface. JOSA. 69(10), 1340–1349 (1979).View ArticleGoogle Scholar
 SJ Kim, SK Kim, CH Kim, in Computer Graphics and Applications, 2002. Proceedings. 10th Pacific Conference on. Discrete differential error metric for surface simplification (IEEE, 2002), pp. 276–283.Google Scholar
 G Lavoué, in Computer Graphics Forum, 30. A multiscale metric for 3d mesh visual quality assessment (Wiley Online Library, 2011), pp. 1427–1437.Google Scholar
 G Lavoué, ED Gelasca, F Dupont, A Baskurt, T Ebrahimi, in Optics & Photonics. Perceptually driven 3d distance metrics with application to watermarking (International Society for Optics and Photonics, 2006). 63,120L–63,120L.Google Scholar
 G Lavoué, MC Larabi, L Vasa, On the efficiency of image metrics for evaluating the visual quality of 3d models. IEEE Trans. Vis. Comput Graph. 22(8), 1987–1999 (2015).View ArticleGoogle Scholar
 G Lavoué, R Mantiuk, in Visual Signal Quality Assessment. Quality assessment in computer graphics (Springer, 2015), pp. 243–286.Google Scholar
 C Lee, A Varshney, D Jacobs, Mesh saliency (ACM, New York, 2005).View ArticleGoogle Scholar
 B Li, GW Meyer, RV Klassen, in Photonics West’98 Electronic Imaging. Comparison of two image quality models, (1998), pp. 98–109.Google Scholar
 W Lin, CC Jay Kuo, Perceptual visual quality metrics: a survey. J. Vis. Commun. Image Represent. 22(4), 297–312 (2011).View ArticleGoogle Scholar
 P Longhurst, A Chalmers, in Proceedings of the Theory and Practice of Computer Graphics 2004 (TPCG’04). User validation of image quality assessment algorithms (IEEE Computer SocietyWashington, 2004), pp. 196–202.View ArticleGoogle Scholar
 J Lubin, A visual discrimination model for imaging system design and evaluation. Vision models for target detection and recognition. 2:, 245–357 (1995).View ArticleGoogle Scholar
 BD Luebke, JD Watson, M Cohen, A Reddy, Varshney, Level of Detail for 3D Graphics (Elsevier Science Inc., New York, 2002).Google Scholar
 K Myszkowski, P Rokita, T Tawara, Perceptionbased fast rendering and antialiasing of walkthrough sequences. IEEE Trans. Vis. Comput. Graph. 6(4), 360–379 (2000).View ArticleGoogle Scholar
 G Nader, K Wang, F HetroyWheeler, F Dupont, Just noticeable distortion profile for flatshaded 3d mesh surfaces. IEEE Trans. Vis. Comput. Graph.22(11), 2423–2436 (2015).View ArticleGoogle Scholar
 Y Pan, LI Cheng, A Basu, Quality metric for approximating subjective evaluation of 3D objects. IEEE Trans. Multimed. 7(2), 269–279 (2005).View ArticleGoogle Scholar
 G Ramanarayanan, J Ferwerda, B Walter, K Bala, in ACM SIGGRAPH 2007 papers, SIGGRAPH ’07. Visual equivalence towards a new standard for image fidelity (ACMNew York, 2007).Google Scholar
 M Ramasubramanian, SN Pattanaik, DP Greenberg, in Proceedings of the 26th annual conference on Computer graphics and interactive techniques. SIGGRAPH ’99, A perceptually based physical error metric for realistic image synthesis (ACM Press/AddisonWesley Publishing Co.New York, 1999), pp. 73–82.Google Scholar
 BE Rogowitz, HE Rushmeier, in Photonics West 2001Electronic Imaging. Are image quality metrics adequate to evaluate the quality of geometric objects?, (2001), pp. 340–348. International Society for Optics and Photonics.Google Scholar
 O Sorkine, D CohenOr, S Toledo, in Symposium on Geometry Processing. Highpass quantization for mesh encoding (Citeseer, 2003), pp. 42–51.Google Scholar
 RW Sumner, J Popović, in ACM Transactions on Graphics (TOG), 23. Deformation transfer for triangle meshes (ACMNew York, 2004), pp. 399–405.Google Scholar
 R Taylor, Interpretation of the correlation coefficient: a basic review. J. Diagn. Med. Sonography. 6(1), 35–39 (1990).View ArticleGoogle Scholar
 F Torkhani, K Wang, JM Chassery, A curvaturetensorbased perceptual quality metric for 3d triangular meshes. Mach. Graph. Vis. 23(12), 59–82 (2014).Google Scholar
 F Torkhani, K Wang, JM Chassery, Perceptual quality assessment of 3d dynamic meshes: subjective and objective studies. Signal Process Image Commun. 31:, 185–204 (2015).View ArticleGoogle Scholar
 L Vasa, V Skala, A perception correlated comparison method for dynamic meshes. IEEE Trans. Vis. Comput. Graph. 17(2), 220–230 (2011).View ArticleGoogle Scholar
 I Wald, Utah 3d animation repository. http://www.sci.utah.edu/~wald/animrep/. Accessed 8 Jan 2017.
 K Wang, F Torkhani, A Montanvert, A fast roughnessbased approach to the assessment of 3d mesh visual quality. Comput. Graph. 36(7), 808–818 (2012).View ArticleGoogle Scholar
 Z Wang, HR Sheikh, AC Bovik, Noreference perceptual quality assessment of JPEG compressed images. Proceedings of IEEE International Conference on Image Processing 2002. 1:, 477–480 (2002).View ArticleGoogle Scholar
 B Watson, A Friedman, McA Gaffey, in Proceedings of the 28th annual conference on Computer graphics and interactive techniques. SIGGRAPH ’01, Measuring and predicting visual fidelity (ACMNew York, 2001), pp. 213–220.Google Scholar
 H Yee, S Pattanaik, DP Greenberg, Spatiotemporal sensitivity and visual attention for efficient rendering of dynamic environments. ACM Trans. Graph. (TOG). 20(1), 39–65 (2001).View ArticleGoogle Scholar