Subjective assessment of HDTV with superresolution function
EURASIP Journal on Image and Video Processing volume 2014, Article number: 11 (2014)
Superresolution (SR) is a means of image enhancement, and some recent high-definition television (HDTV) sets and digital cameras are equipped with it. However, the resolution of such HDTV sets has not been tested as to whether it is actually better than that of HDTV sets without the function, in part because the resolution difference between HDTV sets is not always clearly visible. This paper proposes a subjective assessment for this purpose. The method is a combination of Scheffe’s paired comparison and part of BT.500. Using this method, we performed a subjective assessment on an HDTV set with the SR function and other sets. The assessment data was statistically analyzed, and the results prove that the HDTV set with the SR function was not superior in resolution to the others.
Digital high-definition television (HDTV) broadcasting has started in many countries, and large LCD TV sets have become common. The image quality of these systems is much higher than those of analogue systems such as NTSC and PAL. LCD manufacturers are selling various HDTV systems, and their catalogues are filled with sales points, such as superresolution (SR) [1–3], 240-Hz frame rates , etc.
There is a variety of SR technologies [5–9], and they obviously improve the resolution of still images [9–12]. However, SR technologies are complex, and it is not easy to develop a real-time SR function for HDTV. All of the SR proposals in the literature only include computer simulations and either do not work in real time or work only for a limited range of video sequence types [5–10, 12–14]. However, we need to discuss real-time SR technology since it is now being used in commercial TV sets [1–3]. Note as well that there is another resolution enhancement method called enhancer or unsharp mask that does not actually improve the resolution but instead enhances edges. Unlike a SR function, it is very easy to make real-time enhancer hardware for HDTV [15, 16].
All HDTV sets in use today receive broadcasting bit streams, such as MPEG-2 or MPEG-4, and decode them. After decoding, the sets have different functions to produce better image quality for LCDs. The functions, such as enhancers and noise reducers and so on, vary depending on the manufacturer and set in question. It is impossible to access the video signal inside an HDTV set and show it on another display. If we want to compare individual HDTV sets, we have to compare them with only their displays and with their functions. Although consumers want to buy HDTV sets with better image quality, the methods including the paired comparisons that have been reported in the literature are not useful to compare more than two displays at the same time. For this comparison, HDTV sets should be assessed with all of their functions and on their own displays.
Most HDTV sets are equipped with some kind of image enhancement or image-improving technology. In fact, manufacturers do not always state that their sets are equipped with enhancement technologies. Recently, HDTV sets with SR have become available. According to the information provided by the manufacturers, the SR function is different from that of conventional enhancers. However, the SR function developed for HDTV has not been assessed yet. If HDTV sets with SR cannot actually create frequency elements higher than those of the conventional enhancers, it is questionable whether HDTV sets with SR show better resolution than the conventional HDTV sets without SR.
In this paper, therefore, HDTV sets with enhancement technologies other than SR will be categorized as sets without SR, and those whose manufacturers say are equipped with SR functions will be categorized as sets with SR. This distinction mirrors the situation when shoppers go to an electronics store to buy an HDTV and find that it is not easy to tell the differences in resolution between HDTV sets with and without SR functions. They may end up relying on the claims of the manufacturers or question whether sets equipped with SR have better resolutions than sets without it.
The goal of our study is to see whether SR functions actually improve the resolution of HDTV sets [17, 18]. Although there are several types of SR, so far, superresolution image reconstruction (SRR) is the only SR technology that has been embodied in a real-time system [1–3, 19]. As mentioned above, the details about SR on most HDTV sets are not released by their manufacturers, so there is no proof that sets are actually equipped with an actual SR function or not. SR is an established research field. However, if the SR functions of HDTVs are not based on researchers’ understanding, they may cause confusion not only among researches but also among consumers. We can theoretically analyze the resolution of the video if the HDTV set can be made to output video signals before and after the SR signal process. However, the SR processed image is only sent to the LCD, and there is no method to take the unprocessed image from the set. The only practical way to analyze the capability of SRR on HDTV sets is subjective assessment. There are many subjective assessment methods for evaluating image quality, and they give various results. There is no ‘standard’ method, but psychology and psychophysics provide plenty of methods to carry out this evaluation.
Subjective assessments are the alternative way to clarify the capabilities of HDTVs equipped with SR. These methods measure the reactions of volunteers who view television systems and are used to judge the performances of the systems. Although there are a couple of subjective assessment candidates [20–22], they are not appropriate for the purpose. All of them are designed to assess the image quality with a single display. P.910 is mainly designed for videophone systems, and P.912 is for surveillance systems because content for them are very limited. Videophone sequences mainly show a couple of people, and surveillance sequences tend to show people in corridors or vehicles on streets. Broadcasting has much more varied content, including news, dramas, and sports whose images do not usually resemble the above.
One of the most common and useful subjective assessments is BT.500 . However, BT.500 has been standardized to evaluate the relationship between the video stream bitrate and subjective image quality, and also only one display can be used during the entire assessment test.
We have to use a number of HDTV sets showing the same bit streams to compare individual HDTV sets. The same bit stream was sent to the non-SR and SRR TV sets in order to compare their image qualities, but there is no standard for this sort of evaluation.
To be able to make a comparison, we decided that a couple of capabilities of BT.500 must be combined with other measuring factors.
We thought that a paired comparison would be useful. The notion of a paired comparison is exploited whenever we go to a store and do comparison shopping of similar items. Shoppers would likely want to compare the image qualities of two (or more) TV sets if all other features such as price, reliability, etc. are equivalent. The paired comparison does have an issue in that a lot of time would be consumed if we wanted to compare numerous HDTV sets. The number of TV manufacturers with established brands, however, is limited, and here, we only compare one HDTV set with an SRR function with four other HDTV sets. However, despite their potential utility and despite that paired comparison methods have been used to make video quality assessments, the ones described in the literature use only one display to make comparisons of individual signal processing methods or make changes to parameters. Such paired comparisons have not been used to compare different displays [23–25].
This paper is organized as follows: Section 2 discusses the eligibility of the observers in the assessment and the length of the test video in reference to BT.500. Section 3 explains the subjective assessment. Section 4 describes the statistical analysis of the subjective assessment, and Section 5 is the conclusion.
2 Observers and length of test sequence
BT.500 was used when digital video coding technology was first implemented in broadcasting, and it was an important standard at the time digital broadcasting was just starting. Our study followed the guidelines laid out in BT.500 as to how we selected the observers and how we determined the length of the test video sequences. BT.500 specifies that the observers must be non-video experts who do not work in the video industry and the number of observers should be more than 15. It specifies that the number of sequences should be at least four. BT.500 is still widely used to assess the video image quality, and this means that non-specialist observers can recognize differences in quality. Many analysts and critics of HDTV image quality can easily recognize differences in image quality, but non-video experts can do so as well. BT.500 calls for the length of each video sequence for the assessment to be from 10 to 15 s long. The ITE/ARIB Hi-Vision Test Sequences (ITE sequences) were made for HDTV assessments, and the period of each sequence is 15 s [26, 27]. Although these sequences have been used for subjectively assessing HDTV video coding technologies, they are in YUV422 format. We have to process the sequences with the same MPEG-2 video encoder as broadcasting companies use, including the horizontal resolution conversion.
The sequences for the assessment should be selected from terrestrial broadcasting content because of the issues discussed above. However, it is not easy to find appropriate sequences in actual broadcasting content. The appropriate sequences must have very high frequency elements that are details in images that have no panning. Since panning causes motion blur in the whole image on the display, there are no high-frequency elements in the image at all. Blurry video sequences are thus of no use to assessments seeking to determine how high the resolution of an image is on a display. In accordance with the above considerations, we recorded various pieces of terrestrial HDTV broadcasting content onto a Blu-ray Disc (BD) player (capable of showing 1,920 × 1,080i/59.94-Hz HDTV video) and conducted many subjective assessments on them in order to select the appropriate sequences. Five sequences, each lasting from 10 to 15 s, were selected. The sequences are described in Section 3.2.
3 Subjective assessment
3.1 Scheffe’s paired comparison
Before describing the assessment procedure, we should note that the ‘pre-assessments’ conducted beforehand proved that the results of paired comparisons were reproducible. We selected Scheffe’s paired comparison, which is a round-robin comparison. The video signal paths are shown in Figure 1; a pair of HDTV sets are used in one assessment. The same HDTV video bit stream was used to make non-SR and SRR processed video sequences. Figure 2 shows an example of comparing an HDTV with an SRR function and one without SR. All signal paths are 1,920 × 1,080/59.94 Hz.
Commercial HDTV sets have several display modes, and the names of these modes vary from one manufacturer to another. Most commercial HDTV sets have a dynamic mode, cinema mode, and standard mode. The dynamic mode is used in stores, and it gives an excessive enhancement. The cinema mode is used for showing Blu-ray and DVD movies. The standard mode is for home use, and we chose this mode since it is recommended by HDTV manufacturers for viewing over long periods and reducing energy consumption. The standard mode includes all parameters such as contrast, sharpness, color mode, and those recommended by the manufacturer. Most consumers likely do not have sufficient knowledge to control the many parameters of recent HDTV sets. Hence, they tend to use HDTV sets only in the recommended standard mode. In each assessment, observers assess a pair of HDTV sets at a time (this is the basic rule of Scheffe’s paired comparison, that is, they do not compare the image quality of all five TV sets at once). Synchronized HDTV video is sent to both sets.
The observers we recruited for the experiment were allowed to move freely and check the image quality since it gave them more chances to check for very small resolution differences between sets. People would make such checks when they go to a shop to buy a HDTV set. Moreover, while they are in the shop, they would freely move around the sets and compare their image quality characteristics, including the resolutions. If they cannot find any difference in resolution regardless of how close they get to the screen, they would not need to worry about which set had the SR functions. For this reason, we asked the observers to move freely and look for differences in resolution. Scheffe’s paired comparison is used for many purposes and has been used for the image quality assessment for HDTV sets [18, 28].HDTV sets are usually set in living rooms. Thus, normal lighting conditions for a living room were used. The same video sequence was repeated until the observer made the decision between the two sets. Five TV sets were selected for the experiment (the selection included the HDTV with SRR). A round-robin paired comparison was conducted since Scheffe’s paired comparison was used. That is, one sequence was assessed five times by each observer, using the pair of HDTV A and HDTV B, the pair of HDTV B and HDTV C, and the pair of HDTV C and HDTV A, etc. Each observer was shown two HDTV sets and asked to choose the one with the higher resolution. Odd grades, such as three or five grades, are commonly used when a paired comparison is conducted. Observers scored + 2 for excellent, + 1 for good resolution, 0 for fair, −1 for poor, and −2 for bad resolution in each assessment. There was no time limitation on an assessment. Observers could assess the TV sets as long as they wanted in order to make their decision. Each of the observers assessed the image quality on their own without anyone else in the test room. Figure 3 shows a photo of the experiment. As the photo shows the conditions were similar to when someone goes to a shop to buy a TV. In each test sequence, the observer made 20 assessments since there were five TV set round robins per sequence. Since there were five test sequences, the total number of assessments that one observer had to make was 100.
3.2 Test sequences
Still images such as test patterns and high-resolution photos are usually used to assess the resolution of displays. However, the SRR function on an HDTV set is supposed to work on digital broadcasting content, not on still photos. Since SRR cannot improve resolution with a single image such as a test pattern, we have to use video sequences taken with a HDTV video camera. As described in Section 2, the lengths were from 10 to 15 s. The test video sequences were selected from terrestrial HDTV digital broadcasting content in Japan. Content was recorded on a BD at HDTV resolution, and the repeat function of the BD player was used to show the sequences to the observers during the assessment. A limited amount of recorded broadcasting content was deemed appropriate for this assessment since most of it did not show any differences in the pre-assessment involving several observers. BT.500 recommends using at least four video sequences. Five sequences were selected. The test video sequences are shown in Figures 4, 5, 6, 7 and 8. The circles in each figure are high-resolution areas that were the objectives of the assessments.Before starting the assessment, we prepared a couple of test video sequences to help non-video specialists understand what was meant by high-resolution HDTV video sequences. This training procedure is described in BT.500, and the recruited observers indicated that they understood the instructions after going through the training procedure. Dummy video sequences were shown to observers to stabilize their opinion, as specified in BT.500. Especially high resolution areas in the video sequences were pointed out by the indicators who conducted the assessments with the observers. We did not use the assessment results of the training video sequences. After the training, the observers were asked to concentrate on watching the texture in the circled areas and decide which HDTV set had the higher resolution. In tests such as these, it is generally difficult for non-experts to detect differences in resolution in the whole image. Thus, after the training, we asked the observers to concentrate on watching the texture in the circles and decide which HDTV set had the higher resolution. Furthermore, the observers were also asked to evaluate only the resolution and ignore other things. Sequences 1 (Figure 4) to 5 (Figure 8) were selected for the actual test. All of the circled areas have high-frequency elements and details.
4 Statistical analysis
Twenty-five observers participated in the assessment. All of them were university students ranging in age from 20 to 23 years old (average, 21). Prior to each test, a training session was held to introduce them to the test methodology of using broadcasting content that had high-resolution areas. The stimuli numbered five since five HDTV sets were used.
The outline of the analysis process of Scheffe’s comparison test is as follows: A round-robin is performed on the five samples by comparing a pair of samples each time. The cross table for the whole results is made, and an analysis of variance is conducted. F0 is calculated. We go forward to the yardstick analysis only when a significant difference is detected in the analysis of variance and F0. The results for sequence 1 (news) are shown in Table 1. n means the number of stimuli (five HDTV sets), and N means the number of observers (25). The results in the deviations column and the biased deviations column were calculated in order to analyze the assessments [28, 29]. F at a 1% provability, F1%, values were derived from F table using the degrees of freedom of the residual (369) and those of the second parameter (4, 96, and 6). F0 values are obtained with the biased deviation values. The biased deviation of the stimuli value (171.918) was divided by the biased deviation of the residual value (2.2666). Thus, the F0 value of the stimuli was 75.84872. The F0 value of the stimuli × observers row was calculated in a similar fashion. The biased deviation of the stimuli × observers row (0.3907668) was divided by the biased deviation of the residual value (2.2666). The F0 value of the stimuli × observers was 0.3907668. If F0 is bigger than F1%, there is a significant difference, and indeed, the table shows that the F0 values of the stimuli are bigger than those of F. There is a significant difference in the stimuli × observers row (1%), which is indicated with ‘a’ in the F0 column.
The yardstick method can only be used on significant differences in the analysis of variance [28, 29]. Although the details of the analysis cannot be discussed in full due to space limitations, it is a typical combination of Scheffe’s paired comparisons and a yardstick analysis of the results of the Scheffe’s paired comparisons.
The α values for HDTV sets (αHDTV A, αHDTV B, αHDTV C, αHDTV D, and αHDTV E) were determined from the degrees of freedom and Figure 9. Figure 9 is called the cross table, and it is used in Scheffe’s paired comparison. There are two degrees of freedom: the number of observers (25) and the number of HDTV sets (5). αHDTV A is the value in Figure 9, i.e., the row (X·j·−Xi··) and the column (HDTV A) divided by 2nN, i.e., the stimuli values (HDTV sets: n=5, observers: N=25). (X·j·−Xi··,HDTV A)(−252) are divided by 2nN as follows [28, 29]:
αHDTV B, αHDTV C, αHDTV D, and αHDTV E are calculated in a similar way.
In Scheffe’s paired comparison, all HDTVs become the reference. For example, HDTV A starts out as the reference, and HDTV B, C, D, and E are evaluated against it. Then, HDTV B becomes the reference, and all other HDTVs are evaluated against it. Although the evaluation of HDTV B is +2 with reference HDTV A, the evaluation might not be −2 but be −1 when HDTV A is assessed against the reference HDTV B. The reverse assessment is not always symmetrical.
The order of resolution is HDTV D, HDTV C, HDTV B, HDTV E, and HDTV A. HDTV A is equipped with SRR. According to the subjective assessments, the resolution of HDTV A is the lowest.
An analysis was conducted to see if the order had statistically significant differences. A Y value for 1% was used, and it can be derived using Equation 3:
Different tables give the provability at 1% of the F1% distribution ; thus, Yα 0.01=0.44847.
The yardstick values are shown in Figure 10. The differences in the yardstick values in relation to Yα 0.01 are as follows. The difference between the lowest resolution HDTV A and the second lowest HDTV E is calculated as follows:
The percentage 1% in Equation 5 means a false provability of 1%, and the result is 99% true. The resolution of HDTV set A with SRR is thus inferior to that of HDTV set E, the second lowest resolution at a provability of 99%.
All of these values are greater than Yα 0.01=0.44847. There are statistical differences and their false provability of 1%. These relations are marked with double asterisk in Figure 10. Thus, the HDTV set with SRR was actually poorer in resolution than the other HDTV sets. Figures 11, 12, 13 and 14 show the assessment results of Figures 5, 6, 7 and 8. All of them have the similar tendencies. Our assessments have proven that the resolution of the HDTV set with SRR is the lowest of the manufacturers’ HDTV sets tested.
Our subjective assessment of HDTV with an SRR function used Scheffe’s paired comparison, observers who were not video experts, as called for by BT.500, and content chosen from terrestrial digital HDTV broadcasting. The assessment results were statistically analyzed (analysis of variance). A yardstick method was conducted on the points of significant difference. It was statistically proven that the SRR function on the HDTV set did not improve the resolution.
The resolution of the HDTV set with SRR was found to be the lowest of the HDTV sets tested. This result accords with most observers’ opinions just after the assessment test. The assessment method described here can be used for other items such as frame rate conversion from 60 to 240 Hz and noise reduction on digital HDTV sets. It is necessary to conduct further validation of this method with various content and TV sets.
Tokumitsu S: From REGZA to Cell REGZA. . Accessed 12 Feb 2014 http://www.chinacom.tw/ngn2009/pdf/Toshiba_Presentation_Web_REVISED.pdf
Matsumoto N, Ida T: Reconstruction-based super-resolution using self-congruency around image edges. J. IEICE 2010, J93-D(2):118-126. (in Japanese)
Toshiba (in Japanese). Accessed 12 Feb 2014 http://www.toshiba.co.jp/regza/detail/superresolution/resolution.html
Sony (in Japanese). Accessed 12 Feb 2014 https://www.sony.jp/bravia/technology/mf/
Farsiu S, Robinson D, Elad M, Milanfar P: Fast and robust multi-frame super-resolution. IEEE Trans. Image Process 2004, 13(10):1327-1344. 10.1109/TIP.2004.834669
Panda S, Prasad RS, Jena G: POCS based super-resolution image reconstruction using an adaptive regularization parameter. IJCSI Int. J. Comput. Sci. Issues 2011, 8(5):1694-0814.
Park SC, Park MK, Kang MG, Super-resolution image reconstruction: a technical overview: IEEE Signal Process. Mag. 2003, 20: 21-36. 10.1109/MSP.2003.1203207
Eekeren AWM, Schutte K, Vliet LJ: Multiframe super-resolution reconstruction of small moving objects. IEEE Trans. Image Process 2010, 19(11):2901-2912.
Katsaggelos A, Molina R, Mateos J: Synthesis Lectures on, Images, Video and Multimedia Processing. Morgan & Claypool Publishers, San Rafael; 2007.
Glasner D, Bagon S, Irani M: Super-resolution from a single image. 2009 IEEE 12th International Conference on Computer Vision, Kyoto, October 2009 pp. 349-356.
Bannore V: Iterative-Interpolation Super-Resolution Image Reconstruction, Studies in Computational Intelligence. Springer, Milton, Keynes; 2010:77-103.
Chaudhuri S, Manjunath J: Motion-Free Super-Resolution. Springer, New York; 2005.
Baker S, Kanade T: Limitation on super-resolution and how to break them. PAMI 2002, 24(9):1167-1183. 10.1109/TPAMI.2002.1033210
Shahar O, Faktor A, Irani M: Space-time super-resolution from a single video. In CVPR ’11 Proceedings of the 2011 IEEE Conference on, Computer Vision and Pattern Recognition. IEEE, Piscataway; 2011:3353-3360.
Schreiber WF: Wirephoto quality improvement by unsharp masking. J. Pattern Recognit 1970, 2: 111-121.
Lee J-S: Digital image enhancement and noise filtering by use of local statistics. In IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-2. IEEE, Piscataway; 1980:165-168.
Gohshi S: Limitation of super resolution image reconstruction for video. In Fifth International Conference on Computational Intelligence, Communication Systems and Networks. IEEE, Madrid; 5–7 June 2013:217-221.
Gohshi S, Echizen I: Subjective assessment for HDTV with super-resolution function. In Seventh International Workshop on Video Processing and Quality Metrics for Consumer Electronics - VPQM2013 2013. Scottsdale; 30 January–1 February 2013:32-36.
Matsumoto N, Ida T: A study on one frame reconstruction-based super-resolution using image segmentation. IEICE Technical Report 2008. (in Japanese)
. Accessed 12 Feb 2014 http://www.itu.int/rec/R-REC-BT.500/en
. Accessed 12 Feb 2014 http://www.itu.int/rec/T-REC-P.910/en
Lee J-S, Simone FD, Ebrahimi T: Subjective quality evaluation via paired comparison: application to scalable video coding. Multimedia IEEE Trans 2011, 13(5):882-893.
Silverstein DA, Farrell JE: Quantifying perceptual image quality. Proc. IS& T Image Process. Image, Qual., Image Capture, Syst. Conf 1998, 1: 242246.
Li J, Barkowsky M, Callet PL: Subjective assessment methodology for preference of experience in 3DTV. In Proceedings of the 11th IEEE IVMSP Workshop: 3D Image/Video Technologies and Applications, Seoul. IEEE, Piscataway; 2013.
. Accessed 12 Feb 2014 http://www.ite.or.jp/en/
. Accessed 12 Feb 2014 http://www.nes.or.jp/gaiyo/pdf/ite_hyoujundouga_sample.pdf
Scheffe H: An analysis of variance for paired comparisons. J. Am. Stat. Assoc 1952, 47(259):381-400.
Fukuda T, Fukuda R: Ergonomics Handbook. Scientist Press Co. Ltd, Tokyo; 2009. (in Japanese)
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Gohshi, S., Hiroi, T. & Echizen, I. Subjective assessment of HDTV with superresolution function. J Image Video Proc 2014, 11 (2014). https://doi.org/10.1186/1687-5281-2014-11