 Research
 Open Access
 Published:
Zoom motion estimation for color and depth videos using depth information
EURASIP Journal on Image and Video Processing volumeÂ 2020, ArticleÂ number:Â 11 (2020)
Abstract
In this paper, two methods of zoom motion estimation for color and depth videos by using depth information are proposed. Color and depth videos are independently estimated for zoom motion. Zoom for color video is scaled by spatial domain, and depth video is scaled by both spatial and depth domains. For color video, instead of existing methods of zoom motion estimation that apply all of possible zoom ratios for a current block, the zoom ratio of the proposed method is determined as the ratio of the average depth values of the current and reference blocks. Then, the reference block is resized by multiplying the zoom ratio and the reference block is mapped to the current block. For depth video, the reference block is first scaled in the spatial direction by the same methodology used for color video and then scaled by a distance ratio from a camera to the objects. Compared to the conventional motion estimation method, the proposed method reduces MSE by up to about 30% for the color video and up to about 85% for the depth video.
1 Introduction
Intelligent surveillance systems for monitoring the behavior of objects are operated in various places for public safety. These systems can use not only conventional RGB videos but also infrared and depth videos to acquire new information. In order to operate the intelligent surveillance systems by transmitting the videos, an efficient encoding method is required for the various types of the videos.
In video coding standards such as H.264/AVC [1â€“4] and H.265/HEVC [5, 6], various methods for removing redundancies are used to compress color video. The temporal direction is one type of the redundancies of the video. The temporal redundancy is efficiently removed by motion estimation for objects in frames. The block matching algorithm (BMA) [7, 8] has been embraced as a method of motion estimation in the video coding standards. BMA estimates object motion accurately when the object size among frames is fixed. However, conventional motion estimation methods through BMA have a limitation that it estimates object motion inaccurately when the object size is changed because the size of the reference block is equal to the size of a current block.
In order to estimate various types of object motion including zoom, whose size is changed, the object motion models such as affine [9â€“11], perspective [12], polynomial [13], or elastic [14] can be applied. However, motion estimation methods through the motion models have high computational complexity because they need computation of model factor for each object. An improved affine model that the number of parameters is reduced from 6 to 4 has been introduced to solve this problem [15, 16]. Instead of computing the model parameters, a method of introducing a zoom ratio into the conventional BMA [17] has been proposed. However, there is a need to limit searching range of zoom ratios since the possible zoom ratios are infinite. To reduce the searching complexity of the zoom ratio, a diamond search method has been introduced to zoom ratio search [18]. Methods [19â€“21] for determining the zoom ratio instead of searching a zoom ratio have also been researched as follows. Superiori [19] observes that directions of motion vectors (MVs) tend to align with a direction from the border to the center of the object when the object has zoom motion. Takada et al. [20] proposes a method of improving coding efficiency by calculating zoom ratios by analyzing MVs in the coded video and recoding the video. This method has a limitation that it can only be applied in the coded video. Shukla et al. [21] proposes a method of finding warping vectors in the vertical and horizontal directions instead of the conventional BMA. Shen et al. [22] proposed a motion estimation method for extracting and matching scaleinvariant feature transform (SIFT) features that are robust for rotating and scaling. Luo et al. [23] proposes a motion compensation method to detect feature points through the speededup robust features (SURF) algorithm in reference and current frames and find corresponding image projections by the perspectivenpoint method. Qi et al. [24] proposes a 3D motion estimation method by predicting a future scene based on the 3D motion decomposition. Wu et al. [25] introduces a Kmeans clustering algorithm to improve a performance of motion estimation.
In this paper, a zoom estimation method for color video is first proposed by using depth information. Each pixel value in the depth video represents some distance from a depth camera to the objects. Applications of depth video have been researched in various fields such as face recognition [26â€“28], simultaneous localization and mapping [29, 30], object tracking [31â€“35], and people tracking [36â€“38]. The proposed method determines the zoom ratio as the ratio of the representative depth values of a current block to a reference block. The representative depth value is set to an average of depth values in each block. Then, a reference block size is determined by multiplying the current block size and the zoom ratio. The reference block is scaled to the current block size by spatial interpolation, and two blocks are compared in order to find an optimal reference block.
A method of motion estimation for depth video is also proposed in this paper. In depth video coding, studies for intraprediction have been conducted [39â€“43], but studies for interprediction are insufficient. When an object in depth video has zoom motion, not only the size but also depth values of the object are scaled to a zoom rate. In order to accurately estimate the zoom motion for the depth video, we propose a 3D scaling method that is simultaneously scaling 2D spatial size and depth values of the reference block. The spatial scaling is similar to the method for the color video. After the spatial scaling, the depth values in the reference block are also scaled by multiplying the zoom ratio.
Contributions of the proposed method are as follows. The proposed method for color video encoding reduces a computational complexity for determining a zoom ratio through calculating the ratios of depth values. The proposed method for depth video encoding improves the accuracy of motion estimation through considering changes of pixels in the depth video when the object has zoom motion.
This paper is organized as follows. The proposed method is described in Section 2. In Section 3, we present the simulation results to show the improvement of motion estimation accuracy using the proposed method. Finally, we describe a conclusion for this paper in Section 4.
2 Proposed method
2.1 Relationship between depth values and object size
The size of an object and the distance from a camera appear to be inversely proportional. To clarify the relationship between the object size and the depth value of the depth frame, object widths in captured pictures are measured while moving a diamondshaped object at intervals of 0.5 m from 1 m to 4 m as shown in Fig.Â 1.
The relationship between the width and distance of the object is described as shown in Fig.Â 2. The measured relationship can be approximated with a fitting equation as follows:
where P means the number of pixels of the object width shown in red arrow in Fig.Â 1, d means the distance from the camera, and Î± and Î² mean constant values. In the case of Fig.Â 2, Î± and Î² are measured as 0.965 and 214.59, respectively.
2.2 Relationship between depth values and object size
When the zoom motion of an object occurs between the current and reference picture, a size of the object is zoomed as the distance moved toward the camera. Therefore, the size of the reference block should be determined through the distance in order to estimate the object motion which has zooming. The depth information has distances from the camera at each pixel. Therefore, the zoom ratio between the current and reference blocks can be calculated through the depth information. The averages of the depth values in the current and reference blocks are assumed as distances of each block. If the zoom ratio s is defined as the ratio of the number of the pixels between the current and reference blocks, s is calculated by substituting the number of pixels of the current and reference blocks into Eq. (1) as follows:
where \(\overline {d_{\text {cur}}}\) and \(\overline {d_{\text {ref}}}\) mean the representative depth values of the current and reference blocks, respectively, and P_{cur} and P_{ref} mean the number of pixels of the current and reference blocks, respectively. A simplified expression of Eq. (2) is as follows:
When a size of the current block is assumed as mÃ—n, the size of the reference block is determined as smÃ—sn. The reference block is scaled by interpolation so that the size of the reference block is equal to the size of the current block. FigureÂ 3 shows a flowchart of the proposed zoom motion estimation for the color video and Fig.Â 4 shows processes of the proposed method.
FigureÂ 5 shows an example of zoom motion estimation for the color video. Areas surrounded by the red rectangle in Fig.Â 5 a and b are the 8Ã—8 current and reference blocks, respectively, and Fig.Â 5 c and d show depth values in each blocks. \(\overline {d_{\text {cur}}}\) and \(\overline {d_{\text {ref}}}\) of 8Ã—8 current and reference blocks in depth pictures are about 2322.312 and 2469.523, respectively, so s is calculated as about 0.940 if Î± is set to 0.965. Therefore, the size of the reference color block is determined to 7Ã—7. Then, a 7Ã—7 reference color block is scaled so that the reference block size is equal to the current block size. The mean square errors (MSEs) of conventional and proposed motion estimation methods are about 169.734 and 74.609, respectively. These results shows the proposed zoom motion estimation method is more accurate when the object in the video has zoom motion.
2.3 Zoom motion estimation for depth video
In depth video, the distance of an object from the depth camera is changed when the object has zoom motion, so the depth values of the object are changed as shown in Fig.Â 6. Therefore, not only the size but also depth values of the object should be considered for the zoom motion estimation for depth video.
A method of 3D scaling is introduced for the zoom motion estimation for depth video. 3D scaling means that depth axis scaling has been added to the 2D spatial scaling that scales the block size. The flowchart of 3D scaling is shown in Fig.Â 7.
In 3D scaling, the zoom ratio calculation and the size determination of a reference block are the same as the processes of zoom motion estimation for previous color video. Then, the depth values of the sizescaled reference block are scaled by the following equation:
where R(i,j) and R_{i}(i,j) mean original and scaled depth values in position (i,j), respectively.
FigureÂ 8 shows an example of zoom motion estimation for the depth video. Areas surrounded by the red rectangle in Fig.Â 8a and b are the 8Ã—8 current and reference blocks, respectively, and Fig.Â 8 c and d show the depth values in each block. \(\overline {d_{\text {cur}}}\) and \(\overline {d_{\text {ref}}}\) for each 8Ã—8 block are about 679.625 and 776.969, respectively, so s is calculated as about 0.874. If Î± is set to 0.965, the reference block size is determined as 7Ã—7 as shown in Fig.Â 8 e when the current block size is 8Ã—8. Then, a 7Ã—7 reference block is scaled by the spatial scaling so that the reference block size is equal to the current block size. After that, depth values in 2D scaled reference block is scaled as shown in Fig.Â 8 g. MSEs of conventional and proposed methods are about 9482.97 and 3.48, respectively. These results show that the 3D scaling improves an accuracy of the motion estimation for the depth video.
2.4 Zoom motion estimation for variablesize block
The video coding standard provides the variablesize block that groups blocks which have similar MVs in order to reduce the number of coding blocks. In the motion estimation of H.264/AVC [1â€“4], the size of variablesize block is allowed to be 16Ã—16, 16Ã—8, 8Ã—16, and 8Ã—8 when the macroblock size is 16Ã—16 and 8Ã—8, 8Ã—4, 4Ã—8, and 4Ã—4 when the macroblock size is 16Ã—16. FigureÂ 9 shows the division of a macroblock in the variablesize block. The modes of variablesize block are determined by comparing sum of absolute errors (SAEs) or sum of square errors (SSEs) of each variablesize block.
In addition, an introduction of the variablesize block can solve a problem that is difficult to determine the representative depth value of a mixed block having foreground object and background. For the mixed block, the representative depth value is determined as an average value of the depth values of background and foreground, and then this causes the inaccurately zoom ratio. This problem can be solved by dividing the block into smaller size blocks so that each block has only background or foreground object.
The proposed method can provide estimation for variablesize block. The variablesize block is applied independently to both color and depth videos. When the size of sample block is 16Ã—16, SAE for the original block and sums of SAEs for partitioned block are 16Ã—16, 16Ã—8, 8Ã—16, and 8Ã—8. In motion estimation for partitioned block, coding of each MVs for partitioned block should also be considered. In the case of comparing between 16Ã—16 and 16Ã—8 variablesize blocks, the equation for comparing SAEs is as follows:
where SAE_{16Ã—16} and SAE_{16Ã—8} mean SAEs for original block and partitioned block as 16Ã—8, respectively, and T_{16Ã—8} means a threshold considering MVs. If the 16Ã—16 sample block satisfies Eq. (5), this block can be partitioned into 16Ã—8.
3 Results and discussion
In order to measure motion estimation accuracies of the proposed zoom motion estimation, we use the depth video datasets [44] that the camera moves forth or back as shown in Fig.Â 10 a and b, and we capture videos in which 1 or 2 people move back and forth while the position of the camera is fixed as shown in Fig.Â 10 c and d. The videos in Fig.Â 10 a and b are captured by Microsoft Kinect, and the videos in Fig.Â 10 c and d are captured by Intel Realsense D435. The resolutions of color and depth videos are specified as 640 Ã— 480. We used 30 consecutive frames that has the most prominent zoom motion in each video. The reference picture basically has a picture gap from the current picture. The fullsearch method is applied as the search method for BMA. The search range is set to Â± 15 while the sizes of the sample block are set to 8Ã—8 and 16Ã—16. Î± in Eq. (3) is set to 0.965. In the color videos, only a gray channel is used. The searching pixel unit is limited as 1/2 pixel in the case of the color video and 1 pixel in the case of the depth video.
In the proposed method, the RD optimization method can be used to determine the motion estimation mode. However, this paper does not discuss the coding method of depth video. Therefore, the estimation mode for each block is selected by following equation:
where SSE_{ME} and SSE_{ZME} mean SSE for the conventional and proposed methods. If a block satisfies Eq. (6), then the motion estimation mode of this block is selected as the zoom motion estimation. In this simulation, T_{mode} is determined as the following equation:
where m and n mean the height and width of a current block, respectively.
FiguresÂ 11 and 12 show MSEs of motion estimation for the color videos through the conventional and proposed methods. A picture gap between the current picture and the reference picture is 1. The accuracies of motion estimation by the proposed method are improved.
TablesÂ 1 and 2 show the average MSEs according to the frame gap between the current picture and the reference picture. In TablesÂ 1 and 2, \(\overline {\text {MSE}_{\text {ME}}}\) and \(\overline {\text {MSE}_{\text {ZME}}}\) mean averages of MSEs for conventional and proposed motion estimation methods and \(\Delta \overline {\text {MSE}}\) means improved MSE by the proposed zoom motion estimation. The picture gap between the current and reference pictures is farther, and the number of selected block as the zoom estimation mode is larger. In color image, blocks including the object boundary region are mainly selected as the zoom motion estimation mode. This means that when the color video has the zoom motion, regions of the object boundaries are particularly affected in conventional motion estimation method.
FiguresÂ 13 and 14 shows MSEs of motion estimation in the depth videos through the conventional and the proposed methods. A picture gap between the current picture and the reference picture is 1. The accuracies of motion estimation by the proposed method are more improved than in the case of the color videos. FigureÂ 15 shows zoom ratios in the proposed zoom motion estimation for depth videos. The zoom motion estimation mode is selected for almost all the areas where the zoom motion occurs.
TablesÂ 3 and 4 show the average MSEs according to the picture gap between the current picture and the reference picture. Similar to the case of color images, the picture gap between the current and reference pictures is farther, and the number of selected block as the zoom estimation mode is larger.
Estimation accuracies and reduction in the number of MVs through the variablesize block are measured in TablesÂ 5, 6, 7, and 8. Thresholds of the block partition in Eq. (6) are set as follows: T_{16Ã—8} and T_{8Ã—16} are set to 16^{2}/2, T_{8Ã—8} is set to 16^{2}, T_{8Ã—4} and T_{4Ã—8} are set to 8^{2}/2, and T_{4Ã—4} is set to 8^{2}. TablesÂ 5, 6, 7, and 8 show MSEs and a number of each block size in a variablesize block allowing block sizes of 16Ã—16, 16Ã—8, 8Ã—16, and 8Ã—8, and in a variablesize block allowing block sizes of 8Ã—8, 8Ã—4, 4Ã—8, and 4Ã—4. In TablesÂ 5 and 6, MSE_{VB} means MSEs of the variablesize block and MSE_{16Ã—16}, MSE_{8Ã—8}, and MSE_{4Ã—4} means MSEs of the fixedsize block. In TablesÂ 7 and 8, notations such as MV_{16Ã—16} and MV_{16Ã—8} mean the number of MVs in the variablesize block, MV_{fixed(8Ã—8)} means the number of MVs in the fixedsize block, and \(\sum \text {MSE}_{\text {VB}}\) means the sum of the number of MVs in the variablesize block. MSEs in the variablesize block are similar to the fixedsize block whose the block size is equal to the smallest size in allowed size. The number of MVs is greatly reduced to up to about 40% compared to the fixedsize block.
4 Conclusions
In this paper, we proposed a method of calculating the zoom ratio for the zoom motion estimation of color video by using the depth information. We also proposed a method of the zoom motion estimation for the depth video. We measured the improvement of MSEs when the proposed method was separately applied to the color and depth videos. The simulation results showed that MSE is reduced up to about 30% for the color video and 85% for the depth video. Furthermore, zoom motion estimation for variablesize block reduces a lot of the number of motion vectors.
Some of the conventional methods for zoom motion estimation determine the zoom ratio by extracting and matching object features which are robust against zooming. There are also methods for determining the zoom ratio through searching the pattern of zoom motion from the direction and size of MVs. In the other method, an optimal zoom ratio can be found through scaling a reference block in the range of possible zoom ratios. However, these conventional methods of determining the zoom ratio have a limitation of high computational complexity. On the other hand, a computation of the zoom ratio is simplified in the proposed method, since the determination of the zoom ratio is required only in the calculation of a ratio of depth values between reference and current blocks.
The motion estimation method proposed in this paper is expected to be applicable to the video coding standard. Also, a method to encode the zoom motion vector is to be studied more in the future. Further research to obtain optimal coding efficiency by considering both the number of bits for additional transmission of the zoom motion vector and the coding gain according to the reduced motion estimation difference signal is also required.
Availability of data and materials
The dataset used during the current study is the NYU Depth Dataset V2 [44] and is available at https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html.
Abbreviations
 BMA:

Block matching algorithm
 MSE:

Mean square error
 MV:

Motion vector
 SAE:

Sum of absolute error
 SIFT:

Scaleinvariant feature transform
 SSE:

Sum of square error
 SURF:

Speededup robust features
References
H.264: Advanced Video Coding for Generic Audiovisual Services. ITUT Rec. H.264. https://www.itu.int/rec/TRECH.264/en. Accessed 1 June 2019.
I. E. G. Richardson, H.264 and MPEG4 Video Compression: Video Coding for NextGeneration Multimedia (Wiley, NJ, 2003).
T. Wiegand, G. J. Sullivan, G. Bjontegaard, A. Luthra, Overview of the h.264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol.13(7), 560â€“576 (2003).
S. K. Kwon, A. Tamhankar, K. R. Rao, Overview of h.264/MPEG4 part 10. J. Vis. Commun. Image Represent.17(2), 186â€“216 (2006).
G. J. Sullivan, J. R. Ohm, W. J. Han, T. Wiegand, Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Video Technol.22(12), 1649â€“1668 (2012).
D. Patel, T. Lad, D. Shah, Review on intraprediction in high efficiency video coding (HEVC) standard. Int. J. Comput. Appl.132(13), 26â€“29 (2015).
H. G. Musmann, P. Pirsch, H. Grallert, Advances in picture coding. Proc. IEEE. 73(4), 523â€“548 (1985).
J. Jain, A. Jain, Displacement measurement and its application in interframe image coding. IEEE Trans. Commun.29(12), 1799â€“1808 (1981).
H. Jozawa, K. Kamikura, A. Sagata, H. Kotera, H. Watanabe, Twostage motion compensation using adaptive global mc and local affine mc. IEEE Trans. Circ. Syst. Video Technol.7(1), 75â€“85 (1997).
T. Wiegand, E. Steinbach, B. Girod, Affine multipicture motioncompensated prediction. IEEE Trans. Circ. Syst. Video Technol.15(2), 197â€“209 (2005).
R. C. Kordasiewicz, M. D. Gallant, S. Shirani, Affine motion prediction based on translational motion vectors. IEEE Trans. Circ. Syst. Video Technol.17(10), 1388â€“1394 (2007).
Y. Nakaya, H. Harashima, Motion compensation based on spatial transformations. IEEE Trans. Circ. Syst. Video Technol.4(3), 339â€“356 (1994).
M. Karczewicz, J. Nieweglowski, J. Lainema, O. Kalevo, in Proceedings of First International Workshop on Wireless Image/Video Communications. Video coding using motion compensation with polynomial motion vector fields, (1996), pp. 26â€“31. https://doi.org/10.1109/wivc.1996.624638.
M. R. Pickering, M. R. Frater, J. F. Arnold, in 2006 International Conference on Image Processing. Enhanced motion compensation using elastic image registration, (2006), pp. 1061â€“1064. https://doi.org/10.1109/icip.2006.312738.
L. Li, H. Li, D. Liu, Z. Li, H. Yang, S. Lin, H. Chen, F. Wu, An efficient fourparameter affine motion model for video coding. IEEE Trans. Circ. Syst. Video Technol.28(8), 1934â€“1948 (2018). https://doi.org/10.1109/TCSVT.2017.2699919.
N. Zhang, X. Fan, D. Zhao, W. Gao, Merge mode for deformable block motion information derivation. IEEE Trans. Circ. Syst. Video Technol.27(11), 2437â€“2449 (2017). https://doi.org/10.1109/TCSVT.2016.2589818.
L. Po, K. Wong, K. Cheung, K. Ng, Subsampled blockmatching for zoom motion compensated prediction. IEEE Trans. Circ. Syst. Video Technol.20(11), 1625â€“1637 (2010).
H. S. Kim, J. H. Lee, C. K. Kim, B. G. Kim, Zoom motion estimation using blockbased fast local area scaling. IEEE Trans. Circ. Syst. Video Technol.22(9), 1280â€“1291 (2012).
L. Superiori, M. Rupp, in 2009 10th Workshop on Image Analysis for Multimedia Interactive Services. Detection of pan and zoom in soccer sequences based on H.264/AVC motion information, (2009), pp. 41â€“44. https://doi.org/10.1109/wiamis.2009.5031427.
R. Takada, S. Orihashi, Y. Matsuo, J. Katto, in 2015 IEEE International Conference on Consumer Electronics (ICCE). Improvement of 8k UHDTV picture quality for H.265/HVEC by global zoom estimation, (2015), pp. 58â€“59. https://doi.org/10.1109/icce.2015.7066317.
D. Shukla, R. K. Jha, A. Ojha, Unsteady camera zoom stabilization using slope estimation over interest warping vectors. Pattern Recogn. Lett.68:, 197â€“204 (2015).
X. Shen, J. Wang, Q. Yang, P. Chen, F. Liang, in 2017 IEEE Visual Communications and Image Processing (VCIP). Feature based inter prediction optimization for nontranslational video coding in cloud, (2017), pp. 1â€“4. https://doi.org/10.1109/vcip.2017.8305066.
G. Luo, Y. Zhu, Z. Weng, Z. Li, A disocclusion inpainting framework for depthbased view synthesis. Trans. Pattern Anal. Mach. Intell. (Early Access), IEEE, 1â€“14 (2019). https://doi.org/10.1109/tpami.2019.2899837.
X. Qi, Z. Liu, Q. Chen, J. Jia, in 2019 IEEE Conference on Computer Vision and Pattern Recognition. 3D motion decomposition for RGBD future dynamic scene synthesis, (2019), pp. 7673â€“7682. https://doi.org/10.1109/cvpr.2019.00786.
M. Wu, X. Li, C. Liu, M. Liu, N. Zhao, J. Wang, X. Wan, Z. Rao, L. Zhu, Robust global motion estimation for video security based on improved kmeans clustering. J. Ambient Intell. Humanized Comput.10(2), 439â€“448 (2019).
G. Fanelli, M. Dantone, L. Van Gool, in 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). Real time 3d face alignment with random forestsbased active appearance models, (2013), pp. 1â€“8. https://doi.org/10.1109/fg.2013.6553713.
M. Dantone, J. Gall, G. Fanelli, L. Van Gool, in 2012 IEEE Conference on Computer Vision and Pattern Recognition. Realtime facial feature detection using conditional regression forests, (2012), pp. 2578â€“2585.
R. Min, N. Kose, J. Dugelay, Kinectfacedb: A kinect database for face recognition. IEEE Trans. Syst. Man Cybernet. Syst.44(11), 1534â€“1548 (2014).
J. Sturm, N. Engelhard, F. Endres, W. Burgard, D. Cremers, in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. A benchmark for the evaluation of rgbd slam systems, (2012), pp. 573â€“580. https://doi.org/10.1109/iros.2012.6385773.
F. Pomerleau, S. Magnenat, F. Colas, M. Liu, R. Siegwart, in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. Tracking a depth camera: Parameter exploration for fast ICP, (2011), pp. 3824â€“3829. https://doi.org/10.1109/iros.2011.6094861.
M. Siddiqui, G. Medioni, in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition  Workshops. Human pose estimation from a single view point, realtime range sensor, (2010), pp. 1â€“8. https://doi.org/10.1109/cvprw.2010.5543618.
R. MuÃ±oz Salinas, R. Medina Carnicer, F. J. Madrid Cuevas, A. Carmona Poyato, Depth silhouettes for gesture recognition. Pattern Recogn. Lett.29(3), 319â€“329 (2008).
P. Suryanarayan, A. Subramanian, D. Mandalapu, in 2010 20th International Conference on Pattern Recognition. Dynamic hand pose recognition using depth data, (2010), pp. 3105â€“3108. https://doi.org/10.1109/icpr.2010.760.
J. Preis, M. Kessel, M. Werner, C. LinnhoffPopien, in 1st International Workshop on Kinect in Pervasive Computing. Gait recognition with kinect (New CastleUK, 2012), pp. 1â€“4.
S. Song, J. Xiao, in 2013 IEEE International Conference on Computer Vision. Tracking revisited using RGBD camera: Unified benchmark and baselines, (2013), pp. 233â€“240. https://doi.org/10.1109/iccv.2013.36.
J. Sung, C. Ponce, B. Selman, A. Saxena, in Workshops at the Twentyfifth AAAI Conference on Artificial Intelligence. Human activity detection from RGBD images, (2011), pp. 47â€“55.
L. Spinello, K. O. Arras, in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. People detection in RGBD data, (2011), pp. 3838â€“3843. https://doi.org/10.1109/iros.2011.6095074.
M. Luber, L. Spinello, K. O. Arras, in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. People tracking in RGBD data with online boosted target models, (2011), pp. 3844â€“3849. https://doi.org/10.1109/iros.2011.6095075.
K. Lai, L. Bo, X. Ren, D. Fox, in 2011 IEEE International Conference on Robotics and Automation. A largescale hierarchical multiview RGBD object dataset, (2011), pp. 1817â€“1824. https://doi.org/10.1109/icra.2011.5980382.
S. Gasparrini, E. Cippitelli, E. Gambi, S. Spinsante, J. WÃ¥hslÃ©n, I. Orhan, T. Lindh, in International Conference on ICT Innovations. Proposal and experimental evaluation of fall detection solution based on wearable and depth data fusion (Springer, 2015), pp. 99â€“108. https://doi.org/10.1007/9783319257334_11.
P. Ammirato, P. Poirson, E. Park, J. KoÅ¡eckÃ¡, A. C. Berg, in 2017 IEEE International Conference on Robotics and Automation (ICRA). A dataset for developing and benchmarking active vision, (2017), pp. 1378â€“1385. https://doi.org/10.1109/icra.2017.7989164.
M. Kraft, M. Nowicki, A. Schmidt, M. Fularz, P. SkrzypczyÅ„ski, Toward evaluation of visual navigation algorithms on RGBD data from the first and secondgeneration kinect. Mach. Vis. Appl.28(12), 61â€“74 (2016).
D. S. Lee, S. K. Kwon, Intra prediction of depth picture with plane modeling. Symmetry. 10(12), 715 (2018).
N. Silberman, D. Hoiem, P. Kohli, R. Fergus, in European Conference on Computer Vision. Indoor segmentation and support inference from RGBD images (Springer, 2012), pp. 746â€“760. https://doi.org/10.1007/9783642337154_54.
Acknowledgements
Not applicable.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
All authors took part in the discussion of the work described in this paper. The authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisherâ€™s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the articleâ€™s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâ€™s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kwon, Sk., Lee, Ds. Zoom motion estimation for color and depth videos using depth information. J Image Video Proc. 2020, 11 (2020). https://doi.org/10.1186/s13640020004992
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13640020004992
Keywords
 Zoom motion Estimation
 Inter prediction
 Depth video
 Depth video coding