### Relationship between depth values and object size

The size of an object and the distance from a camera appear to be inversely proportional. To clarify the relationship between the object size and the depth value of the depth frame, object widths in captured pictures are measured while moving a diamond-shaped object at intervals of 0.5 m from 1 m to 4 m as shown in Fig. 1.

The relationship between the width and distance of the object is described as shown in Fig. 2. The measured relationship can be approximated with a fitting equation as follows:

$$ P=\frac{\beta}{d^{\alpha}}, $$

(1)

where *P* means the number of pixels of the object width shown in red arrow in Fig. 1, *d* means the distance from the camera, and *α* and *β* mean constant values. In the case of Fig. 2, *α* and *β* are measured as 0.965 and 214.59, respectively.

### Relationship between depth values and object size

When the zoom motion of an object occurs between the current and reference picture, a size of the object is zoomed as the distance moved toward the camera. Therefore, the size of the reference block should be determined through the distance in order to estimate the object motion which has zooming. The depth information has distances from the camera at each pixel. Therefore, the zoom ratio between the current and reference blocks can be calculated through the depth information. The averages of the depth values in the current and reference blocks are assumed as distances of each block. If the zoom ratio *s* is defined as the ratio of the number of the pixels between the current and reference blocks, *s* is calculated by substituting the number of pixels of the current and reference blocks into Eq. (1) as follows:

$$ \left. s=\frac{P_{\text{ref}}}{P_{\text{cur}}}=\frac{\beta}{(\overline{d_{\text{ref}}})^{\alpha}} \middle/ \frac{\beta}{(\overline{d_{\text{cur}}})^{\alpha}} \right., $$

(2)

where \(\overline {d_{\text {cur}}}\) and \(\overline {d_{\text {ref}}}\) mean the representative depth values of the current and reference blocks, respectively, and *P*_{cur} and *P*_{ref} mean the number of pixels of the current and reference blocks, respectively. A simplified expression of Eq. (2) is as follows:

$$ s=\left(\frac{\overline{d_{\text{cur}}}}{\overline{d_{\text{ref}}}}\right). $$

(3)

When a size of the current block is assumed as *m*×*n*, the size of the reference block is determined as *s**m*×*s**n*. The reference block is scaled by interpolation so that the size of the reference block is equal to the size of the current block. Figure 3 shows a flowchart of the proposed zoom motion estimation for the color video and Fig. 4 shows processes of the proposed method.

Figure 5 shows an example of zoom motion estimation for the color video. Areas surrounded by the red rectangle in Fig. 5 a and b are the 8×8 current and reference blocks, respectively, and Fig. 5 c and d show depth values in each blocks. \(\overline {d_{\text {cur}}}\) and \(\overline {d_{\text {ref}}}\) of 8×8 current and reference blocks in depth pictures are about 2322.312 and 2469.523, respectively, so *s* is calculated as about 0.940 if *α* is set to 0.965. Therefore, the size of the reference color block is determined to 7×7. Then, a 7×7 reference color block is scaled so that the reference block size is equal to the current block size. The mean square errors (MSEs) of conventional and proposed motion estimation methods are about 169.734 and 74.609, respectively. These results shows the proposed zoom motion estimation method is more accurate when the object in the video has zoom motion.

### Zoom motion estimation for depth video

In depth video, the distance of an object from the depth camera is changed when the object has zoom motion, so the depth values of the object are changed as shown in Fig. 6. Therefore, not only the size but also depth values of the object should be considered for the zoom motion estimation for depth video.

A method of 3D scaling is introduced for the zoom motion estimation for depth video. 3D scaling means that depth axis scaling has been added to the 2D spatial scaling that scales the block size. The flowchart of 3D scaling is shown in Fig. 7.

In 3D scaling, the zoom ratio calculation and the size determination of a reference block are the same as the processes of zoom motion estimation for previous color video. Then, the depth values of the size-scaled reference block are scaled by the following equation:

$$ R_{i}(i,j)=s\times R(i,j), $$

(4)

where *R*(*i*,*j*) and *R*_{i}(*i*,*j*) mean original and scaled depth values in position (*i*,*j*), respectively.

Figure 8 shows an example of zoom motion estimation for the depth video. Areas surrounded by the red rectangle in Fig. 8a and b are the 8×8 current and reference blocks, respectively, and Fig. 8 c and d show the depth values in each block. \(\overline {d_{\text {cur}}}\) and \(\overline {d_{\text {ref}}}\) for each 8×8 block are about 679.625 and 776.969, respectively, so *s* is calculated as about 0.874. If *α* is set to 0.965, the reference block size is determined as 7×7 as shown in Fig. 8 e when the current block size is 8×8. Then, a 7×7 reference block is scaled by the spatial scaling so that the reference block size is equal to the current block size. After that, depth values in 2D scaled reference block is scaled as shown in Fig. 8 g. MSEs of conventional and proposed methods are about 9482.97 and 3.48, respectively. These results show that the 3D scaling improves an accuracy of the motion estimation for the depth video.

### Zoom motion estimation for variable-size block

The video coding standard provides the variable-size block that groups blocks which have similar MVs in order to reduce the number of coding blocks. In the motion estimation of H.264/AVC [1–4], the size of variable-size block is allowed to be 16×16, 16×8, 8×16, and 8×8 when the macroblock size is 16×16 and 8×8, 8×4, 4×8, and 4×4 when the macroblock size is 16×16. Figure 9 shows the division of a macroblock in the variable-size block. The modes of variable-size block are determined by comparing sum of absolute errors (SAEs) or sum of square errors (SSEs) of each variable-size block.

In addition, an introduction of the variable-size block can solve a problem that is difficult to determine the representative depth value of a mixed block having foreground object and background. For the mixed block, the representative depth value is determined as an average value of the depth values of background and foreground, and then this causes the inaccurately zoom ratio. This problem can be solved by dividing the block into smaller size blocks so that each block has only background or foreground object.

The proposed method can provide estimation for variable-size block. The variable-size block is applied independently to both color and depth videos. When the size of sample block is 16×16, SAE for the original block and sums of SAEs for partitioned block are 16×16, 16×8, 8×16, and 8×8. In motion estimation for partitioned block, coding of each MVs for partitioned block should also be considered. In the case of comparing between 16×16 and 16×8 variable-size blocks, the equation for comparing SAEs is as follows:

$$ \text{SAE}_{16\times16}\geq \sum {\text{SAE}_{16\times8}}+T_{16\times8}, $$

(5)

where SAE_{16×16} and SAE_{16×8} mean SAEs for original block and partitioned block as 16×8, respectively, and *T*_{16×8} means a threshold considering MVs. If the 16×16 sample block satisfies Eq. (5), this block can be partitioned into 16×8.