Integer fast lapped transforms based on direct-lifting of DCTs for lossy-to-lossless image coding

The discrete cosine transforms (DCTs) have found wide applications in image/video compression (image coding). DCT-based lapped transforms (LTs), called fast LTs (FLTs), overcome blocking artifacts generated at low bit rate image coding by DCT while keeping fast implementation. This paper presents a realization of more effective integer FLT (IntFLT) for lossy-to-lossless image coding, which is unified lossless and lossy image coding, than the conventional IntFLTs. It is composed of few operations and direct application of DCTs to lifting blocks, called direct-lifting of DCTs. Since the direct-lifting can reuse any existing software/hardware for DCTs, the proposed IntFLTs have a great potential for fast implementation which is dependent on the architecture design and DCT algorithms. Furthermore, the proposed IntFLTs do not need any side information unlike integer DCT (IntDCT) based on direct-lifting as our previous work. Moreover, they can be easily extended to larger size which is recently required as in DCT for the standard H.26x series. As a result, the proposed method shows better lossy-to-lossless image coding than the conventional IntFLTs.


Introduction
The most popular image/video compression (image coding) standards, JPEG [1,2] and H.26x series [3,4], employ discrete cosine transform (DCT) [5] at their transformation stages. DCT can be basically classified into types I to IV (DCT-I to IV) and has numerous fast implementations [6][7][8][9][10] and applications for signal processing. In them, DCT-II, so-called DCT, has excellent energy compaction capability and DCT-III is its inverse transform, so-called inverse DCT (IDCT). However, DCT generates annoying blocking artifacts at low bit rates because the DCT bases are short and create discontinuities at block boundaries due to non-overlapping. To overcome this drawback, lapped transforms (LTs), which are classified into lapped orthogonal transform (LOT) and lapped biorthogonal transform (LBT), have received much attention. DCT-based fast LTs (FLTs), which are classified into fast LOT (FLOT) and fast LBT (FLBT), are well-known as fast and effective transform for image coding [11]. FLTs are constructed by cascading DCT-II, DCT-III, DCT-IV, *Correspondence: taizo@cs.tsukuba.ac.jp 1 Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba 305-8573, Japan Full list of author information is available at the end of the article rotation matrices with π/4 angles, ±1 operations, scaling factors, a delay matrix, and permutation matrices. To improve the coding performance and reduce the complexity more, LiftLT with VLSI-friendly implementation has been proposed by Tran [12] a . However, the LTs cannot be applied to the lossless mode.
On the other hand, JPEG achieves the lossless mode by using differential pulse code modulation (DPCM) in place of DCT. JPEG 2000 [13] employs 9/7-tap and 5/3-tap discrete wavelet transforms (9/7-DWT and 5/3-DWT) for lossy and lossless modes, respectively [14]. They mean that JPEG and JPEG 2000 do not have compatibility between the lossy and lossless mode. Of course, lossless transform such as 5/3-DWT is applicable to lossy-tolossless image coding. However, its lossy performance is not good compared with 9/7-DWT because each transform is suitable only in each mode. The next standard JPEG XR [15] has solved this problem by achieving lossyto-lossless image coding which is unified lossy and lossless image coding. JPEG XR employs only hierarchical lapped transform (HLT) for both of lossy and lossless modes [16]. The HLT is composed of lifting structures [17][18][19] with rounding operations and achieves integer-to-integer http://jivp.eurasipjournals.com/content/2013/1/65 transform, whereas it does not have enough coding performance, especially for images with many high frequency components. Various lifting-based filter banks (L-FBs) [20][21][22][23][24][25][26][27][28], which contain integer DCTs (IntDCTs) [29][30][31][32][33][34][35], have been researched to improve coding performance. However, these except for IntDCTs are not practical due to the complexity. This paper presents a realization of integer FLT (Int-FLT), which is constructed by lifting structures with rounding operations, for lossy-to-lossless image coding. Although FLT can be easily applied to lossy-to-lossless image coding by simple lifting factorizations of rotation matrices and scaling factors, the obtained integer transform is unsuitable due to large rounding error because of many rounding operations. The conventional IntFLTs also have many operations, whereas the proposed IntFLTs have simple implementations with few operations and direct application of DCTs to lifting blocks, called directlifting of DCTs. The direct-lifting can reuse any existing software/hardware for DCTs b . As a result, although the proposed IntFLTs are apparently sacrificing the complexity to achieve the lossless mode compared with LiftLT, they have a great potential for fast implementation which is dependent on the architecture design and DCT algorithms. Furthermore, the proposed IntFLTs do not need any side information unlike IntDCT based on directlifting as our previous work [35]. Moreover, they can be easily extended to larger size which is recently required as in DCT for H.26x series. Such IntFLT already proposed in [36] cannot achieve enough coding performance due to the orthogonality. This paper introduces IntFLT without such a restriction. Finally, the proposed method shows better lossy-to-lossless image coding than the conventional IntFLTs.

Fast lapped transform (FLT)
An M-channel (M = 2 k , k ∈ N) FLT can be constructed in polyphase structure from components with well-known fast-computable algorithms. One of the most elegant solution is the type-II FLOT. The polyphase matrix E(z) is expressed as [11] z −1 is a delay, and C II , C III , C IV , and S IV are DCT-II, DCT-III, DCT-IV, and type-IV discrete sine transform (DST-IV) matrices whose (m, n)-elements are presented by Since the following relationship between DST-IV and DCT-IV matrices can be established: S IV = DC IV J, Equation 1 can be easily represented by On the other hand, the HLT for JPEG XR is based on FLOT with scaling factors [16]. By inspiring it, the FLT in this paper is defined by where s 1 = s −1 0 which is the restriction for lifting factorization. This is called FLBT in this paper. Since FLOT in Equation 2 is understandably equal to FLBT in Equation 3 with s 0 = s 1 = 1, we use this equation (3) as a representative expression of FLT. The FLT with this polyphase matrix is implemented as shown in the top half in Figure 1.

Direct-lifting structure
In [35], we have presented direct-lifting which is a class of block-lifting [25] known as a more effective lifting structure for lossy-to-lossless image coding than standard lifting structure [17][18][19]. The block-lifting reduces rounding error by merging many rounding operations. The directlifting is a key technology to produce novel IntFLTs. To achieve the lifting, we suppose a processing of two individual M × 1 signals x i and x j by an M × M arbitrary nonsingular matrix T and its inverse transform matrix T −1 , respectively, as shown at the left side of Figure  input signals x i and x j are simultaneously transformed to the output signals y i and y j by T and T −1 as This block diagonal matrix diag{T, T −1 } can be factorized into complete block-liftings such as Thus, the parallel block system of T and T −1 can be efficiently implemented by the block-liftings as shown at the right side of Figure 2. This is a breakthrough structure because any block T and its inverse one T −1 can be directly applied to the block-lifting coefficients without breaking their forms. Although any existing software/hardware for DCT cannot be directly reused for the conventional IntDCTs, we can admit any of them as the lifting blocks when T = C II .

IntFLTs based on direct-lifting of DCTs
This section presents a realization of IntFLT for lossyto-lossless image coding. The IntFLTs have simple implementations with few operations and direct-lifting of DCTs.

FLT in Equation 3
is transferred into the another type so that direct-lifting (4) can be applied. First, Equation 3 is rewritten as where C II C III = I. By moving diag{C II , C II } to the postprocessing part, Equation 5 is rewritten as where C III C II = I andẼ(z) are used to distinguish from the original E(z) in Equation 3. Of course,Ẽ(z) is the same transfer function as E(z). The FLT with this polyphase matrixẼ(z) is implemented as shown at the bottom half in Figure 1.
Next, as already mentioned, we consider the parallel process of two different type FLTs in Equations 3 and 6 as follows: where x i and x j are individual input signals along process direction, and y i and y j are their output signals as shown in Figure 1.

Lifting structure of rotation matrix with π/4 angle
Since W in Equations 3 and 6 and Figure 1 includes the scaling factor 1/ √ 2, we factorize this into lifting structure. In [14], W is simply factorized as  Note that a lifting structure with coefficient 1/2 and rounding operation can be replaced by one adder and one bit-shifter [37], i.e., multiplierless operations. With  Here, note that the one-dimensionally transformed output signals are scaled by 1/ √ 2 and √ 2 as compared with the output signals transformed by normal FLTs as shown in the dashed line box in Figure 3. By considering these scales 1/ √ 2 and √ 2 for the next column process, Equations 3 and 6 are represented again as follows: Similarly, the ith column signals and the jth column signals, i.e., the red and blue areas in Figure 3, are processed by FLTs in Equations 9 and 10, respectively. Consequently, the scales are changed temporarily for fast implementation and restored after two-dimensional transform.

Lifting structure of scaling part
In this subsection, we present lifting structures of each scaling part diag{s 0 I, s 1 I} including in Equations 7 to 10. According to Equation 4, we define a simple realization of integer transform in the scaling part as follows:

Coding gain
This paper designed 8 × 16 and 16 × 32 IntFLTs. First, the comparison of coding gain of the ideal FLTs and the proposed IntFLTs is shown. The coding gain is one of the most important factors to be considered in compression applications. A transform with higher coding gain compacts more energy into a fewer number of coefficients. As a result, higher objective performance such as PSNR would be achieved after quantization. The biorthogonal coding gain is defined as [38] Coding gain [dB] = 10 log 10 σ 2 where σ 2 x is the variance of the input signal, σ 2 x k is the variance of the kth subbands and f k 2 is the norm of the kth synthesis filter. Although the coding gain does not completely dominate all image coding results due to rounding error, it is clear that all of coding gain are not lost as shown in Table 1.
For comparison, the coding gain of LiftLT [12] is 9.5378 (dB) which is higher than the proposed 8 × 16 IntFLTs because this is optimized for lossy coding.

Lossy-to-lossless image coding
Lossy-to-lossless image coding results by the designed IntFLTs are shown in this subsection. As targets for comparison, LiftLT [12], 5/3-DWT and 9/7-DWT for JPEG 2000 [14], HLT for JPEG XR [16], and the conventional 8×16 and 16×32 IntFLTs were applied. The conventional 8 × 16 and 16 × 32 IntFLTs are based on simple threestep lifting factorizations of rotation matrices and scaling factors [14]. The periodic extension was used for image boundaries except for DWTs and HLT. To evaluate transform performance fairly, a very common wavelet-based zerotree coder SPIHT [39] was adopted for all c .  Table 2.
If lossy compressed data is required, it can be achieved by interrupting the obtained lossless bitstream. The comparison of peak signal-to-noise ratio (PSNR) PSNR(dB) = 10 log 10 255 2 MSE where MSE is the mean squared error, is shown in Table 3. Even though the proposed and conventional IntFLTs have same transfer function, the proposed IntFLTs perform better coding than the conventional IntFLTs, especially lossy image coding results show excellent performance. We consider that this is mainly due to the reduction of rounding operations as shown in Table 4 and no large lifting coefficients d . Moreover, note that the proposed IntFLTs have a more effective implementation than the conventional IntFLTs due to the construction with few operations and direct-lifting of DCTs. The direct-lifting can reuse any existing software/hardware for DCTs. On the other hand, LiftLT and 9/7-DWT perform often good lossy image coding because they were designed for the lossy mode. However, it cannot preserve the high frequency components in the images as shown in Figure 4, whereas the proposed IntFLTs, especially the proposed 16 × 32 IntFLT, can preserve them.

Conclusions
This paper presented integer fast lapped transforms (Int-FLTs) for effective lossy-to-lossless image coding, which were constructed by few operations and direct-lifting of discrete cosine transforms (DCTs). Due to merging, many rounding operations and keeping small lifting coefficients by use of direct-lifting, the proposed IntFLTs performed better coding than the conventional IntFLTs in lossy-tolossless image coding. Also, the proposed IntFLTs can preserve the high frequency components in the images. Since the direct-lifting can reuse any existing software/hardware for DCTs, the proposed IntFLTs have a great potential for fast implementation which is dependent on the architecture design and DCT algorithms. Furthermore, the proposed IntFLTs do not need any side information