Optimal SP frame selection and bit budget allocation for mobile H.264 video streaming

Mobile video streaming services are challenging, as they obey several system constraints, such as random access facilities, efficient server storage, and flexible rate adaptation. Rate adaptation can be performed by means of seamless switching among different encoded bitstreams. The H.264 video coding standard explicitly supports bitstream switching using specific frame coding modes, namely switching pictures (SP). Locations of SP frames affect the overall bit rate and quality of streamed video. In this study, we address the issue of optimal joint selection of the SP frames locations and bit budget allocation at frame layer. The optimization is carried out via a game theoretic approach under assigned system constraints on the overall streaming rate and the maximum random access delay. Numerical simulations show that our frame layer optimal encoding procedure brings advantages in terms of several characteristics of the streamed video, encompassing enhanced rate-distortion, reduced transmission buffer occupancy, equalization of the transmission delays, and more efficient switching.


Introduction
Mobile video streaming services are experiencing a boost in cellular networks [1] or in multimedia wireless sensor networks [2] as well as in vehicular applications [3]. In mobile streaming services, the available bandwidth randomly varies, possibly due to changes in network conditions, terminal mobility, and/or handover in heterogeneous networks. The streaming server can react to these variations by streaming content extracted from differently pre-encoded versions of a given video sequence, i.e. by performing bitstream switching. Bitstream switching can be enabled by encoding INTRA frames coded without reference to any other coded frame. This would result in a significant coding cost, a less efficient bandwidth occupancy, an augmented transmission buffer storage, and worsened frame transmission delay and jitter. One of the new features of H.264 is a new coding mode named switching picture (SP), which allows drift-free bitstream switching [4,5].
Since their introduction, SP frames gathered the attention of the research community due to their unique characteristics. Applications space from streaming services [6][7][8], to error control [9]. The optimal selection of SP frame location has recently been addressed in [10]. Within *Correspondence: gaetano.scarano@uniroma1.it DIET, Universitá "La Sapienza" di Roma, via Eudossiana 18, 00184 Roma, Italy the framework of multiview video coding, the use of SP frames is currently under investigation to allow drift-free switching among different views [11,12].
Insertion of the so-called primary SP frames into a mobile streamed video offers a set of candidate switching locations. The switch among the differently encoded versions of the video sequence is realized at need via the transmission of a complementary encoded representation, named secondary SP frame. Since both primary and secondary SP frames encompass a motion compensation stage [13], bitstream switching is provided without resorting to the transmission of a dedicated INTRA frame. During the encoding phase, the locations for switching frames are selected, and both primary and secondary SP frames are pre-encoded and stored at the server side. During the streaming phase, primary or secondary SP frames are transmitted at user convenience, depending on whether a switching is performed or not. As a side effect, SP frames also provide error resilience, which is an important issue in mobile communications. In [4], for instance, SP frames are integrated in a framework where switching is performed within a single compressed stream to achieve both error resilience and rate scalability.
Theoretical and empirical rate-distortion curves of SP frames have been provided in [5]. The rate distortion curve of SP frames unfavorably compares with those of PREDICTED (P) frames, thus limiting the adoption of SP http://jivp.eurasipjournals.com/content/2012/1/18 coding mode in mobile video streaming. In this respect, it is clear how the choice of the proper frame coding mode itself significantly affects the overall rate (and quality) of the streamed video. On the other hand, the maximum distance between two consecutive SP frames is usually assigned as a system constraint depending on the desired degree of accessibility. Still, there is a degree of freedom on where to locate the SP frames along the sequence. Large margins of quality improvement-or of bit savingcan be observed by allocating SP frames along the video sequence in accordance with a suitable optimization criterion as well as by optimally allocating the available bit budget among the different frames.
On account of these considerations, in this article, we consider a video streaming framework where different versions of the same sequence are encoded at different qualities and switching among these is realized only by means of SP frames, and we jointly address the problems of SP frame location and bit budget allocation, via a game theoretic approach. The former application of game theory in video coding has been presented in the pioneering work [14], where the authors optimize the perceptual quality of the decoded sequence while guaranteeing fairness in bit allocation among macroblocks via a game theoretic approach. Here, we select the optimal SP frame locations and the optimal bit allocation that maximize the overall quality of the encoded sequence. Specifically, we extend the preliminary results in [15], and we formulate a game to optimize: (i) the bit allocation between different frames of the video sequence and (ii) the frame coding mode selection. Optimization is carried out under highlevel system constraints, such as the temporal distance between successive SP frames and the overall bit budget available to encode the sequence.

Related studies
State-of-the-art works on SP frames mostly focus on the realization of novel coding techniques aimed at reducing the allocated budget for SP frames. Sun et al. [16] describe a technique to improve the coding efficiency of the SP frames by limiting the mismatch between the prediction reference and the frames to be encoded. In [17], it is shown that by appropriately choosing reference pictures, the size of secondary SP frames can be reduced by up to 40% for random-access and up to 2% for rate-switching, without affecting the decoded sequence quality. In recent literature, the problem of coding mode selection has been deeply discussed. In [18], a low-complexity procedure to address INTRA mode selection is proposed, while in [19] a rate-distortion approach is employed to derive coding mode assignment procedure for intra, predicted and bidirectional predicted slices. In [10], a scheme to select the best switching points among the encoded bitstream has been introduced in the framework of a specific bandwidth reservation scheme, namely the socalled downstairs reservation scheme. The downstairs reservation scheme is based on reserving the maximum bit rate of the encoded sequence until the frame corresponding to such a maximum is transmitted; then, the reserved rate is reduced to the next highest bit rate and so on. Altaf et al. [10] propose to select as switching points in the bitstream those frames where a change in the reserved bit rate is observed. The resulting SP frame allocation scheme allows the SP frame to be transmitted when the receiver buffer is supposed to be empty, with a minimization of the wasted bits after the bitstream switching [10]. The SP frame selection scheme proposed in [10] is not explicitly related to any optimality criterion on the encoded sequence quality. Besides, in spite of its simplicity, the scheme in [10] is suitable only under a particular reservation scheme and, since the SP frames must coincide with the changes in the reserved bit rate, the degree of accessibility may be severely limited. Thereby, it is worth seeking a procedure for coding mode selection and bit allocation independent of the rate reservation scheme possibly implemented in the streaming system; besides, the procedure should allow the user to choose the desired degree of accessibility.

Organization of the paper
In this study, we introduce an optimization procedure for coding mode selection and bit allocation derived under a game theoretic framework. Specifically, we formulate the optimization problem by representing the frames of a sequence as players whose strategy is the choice of the coding mode and the allocated bits and whose goal is the maximization of the overall sequence quality. Such an encoding optimization procedure is beneficial in different respects, ranging from rate/distortion of the encoded sequences, to network-related issues such as equalization of the transmission delay and transmission/playout buffer load.

Optimal SP selection and bit budget allocation
Here, we carry out a frame layer optimization of SP frame selection and bit budget allocation for mobile H.264 video streaming resorting to a game theoretic approach. Since the video encoder controls the resource allocation among different frames in a joint fashion, we recast the problem of coding mode selection and bit allocation in terms of strategy selection in a cooperative game.
Let us then consider a reference streaming framework where the server is equipped with K versions of the same sequence. Each of these flows is encoded at a different quality. The server simultaneously transmits these flows in multicast to the clients. Each client automatically synchronizes to the flow that better matches the experienced channel conditions and the expected video http://jivp.eurasipjournals.com/content/2012/1/18 quality. Seamless bitstream switching among the different flows is enabled only via the employment of primary and secondary SP frames.
Setting the maximum temporal distance τ max between two consecutive SP frames results in a system constraint on the random access delay. Thereby, the maximum temporal distance τ max between two consecutive SP frames is chosen so as to cope with the achievable degree of flexibility. The choice of the maximum distance between two consecutive SP frames corresponds to a maximum number of frames between switching points, N max = f 0 · τ max , f 0 being the video sequence frame rate. To satisfy this constraint on the maximum number of frames between switching points, the video sequence is partitioned in shorter subsequences of N = floor(N max /2) frames. In each subsequence, exactly one frame shall be coded as a switching one, so as to comply with the choice of τ max . Here, we exploit a game theoretic approach to jointly address the problem of coding mode assignment, i.e. the problem of the selection of the frame where the switching is enabled to occur, and the problem of resource allocation, once the coding modes are correctly assigned.
The game is described as follows: • the players of the game are the N frames within a subsequence; • the player's strategy is given by its coding mode and by the number of bits allocated for coding; • the player's utility is its visual quality after decoding.
We wish to encode the N frames in the sub-sequence at the target bit rate of R [ bit/s]. The overall bit budget available for the N frames is B = RN/f 0 (bit).
To elaborate, let us denote by c i the coding mode assigned to the frame i = 0, . . . , N − 1, and by c = Here, we consider the case where c i represents a binary choice between P and SP coding mode. a Let us remark here that once the number of allowed frames for each coding mode in each subsequence has been set, the N-tuples c ∈ M are different permutations of the same values. b Let r i be the number of bits allocated to the ith frame and let u i = u i (c i , r i ) denote the utility of the ith player, i.e. the visual quality of the ith frame, i = 0, . . . , N − 1. Each player is characterized by the initial utility u 0 i , which measures the minimal visual quality that must be guaranteed, and by the corresponding number of allocated bits r 0 i required to achieve the quality u 0 i . In assigning the minimal quality that must be guaranteed to each frame u 0 i , i = 0, . . . , N −1, different priors can be adopted.
In the video coding framework, the Nash-Bargaining solution [20] can be found by maximizing the following objective function [21]: under the following three constraints: The visual quality u i = u i (c i , r i ) of the ith frame is a value related to subjective perception, possibly affected by interaction between different media, and it is therefore hardly captured by an analytical relation (see [22,23] for a comprehensive survey on the subject). The approach in [14] imposes a linear relation between the bits assigned to an image area, namely a macroblock, and its resulting visual quality after decoding. This choice has the merit of leading to an analytically tractable solution for the maximization in (1). Here, we recall the relation formerly found in [14] and we extend it in order to take into account also the different coding efficiency corresponding to different frame coding modes. Towards this aim, we relate the visual quality u i of the ith frame to the root mean square value (RMSV) σ i of the innovation process between frame i and frame i−1. Specifically, we assume where g(σ i ) is a non-decreasing function c of σ i . The factor K(c i ) represents the coding efficiency of the coding mode option associated with the ith frame. The value of K(c i ) affects u i as a quality penalty reflecting the fact that differently encoded frames will exhibit different quality under the same bit budget. The values for K(c i ) can be directly derived from the rate distortion curves [5] at a typical distortion value, or assigned through an a priori criterion.
Under such a quality model, the objective function (1) rewrites as follows pliers, the maximization of g (m) (r 0 , . . . , r N−1 ) with respect to the r i 's leads to the following optimal allocated rates: d Substituting (5) in (4) yields the following optimal value of the objective function corresponding to the coding mode assignment c (m) : The maximum of (1) is then found as the supremum of the finite set The optimal coding mode assignment c (m opt ) can be stated as follows Hence, the optimal allocated bits are obtained as r In order to find m opt , let us recall that, due to coding constraints, all the N-tuples c (m) ∈ M are permutations of the same values, let us say c collecting the values of the weight function K(·) over the elements of the N-tuple c (m) is affected by different choices of c (m) only in the form of a permutation of its elements. As a consequence, it is easily seen how the denominator in (6) takes the same values for all the possible choices of c (m) , and hence has then no effect in (7).
Moreover, since the numerator of (6) is non-negative because of the constraints (2), the optimal coding mode N-tuple is obtained as the coding mode assignment c (m) that minimizes Since different choices of c (m) affect the weights k Under the hypothesis of uniform minimal quality all over the sequence, i.e. u 0 i = u min , i = 0, . . . , N −1, the solution provided by (9) corresponds to the choice of progressively assigning the less efficient coding mode (higher values of the coding cost k i ) to the frame with lower amounts of innovation (smaller values of g(σ i )).
The condition (9) directly comes from the following Proposition, whose proof is reported in Appendix.

Proposition 1.
Given the finite set A = {a 1 , . . . , a n } such that a i ∈ R + and a i ≥ a i−1 , ∀i = 1, . . . , n, and the finite set P = {p 1 , . . . , p n } such that p i ∈ R + and p i ≤ p i−1 , ∀i = 1, . . . , n, then, for all the possible per- To recap, our optimal allocation and coding mode selection leads to the following criteria for maximization of the overall quality of the decoded sequence (1): Coding mode selection criterion: the coding mode assignment used in the subsequence of N frames is performed by coding with the less efficient coding modes the frames with the smallest amount of innovation.
Bit budget allocation criterion: the bit allocation is performed in two steps: at first an initial allocation is performed in order to satisfy the minimum quality constraints, then the remaining bit budget after this first assignment is fairly redistributed among the frames in the subsequence of N frames.
The optimal coding procedure optimized according to these criteria is summarized in Appendix, where also a few implementation details are discussed.
The above-summarized criteria resulting from the maximization of the objective function (1) basically lead to smoother instantaneous fluctuations of the video bit rate. The smooth traffic behavior is a result of the cooperative game underlying the optimized procedure of the coding mode selection and frame bit allocation. In fact, the constraint on the initial quality u 0 i is satisfied with minimal initial budget N−1 i=0 r 0 i when the less efficient SP coding mode is assigned to the frame with the minimum http://jivp.eurasipjournals.com/content/2012/1/18 amount of innovation. This assignment reduces the unbalance of the initial bit budgets r 0 i required to guarantee the desired initial quality of the different frames. Besides, after the initial allocation, the remaining bit budget is fairly allocated among the frames. Hence, the fair allocation resulting from the joint quality improvement pursued by optimizing the objective function (1) in a cooperative fashion results in a smoother behavior of the video traffic.
Remarkably, the smoothness of the traffic of the encoded video resulting from such cooperative optimization is expected to be beneficial in a realistic network scenario, in terms both of transmission delay jitter of the video traffic and of load of network buffers. These benefits will be quantitatively assessed in "Experimental results" section.

Final remarks
The above detailed coding mode selection and bit budget allocation criteria depend on the linear visual quality model in (3). The selection of such a model for the visual quality u i of the ith frame corresponds to a linear approximation of the peak signal-to-noise ratio (PSNR) empirical curves around a selected value of the bit rate (see for instance Figure 9 in [24]). Interestingly enough, should the following different linear model for the visual quality penalty over u i due to the coding efficiency K(c i ) be adopted it can easily be proved that such a model results in exactly the same optimal criteria obtained with the quality model in (3). Let us finally observe that once the optimal bit budget per frame is allocated, any rate control scheme can be employed to encode the sequence; for instance, the algorithm described in [14] can further be applied at a macroblock layer for individual frame coding.
To sum up, while recent literature works refer to within-frame optimization techniques, the scope of our optimization includes different encoded frames, jointly taking into account system constraints and rate-distortion aspects. Besides, we have demonstrated that the optimization problem phrased in (1) can separately be solved w.r.t. the two tuples c 0 , . . . , c N−1 and r 0 , . . . , r N−1 . Moreover, although the finite and discrete nature of the tuple c 0 , . . . , c N−1 would allow to find the optimal c i 's using an exhaustive search, we have provided a closed form solution.
The so found novel resource allocation procedure extends to frame-level the macroblock-oriented resource allocation procedure found in [14], also accounting for the unbalance between coding mode efficiency.

Experimental results
In this section, we show some experimental results of the herein analyzed optimized method, obtained using the H.264 codec [25] on different test sequences in QCIF and CIF format at the reference frame rates of f 0 = 10 and 30 fps.
The optimization is performed by first partitioning the sequence in groups of N frames with N = f 0 · 1, corresponding to at least one SP frame every 2 s, so as to achieve a good compromise between accessibility and compression efficiency. For every group of N frames, the switching coding mode c i = SP (SP frame) is assigned to the frame with minimum RMSV of the innovation process; the remaining frames in the subsequence are coded as c i = P (P frame). The optimization algorithm is then applied by evaluating the bit budget for each frame to be encoded. Following the guidelines in Step IV of "Optimal coding procedure" in Appendix, we compute the initial frame bit budget r 0 i , i = 0, . . . , N − 1 by applying a coarse quantization coding stage with quantization parameters fixed so as to assure the desired average initial quality set to PSNR 0 = 20 dB. e The optimal frame bit budgets r i , i = 0, . . . , N − 1 are then evaluated by fairly redistributing the remaining bit budget according to (5). Finally, the sequence is encoded under the per frame bit budget constraints r i . A coarse rate control procedure is implemented using a constant within frame QP; such strategy can be further refined using a spatially varying QP as described in [14].
For comparison, we consider also a suboptimal coding approach with fixed SP periodicity (one SP frame each N frames), using quantization parameters chosen according to the analysis in [5].
We first apply the presented coding mode selection and allocation scheme to encode the test sequences "Foreman" and "Coastguard" in QCIF format and the test sequence "Mother and Daughter" in CIF format at f 0 = 10 fps. Figures 1, 2 and 3, respectively, plot the bit/frame budgets r i obtained employing the optimal encoding procedure on the QCIF test sequences "Foreman" and "Coastguard" at the nominal rate of 90 Kb/s and the CIF sequence "Mother and Daughter" at the nominal rate of 900 Kb/s. For comparison, we have evaluated the bit/frame budgets r PER i of the suboptimal (periodical SP insertion) strategy.
In all these cases, maximization of the objective function (1) leads to smoother instantaneous bit rate fluctuations; the effect is even more noticeable on the sequence in CIF format, showing that the higher the objective bit rate, the more effective the optimal allocation scheme is.
Noteworthily, the fair allocation strategy leading to the maximization of (1) is always achieved under an improvement of the PSNR or of the coding gain as shown in Table 1, summarizing the average bit rate and PSNR for the optimal and suboptimal strategies in different http://jivp.eurasipjournals.com/content/2012/1/18 encoding conditions. We observe that the improvement is more relevant at higher spatial or temporal resolutions. Besides, in order to test the performance of the herein presented allocation scheme over more realistic video contents, we have also considered the encoding of a group of N = 300 frames extracted from the movie "Fear and Loathing in Las Vegas" (labeled as "Movie" sequence in the following) encompassing a scene change. The "Movie" sequence is encoded in CIF format at f 0 = 30 fps at the nominal rate of 480 Kb/s. Figure 4 reports the bit/frame budgets r i while Figure 5 reports the PSNR per frame. Inspection of the plots in Figures 4 and 5 confirms how the optimal allocation scheme allows to better share the available bit budget among the frames to be encoded with the aim of maximizing the encoded video quality.  Let us now show how the smoothness of the instantaneous bit rate due to the cooperative optimization strategy is beneficial in a realistic network scenario. For the sake of concreteness, let us refer to the simplified model of the end-to-end communication link depicted in Figure 6. The scheme comprises a transmission buffer of size B T , a channel at fixed nominal rate R C , and a playout buffer of size B R .
We first show that the adoption of the optimal encoding strategy is beneficial in terms of transmission delay jitter. In order to quantify such benefit, we compute the transmission delays d i = r i /R C , associated to the transmission of the frames of the encoded video sequence. In Figure 7, we plot the cumulative distribution of the delays d i = r i /R C , in the case of the CIF sequence "Mother and Daughter" encoded at a nominal bit rate of 900 Kb/s with the optimal and suboptimal (periodical SP insertion) allocation schemes. In this case, the channel rate R C has been set to R C = 1000 Kb/s, corresponding to the nominal source rate plus a gross 10% margin. Results show that, resorting to the game theoretic allocation scheme, the transmission delay is always less than 100 ms; on the contrary, when the periodical SP allocation is employed, up to 10% of the frames suffer a higher delay.
The smoothed nature of the video traffic generated using the described optimized procedure also affects the transmission buffer load. We have compared the frame loss rate observed while filling a buffer with a sequence encoded according to the optimal allocation scheme and the frame loss rate experienced by a sequence with periodic SP frames. The transmission buffer is managed with a first-in first-out (FIFO) policy; a frame is stored in the buffer only if there is available space for it, otherwise it is lost. In Figures 8, 9, and 10, we compare http://jivp.eurasipjournals.com/content/2012/1/18 the frame loss rate obtained using the optimal and the periodical SP allocation scheme for the sequences "Coastguard" (QCIF format, 90 Kb/s, R C = 100 Kb/s), "Mother and Daughter" (CIF format, 900 Kb/s, R C = 1000 Kb/s), and "Movie" (CIF format, 480 Kb/s, R C = 500 Kb/s). In all the considered simulations, a reduction of the frame loss rate is observed when the optimal frame layer encoding scheme is adopted.
Let us now refer to the same scenario as in Figure 6, when a specific bandwidth reservation scheme, namely the Downstairs Reservation (DR) scheme is employed. As variable bit rate (VBR) video data are likely to exhibit severe bit rate fluctuations on both short and large scales, suitable smoothing procedure are designed to realize the transmission of VBR by means of a series of constant bit rate (CBR) segments. Video server is then required to reserve the correct amount of bandwidth to effectively transmit each segment. Several techniques to achieve such piece-wise CBR reservations have been proposed in recent literature. Among others, the DR scheme exhibits the desired property of avoiding upwards bandwidth reallocations, that is, every CBR segment is characterized by a bit rate equal or less than the previous segments. Such a characteristic deeply simplifies the network admission control procedures. The DR scheme starts by reserving a channel rate equal to the peak bit rate of the encoded sequence; after the peak occurrence, the reserved rate is reduced to the next peak and so on. Specifically, let us suppose to have encoded the N frames of a given sequence so that the ith frame is assigned with r i bit. The DR procedure starts by evaluating the following quantities The A j 's represent the average transmission bit rate for subsequences composed of j frames. The largest of these averages indicates the frame for which the maximum instantaneous bit rate is observed in the encoded sequence; such a value is employed for bandwidth reservation of the first CBR segment. Once the largest A j has been identified, say for j =ĵ, so that the first segment spans the firstĵ frames, new averages are evaluated starting from the (ĵ + 1)th frame and the largest values among the A l 's is employed for the reservation of the following segment. The procedure is iterated for the entire sequence.
According to the DR guidelines reported in [10], we modify the network scenario in Figure 6 so that the channel rate R C varies according to the DR criterion. The transmission buffer is managed with a FIFO policy. The playout buffer is emptied at the nominal sequence frame rate, and the frames are extracted from the buffer according to their decoding order. The playout process starts with a playout delay D. A frame correctly transmitted is considered lost at the receiver side either if there is not enough space in the buffer for its storage or if it is received after its decoding deadline. f All the received frames exhibit a different delay at the playout buffer, due to the transmission buffer queue and the random delay introduced by the channel. We model such random delay following the channel model described in [26]. We compare the frame loss rate for the optimal coding mode assignment and bit budget allocation and the suboptimal method with fixed SP periodicity. For comparison, we also consider the DR-oriented allocation strategy presented in [10]. We consider, for this experiment, the test sequence "Foreman" in QCIF format, encoded at a nominal bit rate of 100 Kb/s. Setting f 0 = 30 fps, we obtain 9 steps of the DR rate within the 300 frames long sequence.  Within the approach in [10], the number of inserted SP frames equals the number of steps of the downstairs reservation function; for fair comparison, we have employed the same number of SP frames, namely nine SP frames for the settings of this experiment, also for the presented approach and for the suboptimal scheme with fixed SP periodicity. Table 2 summarizes the averaged bit rate and the PSNR attained, under these settings, by using the optimal strategy, the suboptimal one with fixed SP periodicity, and the approach in [10]. Figure 11 reports the frame loss rate at the transmitter buffer, while Table 3 reports the overall end-to-end frame loss rate for different sizes of both the transmitter and the playout buffer, for a playout delay D = 5 s. Figure 12 reports the overall end-to-end frame loss rate for various sizes of the playout buffer when the transmission buffer size is fixed in order to obtain a transmission frame loss rate equal to 5% for all of the approaches (4.2 Kb for the optimal approach and 6.6 Kb for the periodical SP allocation and for the approach in [10]). Simulation results show that also in this specific scenario, under which the scheme in [10] has been designed, the coding mode assignment and the resource allocation obtained via the game theoretic approach allow to significantly reduce the buffer losses, and hence to enhance the quality of the received stream. Also in this case, we have assessed the performance of herein presented approach over a real video content. Specifically, we have considered a portion of N = 600 frames extracted from the recording of a soccer match (from now on labeled as "Sport" sequence) in CIF format at a frame rate of f 0 = 30 fps at a nominal rate of 450 Kb/s. The said "Sport" sequence encompasses a scene change. We have run the DR scheme over 6 windows made up by 100 frames in order to obtain the number of SP frames introduced by the approach in [10], and then we have employed the same number of SP frames, namely 20 SP frames for the settings of this experiment, also for Table 2 Average bit rate and PSNR measurements for the optimal presented allocation scheme, the suboptimal strategy (periodical SP allocation) and the work in [10] ("Foreman" test sequence, 30 fps, QCIF format)  the presented approach and for the suboptimal scheme with fixed SP periodicity. Table 4 summarizes the averaged bit rate and the PSNR attained, for the "Sport" sequence, by using the optimal strategy, the suboptimal one with fixed SP periodicity, and the approach in [10]. Figure 13 reports the frame loss rate at the transmitter buffer, while Figure 14 reports the overall end-to-end frame loss rate for various sizes of the playout buffer when the transmission buffer size is fixed in order to obtain a transmission frame loss rate equal to 5% for all of the approaches (20 Kb for the optimal approach and 55 Kb for the periodical SP allocation and for the approach in [10]). Finally, for completeness, we also report the results obtained using the same settings in absence of the DR scheme. Specifically, the transmission buffer is emptied at a constant rate equal to the nominal sequence bit rate and the channel rate R C is constant and equal to R C = 100 Kb/s for the "Foreman" sequence, and to R C = 500 Kb/s for the "Sport" sequence. Figures 15 and 16 show, for both of the sequences, the frame loss rate evaluated  Figure 12 End-to-end frame loss rate obtained using the DR scheme, for the different coding strategies (optimal, periodical SP allocation, and SP allocation [10]) versus the playout buffer size (B R ) ("Foreman" test sequence, QCIF format, 30 fps, and at a nominal bit rate of 100 Kb/s, 5% transmission buffer frame loss rate).
at the transmission side, i.e. caused only by the transmission buffer overflow, for different sizes of the transmission buffer (expressed in Kb). Simulation results show how the described approach outperforms both the periodical SP insertion and the allocation criterion introduced in [10] for all the buffer sizes. As previously stated, this is explained by the smoothed nature of the video traffic generated using the described optimized procedure. Tables 5,  6, and 7 report, for both of the sequences, the overall endto-end frame loss rate at various sizes of the transmission and the playout buffers. Also for the overall frame loss rate our allocation method suffers a minor number of losses with respect to the other approaches. Finally, Figures 17  and 18 report the overall end-to-end frame loss rate for various sizes of the playout buffer when the transmission buffer size is fixed in order to obtain a transmission frame loss rate equal to 5% for all of the approaches. In any case, the optimal strategy exhibits the best performance in terms of transmission buffer load. All the presented results clearly highlight the impact of the optimal criteria for coding mode assignment and bit allocation with respect to state-of-the-art approaches. Table 4 Average bit rate and PSNR measurements for the optimal presented allocation scheme, the suboptimal strategy (periodical SP allocation) and the work in [10] ("Sport" test sequence, 30 fps, CIF format) Until now, we have considered optimization of primary SP frames, i.e. the random access frames of the encoded video sequence. When a switching is requested during a streaming session, the server sends a different version of the access frame, namely the secondary SP frame, for decoder buffer synchronization purposes. Since also secondary SP frames are encoded by motion compensation, optimization of primary SP allocation is beneficial for secondary SP bit allocation too. Numerical simulations have shown a variable gain of the optimal allocation scheme over the suboptimal one; in the case of bitstream  Figure 14 End-to-end frame loss rate obtained using the DR scheme, for the different coding strategies (optimal, periodical SP allocation, and SP allocation [10]) versus the playout buffer size (B R ) ("Sport" test sequence, CIF format, 30 fps, and at a nominal bit rate of 450 Kb/s, 5% transmission buffer frame loss rate). http://jivp.eurasipjournals.com/content/2012/1/18 switching between 70 and 100 Kb/s version of the QCIF sequence "Foreman", we have observed a reduction up to 20%, with an average value of 10%, of the bits allocated to the SP secondary frames.

Conclusion
In this study, we have presented a procedure for optimal frame-level coding mode selection and bit budget allocation, with application to mobile H.264 video streaming. The optimization procedure is here derived via a game theoretic approach. The cooperative game underlying the optimized procedure of the coding mode selection and frame bit allocation basically leads to smoother   We wish to prove that The proof of (10) will be carried out by induction. We will show that (i) (10) is true for n = 2; (ii) if (10) is true for n = m − 1, then it is true also for n = m.
The induction basis (i) is easily proved by simple algebra. In fact, when n = 2 we have that Given the ordering of the elements of the sets A (2) and P (2) , the term (a 1 − a 2 )(p 2 − p 1 ) is always non-negative, and hence (11) proves (i).
Having proved the induction basis (i), let us then assume that for the sets A (m−1) and P (m−1) ordered as stated in the hypothesis. Let us then consider the ordered sets A (m) and P (m) of cardinality m. We show that, under these settings, (12) implies that which, in turn, is rewritten as follows whereh(i) ∈ F (m−1) is an auxiliary permutation defined as follows: By adding and subtracting a m p m to the right-hand side of (14), we have   (i) which, in turn, can be expressed as follows 1) , and because of (12), we have ) is non-negative given the ordering of the sets A (m) and P (m) , we have

Optimal coding procedure
Here, we summarize the coding algorithm steps, optimized according to the criteria exposed in "Optimal SP selection and bit budget allocation" section.
Step I: Sequence Partitioning-The coding optimization algorithm is applied by first partitioning the overall sequence in subsequences of equal length N. In each and every subsequence exactly one SP frame shall be introduced.
Step II: Innovation process RMSV estimation-According to the guidelines provided by Proposition 1, the RMSVs σ i , i = 0, . . . , N − 1 of the innovation process of the N frames in each subsequence are estimated as the RMSV of the motion-compensation residuals and are sorted in ascending order. We observe that, during the coding process, the motion compensation residuals are generated with respect to the decoded reference frame. Here, we estimate the RMSV of the motion-compensation residual with respect to the original reference frame. This design choice is well suited to be implemented in streaming systems, since it leads to allocate the primary SP frames at the same time index in all the encoded bitstreams. This circumstance enables streaming server rate adaptation by seamless switching among pre-encoded bitstreams.
Step III: Coding Mode Assignment-Once the RMSV has been evaluated, the SP coding mode is assigned to the frame with the minimum RMSV of the innovation process. The values for K(c i ) can be directly derived from the rate distortion curves at a typical distortion value, or assigned through an a priori criterion. In the specific case of only two possible coding modes (P or SP), it is sufficient to establish an ordering between K(c i = SP) and K(c i = P), according to the hypothesis of Proposition 1, regardless of their numerical values.
Step IV: Rate Evaluation-After the choice of the coding mode of each frame, the preliminary assignment of the initial rates r 0 i is performed, based on the assignment of the qualities u 0 i , i = 1, . . . , N − 1. Recent investigations on the theoretical and experimental rate-distortion performance of SP and P frames have highlighted that a given level of distortion is achieved by higher rate for SP frames than for P frames [5]. Hence, to avoid initial quality fluctuations, a larger initial bit budget r 0 i is assigned to the SP frame. The r i , i = 0, . . . , N − 1, are then straightforwardly evaluated using (5).
Step V: Frame Coding-Once the bit budget per frame r i has been assigned, the subsequence is ready to be encoded. The optimal frame coding under an assigned bit budget per frame can be performed according to different rate control techniques. For instance the optimal approach presented in [14] can be applied; according to this algorithm, the quantization parameter is properly chosen for each macroblock, in order to meet the fairest bit allocation among macroblocks satisfying the bit budget constraint. If the whole frame is encoded by a single quantization parameter, this latter shall be chosen equal to the minimum value compatible with the assigned value r i . Endnotes a Extension to the case where B frames are also considered is straightforward, provided the number of allowed B frames in the subsequence is fixed. b For instance, if one and only one out of the N frames in each subsequence is allowed to be an SP frame and the other frames are set as P frames, then it results that M = N and all the N-tuples c will exhibit the form c = [ P P · · · SP · · · P], thus differing one from the other only in the location assigned to the SP frame. c As in [14], here we set g(σ i ) = σ α i with α = 0.8. d As the optimal allocated bits evaluated according to (5) are real values, they must be quantized to provide an input to the encoder. For instance, the closest integer to r (m) i can be considered as the assigned rate. The quantization loss thus introduced by this approximation comprises the effect of a single bit on the whole frame and it is therefore negligible. e Let us remark that, when the objective function is maximized, all the N frames overcome this limit. The minimum initial quality can be instead regarded as a parameter that allows to determine initial bit budget, http://jivp.eurasipjournals.com/content/2012/1/18 which is assigned in an unbalanced way and, by complement, the residual bit budget, which is assigned on a fair basis. f In-network losses are neglected in this test.