A static video summarization method based on the sparse coding of features and representativeness of frames

EURASIP Journal on Image and Video Processing

Table 1 UTE dataset results averaged w.r.t. subjects

Method	Video name	N _RE	N _E	N _LK	N _K	Precision	Recall	F-measure
Uniform sampling	P01	6.3	9.5	7.6	12	0.636	0.711	0.656
	P02	7.8	10.2	8.9	15	0.594	0.778	0.668
	P03	4.5	8.3	5.9	12	0.492	0.545	0.513
	P04	6.4	9.0	7.6	16	0.477	0.718	0.565
	Avg.	6.2	9.3	7.5	13.8	0.550	0.688	0.600
Clustering-based [4]	P01	6.7	9.5	7.6	11	0.694	0.755	0.709
	P02	8.4	10.2	10.5	16	0.653	0.823	0.720
	P03	5.5	8.3	7.5	14	0.532	0.664	0.588
	P04	7.9	9.0	10.1	18	0.561	0.894	0.677
	Avg.	7.1	9.3	8.9	14.8	0.610	0.784	0.674
Attention-based [2]	P01	7.1	9.5	7.9	12	0.659	0.790	0.708
	P02	6.0	10.2	6.8	13	0.524	0.601	0.555
	P03	5.5	8.3	7.0	12	0.583	0.661	0.611
	P04	7.3	9.0	8.5	16	0.534	0.811	0.634
	Avg.	6.5	9.3	7.6	13.3	0.575	0.716	0.627
Object-driven [8]	P01	7.0	9.5	9.4	13	0.720	0.776	0.731
	P02	7.5	10.2	10.9	19	0.574	0.741	0.641
	P03	6.0	8.3	8.2	12	0.682	0.720	0.692
	P04	7.0	9.0	8.5	16	0.534	0.793	0.632
	Avg.	6.9	9.3	9.3	15.0	0.628	0.758	0.674
Proposed (w/o optimization)	P01	6.1	9.5	6.5	10	0.655	0.686	0.659
	P02	7.1	10.2	8.1	13	0.622	0.704	0.655
	P03	5.8	8.3	7.4	11	0.669	0.707	0.683
	P04	7.7	9.0	8.8	15	0.588	0.867	0.689
	Avg.	6.7	9.3	7.7	12.3	0.634	0.741	0.672
Proposed	P01	7.1	9.5	7.8	10	0.782	0.791	0.773
	P02	8.2	10.2	9.3	13	0.713	0.811	0.756
	P03	6.8	8.3	8.5	11	0.777	0.830	0.798
	P04	7.9	9.0	9.5	15	0.630	0.889	0.725
	Avg.	7.5	9.3	8.8	12.3	0.726	0.830	0.763