Method | Video name | N RE | N E | N LK | N K | Precision | Recall | F-measure |
---|---|---|---|---|---|---|---|---|
Uniform sampling | P01 | 6.3 | 9.5 | 7.6 | 12 | 0.636 | 0.711 | 0.656 |
 | P02 | 7.8 | 10.2 | 8.9 | 15 | 0.594 | 0.778 | 0.668 |
 | P03 | 4.5 | 8.3 | 5.9 | 12 | 0.492 | 0.545 | 0.513 |
 | P04 | 6.4 | 9.0 | 7.6 | 16 | 0.477 | 0.718 | 0.565 |
 | Avg. | 6.2 | 9.3 | 7.5 | 13.8 | 0.550 | 0.688 | 0.600 |
Clustering-based [4] | P01 | 6.7 | 9.5 | 7.6 | 11 | 0.694 | 0.755 | 0.709 |
 | P02 | 8.4 | 10.2 | 10.5 | 16 | 0.653 | 0.823 | 0.720 |
 | P03 | 5.5 | 8.3 | 7.5 | 14 | 0.532 | 0.664 | 0.588 |
 | P04 | 7.9 | 9.0 | 10.1 | 18 | 0.561 | 0.894 | 0.677 |
 | Avg. | 7.1 | 9.3 | 8.9 | 14.8 | 0.610 | 0.784 | 0.674 |
Attention-based [2] | P01 | 7.1 | 9.5 | 7.9 | 12 | 0.659 | 0.790 | 0.708 |
 | P02 | 6.0 | 10.2 | 6.8 | 13 | 0.524 | 0.601 | 0.555 |
 | P03 | 5.5 | 8.3 | 7.0 | 12 | 0.583 | 0.661 | 0.611 |
 | P04 | 7.3 | 9.0 | 8.5 | 16 | 0.534 | 0.811 | 0.634 |
 | Avg. | 6.5 | 9.3 | 7.6 | 13.3 | 0.575 | 0.716 | 0.627 |
Object-driven [8] | P01 | 7.0 | 9.5 | 9.4 | 13 | 0.720 | 0.776 | 0.731 |
 | P02 | 7.5 | 10.2 | 10.9 | 19 | 0.574 | 0.741 | 0.641 |
 | P03 | 6.0 | 8.3 | 8.2 | 12 | 0.682 | 0.720 | 0.692 |
 | P04 | 7.0 | 9.0 | 8.5 | 16 | 0.534 | 0.793 | 0.632 |
 | Avg. | 6.9 | 9.3 | 9.3 | 15.0 | 0.628 | 0.758 | 0.674 |
Proposed (w/o optimization) | P01 | 6.1 | 9.5 | 6.5 | 10 | 0.655 | 0.686 | 0.659 |
 | P02 | 7.1 | 10.2 | 8.1 | 13 | 0.622 | 0.704 | 0.655 |
 | P03 | 5.8 | 8.3 | 7.4 | 11 | 0.669 | 0.707 | 0.683 |
 | P04 | 7.7 | 9.0 | 8.8 | 15 | 0.588 | 0.867 | 0.689 |
 | Avg. | 6.7 | 9.3 | 7.7 | 12.3 | 0.634 | 0.741 | 0.672 |
Proposed | P01 | 7.1 | 9.5 | 7.8 | 10 | 0.782 | 0.791 | 0.773 |
 | P02 | 8.2 | 10.2 | 9.3 | 13 | 0.713 | 0.811 | 0.756 |
 | P03 | 6.8 | 8.3 | 8.5 | 11 | 0.777 | 0.830 | 0.798 |
 | P04 | 7.9 | 9.0 | 9.5 | 15 | 0.630 | 0.889 | 0.725 |
 | Avg. | 7.5 | 9.3 | 8.8 | 12.3 | 0.726 | 0.830 | 0.763 |