Skip to main content

Table 4 It is about the run time and performance comparison on data set UCF101-24 on a single NVIDIA RTX8000 card with 16-frames video clip

From: Weakly supervised spatial–temporal attention network driven by tracking and consistency loss for action detection

Method

Speed(fps)

Frame-mAP

P3D-CTN

28

I3D

30

77.7

3C-Net

45

84.4

HAM-Net

29

92.1

YOWO+LFB

38

86.4

Ours

31

94.8

  1. For our method, ResNeXt-50 and ResNeXt-34 are used in its two 3D-CNN backbones