EURASIP Journal on Image and Video Processing

Table 4 It is about the run time and performance comparison on data set UCF101-24 on a single NVIDIA RTX8000 card with 16-frames video clip

From: Weakly supervised spatial–temporal attention network driven by tracking and consistency loss for action detection

Method	Speed(fps)	Frame-mAP
P3D-CTN	28	–
I3D	30	77.7
3C-Net	45	84.4
HAM-Net	29	92.1
YOWO+LFB	38	86.4
Ours	31	94.8

For our method, ResNeXt-50 and ResNeXt-34 are used in its two 3D-CNN backbones

Back to article page