EURASIP Journal on Image and Video Processing

Table 3 Action classification experiment

From: Weakly supervised spatial–temporal attention network driven by tracking and consistency loss for action detection

Method	Mode	J-HMDB-21	UCF101-24
T-CNN [12]	Full	61.3	41.4
ACT [20]	Full	65.7	69.5
STEP [46]	Full	–	75.0
P3D-CTN [15]	Full	71.1	–
I3D [47]	Full	73.3	77.7
ACRN [24]	Full	77.9	80.4
YOWO+LFB [24]	Full	75.7	87.3
3C-Net [34]	Weak	77.9	86.4
HAM-Net [39]	Weak	88.1	92.1
Ours	Weak	90.2	94.8

The table lists the comparison results of Frame-mAP (IOU=0.5, 16 frames clip). We compared with recent fully and weakly supervised methods. Note that, the proposed method is an object location-unsupervised classification-supervised attention network

Back to article page