EURASIP Journal on Image and Video Processing

Table 1 We used 80% of the UCF101-24 dataset for training and 20% for validation

From: Weakly supervised spatial–temporal attention network driven by tracking and consistency loss for action detection

Domain	Mode	20%	30%	50%	70%	100%
Source	Full	80.7	86.8	95.4	96.5	96.7
Target	Weak	93.3	94.9	96.1	96.3	-

The Frame-mAP is shown in the table. We assume 20–100% usage of training data for fully supervised learning with our model in source domain. In target domain, we trained the network with the pre-trained model and the remaining data that only has classification annotations

Back to article page