Skip to main content

Table 1 We used 80% of the UCF101-24 dataset for training and 20% for validation

From: Weakly supervised spatial–temporal attention network driven by tracking and consistency loss for action detection

Domain

Mode

20%

30%

50%

70%

100%

Source

Full

80.7

86.8

95.4

96.5

96.7

Target

Weak

93.3

94.9

96.1

96.3

-

  1. The Frame-mAP is shown in the table. We assume 20–100% usage of training data for fully supervised learning with our model in source domain. In target domain, we trained the network with the pre-trained model and the remaining data that only has classification annotations