Skip to main content
Fig. 2 | EURASIP Journal on Image and Video Processing

Fig. 2

From: Weakly supervised spatial–temporal attention network driven by tracking and consistency loss for action detection

Fig. 2

Overall framework. In source domain, we trained the network on the first data set, which has both location and classification annotations. In target domain, we trained the network with the pre-trained model on the second data set, which only has action classification temporal annotations. To ensure the continuity of the target in the video sequence, Tracking-regularization loss is calculated by a tracker between the tracking location and network’s predicted location. The neighbor-consistency loss makes the features of objects more closer between neighbors in the video

Back to article page