Fig. 1From: Weakly supervised spatial–temporal attention network driven by tracking and consistency loss for action detectionSource domain has accurate person location and action category labels in the frames, and the target domain only has inaccurate action temporal position labels in the video.Back to article page