Skip to main content
Fig. 5 | EURASIP Journal on Image and Video Processing

Fig. 5

From: Weakly supervised spatial–temporal attention network driven by tracking and consistency loss for action detection

Fig. 5

Activation heat-maps are from the tensors just before the channel fusion network. Jump action example shows that HAM–NET’s attention mechanism is more likely be disturbed by sudden or rapid object movements such as moving clouds and crowds of people, because it concerns the optical flow of the video frames. Walking-With-Dog action example shows that HAM–NET is more likely to ignore important parts of an action such as the presence of the dog in cases where the training data set contains a series of similar actions such as in Skiing, Ice-Dancing, Long-Jump. Our method has a higher level of robustness

Back to article page