Gated spatio and temporal convolutional neural network for activity recognition: towards gated multimodal deep learning

EURASIP Journal on Image and Video Processing

Table 11 HMDB-51 (split 1)

Methods	Accuracy
Spatial streams (three-channel RGB)	36%
Motion streams (three flow fields)	43%
Averaging (model A)	47.5%
Gating network (model C)	48%
Temporal segment network (averaging) [23]	69.93%
Our gating network (model C) + expert network of temporal segment network [23]	70%