Gated spatio and temporal convolutional neural network for activity recognition: towards gated multimodal deep learning

EURASIP Journal on Image and Video Processing

Table 10 UCF-101 (split 1)

Methods	Accuracy
Spatial streams (three-channel RGB)	72.7%
Motion streams (three flow fields)	76.5%
SVM Fusion (model B)	81.5%
Averaging (model A)	82.7%
Gating network (model C) VGG-16	83%
Gating network (model C) ResNet-50	88.5%