Skip to main content

Table 3 State-of-the-art performance comparison on UCF101

From: Improved two-stream model for human action recognition

 

Pre-trained

CNN backbone

UCF-101%

Two stream CNN [9]

ImageNet

VGG16

88.7

Conv + LSTM [8]

ImageNet

AlexNet

69.1

C3D [18]

ImageNet

VGG11

82.3

RGB-I3D [19]

ImageNet

Inception v1

84.5

TSN [20]

ImageNet

Inception v2

86.4

3D Hybrid Model [21]

2D CNN

C3D

89.4

Two-stream model

ImageNet

DenseNet

92.5

(proposed model)

   
  1. The accuracy is the average accuracy for all three splits of the dataset