Fig. 1From: Consistent constraint-based video-level learning for action recognitionComparison of the video-level learning framework and the clip-level learning framework. Vi is the ith video, and Li means action label. \(C^{i}_{j}\) is the jth clip in video i, and mi is the number of clip in video iBack to article page