Fig. 3From: Time-dependent bag of words on manifolds for geodesic-based classification of video activities towards assisted living and healthcareIllustration of major steps in the proposed method. Notations and notes: ∙I t is the t-th frame of an input video, and L is the total number of frames ∙ “ ∘” are key points (head, hands, waist center, midpoint of feet), and the areas with dotted edges are local patches centered at hands ∙C is the frame-based covariance feature (as a point on the manifold of SPD matrices \(Sym_{+}^{d}\)) extracted from local patches and key points in I t ∙ The codebook for BoW+T model is generated by clustering covariance matrices on \(Sym_{+}^{d}\) ∙ The video is encoded by the BoW+T model as a time series of manifold points on a unit n-sphere \(\mathcal {S}^{n}\) and then classified by a kernel machine based on geodesic distance on that sphereBack to article page