Skip to main content
Fig. 3 | EURASIP Journal on Image and Video Processing

Fig. 3

From: Time-dependent bag of words on manifolds for geodesic-based classification of video activities towards assisted living and healthcare

Fig. 3

Illustration of major steps in the proposed method. Notations and notes: ∙I t is the t-th frame of an input video, and L is the total number of frames ∙ “ ” are key points (head, hands, waist center, midpoint of feet), and the areas with dotted edges are local patches centered at hands ∙C is the frame-based covariance feature (as a point on the manifold of SPD matrices \(Sym_{+}^{d}\)) extracted from local patches and key points in I t ∙ The codebook for BoW+T model is generated by clustering covariance matrices on \(Sym_{+}^{d}\) ∙ The video is encoded by the BoW+T model as a time series of manifold points on a unit n-sphere \(\mathcal {S}^{n}\) and then classified by a kernel machine based on geodesic distance on that sphere

Back to article page