Skip to main content

Table 7 Statistics for COGNIMUSE database (Hollywood movies and GWW) annotated with audio-visual events per event category. Subcategories that their annotated instances exceeded a duration of 20 min in total can be seen

From: COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Most frequent audio and visual events

Category/subcategory

Instances

Dur. (min)

Voice: speech male

1874

102.39

Voice: speech female

1048

55.55

Voice: crowd noise

188

42.68

Sound source: background music

350

158.20

Sound source: foreground music

290

68.71

Genre: symphonic

119

118.61

Genre: other genre

70

42.14

Instrument: string

32

23.29

Instrument: percussion

102

91.86

Instrument: mixed

16

22.71

General facial actions: talk

1915

114.67

General body mov.: walk

456

41.72