Skip to main content

Table 6 Statistics for COGNIMUSE database (Hollywood movies and GWW) annotated with audio-visual events per event category

From: COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Audio events

No. of instances: 6262, total duration in hours: 19.24

Category/Subcategory 1

Instances

Dur. (min)

Voice

3809

245.75

Movement

228

19.82

Elements

154

16.91

Animals

222

20.26

Plants

0

0.00

Construction

46

5.19

Ventilation

4

0.54

Non-motorized trans.

18

0.84

Social signals

444

15.66

Motorized trans.

48

3.86

Non-amp. music

12

5.16

Amplified

218

213.28

Sound Source

640

226.91

Genre

231

222.86

Instrument

200

162.80

Visual actions

No. of instances: 4847, total duration in hours: 4.58

Category

Instances

Dur. (min)

General facial actions

2233

129.67

Facial action with obj. manip.

90

4.08

General body mov.

1215

79.75

Gestures

284

9.09

Body mov. with object inter.

693

33.72

Body mov. for human inter.

332

18.79