Skip to main content

Table 1 Annotation schemes included in the COGNIMUSE database, providing also a brief description for each layer/category

From: COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Annotation Scheme

Annotated Content

Layers/Categories

Annotation Description

Saliency i.e., video elements that captured the viewer’s attention instantaneously or in segments

Total: ca. 7 h including: Seven Hollywood movies (ca. 30 min/each) One full-length movie (ca. 100 min) Five travel series (ca. 25 min/each)

Audio

Acoustically interesting segments i.e., abrupt/loud sounds etc.

  

Visual

Visually interesting segments i.e., motion, color variations etc.

  

Audio-visual

Audio-visually interesting segments i.e., an explosion (that includes both visual and acoustic saliency)

  

Semantics

Conceptually important as stand-alone semantic events i.e., names, plot elements, facial expressions etc.

  

Informative Segments

Segments important for understanding the plot of the specific video clip, considered also as a manually generated summary.

  

Expert Summaries

Summaries created by an “expert” related professionally with film production.

Audio Events

Total: ca. 5 hours including: Seven Hollywood movies One full-length movie

Human

Events regarding various human, nature, or mechanical sounds and music, i.e., voice, movement, animal sounds etc. For more info see Table 4.

  

Nature

 
  

Mechanical

 
  

Music

 

Visual Actions

 

Facial actions

General facial actions or body movements incl. object manipulation or interaction, i.e., talk, smile, sitting down/up. For more info see Table 5.

  

Body movements

 
  

Gestures

 

Cross-media semantics

Total: ca. 100 min including: One full-length movie

Equivalence

Interaction relations between different modalities, i.e., images, language, body movements or acoustic events.

  

Complementarity

 
  

Independence

 

Emotion

Total: ca. 3.5 hours including: Seven Hollywood movies

Arousal

Corresponding to viewer’s excitement and describes emotional evaluation from negative to positive.

  

Valence