COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Zlatintsi, Athanasia; Koutras, Petros; Evangelopoulos, Georgios; Malandrakis, Nikolaos; Efthymiou, Niki; Pastra, Katerina; Potamianos, Alexandros; Maragos, Petros

doi:10.1186/s13640-017-0194-1

EURASIP Journal on Image and Video Processing

Table 1 Annotation schemes included in the COGNIMUSE database, providing also a brief description for each layer/category

From: COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Annotation Scheme	Annotated Content	Layers/Categories	Annotation Description
Saliency i.e., video elements that captured the viewer’s attention instantaneously or in segments	Total: ca. 7 h including: Seven Hollywood movies (ca. 30 min/each) One full-length movie (ca. 100 min) Five travel series (ca. 25 min/each)	Audio	Acoustically interesting segments i.e., abrupt/loud sounds etc.
		Visual	Visually interesting segments i.e., motion, color variations etc.
		Audio-visual	Audio-visually interesting segments i.e., an explosion (that includes both visual and acoustic saliency)
		Semantics	Conceptually important as stand-alone semantic events i.e., names, plot elements, facial expressions etc.
		Informative Segments	Segments important for understanding the plot of the specific video clip, considered also as a manually generated summary.
		Expert Summaries	Summaries created by an “expert” related professionally with film production.
Audio Events	Total: ca. 5 hours including: Seven Hollywood movies One full-length movie	Human	Events regarding various human, nature, or mechanical sounds and music, i.e., voice, movement, animal sounds etc. For more info see Table 4.
		Nature
		Mechanical
		Music
Visual Actions		Facial actions	General facial actions or body movements incl. object manipulation or interaction, i.e., talk, smile, sitting down/up. For more info see Table 5.
		Body movements
		Gestures
Cross-media semantics	Total: ca. 100 min including: One full-length movie	Equivalence	Interaction relations between different modalities, i.e., images, language, body movements or acoustic events.
		Complementarity
		Independence
Emotion	Total: ca. 3.5 hours including: Seven Hollywood movies	Arousal	Corresponding to viewer’s excitement and describes emotional evaluation from negative to positive.
		Valence

Back to article page