Annotation Scheme | Annotated Content | Layers/Categories | Annotation Description |
---|---|---|---|
Saliency i.e., video elements that captured the viewer’s attention instantaneously or in segments | Total: ca. 7 h including: Seven Hollywood movies (ca. 30 min/each) One full-length movie (ca. 100 min) Five travel series (ca. 25 min/each) | Audio | Acoustically interesting segments i.e., abrupt/loud sounds etc. |
Visual | Visually interesting segments i.e., motion, color variations etc. | ||
Audio-visual | Audio-visually interesting segments i.e., an explosion (that includes both visual and acoustic saliency) | ||
Semantics | Conceptually important as stand-alone semantic events i.e., names, plot elements, facial expressions etc. | ||
Informative Segments | Segments important for understanding the plot of the specific video clip, considered also as a manually generated summary. | ||
Expert Summaries | Summaries created by an “expert” related professionally with film production. | ||
Audio Events | Total: ca. 5 hours including: Seven Hollywood movies One full-length movie | Human | Events regarding various human, nature, or mechanical sounds and music, i.e., voice, movement, animal sounds etc. For more info see Table 4. |
Nature | |||
Mechanical | |||
Music | |||
Visual Actions | Facial actions | General facial actions or body movements incl. object manipulation or interaction, i.e., talk, smile, sitting down/up. For more info see Table 5. | |
Body movements | |||
Gestures | |||
Cross-media semantics | Total: ca. 100 min including: One full-length movie | Equivalence | Interaction relations between different modalities, i.e., images, language, body movements or acoustic events. |
Complementarity | |||
Independence | |||
Emotion | Total: ca. 3.5 hours including: Seven Hollywood movies | Arousal | Corresponding to viewer’s excitement and describes emotional evaluation from negative to positive. |
Valence |