COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

EURASIP Journal on Image and Video Processing

Table 5 Categories for visual action/event annotation

Categories for visual action/event annotation
Categories (no. of layers)	Subcategory 1
General facial actions (×2)	Smile, cry, laugh, chew, talk, other
Facial actions with object manipulation (×2)	Smoke, eat, drink, other
General body movements (×2)	Sitting down, sitting up, standing up, running, [cartwheel], clap hands, climb, climb stairs, [dive], fall on the floor, [backhand flip], [handstand], jump, pull up, push up, [somersault], turn, dance, walk, other
Gestures (×2)	Wave hands, point at something, pantomime, other
Body movements with object interaction (×2)	Answering phone, driving car, getting out of the car, open car door, open door, brush hair, catch, draw sword, [dribble], [golf], hit something, kick ball, pick, pour, push something, ride bike, ride horse, [shoot ball], shoot bow, shoot gun, [swing baseball bat], sword exercise, throw, other
Body movements for human interaction (×2)	Fighting, hugging, kissing, grab hand, threaten person, [fencing], kick someone, punch, shake hands, sword fight, other