Skip to main content

Table 4 Categories for audio event annotation

From: COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Categories for Audio Event annotation

Categories

Subcategory 1 (no. of layers)

Subcategory 2

Human

Voice (×3)

speech male, speech female, speech child, [speech synthetic], crowd noise, laughter, shouting, crying, coughing, [sneezing], breathing, spitting, singing, infant, other

 

Movement (×3)

footsteps, punching, other

Nature

Elements (x2)

wind, water, waves, thunder, fire, sand, other

 

Animals (×2)

dog bark, [dog howl], bird tweet, bird sing, horse galloping, horse neighing, [sheep], other

 

Plants/Vegetation (×2)

[leaves rustling,] [other]

Mechanical

Construction (×2)

[jackhammer], hammering, drilling, [sawing], engine running, other

 

Ventilation (×2)

[air-conditioner], other

 

Non-motorized Transport (×2)

bicycle, skateboard, other

 

Social Signals (×2)

bells, clock chimes, alarm/siren, [fireworks], gun shot, explosion, glass breaking, door rusty, door opening/closing, swords, other

 

Motorized Transport (×2)

[marine], rail, road, [air], [other]

Music

Amplified (×1)

live, recorded

 

Non-amplified (×1)

live

 

Sound Source (×1)

Diegetic: originated from the source within the film’s world, Non-diegetic: mood music Background music: when music is not the basic element in the scene Foreground music: when music is basically the only thing you hear

 

Genre (×1)

classical, symphonic, rock, pop, [punk], jazz, folk/country, blues, [metal], rock ’n roll, hiphop, [reggae], electronic, funk/soul/rnb, ethnic/world, other

 

Instrument (×1)

[keyboard], string, wind, percussion, orchestra, electronic/amplified, mixed (e.g., rock band etc.),other