Open Access

Special issue on animal and insect behaviour understanding in image sequences

  • Concetto Spampinato1Email author,
  • Giovanni Maria Farinella2,
  • Bas Boom3,
  • Vasileios Mezaris4,
  • Margrit Betke5 and
  • Robert B Fisher3
EURASIP Journal on Image and Video Processing20152015:1

Received: 25 September 2014

Accepted: 3 October 2014

Published: 30 January 2015

Imaging systems are, nowadays, used increasingly in a range of ecological monitoring applications, in particular for biological, fishery, geological and physical surveys. These technologies have improved radically the ability to capture high-resolution images in challenging environments and consequently to manage effectively natural resources. Unfortunately, advances in imaging devices have not been followed by improvements in automated analysis systems, necessary because of the need for time-consuming and expensive inputs by human observers. This analytical ‘bottleneck’ greatly limits the potentialities of these technologies and increases demand for automatic content analysis approaches to enable proactive provision of analytical information.

On the other side, the study of the behaviour by processing visual data has become an active research area in computer vision. The visual information gathered from image sequences is extremely useful to understand the behaviour of the different objects in the scene, as well as how they interact with each other or with the surrounding environment. However, whilst a large number of video analysis techniques have been developed specifically for investigating events and behaviour in human-centred applications, very little attention has been paid to the understanding of other live organisms, such as animals and insects, although a huge amount of video data are routinely recorded, e.g. the Fish4Knowledge project ( or the wide range of nest cams ( continuously monitor, respectively, underwater reef and bird nests (there exist also variants focusing on wolves, badgers, foxes etc.).

The automated analysis of visual data in real-life environments for animal and insect behaviour understanding poses several challenges for computer vision researchers because of the uncontrolled scene conditions and the nature of the targets to be analysed whose 3D motion tends to be erratic, with sudden direction and speed variations, and appearance and non-rigid shape can undergo quick changes. Computer vision tools able to analyse those complex environments are, therefore, envisaged to support biologists in their strive towards analysing the natural environment, promoting its preservation and understanding the behaviour and interactions of the living organisms (insects, animals etc.) that are part of it.

This special issue reports on the most recent approaches and tools for the identification and recognition of animal and insect and their behaviour by processing visual data.
  • – Animal identification, recognition and behaviour understanding

    In ‘An automated chimpanzee identification system using face detection and recognition’, Loos et al. propose a framework to recognize chimpanzees based on their facial appearance, where they assume that human face recognition techniques are also applicable to chimpanzees. They propose a framework that performs face detection and registration using landmarks and face identification. More sophisticated descriptors are employed to deal with the challenges of chimpanzees’ face poses in the natural environment. Global and local features improve the recognition performance even further as shown by the results obtained on the ChimpZoo and ChimpTai datasets.

    In ‘Automated detection of elephants in wildlife video’, Zeppelzauer et al. propose an approach for detection and tracking of elephants in wildlife videos. The method dynamically learns a colour model of elephants from a few training images and, then, localizes elephants in video sequences with different backgrounds and lighting conditions. The approach is able to detect elephants (and groups of elephants) of different sizes and poses performing different activities also in cases of occlusions (e.g. by vegetation), camera motion and lighting changes. Experiments show that both near- and far-distant elephants can be detected and tracked reliably. Moreover, the method does not make any assumptions based on the elephant species and is thus adaptable to other animal species.

    In ‘Automated identification of animal species in camera trap images’, Yu et al. present a method for automated animal species identification. Their method targets the analysis of images captured by motion-sensitive cameras that are regularly used in biodiversity monitoring, generating an abundance of data. In order to identify the species depicted in an image, the authors use dense SIFT and cell-structured LBP (cLBP) as local image descriptors and introduce an improved sparse coding spatial pyramid matching (ScSPM) approach for encoding the multiple local features into a global image description. The latter is used as input to a linear support vector machine classifier. The authors present the results of their approach on data captured in different settings (tropical rain forest, temperate forest and heathland) and show that high classification accuracy can be achieved for a variety of species.

    In ‘2D and 3D analysis of animal locomotion from bi-planar X-ray videos using augmented active appearance models’, Haase et al. analyse the locomotion of animals. To measure the locomotion, a high speed X-ray dataset of 5 bird species is used which contains 172,942 ground-truth landmarks placed by human experts. Both the normal active appearance models (AAM) and an augmented AAM developed by the authors are fitted to X-ray videos to create a holistic model for all anatomical landmarks with a probabilistic framework. The augmented AAM outperforms the standard model and with calibration information can be extended to 3D landmark positions which are more relevant for biological evaluation.

    In ‘Automated quantification of the schooling behaviour of sticklebacks’, Ardekani et al. describe a video analysis technique for automatically localizing a fish in a tank in the presence of a moving experimental apparatus containing artificial fish. The goal of the study is to analyse the schooling behaviour of the living fish in the presence of movements of the artificial fish. The task to detect the real fish is challenging because the artificial fish looks like the target fish, and so a feature-based extraction method would be ineffective. Also, the experimental apparatus is moving, which makes the straightforward application of traditional background-subtraction techniques ineffective. The authors address the challenge by developing a background model that uses information from other non-contiguous frames, which are selected based on their appearance similarity with the frame of interest. The idea of using non-contiguous frames in the background model in this way is interesting and unusual. The authors evaluate their method by presenting missed and false detections and by comparing the schooling behaviour as identified by manual annotation and automated annotation.

  • – Insect identification and behaviour understanding

    In ‘A two-fly tracker that solves occlusions by dynamic programming: computational analysis of Drosophila courtship behaviour’, Schusterreiter and Grossmann show how visual tracking technologies can support geneticists and neuroscientists in the analysis of the behaviour of flies, which can help them understand the relation between genes, their brain and their behaviour. Schusterreiter and Grossmann focus in their work on achieving accurate tracking of flies by solving the occlusion problems that arise in their target application and use the resulting fly tracker as part of a system that analyses the video and detects behaviour events. Their results show that the presented system is capable of identifying flies through a video with very high accuracy, thus making possible its practical use in such laboratory studies.

    In ‘Detecting and tracking honeybees in 3D at the beehive entrance using stereo vision’, Chiron et al. describe a real-time stereo vision-based system for monitoring flying honeybees in three dimensions at the beehive entrance. The proposed system detects bees at the beehive entrance by a hybrid segmentation approach using both intensity and depth images. 3D multi-target tracking based on the Kalman filter and Global Nearest Neighbour is then performed. Tests on robust ground truths for segmentation and tracking show that the proposed method outperforms standard 2D approaches.

    In ‘Comparison of two 3D tracking paradigms for freely flying insects’, Risse et al. discuss and compare state-of-the-art 3D tracking paradigms for flying insects such as Drosophila melanogaster. Probabilistic and global correspondence selection approaches are discussed and compared. The probabilistic approach is based on the Kalman filter for temporal tracking, whereas the global one is based on a global cost function. Furthermore, a novel greedy selection scheme is introduced for the correspondence selection approach. The tracking paradigms are evaluated using synthetic data generated by a swarm simulator.

    In ‘A human-computer collaborative workflow for the acquisition and analysis of terrestrial insect movement in behavioral field studies’, Reda et al. addresses the problem of the characterization and understanding of the insect’s movements. A framework for the acquisition, visualization and analysis of terrestrial insect trajectories from field-recorded videos is presented. The workflow has three main components: a semi-automated image processing pipeline to track and record insect trajectories, a trajectory visualization tool for qualitative analysis of insect movements and the quantitative trajectory measurements for statistical hypothesis testing. The authors demonstrate the effectiveness of their framework in the context-dependent navigational strategies employed by Kenyan seed harvester ants.

  • – Bioinspired approaches

    In ‘Data feature selection based on Artificial Bee Colony algorithm’, Schiezaro and Pedrini present a bioinspired algorithm for feature selection to address the classification problem. The Artificial Bee Colony approach is considered as model. The work proposes a binary version of the Artificial Bee Colony algorithm (ABC), where the number of new features to be analysed in a neighbourhood of a food source is determined through a perturbation of the parameter of the ABC algorithm. The feature selection method is then assessed on datasets of the UCI Machine Learning Repository.

Authors’ information

CS received his MS degree (grade 110/110 cum laude) and Ph.D. degree in computer engineering from the University of Catania (Italy) in 2004 and 2008, respectively, where he is currently an assistant professor. His research interests include mainly computer vision, pattern recognition, machine learning and multimedia. He has particular interest in ecological data, being involved in several projects dealing with multimedia in ecology. He has coauthored more than 100 publications in international refereed journals and conference proceedings. As further research activities, he has organized and chaired dedicated workshops on multimedia in ecology (MAED 2012, MAED 2013 and MAED 2014) several special sessions at mainstream conferences and several special issues of international journals with impact factor. He is a member of the editorial board of Ecological Informatics Journal.

GMF received his M.S. degree in computer science (egregia cum laude) from the University of Catania, Italy, in 2004, and his Ph.D. degree in computer science in 2008. He joined the Image Processing Laboratory (IPLAB) at the Department of Mathematics and Computer Science - University of Catania, in 2008. He is an assistant professor of Computer Science at the University of Catania (since 2008) and a contract professor of Computer Vision at the Academy of Arts of Catania (since 2004). His research interests lie in the fields of computer vision, pattern recognition and machine learning. He has edited four volumes and coauthored more than 60 papers in international journals, conference proceedings and book chapters. He is a co-inventor of four international patents. He serves as a reviewer and on the programme committee for major international journals and international conferences. He founded (in 2006) and currently directs the International Computer Vision Summer School.

BB received in 2005 a master’s degree from the Free University of Amsterdam in computer science on a thesis entitled ‘Fast object detection’. This thesis was the result of a successful internship at the company PrimeVision, where he developed methods for fast detection (localization) of licence plates, faces and addresses in images. He received his PhD at the University of Twente, in the field of face recognition with special interests in face registration and illumination correction. His current research interests are domain-specific image retrieval, collection of image-based ground-truth annotations, discovering the illumination in a scene, object detection and recognition. He has been organizing several scientific workshops (VAIB 2012, VIGTA 2012 and 2013) and is a guest editor for the related special issues. He has published several journals and conference articles on biometrics and computer vision.

VM is a senior researcher (Researcher B) with the Information Technologies Institute/Centre for Research and Technology Hellas (CERTH), Thessaloniki, Greece. He received his bachelors and Ph.D. in electrical and computer engineering from the Aristotle University of Thessaloniki, Thessaloniki, Greece, in 2001 and 2005, respectively. His research interests include image and video analysis, event detection in multimedia, machine learning for multimedia analysis, content-based and semantic image and video retrieval, application of image and video analysis technologies in specific domains (medical images, ecological data). He is a co-author of 28 papers in refereed international journals, 12 book chapters, two patents and more than 100 papers in international conferences. He serves as an associate editor for the IEEE Transactions on Multimedia and as a guest editor for special issues in other journals. He is a senior member of the IEEE.

MB is a professor of computer science at Boston University, where she co-leads the Image and Video Computing Research Group. She conducts research in computer vision, in particular, the development of methods for detection, segmentation, registration and tracking of objects in visible light, infrared and X-ray image data. She has worked on tracking animals, cells, gestures, people and vehicles, video-based human-computer interfaces, statistical object recognition and medical imaging analysis. She has published over 100 original research papers. Prof. Betke earned her Ph.D. degree in computer science and electrical engineering at the Massachusetts Institute of Technology in 1995. She has received the National Science Foundation Faculty Early Career Development Award in 2001 for developing ‘Video-based Interfaces for People with Severe Disabilities’. She co-invented the ‘Camera Mouse’, an assistive technology used worldwide by children and adults with severe motion impairments. She was one of the two academic honorees of the ‘Top 10 Women to Watch in New England Award’ by Mass High Tech in 2005. She is a senior member of the ACM and IEEE. She currently leads a 5-year research programme to develop intelligent tracking systems that reason about group behaviour of people, bats, birds and cells.

RBF, BS (California Institute of Technology), MS (Stanford), PhD (Edinburgh), is a full professor at Edinburgh University. His research covers topics in 3D computer vision and video sequence understanding. He has contributed to a spin-off company, Dimensional Imaging. The research has led to 13 authored or edited books and more than 250 peer-reviewed scientific articles or book chapters. He has developed several popular on-line computer vision resources. Most recently, he has been the coordinator of the EC-funded Fish4Knowledge project acquiring and analysing video data of 1.4 billion fish from over about 10 camera-years of undersea video of tropical coral reefs. He is a fellow of the Int. Association for Pattern Recognition (2008) and the British Machine Vision Association (2010).



We would like to thank, first, the authors for their contribution to this special issue and then all the reviewers for the effort and time spent to provide thorough reviews and valuable suggestions on the submitted manuscripts. Finally, we also would like to extend thanks to the Editor in Chief, Professor Jean-Luc Dugelay, and the whole editorial staff of EURASIP Journal on Image and Video Processing for recognizing the importance that the subject of this special issue may have on future research on this emergent field, whose development will provide significant benefits for the society, allowing scientists to exploit technology advances in order to better understand the world we live in.

Authors’ Affiliations

Department of Computer Engineering, University of Catania
Computer Science Department, University of Catania
School of Informatics, University of Edinburgh
Centre for Research and Technology
Computer Science Department, Boston University


© Spampinato et al.; licensee Springer. 2015

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.