VISTA: achieving cumulative VIsion through energy efficient Silhouette recognition of mobile Targets through collAboration of visual sensor nodes

Jabbar, Sana; Akbar, Ali Hammad; Zafar, Saima; Quddoos, Muhammad Mubashir; Hussain, Majid

doi:10.1186/1687-5281-2014-32

Research
Open access
Published: 25 June 2014

VISTA: achieving cumulative VIsion through energy efficient Silhouette recognition of mobile Targets through collAboration of visual sensor nodes

Sana Jabbar¹,
Ali Hammad Akbar¹,
Saima Zafar²,
Muhammad Mubashir Quddoos² &
…
Majid Hussain³

EURASIP Journal on Image and Video Processing volume 2014, Article number: 32 (2014) Cite this article

2559 Accesses
7 Citations
Metrics details

Abstract

Visual sensor networks (VSNs) are innovative networks founded on a broad range of areas such as networking, imaging, and database systems. These networks demand well-defined architectures in terms of sensor nodes and camera deployment, image capturing and processing, and well-organized distributed systems. This makes existing VSN architectures deficient because these are limited in approach and in design. In this paper, we propose VISTA, a distributed vision multi-layer architecture aimed at constructing the cumulative vision of mobile objects (MOs). VISTA realizes silhouette recognition of mobile targets through (a) pre-meditated deployment of sensor nodes (SNs) that are equipped with sonar sensors and fixed view (FV) on-board cameras present at the periphery of region of interest (RoI) and SNs with only on-board cameras within RoI, (b) pre-distribution of silhouettes of known objects across SNs, (c) sonar-based presence detection of MO at the outskirts of RoI, (d) MO silhouette capturing and matching at interior node to determine the % age match, (e) subsequent activation of next interior cameras in order to improve % age match, and (f) terminating further activation upon threshold recognition of MO. Experimental evaluation of our image processing algorithms against baseline algorithms with respect to execution time and memory shows significant reduction in image data and memory occupancy. Also, experiments show that true match is achieved fully under broad daylight conditions and large backgrounds when our proposed background subtraction and pixel reduction techniques are used. The mobility-driven behavior of associated network layer algorithms of VISTA is simulated in a network simulator (NS2) by representing the surety of MO identification as a function of number of cameras, database size and distribution, MO’s trajectory, stored perspectives, and network depth. The simulation results show that doubling and, in some situations, manifold increase is observed in the surety of the target with an increase in the number of silhouettes deployed against the baselined database size and mobility model. The results substantiate that VISTA is a suitable architecture for low-cost, autonomous and efficient human and asset monitoring surveillance, friend-or-foe (FoF) identification, and target tracking systems.

1 Introduction

Visual sensor networks (VSN) forms the crossroads of networking, image capturing, processing and rendering techniques, and distributed systems. These innovative networks are emerging as an important research challenge and gaining notice of both research community and applications developers. The contemporary VSN architectures are limited in approach and in design. For instance, none of these architectures takes into account civil infrastructure and geographical information in the placement and simultaneous activation of sensors or cameras or both. Likewise existing VSN architectures focus on capturing images in entirety, which tends to be redundant and at times even detrimental to user application requirements. Also, these schemes tend to overlook the constrained ambulatory behavior of mobile objects (MOs) such as varying mobility behavior in the interior and at the exterior of region of interest (RoI). Furthermore, these architectures do not reflect on features and attributes of captured images as means for defining the camera activation schedule and coordination between sensor nodes (SNs). Finally, hardware choices for VSNs are either limited to cameras mounted onto mobile assemblies or cameras using pan-tilt-zoom (PTZ) assemblies, both involving mechanical motion. All in all, existing work makes strong assumptions about the presence and availability of video-customized hardware and codecs, bandwidths of the orders of megabit per second (Mbps), and mains power supply or unconstrained battery sources, all defining VSN design in concordance. In this research, we adopt contra-concordance by redefining and restricting VSN features to meet the limited capabilities of real wireless sensor networks (WSNs), which have limited form factors in computation and memory and are equipped with wireless transceivers. We propose VISTA, an architecture that involves redefining the video capturing capability of VSNs. The hardware for VISTA is deployed considering the civil infrastructure of RoI to be monitored. VISTA proposes a deployment scheme in which SNs are placed at optimal positions in order to make communication effective. Camera at the next hop SN is activated when MO comes in its range, such that redundant image is avoided. Only the SNs at the boundary of the RoI are activated to avoid unnecessary consumption of energy of interior SNs in the network.

This research includes the following:

1.
Comprehensive layered architecture for achieving cumulative vision.
2.
Elaboration of the role played by each layer in order to accomplish the goal which includes hardware role, image processing details, and final task accomplishment.
3.
Description of operations concerning the edge nodes (ENs) and the inner nodes (INs).
4.
Simulations in NS2 in order to validate our assertion regarding the goals achieved by the proposed architecture.

The remainder of the paper is organized as follows. In Section 2, related work is discussed. Section 3 presents the VISTA architecture in detail. Section 4 presents the experimental results based on NS2 simulations. Section 5 presents the discussion on the operation and performance of VISTA.

2 Literature review

The challenges in the effective realization of VSNs is a contemporary research problem. Research is being carried out in diverse directions of VSNs that include camera calibration, image processing algorithms, hardware architecture, communication protocols, and applications. The research work that we explored covers multiple domains of VSNs. The objective of this research is to realize energy efficient solution for VSNs in terms of computation and communication recovering the shortcomings of previous efforts in this direction. Literature review of VISTA is based on three domains that are mentioned below.

Chen et al. [1] focus on capturing images from SNs and reducing these images to object of interest (OoI) through mobile agents in VSN. In this way, volume of image data at each SN in target region is reduced. Though a degree of compression is achieved through segmentation, and transmission of OoI only, whole process remains an image processing and transmission scheme. Compression simply reduces the image size to be transmitted. However, image transmission is impractical for long-lived VSNs because VSNs once deployed are sporadically used over very long times. Nelson and Khosla [2] describe a number of criteria that assist in improving visual resolution of a MO. These criteria are used to control the focus and motion of single or multiple cameras. They address camera resolution by suggesting that cameras can actually be moved. Since in WSNs, energy per SN is very limited; therefore, this idea is impractical for VSNs. In [3], Navarro-Serment et al. present their work that is related to the inspection of moving targets in RoI by activating multiple cameras which distributes the collective tasks of identification and prevent energy consumption on a single robot. They address the problems of scheduling and maneuvering cameras to observe targets based on their present positions. Likewise Capezio et al. [4] develop Cyber Scout, an autonomous surveillance and investigation system to detect and track OoI. They use a network of all-terrain vehicles and focus on vision for inspection, autonomous navigation, and dynamic path planning. In [5], a motion segmentation algorithm is proposed for extracting foreground objects with a PTZ camera. Image mosaicing technique is used to build a planar background. The object is detected by comparing current camera image with the corresponding background indexed from the mosaic. In [6], a novel method is proposed by Saptharishi et al. for temporally and spatially moving objects by automatically learning the relevance of the object’s appearance features to the task of discrimination. This method is proposed for distributed surveillance systems. Ukita and Matsuyama [7] perform multi-target tracking by active vision agents (AVAs) that is a network-connected computer with an active fixed-view pan-tilt-zoom (FV-PTZ) camera. Multiple FV-PTZ active cameras are required for detailed measurements of 3D objects. However, their idea for surveillance and tracking is not implementable in VSNs due to maneuvering cameras. Similarly in [8], Matsuyama gives the overview of cooperative distributed vision (CDV). The goal of CDV is to embed network-connected mobile robots with active cameras in a real world and realize wide-area dynamic scene understanding and visualization. However, all of the above ideas for surveillance and tracking are not implementable in VSNs due to maneuvering cameras and significant power consumption of devices.

For image recognition, Tien et al. [9] propose a novel method based on non-uniform rational B-splines (NURBS) and cross-ratios. They propose a method that utilizes both memory and computation time, but the resources required are less as compared to those of the curve matching method. They use a small database to save memory. But matching by using the NURBS curves first and then applying cross-ratios is still expensive in terms of time and computation for a VSN system. In VISTA, a rich database is proposed, i.e., more aspects of an object deployed in the nodes but avoiding computationally expensive algorithm for matching.

In [10], Soro and Heinzelman take into account the unique characteristics and constraints of VSN that differentiate VSNs from other multimedia networks as well as traditional WSNs. They outline all areas of VSNs such as applications, signal processing algorithms, communication protocols, sensor management, hardware architectures, middleware support, and open research problems in VSNs by exploring several relevant research directions. They argue that traditional WSN protocols do not provide sufficient support in VSNs. Hence, there is a need to propose new communication protocols and vision algorithms suitable for resource-limited VSN systems.

Background subtraction is an important step in image matching using low-power devices. In [11] Stauffer and Grimson present a background subtraction method which involves thresholding the error between an estimate of the image lacking moving objects and the current image. The background model used in this work models each pixel as a mixture of Gaussians with an on-line approximation used for updating the model. The Gaussian distributions of this mixture model are evaluated to classify pixels which most likely fall in the background process. Since in reality, multiple surfaces show in the view frustum of a pixel along with changes in lighting conditions; therefore, multiple adaptive Gaussians are required. A mixture of adaptive Gaussians is used in this approximation such that as the parameters of the Gaussians are updated, the Gaussians are estimated based on a simple heuristic to find out those which are part of the ‘ background process’. In [12], the mixture of Gaussians (MoG) concept has been used in a number of sensor network problems. A related work in this context is by Ihler et al. [12]. Their work addresses the problem of automatic self-localization of sensor nodes. The authors redefine the sensor localization problem within a graphical model framework and present the use of a recent generalization of particle filtering for approximation of sensor locations. In this technique, each message is depicted using either a sample-based density estimate (as a mixture of Gaussians) or as an analytic function. The messages along observed edges are represented by samples and the messages along unobserved edges are described as analytic functions. First, the samples are drawn from the estimated marginal and then these samples are used to approximate each outgoing message. Another paper which employs mixture of Gaussian distributions in sensor networks is by Rabbat and Nowak [13]. They address the problem of in-network data aggregation in sensor networks which comprise sensor nodes capable of sending sensed data to a base station. Normally, it is required to derive an estimate of a parameter or function from the collected data which is huge and redundant. This paper investigates the distributed algorithms for data processing prior to its transmission to a central point which results in reducing the amount of energy spent in obtaining accurate estimate. This estimation problem is defined as the incremental optimization of a cost function concerning collected data from all nodes such that each node adjusts the estimate based on its local data and transmits it to the next node. In distributed expectation-maximization (DEM) algorithm, the measurements are modeled as samples extracted from a mixture of Gaussian distributions with unknown means and covariances, the mixture weights being different at each sensor in the network. Initially, the parameters of the global density are estimated, which are passed through the network such that each sensor node detects the component of the density which best fits its local data. Cho et al. [14] present a smart video surveillance system by deploying visual sensor network. It comprises of an inference framework in which autonomous scene analysis is carried out using distributed and collaborative processing among camera nodes and an effective occupancy reasoning algorithm. For each node in the network, they define a potential function representing how the global inference is coherent with the local measurement on that node. Next, a multi-tier architecture is built and one node in each cluster is chosen as an anchorage node for global inferences. The amount of overlapping between two nodes is used as a basis when constructing a work tree for distributed processing within a cluster. The existence probabilities for each camera are predicted using the binary images from the background subtraction. A modified mixture of Gaussian (MOG) algorithm is used. In [15], Tsai and Lin deal with contextual redundancy linked with background and foreground objects in a scene. They propose a scene analysis technique that classifies macroblocks based on contextual redundancy. Only specific context of macroblock is analyzed for motion which involves salient motion through an object-based coding architecture. The context of a scene is defined as the association of a pixel in a scene with static or moving background or moving foreground with/without illumination change based on the observation on a number of recent frames. The context of a macroblock is modeled by an estimated background image. In the scene analysis method, most representative Gaussian is selected from mixture of Gaussians. In [16], Ellis presents a multi-view video surveillance system with algorithms for detecting and tracking moving objects. The scene-dependent information is depicted by creating models of the scene based on observations obtained from the camera network. In order to cater for the background changes, the probability of detecting a pixel value is modeled by a mixture of Gaussians based on color and monochrome pixel values. In [17], Paletta et al. present a video surveillance system for monitoring passenger flows at public transportation junctions based on a network of video cameras. In background modeling and motion detection module, they employ an adaptive model for background estimation applying mixture of Gaussians and appearance patterns, thereby presenting a stable and robust background model.

Kumar [18] demonstrates the importance of various features in image matching. A framework is proposed consisting of hardware cameras and accompanying software. The software manages processing of image and satisfies queries from other cameras over the network or by the camera itself. The software logic is implemented over the publisher-subscriber model. To satisfy queries, different handlers are registered to publisher-subscriber block. It is asserted that scale invariant feature transform (SIFT) features do not work well when there is large orientation change and low resolution. It is also shown that SIFT features do not work efficiently across cameras that are far in terms of time or location. Therefore, it is always efficient to use more than one identification feature in different scenarios. In [19], Margi et al. study Meerkats project and observe the trade-off between power efficiency and performance and realize verifications through a test-bed based on the Crossbow Stargate platform. They observe energy consumption of activities such as processing, image acquisition, flash memory access, and communication over the network. They also report steady-state and transient energy consumption behaviors. They prove that transients are not at all negligible, neither in terms of power nor in terms of delay incurred. They conclude that delay and energy measurements are very important for performance, and transients play a significant role in terms of delay and energy. In [20], Margi et al. present power consumption analysis and execution time for the elementary tasks such as sensing, processing, and communication. These tasks compose duty cycle of a VSN node based on Crossbow Stargate board. They also predict the life time of a VSN system by considering energy consumption characterization and draw attention to the fact that activation/deactivation of the hardware and transition between different states of a SN requires non-negligible amount of time and energy. They illustrate that SN performs the same functionality but with different energy requirements, depending upon the SN’s current state. They also prove that on-board detection always plays a significant role in energy saving even if the rate of event detection is high. To determine event detection, SN requires blob detector which further decides whether image should be transmitted or not. However, blob detection is a power-consuming process that must be run in either case. Even in the case of event detection, image is compressed by the node and sent to sink or any other node. Image compression saves energy, but when blob detector detects larger blobs in acquiring image, it takes sufficient amount of energy to send that image. Image compression only reduces the size of the image, and small blobs require high energy and long time for transfer. In order to overcome the abovementioned problems, we introduce a novel idea in VISTA, where we identify MO on a node and only send the information about MO without sending image data. Qureshi and Terzopoulos [21–24] present work related to smart camera networks which consists of static and active cameras that provide coverage of environment with minimal reliance on human operator. They propose a distributed strategy in which nodes are capable of local decision making and inter-node communication. Each camera node has an autonomous agent to communicate with nearby nodes. When the node is in idle state, the camera does not perform any task. Upon receiving message, node calculates its relevance to the task by employing low-level visual routines (LVR). Supervisor node decides whether or not to include node in the group by observing its relevance value. A visual routine occurs every time when a node receives the message. This means that every time, the node bears the burden of running LVR and calculates its relevance to the task. Contrary to this, in our approach, we invoke only that node which can participate efficiently in identifying MO. An overview of VSNs along with research challenges in this area is given in [25]. The need for tight coupling between communication protocols and vision techniques for effective object monitoring and tracking is also highlighted. Many surveys about VSNs are published in the past in which VSN characteristics, its corresponding layers, and open research issues are discussed in detail. An extensive survey of wireless multimedia sensor networks is provided in [26], where Akyildiz et al. discuss various open research problems in multimedia research area, including networking architectures, layers, and protocols. Similarly in [10], the authors overview the current state-of-the-art in the field of visual sensor networks, by exploring several relevant research directions. All these authors agree about the fact that previous architectures cannot fulfill the need of this new smart visual sensor network era. They suggest that development of some new energy-efficient architectures for sending visual information is the need of the day.

In paper [27], the authors highlighted the major wireless visual sensor network approaches for energy efficiency. They analyze the already proposed strategies in this domain. They suggested that enhancement should be done in LANMAR [28] and in G-AODV [29] to increase the energy efficiency of VSNs in future. They also suggested that due to the different elements that enter into the design of visual sensor networks, multidisciplinary research is essentially needed to design future VSN that provide an effective trade-off between the energy associated with the VSN and the QoS received by the end user. Paper [30] focuses on the functionality of VSNs as intelligent systems capable of operating autonomously and in a wide range of scenarios. The authors feel the need of extensive research regarding the placement of these nodes, coverage of blind areas and how to localize and calibrate camera nodes within the network.

To overcome the above mentioned energy consumption and architectural problems, a novel architecture ‘ VISTA’ is proposed through which SNs energy can be saved by pre-planned database and camera scheduling.

3 VISTA architecture

This chapter elaborates the VISTA design that comprises layered architecture. First, the assumptions are formulated and then we discuss details of layers and modules that comprise this architecture.

3.1 VISTA assumptions

The following assumptions are taken for VISTA:

All nodes in VISTA are deployed in a pre-engineered topology.
At the time of network initialization, there is no mobile object inside RoI. Even if there is an already present MO, it is not the scope of VISTA to track it.
On ENs, sonars are mounted along with cameras.
INs are equipped with cameras only.
Each SN location is pre-programmed in RoI.
All SNs have same computational resources and memory.
ENs exhibit three different states of activation with respect to (w.r.t.) to sonar, timer, camera, and transceiver as shown in Table 1.
Similarly, INs exhibits two different states of activation w.r.t. timer, camera, and transceiver as shown in Table 2.
Field Of view (FOV) of edge nodes’ sonars and cameras is calibrated to be exactly the same respectively.

Table 1 States exhibited by ENs

VISTA: achieving cumulative VIsion through energy efficient Silhouette recognition of mobile Targets through collAboration of visual sensor nodes

Abstract

Abstract

1 Introduction

2 Literature review

3 VISTA architecture

3.1 VISTA assumptions

3.2 VISTA network model

3.2.1 Considerations for camera deployment of ENs

3.2.2 Considerations for camera deployment of INs

3.3 VISTA-layered tenon mortise architecture

3.3.1 Physical layer

3.3.2 Network layer

3.3.3 Processing layer

4 VISTA performance evaluation

4.1 VISTA IP algorithms performance on testbed

4.2 VISTA performance evaluation using NS-2

4.2.1 VISTA simulation parameters

4.2.2 VISTA results based on NS-2 simulation

5 Discussion

6 Conclusions

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords