Detection and tracking of moving objects in a maritime environment using level set with shape priors
 Duncan Frost^{1} and
 JulesRaymond Tapamo^{1}Email author
DOI: 10.1186/16875281201342
© Frost and Tapamo; licensee Springer. 2013
Received: 14 February 2013
Accepted: 2 July 2013
Published: 26 July 2013
Abstract
Over the years, maritime surveillance has become increasingly important due to the recurrence of piracy. While surveillance has traditionally been a manual task using crew members in lookout positions on parts of the ship, much work is being done to automate this task using digital cameras coupled with a computer that uses image processing techniques that intelligently track object in the maritime environment. One such technique is level set segmentation which evolves a contour to objects of interest in a given image. This method works well but gives incorrect segmentation results when a target object is corrupted in the image. This paper explores the possibility of factoring in prior knowledge of a ship’s shape into level set segmentation to improve results, a concept that is unaddressed in maritime surveillance problem. It is shown that the developed video tracking system outperforms level setbased systems that do not use prior shape knowledge, working well even where these systems fail.
Keywords
Tracking Level sets Shape priors Maritime surveillance1 Introduction
While the word ‘pirate’ brings to mind thoughts of the swashbuckling, oneeyed seafarers of childhood fantasy, the term still, unfortunately, has use in today’s modern world. Costing an estimated US$13 to 16 billion a year [1], piracy remains a pertinent problem in areas such as the coast of Somalia and the Gulf of Guinea. Despite increased security, piracy in these areas is increasing over the years. While recent years have seen a slight drop in reported incidents of piracy, 439 attacks were reported in 2011 according to the International Maritime Bureau [2].
Due to the increased threat of piracy, surveillance is an absolute must on cargo ships travelling in these dangerous areas. While radar systems have been extensively used in maritime environments, these generally require large, metallic targets. Modern pirates favour small, fast rigid inflatable boats that are mainly nonmetallic and thus difficult to detect [3]. While the solution to this would seem to be the use of manual detection using dedicated crew members on board, the small number present at any given time makes this unfeasible. Unlike humans that grow tired, automated video surveillance systems are able to constantly monitor camera feeds and keep track of a number of objects of interest around the ship.
Szpak and Tapamo [4] present an approach that attempts to track objects using a closed curve in the image (a method known as level set segmentation) after they had been detected using a motionbased detection system. While the tracking results are very good, object detection suffers from detection of a large number of nonship objects due to motion from waves. To address these shortcomings, this work investigates the possibility of integrating priorknown shape information into segmentation.
The rest of the paper is structured as follows: Section 2 covers the background on level set methods and their implementation and reviews the theory associated with shape priors. Section 3 discusses applications of image processing and level sets within the maritime surveillance environment. Section 4 introduces the proposed video tracking system and details various subsystem functionalities. The various subsystems are implemented in Section 5, and the final system is established. Section 6 concludes and outlines future work.
2 Background
There have been a number of attempts to address the problem of detecting and tracking objects at sea. Some of the main tasks to achieve this goal are understanding of the nature of the sea clutter and its modelisation for accurate segmentation of moving objects.
In [5], temporal characteristics of sea clutter and that of a range of small boats are analysed using a comprehensive set of recorded datasets. This is done in an attempt to understand the dynamics and associated reflexivity of small boats. An empirical sea model is then derived. It then allows the development of advanced detection and tracking algorithms, which will help improve the performance of surveillance and marine navigation radar against small boats.
VicenBueno et al. [6] propose neural networksbased signal processing techniques to reduce sea clutter. In [7], several machine vision techniques that could help in easing the search tasks in maritime environment are investigated. Hidden Markov modelbased tracking models are then used to design a system that improves detection performance.
In this paper we propose an approach to ship detection and tracking based on active contours. Active contour methods are segmentation techniques that use an iteratively evolving contour or interface in an image that separates different regions of interest. Active contours can be expressed using one of two principle approaches [8]:

Explicit or Lagrangian approach resulting in an interface known as snake.

Implicit or Eulerian approach resulting in an interface called a level set.
Kass et al. [9] initially introduced the concept of active snakes for expressing the contour in an image in which a parameterised spline is guided in the image to a desirable position by a number of forces. The major problem with active snakes is its inability to deal with changes in topology [10]. Closed contours expressed as active snakes are unable to deal with splitting or merging of regions in an image. Level set methods were originally introduced by Osher and Sethian [11] as a method to evolve a contour with a speed proportional to its curvature. The main advantage of this method is that, unlike active contours, it allows for cusps, corners and automatic topological changes [10].
To ensure a onetoone mapping between the level set function and its corresponding contour, the level set function Φ is constrained to a signed distance function, that is, ∇Φ = 1 almost everywhere with Φ > 0 inside the contour and Φ < 0 outside the contour [12]. Informally, this can be thought of as the distance to the zerolevel contour inside the contour and its additive inverse outside.
2.1 ChanVese energy functional
Rather than using a predefined set of forces in the image, variational level set methods seek to produce a level set function that minimises a predefined cost function, more specifically known as energy functional also sometimes called a cost functional.
where μ, ν, λ _{1} and λ _{2} are parameters weighting the importance of their respective penalty terms. The contour needs to be evolved until E _{cv} is minimised.
2.2 Level sets with shape priors
Shape priors can be very useful in segmentation, mainly when the object to be segmented is corrupted. This is often the case in realworld applications, such as maritime surveillance. In [13], it is shown how an object partially occluded can be accurately segmented. Adding shape knowledge can be achieved by modifying a variational energy functional designed for segmentation by adding an additional term that penalises deviation from a particular shape.
Parametric methods of incorporating shape information impose this knowledge directly into the level set by defining the level set as the output of some parametric function and minimising segmentation energy functionals with respect to these parameters. Tsai et al. [16] introduce a parametric level set function that is an affine transformed version of a priorknown shape, defined in a level set function Ψ, for the use of shape priors. Parameters controlling translation, scale and rotation of the original shape Ψ are then optimised, rather than the level set function Ψ.
The use of multiple shape priors is documented in the literature and involves the use of selective and competing shape priors. Tsai et al. [16] propose expressing a level set function as a parametric combination of the principle components from a set of training shapes. In [15, 17], the authors create a model of a set of prior shapes by assuming that shape priors follow pixelwise Gaussian distributions. Cremers et al. [18] implement a shape model using a kernel density estimator to produce a statistical shape model that can model fairly distinct training shapes. A shape distance measure is then defined based on Chan and Zhu’s work [14].
3 Image processing approach to ship tracking
3.1 Traditional approach
The maritime environment is a particularly challenging one for tracking, and thus, the majority of works discussed here deal with very low level solutions to the problem. As the ocean, constantly filled with moving waves, is prone to producing erroneous detection with methods that detect moving objects, some authors choose to characterise it and label pixels that do not match this characterisation as objects of interest. Sanderson et al. [3] implement an algorithm that does this using frequency information. Smith and Teal [19] implement a similar approach using a histogrambased descriptor of the appearance of the sea. Voles and Teal [20] propose a method that is based on the use of crude descriptors of tiles in an image; the image is divided into overlapping tiles that increase in size as they get closer to the bottom of the image. This method often results in imprecise segmentation results. Voles et al. [21] improve the method presented in [20] by adding motion information obtained using frame differencing; the algorithm designed is purely pixelbased and therefore fails to segment larger maritime objects.
Gupta et al. [22] describe the development of the Maritime Activity Analysis Workbench; this project aims at overcoming limitations of the current maritime surveillance systems. In [23], Ponsford et al. present the design and implementation of an integrated maritime surveillance system based on highfrequency surface wave radars. Leung et al. [24] combine genetic algorithm and radial basis function neural network to search optimal values of a detector model. This model is then used to detect small surface targets in various sea conditions. In [25], an estimation of the ship size using an ANNbased clutter reduction system is proposed.
Socek et al. [26] present a method of combining existing object detection methods with colour information. It initially segments the foreground using background modelling with a Bayes decision framework that works best for backgrounds with complex variations and that are not periodic. In a maritime environment, the algorithm suffered from poor segmentation results having inaccurate boundaries and many scattered pixels. The authors seek to solve these issues by combining results with that of colourbased segmentation. Colour segmentation is treated as a graphpartitioning problem, and combining it with background subtracted output results in enhanced performance. There have also been many other attempts to build maritime surveillance systems; some example could be found in [27, 28].
3.2 Level set approach
Szpak and Tapamo [4] introduce an approach that uses level set methods as a way to track detected maritime objects in a scene. Object detection is implemented using a modified method of single Gaussian background subtraction. Where normal background subtraction deals with pixels in isolation, spatialsmoothness constraint is enforced to deal with neighbourhoods of pixels. The constraint assumes that realworld objects are spatially consistent entities and requires that a whole group of pixels, rather than single ones, exhibit motion behaviour before marking them as such. The output of this method is further segmented using level set methods. The contour is used in the tracking phase as described by Bertalmio et al. [29], where the final contour from the previous frame is used as an initial contour in the next. The algorithm was tested in 17 test sequences and showed promising results. It was able to successfully track in all but three of the 17 given sequences. The algorithm even showed good results in overcast and rainy conditions. It failed in sequences where there was insufficient contrast between the ocean and the target and thus fails to pick up specific motion of the target, when the target moves too slowly and is thus considered part of the background and when there is a lot of glint in the scene. Due to its high success rate and possible avenues for improvement, it was decided to base further work on the model proposed in [4].
4 Proposed model of ship tracking using level sets with shape priors
The fundamental contribution of the work presented in this paper is the proposition of a method of incorporating shape knowledge into the system discussed in [4].
4.1 Model overview
The input to the system is a sequence of grey scale image frames from a video of a maritime scene. The output of the system is ideally the same set of image frames with various maritime objects of interest highlighted by a level set contour. An overview of how the system functions is as follows:

The object tracker does not run until it has received a set of initial object positions from the object detector.

The object detector uses a background subtraction algorithm that only uses input frames periodically at a fixed spacing. Only once it is filled a buffer of frames is it able to produce an output, and so until then the tracker, and thus the system, has no output.

Once the buffer is full, the detector is able to output a set of objects to the tracker.

Once it has obtained a set of initial object positions, the object tracker continues to track these objects.
The object detector is consist of a prefiltering stage, followed by a background subtraction algorithm. The resultant binary images are then filtered again in a postfiltering algorithm to remove false positives in the image. If it is desired to find a particular shape that is known beforehand, binary level set shape prior segmentation can be further applied in a levelsetfiltering algorithm before the input to the object tracker.
4.2 Level set segmentation algorithm
While Szpak and Tapamo [4] used a general level set segmentation algorithm based on the work of Chan and Vese in [10], the work presented here deals with incorporating shape knowledge into the system. There is then a need for a different method to introduce shape priors.
4.2.1 Level set algorithm of Tsai et al
The system to be designed is based on the work of Tsai et al. [16]. The method presented in this work is based on a parametric function, which offers a number of benefits:

Parametric models allow for a faster evolution that is less prone to getting stuck in local minima as the energy is minimised directly by manipulating a few parameters rather than the entire contour.

Parametric models do not require function reinitialisation. The resultant segmenting level set function is always a transformed version of an original signed distance function, which itself is thus a signed distance function.

The limited degrees of freedom allows for more ‘brute force’ numerical methods of energy optimisation, which allow the contour to evolve according to any arbitrary energy functional without the need for symbolic differentiation.
Tsai et al. [16] deal with the incorporation of multiple shape priors in segmentation. For simplicity, however, the system described in this paper has been designed to use a single shape prior. For each sequence that is tested, the desired shape prior is manually set to the shape of an object that appears in that sequence. Simplification to a single shape prior allows the level set function to be manipulated using a set of pose parameters only (i.e. the one shape is known in advance and is given by the shape prior).
where p^{ t } and p^{ t+1} are the current and next values of p, respectively. ∇_{ p } E is the gradient of the energy with respect to p, and α _{ p } is a stepsize parameter controlling the speed of evolution.
4.2.2 Modifications to the work of Tsai et al
The gradient terms ∇_{ a } E, ∇_{ b } E, ∇_{ h } E, ∇_{ θ } E in Equation 15 are derived in [16] by symbolically differentiating the energy functional with respect to each of the parameters. This process is mathematically complex and undesirable. This section addresses how to estimate these gradient terms, thus avoiding the difficult derivation. Provided that the energy functional is differentiable with respect to its input parameters, it is possible to estimate the gradient with respect to each parameter using a numerical method known as the central difference approximation, thereby avoiding mathematically complex differentiation. For level set evolution, the following notation is introduced:

Φ_{ a,b,h,θ } is the level set function defined by a,b,h,θ.

E(Φ_{ a,b,h,θ }) is the energy functional calculated from Φ_{ a,b,h,θ }.
This can similarly be repeated for the remaining gradient terms ∇_{ b } E, ∇_{ h } E and ∇_{ θ } E. These are then used to evolve the pose parameters normally according to Equation 15. Gradient estimation schemes are in fact more resource intensive than calculation from symbolically derived gradients as they require recalculation of two new level set functions and associated energy functionals every time a gradient is estimated. That being said, their main benefit is that any arbitrary energy functional can be plugged into the algorithm without the need for complex symbolic derivations.
4.3 Object detector
4.3.1 Prefilter
To remove possible noise in the image before it is sent to the background subtraction stage, a prefilter may be used. Two prefilters have been proposed: the 3 × 3 Gaussian filter and the 3 × 3 Median filter.
4.3.2 Background subtraction
Once the image has been filtered, the system detects objects using a background modelling and subtraction algorithm. By producing a model of the background and subtracting it from the image, it is assumed that what remains will be objects of interest.
Take the example of a sequence of a fairly slowly moving object (such as a ship) captured using a highframerate camera. Suppose a buffer of 50 previous frames is used to build a background model. The slow speed of the object combined with the high frame rate of the camera would probably result in very little object movement for these frames, resulting in most of the object becoming part of the background model. By using frames that are spaced apart, the resultant background model is less likely to include the object because it will have moved over these frames. The disadvantage of this method is that this places an upper limit on how fast an object may be moving so that it is not missed by the spaced frames. The spacing must be decided upon by the system designer. This is implemented in the decision block in Figure 2 where frames are only sent to the object detection stage at fixed intervals. Obviously, this also means that the background subtraction stage is not able to provide a corresponding output for every input frame. This, however, is acceptable as the object tracker stage keeps track of objects for every frame after detection.
4.3.3 Postfiltering
A major problem in using background subtraction algorithms alone for maritime surveillance is the motion of the sea. Although kernel density estimation would be able to filter out pixels which oscillate between two values, it still would classify a wave moving across the image, for example, as a legitimate motion. It is for this reason that the binary image is filtered after background subtraction. This subsection details a number of different filters that can be used to remove these unwanted white pixels while keeping the desired ones.

Motion persistence filtering. Motion persistence filtering is a novel method introduced by this work that attempts to remove white pixels that only appear in a few background subtracted frames. The logic behind motion persistence filtering is that while waves will produce legitimate motion pixels in a background subtraction algorithm, unlike those of ships, this motion is shortlived and may last merely over a few frames.
Assuming an input set of motion images {B _{1}, B _{2}, …, B _{ t }}, this filter operates in a similar fashion to a twodimensional kernel density estimator: For every white pixel in every image, a twodimensional Gaussian kernel is placed (centred) over the pixel and its surrounding neighbours. The bandwidth of each Gaussian is set to be the distance to the nearest white pixel in the image. This builds a twodimensional probability estimate, where more ‘persistent’ motion pixels have higher probabilities. The algorithm proceeds to find pixels connected to these highdensity areas (over a fixed threshold) in the most recent motion frame B _{ t }. It was empirically decided to use three previous background subtracted frames for motion persistence filtering. If background subtraction with frame spacing is used, there should be considerable changes in sea motion across these fames. The threshold for the method was likewise set at 0.000008.

Fixedthreshold connected component filtering. Connected component filtering is a lowlevel image processing technique that simply removes connected components (or blobs) from a binary image depending on some criteria. The first connected component filtering algorithm that was proposed simply removes blobs below a certain threshold on blob size using 8connectivity to determine blobs.

Variable threshold connected component filtering. Voles and Teal note in [20] that because a maritime scene is an outdoor one with considerable depth of field, objects close to the camera are projected near the bottom of the image and thus appear larger than those further away from the camera. The second connected component filtering algorithm is based around this idea. Here the threshold on blob size is no longer a preset constant, but a linear function of a blob’s ycoordinates:$\text{Threshold}=M\left(y\right).$(24)

Spatialsmoothness filtering. Szpak and Tapamo [4] suggest that methods based on thresholding the area of connected components as described above are not suitable as targets may be smaller than some waves in the image and thus be erroneously removed. While a variable threshold should solve this problem, their suggested method of spatialsmoothness filtering is implemented for comparison. This technique is built into the proposed single Gaussian background subtraction before pixels are thresholded and thus requires some modification; however, the expected behaviour is the same. For a pixel at (i, j) with probability f _{KDE}(I(i, j)), Γ is calculated for a window of 2r × 2c pixels around it as$\mathrm{\Gamma}=\sum _{p=r}^{r}\sum _{q=c}^{c}{w}_{i+p,j+q}\times {f}_{\text{KDE}}\left(I\right(i+p,j+q\left)\right).$(25)Γ is thus the weighted sum of the input pixel (i, j) and its neighbours’ probabilities. This effectively is a smoothing operation before the probability estimates are converted into a binary image in the background subtraction algorithm. The output of the background subtraction (BS) for this pixel is then modified as follows:$\begin{array}{l}\text{BS}(x,y)=\left\{\begin{array}{ll}1& \text{if}\phantom{\rule{1em}{0ex}}\mathrm{\Gamma}<\mathit{\text{Th}}\times 2r\times 2c\\ 0& \text{otherwise},\end{array}\right.\end{array}$(26)

where Th is the background subtraction threshold normally used. In [4], a 3 × 3 filter is used with constant weights with values of 1; these parameters will then be used for testing.
4.3.4 Level set filtering
The first term of this energy penalises inner mean values (c _{1}) that are not equal to 1, while the second term penalises outer mean values c _{2} that are not equal to 0. This energy is minimised according to the modified version discussed above. The segmentation is applied for every blob or connected component in the image in isolation. Naturally, the blob most likely to be the object sought is that with the lowest energy.
4.4 Object tracker
Once an object has been detected, its shape and position are known for a single frame. It is necessary to track the object for every frame thereafter. To do this, the object tracker makes use of a single level set function that evolves itself to sit around the object in each frame. The level set function is initialised in the image using the object detector and can come directly from the level set shapefiltering stage of the object detector in the form of a single shape, or the binary image at the output of the filtering stage in the form of a set of shapes. Assuming that the level set contour surrounds the object correctly in the first frame after detection, the tracker makes use of pixel information that is within the contour. The initial contour and its inner pixel information in the first frame are henceforth known as the target contour and target model, respectively. For every subsequent frame, the current level set contour (which now probably will not lie around the object) and the information about its inner pixels are known as the candidate contour and candidate model, respectively. By creating an energy functional that penalises deviations of the candidate model from the target model, one is able to force the candidate contour around objects appearing similar to those surrounded by the target contour in the original frame. Different energy functionals may be created by comparing the target and candidate models in various ways. The various functionals are discussed next.
4.4.1 Energy functionals for tracking
The following are energy functionals used for tracking:

Histogram. The simplest feature that can be drawn from the pixels is the histogram. Pixels are put into k bins where ${h}_{i}^{t}$ is the number of pixels that falls into the i th bin for the target t, and${h}_{i}^{c}$ for the candidate c. The energy is the sum of squared differences of the bins:$E=\sum _{i=1}^{k}{({h}_{i}^{t}{h}_{i}^{c})}^{2}.$(29)

Fast Fourier transform. Frequency information may be utilised to make the feature more invariant to changes in lighting. Given a bounding box around the contour M×N pixels in size, a modified Fast Fourier transform is used to only extract frequency information from pixels within the contour:${F}_{\mathrm{\Phi}}(x,y)=\sum _{m=0}^{M1}\sum _{n=0}^{N1}\mathrm{\Phi}H\left(\mathrm{\Phi}\right(m,n\left)\right)I(m,n){e}^{i2\pi (\frac{\mathit{\text{xm}}}{M}+\frac{\mathit{\text{yn}}}{N})}.$(30)The energy function is then defined as the difference in target and candidate spectra:$E=\sum _{m=0}^{M1}\sum _{n=0}^{N1}{F}_{{\mathrm{\Phi}}_{c}}{F}_{{\mathrm{\Phi}}_{t}}.$(31)

Statistical descriptors. Statistical descriptors of the target pixels can be calculated. This approach has been used previously in maritime tracking work by Voles and Teal in [20]. The following descriptors in Table 1 have been modified to suit a level set case, once again for a bounding box around the contour M × N pixels in size:
After normalising with respect to a maximum value, these descriptors can be thought of as vectors that form a basis for a 4D space. The target contour’s pixel distribution is then represented as a point within this space ${D}^{t}=\phantom{\rule{2.77626pt}{0ex}}[{d}_{1}^{t},{d}_{2}^{t},{d}_{3}^{t},{d}_{4}^{t}]$ and similarly so for a candidate contour ${D}^{c}=\phantom{\rule{2.77626pt}{0ex}}[{d}_{1}^{c},{d}_{2}^{c},{d}_{3}^{c},{d}_{4}^{c}]$. The Euclidean distance between these two points can then be used as the energy functional:$E=\sqrt{\sum _{k=1}^{4}{({d}_{k}^{t}{d}_{k}^{c})}^{2})}.$(32)
Various statistical descriptors
Descriptor  Formula 

Energy  ${d}_{1}=\sum _{m=0}^{M}\sum _{n=0}^{N}H\left(\mathrm{\Phi}\right(m,n\left)\right)\mathrm{.I}{(m,n)}^{2}$ 
Entropy  ${d}_{2}=\sum _{m=0}^{M}\sum _{n=0}^{N}H\left(\mathrm{\Phi}\right(m,n\left)\right)\mathrm{.I}(m,n).log\left(I\right(m,n\left)\right)$ 
Homogeneity  ${d}_{3}=\sum _{m=0}^{M}\sum _{n=0}^{N}\frac{H\left(\mathrm{\Phi}\right(m,n\left)\right)\mathrm{.I}(m,n)}{1+mn}$ 
Contrast  ${d}_{4}=\sum _{m=0}^{M}\sum _{n=0}^{N}H\left(\mathrm{\Phi}\right(m,n\left)\right).{(mn)}^{2}\mathrm{.I}(m,n)$ 
4.4.2 Normalisation for rotation/scale invariance
This transforms both the candidate function and the pixels it contains to the same scale and rotation as the target function. The transformed function ${\stackrel{~}{\mathrm{\Phi}}}_{c}$ and image $\stackrel{~}{I}$ are then used in the evaluation of the energy functional. It should be emphasised that the values of h and Φ do not change and that the original candidate contour and image remain intact: their transformed values are used exclusively for evaluating energy functionals.
5 Experimental results and discussion
The system was implemented in MATLAB and tested using a set of ten maritime sequences obtained from the Council of Science and Industrial Research (South Africa). These sequences include a variety of scenes, weather conditions and maritime objects of interest. A specific target object was chosen for each of the sequences and used to test both object detection and tracking—the target objects for each of the sequences. This section introduces performance metrics regarding object detection and tracking and uses these metrics to compare the various choices for detection and tracking described previously. Thereafter, the final system is proposed.
5.1 Performance metrics
5.1.1 Object detection
where E(x) is the final energy obtained from the level set segmentation of blob x in the image and t is the index of the blob belonging to the object that is to be found. This essentially measures the contrast between the energy associated with segmentation of the actual object and the blob with the lowest energy of the remaining blobs. If the desired object has the lowest energy, P will be greater than 1, indicating a correct classification. However, if another blob has a lower energy than the desired object, P will be less than 1.
where A and B are the ground truth and resultant segmentation regions, respectively, and s varies from 0 to 1 depending on the proportion of pixels shared between the ground truth and the segmenting contour.
5.1.2 Object tracking
5.2 Optimisation
There are a number of parameters that need to be decided upon in the object detection and tracking subunits. Given the performance metrics defined in the previous section, the parameter values are deduced in an optimal way.
5.2.1 Object detection
While the common strategy of splitting a dataset into twothirds for training and onethird for testing is shown to work well for reasonably sized datasets (over 100 cases [40]), this was used for the system dataset despite its small size. The training could be repeated over a larger data set as future work, and the concepts are emphasised here with the term ‘optimal’ being used loosely. Sequences 1, 3, 4, 5, 8 and 9 were randomly chosen as a training set for optimisation, while the remaining sequences were used for testing.
Discussion of the two possible prefilters, the 3×3 Gaussian filter and the 3×3 Median filter, is postponed until last. No optimisation can be performed at the moment, but later in this paper, performance of the two filters is compared.

Background subtraction. In order to achieve a more satisfactory result, the background subtraction algorithm must be optimised with respect to the following:

The number of frames used for kernel density estimation (n)

The spacing between each frame (s)

The probability threshold used (Th)
These variables were adjusted to maximise the F _{2} score calculated for a small window around an object of interest in an image. A brute force search was run to find optimal parameters for each of the abovementioned training sequences. For every sequence, the values of n,s and Th were varied and the cost function was measured. Those that resulted in the lowest cost function were considered optimal for each sequence and are shown in Table 2.
The averages of these parameters were used in the final algorithm hoping that these would give fairly decent results across most of the sequences. Therefore, the optimally chosen parameters are as follows:$n=16,s=11\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}\mathit{\text{Th}}=0.0117$ 

Postfiltering. Out of the possible postfilters listed, the fixed threshold connected component algorithm has an optimal threshold based on blob size, while the variable threshold algorithm has an optimal threshold function that may be used. To find an optimal fixed threshold for fixed threshold connected component filtering, the area of the smallest ground truth object in each sequence was measured in order. The algorithm has to filter out blobs which are superfluous but still keep those that belong to actual maritime objects. To do this, a lower threshold for blob size must be selected. Table 3 shows the area of the smallest ground truth object in each sequence.
Optimal background subtraction parameters for various sequences
Sequence number  Optimal n  Optimal s  Optimal Th 

1  19  10  0.015 
3  19  15  0.005 
4  18  14  0.01 
5  14  13  0.015 
8  8  8  0.01 
9  18  5  0.015 
Area of the smallest ground truth object in each sequence
Sequence  Object area 

1  458 
3  1047 
4  696 
5  677 
8  70 
9  1468 
5.2.2 Object tracking
5.3 Object detection results
Given the deduced background subtraction and postfiltering parameters, the performance of the object detection subsystems is not measured. The choice of prefilter is now addressed. The optimised object detection algorithm was run on the test set of sequences: sequences 2, 6, 7 and 10. The entire system has been broken into its individual subsystems which have been tested individually. Subsystems that have a number of different algorithms at their disposal have been tested using each of these techniques and compared.
5.3.1 Background subtraction and prefilter
5.3.2 Postfiltering
5.3.3 Level set filtering
P scores for various image sequences using ChanVese energy as classification criteria
Sequence  P score  Classification (pass/fail) 

2  6.9316  Pass 
6  0.4642  Fail 
7  4.2569  Pass 
10  13.1468  Pass 
Segmentation results from image sequences that passed level set filtering
Sequence  Dice’s coefficient 

2  0.8698 
7  0.6113 
20  0.6667 
5.4 Object tracking results
Performance metrics for each sequence
Sequence  TDR  OTE 

1  1  1.8795 
2  1  3.7144 
3  1  1.4506 
4  0.85  10.799 
5  0.71  4.8505 
6  0.38  7.2204 
7  1  20.62 
8  1  2.7759 
9  0.7933  1.5088 
10  1  2.6944 
Poor tracking results in sequence 6 can be attributed to small object size and similar frequency characteristics of the object with the ocean, whereas camera panning ruined the otherwise perfect tracking results in sequence 9. Although the tracking contour successfully overlapped the object in every frame of sequence 7, the sequence has a comparatively high object tracking error of 20.62. This is due to a local minimum in the tracking energy functional under the object that left the tracking contour offset from its target in most frames.
5.5 Speed of algorithm
On a 2.0GHz dualcore processor, our system could process approximately ten frames per second with a MATLAB implementation of the algorithms. This performance can be greatly improved by doing a C++ implementation and parallelising some of the algorithms (e.g. background subtraction). The running time of the video tracker developed has purposefully been given lesser priority in order to focus on the development of concepts and methodology. Since this work addresses unexplored topics, the main focus is on the development of models and algorithms to address fundamental issues rather than address supplementary practical issues such as algorithm speed which may be optimised in future work for realtime computing.
6 Conclusions
This paper has investigated the use of prior knowledge of a ship shape to assist level set segmentation in video tracking for a maritime surveillance problem. It shows that integrating shape priors into level set segmentation is feasible and results in promising video tracking performance. While the system did produce an acceptable set of results, it still requires some assumptions that would not be practical in a reallife situation. Future work would allow for relaxation of these assumptions. The system only uses a single shape prior that must be manually preset for every sequence that would not be feasible in a reallife system. To remove the reliance on the user, a bank of multiple training shapes could be modelled using a method such as kernel density estimation. This model would then be used in place of a fixed shape prior for segmentation. It has to be noted that the system would be far more accurate if trained with a larger training set.
Declarations
Acknowledgements
The authors gratefully acknowledge PRISM/CSIR for funding this research. We would also like to thank the Optronic Sensor Systems at the Center for Scientific and Industrial Research (CSIR) for making the dataset available.
Authors’ Affiliations
References
 Stubblefield G, Parlatore B: Condition yellow. PassageMaker, Winter. 1999, 85.Google Scholar
 Piracy attacks in East and West Africa dominate world report: (ICC Commercial Crime Services, 2012),. . Accessed 09 Nov 2012 [http://www.iccccs.org/news/711piracyattacksineastandwestafricadominateworldreport]. Accessed 09 Nov 2012
 Sanderson J, Teal M, Ellis T: Characterisation of a complex maritime scene using Fourier space analysis to identify small craft. In Seventh International Conference on Image Processing and its Applications (Conf. Publ. No. 465). Manchester: IEE; 1999:803807.View ArticleGoogle Scholar
 Szapk ZL, Tapamo JR: Maritime surveillance: tracking ships inside a dynamic background using a fast levelset. Expert Syst Appl. 2011, 38(6):66686680.Google Scholar
 Herselman PL, Baker CJ, HJ de Wind: An analysis of Xband calibrated sea clutter and small boat reflectivity at mediumtolow grazing angles. Int. J. Navigation Observation. 2008. 10.1155/2008/347518Google Scholar
 VicenBueno R, CarrascoAlvarez R, RosaZurera M, NietoBorge JC: Sea clutter reduction and target enhancement by neural networks in a marine radar system. Sensors. 2009, 2009(9):19131936.Google Scholar
 Westall P, Ford J, O’Shea P, Hrabar S: Evaluation of machine vision techniques for aerial search of humans in maritime environments. In Digital Image Computing: Techniques and Applications (DICTA) 2008. Canberra; 1–3 Dec. 2008:176183.View ArticleGoogle Scholar
 Maistrou A, Level set methods  overview 2008.http://campar.in.tum.de/twiki/pub/Chair/TeachingSs08EvolvingContoursHauptseminar/ActiveContours.pdf
 Kass M, Witkin A, Terzopoulos D: Snakes: active contour models. Int J. Comput. Vis. 1988, 1(4):321331.Google Scholar
 Chan T, Vese LA: Active contours without edges. IEEE Trans Image Proces. 2001, 10(2):266277.Google Scholar
 Osher S, Sethian J: Fronts propagating with curvaturedependent speed: algorithms based on HamiltonJacobi formulations J. Comput. Phys. 1987, 79: 1249.Google Scholar
 Cremers D: Dynamical statistical shape priors for level set based tracking. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28(8):12621273.Google Scholar
 Bashir F, Porikli F: Performance evaluation of object detection and tracking systems. In IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS). New York: IEEE Computer Society; 2006:714.Google Scholar
 Chan T, Zhu W: Level set based shape prior segmentation. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 of CVPR 2005. San Diego; 20–25 June 2005:11641170.Google Scholar
 Rousson M, Paragios N: Shape priors for level set representations. Lecture Notes in Computer Science, vol. 2351. In Computer Vision – ECCV 2002. Edited by: Heyden A, Sparr G, Nielsen M, Johansen P. 1039 Heidelberg: Springer; 2002:7892.View ArticleGoogle Scholar
 Tsai AJ, Yezzi A, Wells W, Tempany D, Tucker C, Fan A, Grimson W, Willsky A: shapebased approach to the segmentation of medical imagery using level sets. IEEE Trans. Med. Imaging 2003, 22(2):137154. 10.1109/TMI.2002.808355View ArticleGoogle Scholar
 Rousson M, Paragios N, Deriche R: Implicit active shape models for 3D segmentation in MRI imaging. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2004. Edited by: Barrillot C, Haynor DR, Hellier P. Heidelberg: Springer; 2004:209216.View ArticleGoogle Scholar
 Cremers D, Osher S, Soatto S: Kernel density estimation intrinsic alignment for shape priors in level set segmentation. Int. J. Comput. Vis 2006, 60(3):335351.View ArticleGoogle Scholar
 Smith AA, Teal M: Identification and tracking of maritime objects in nearinfrared image sequences for collision avoidance. In Seventh International Conference on Image Processing and its Applications (Conf. Publ. No. 465). Manchester: IEE; 1999:250254.View ArticleGoogle Scholar
 Voles P, Teal M: Maritime scene segmentation,. . Accessed 12/04/2012 http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/VOLES/marine.html
 Voles P, Teal M, Sanderson J: Target identification in a complex maritime scene. In IEE Colloquium on Motion Analysis and Tracking. London: IEE; 1999:15/115/4.Google Scholar
 Gupta KM, Aha DW, Hartley R, Moore PG: Adaptive maritime video surveillance. In Proceedings of SPIE09, Volume 7346 of Visual Analytics for Homeland Defense and Security. Bellingham: SPIE; 2009.Google Scholar
 Ponsford AM, Sevgi L, Chan H: An integrated maritime surveillance system based on highfrequency surfacewave radars. 2. Operational status and system performance. IEEE Antennas Propagation Mag 2001, 43(5):5263. 10.1109/74.979367View ArticleGoogle Scholar
 Leung H, Dubash N, Xie N: Detection of small objects in clutter using a GARBF neural network. IEEE Trans. Aerosp. Electron. Syst 2002, 38: 98118. 10.1109/7.993232View ArticleGoogle Scholar
 VicenBueno R, CarrascoAlvarez R, RosaZurera M, NietoBorge J, JaraboAmores M: Artificial neural networkbased clutter reduction systems for ship size estimation in maritime radars. EURASIP J. Adv. Signal Process 2010, 2010(9):380473.Google Scholar
 Socek D, Culibrk D, Marques O, Kalva H, Furht B: A hybrid colorbased foreground object detection method for automated marine surveillance. Lecture Notes in Computer Science, vol. 3708. In Advanced Concepts for Intelligent Vision Systems – ACIVS 2005. Edited by: BlancTalon J, Philips W, Popescu D, Scheunders P. Heidelberg: Springer; 2005:340347.View ArticleGoogle Scholar
 Fefilatyev S, Goldgof D, Lembke C: Tracking ships from fast moving camera through image registration. In 2010 International Conference on Pattern Recognition. Los Alamitos: IEEE Computer Society; 2010:35003503.View ArticleGoogle Scholar
 Michael Seibert M, Rhodes BJ, Bomberger NA, Beane PO, Sroka JJ, Kogel W, Kreamer W, Stauffer C, Kirschner L, Chalom E, Bosse M, Tillson R: SeeCoast port surveillance. In Proceedings of SPIE Vol. 6204: Photonics for Port and Harbor Security II. Orlando; 17 Apr 2006.Google Scholar
 Bertalmio M, Sapiro G, Randall G: Morphing active contours. IEEE Trans. Pattern Anal. Mach. Int 2000, 22(7):733737. 10.1109/34.865191View ArticleGoogle Scholar
 Yilmaz A, Javed O, Shah M: Object tracking: a survey. Surv 2006, 38(4):1320. 10.1145/1177352.1177355View ArticleGoogle Scholar
 Bibby C, Reid I: Realtime tracking of multiple occluding objects using level sets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE; 2010:13071314.Google Scholar
 Shi Y, Karl W: Realtime tracking using level sets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE; 2005:3441.Google Scholar
 Elgammal A, Duraiswami R: Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proc. IEEE 2002, 90: 11511163. 10.1109/JPROC.2002.801448View ArticleGoogle Scholar
 Silverman B: Density estimation for statistics and data analysis. Monogr. Stat. Appl. Probability 1986, 1: 122.Google Scholar
 Harvey A, Oryshchenko V: Kernel density estimation for time series data. Int. J. Forecasting 2012, 28: 314. 10.1016/j.ijforecast.2011.02.016View ArticleGoogle Scholar
 Yezzi A, Tsai A, Willsky A: A statistical approach to snakes for bimodal and trimodal imagery. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Volume 2 of ICCV 1999. Kerkyra; 20–27 Sept 1999:898903.View ArticleGoogle Scholar
 Hripcsak G, Rothschild A: Agreement, the Fmeasure, and reliability in information retrieval. J. Am. Med. Inf. Assoc 2005, 12(3):296298. 10.1197/jamia.M1733View ArticleGoogle Scholar
 Krishnaveni M, Radha V: Quantitative evaluation of segmentation algorithms based on level set method for ISL datasets. Int. J. Comput. Sci. Eng 2011, 3(2):23612369.Google Scholar
 Dice L: Measures of the amount of ecologic association between species. Ecology 1945, 26(3):297302. 10.2307/1932409View ArticleGoogle Scholar
 Dobbin K, Simon R: Optimally splitting cases for training and testing high dimensional classifiers. BMC Med. Genomics 2011, 41(7):131.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.